Le LING 83800 - Methods in Computational Linguistics II
LING 83800
Methods in
Computational Linguistics II

Spring 2011
Friday 9:30am - 11:30am
GC 7314
Instructor: Andrew Rosenberg (andrew_at_cs.qc.cuny.edu)
Office Hours: Friday 4:00pm-5:00pm GC 4420

Course Description

This is the second of a two-part course sequence to train students with a linguistics background in the core methodologies of computational linguistics. Successful completion of this two-course sequence will enable students to take graduate-level elective courses in computational linguistics; both courses are offered by the Graduate Center's Linguistics Program, as well as courses offered by the Computer Science Program. This course will provide training in: the use of computational libraries built specifically for computational linguistics, the techniques used in performing computational analyses of electronic natural language corpora, and the foundational mathematics, probabilistic methods and statistics that are the backbone of modern computational linguistics. The course will go significantly beyond a survey of these topics. By completing the Methods in Computational Linguistics sequence, at the end of the first year, Computational Linguistics Master's students will have the skills they need to engage in further study of state-of-the-art topics in natural language processing.
Successful completion of Methods in Computational Linguistics I is a pre-requisite for this course.

Learning Outcomes

Upon successful completion of this course, a student can expect to:

  1. Be able to extract and analyze statistics from text corpora
  2. Understand foundational tasks in Computational Linguistics -- tagging, parsing, segmentation
  3. Have the ability to comprehend a Computational Linguistics conference paper (ACL, eg.).

Textbook

Natural Language Processing with Python by Steven Bird, Ewan Klein and Edward Loper. O'Reilly.
ISBN: 978-0596516499
Also available online at: http://www.nltk.org/book
This should be available through the bookstore, but may be found through other outlets at a discount.

Class Policies

Come to Class. It will be difficult to do well in the class without regular attendance. There is no additional penalty for missing class.

Cell phones must be on silent, and are not to be checked or used during class - if you are expecting an urgent call, tell the instructor at the start of class.

Laptops, tablets or lab computers are welcome in class. There will be some interactive exercises..

Cell phone and Laptop policy: One warning, after that 5 points off the next homework for each issue. Same policy for the instructor. One warning, after that, everyone gets 5 points on the next homework.

Grading Policy

Assignments: 80% (5 x 16%)

Attendance and write-up of seminars: 20%

The Final Letter Grade will be based on a scaled adjustment of the Final Numeric Grade. When the scale has been determined, the class will be informed either in class or over email, and it will be posted to the course webpage (here).

Assignment Policy

Do not cheat. You may discuss assignments with your classmates, but write or program your assignment alone. Do not ask for or offer to share code, or written assignments. If you discuss an assignment with a classmate, or on an online forum, include the name of the classmate or URL of the forum on your assignment or in the documentation of your code. The first instance of cheating results in an automatic zero for the assignment (or final project). A second instance of cheating results in a zero (F) for the course. The Computer Science Department will be notified in writing of all instances of cheating. On a second instance a report will be submitted to the Office of Academic Integrity.

Assignments will be posted to the website (here) after class the date that they are assigned.

All assignments will be scored out of 100 points.

There are 5 assignments. Each assignment will have a theoretical (pen-and-paper) component. Assignments may also include an implementation (coding) component.

Assignments will be due by 11:59pm on their due date. Assignments should be delivered electronically, via email.

Deliver assignments with a timestamp before 11:59pm on the due date to avoid a late penalty. If an extension is needed let me know as soon as possible. I will do my best to be reasonable to you and fair to the rest of class.

Incomplete Policy

In extenuating circumstances, students may be given an Incomplete if material has not been completed by the end of the semester. When an incomplete is granted, the student and instructor will specify, in writing, a timeframe for all outstanding material to be submitted. If no other timeframe has been specified in writing, the deadline for all outstanding material to be submitted to resolve an incomplete will be one month following the last meeting of the class. This semester, that would make the deadline: June 18. An incomplete that is not resolved by the deadline will become an F

Seminar Attendance and Writeup

Students are expected to attend three CUNY-NLP seminars and write a 2 page summary of the speaker's talk. This summary should include: 1) The topic and motivation of the work, 2) a description of the approach discussed, 3) a critical response to the work: major strengths and weaknesses, potential improvements or other applications of the approach.

Schedule

Date Material (Tentative) Assignments
Friday, January 27 No Class.
Friday, February 3 Welcome. Introduction to NLTK.
Sorting and Searching.
[pptx]
Read Chapter 1 of NLTK book.
Friday, February 10 Counting Things: Probability, Bayes Rule. NLTK: FreqDist
[pptx]
Friday, February 17 Counting More Things.
Conditionals
NLTK: ConditionalFreqDist
[pptx]
Homework 1 Assigned
Friday, February 24 Matching Things
Regular Expressions.
Friday, March 2 Annotating Things
Corpus Construction.
Part-of-speech tagging
Parsing
More Regular Expressions
[pptx]
[idle output]
[demo python code]
Homework 1 Due
Friday, March 9 Relating Things
Assessing word similarity
WordNet
Co-occurences
Word classes
List Comprehensions.
[pptx]
[idle output]
[demo python code]
Homework 2 Assigned html files
Friday, March 16 Go to Computational Presentations as part of CUNY 2012.
Friday, March 23 Structuring Things
Functions
Object Oriented Programming
Data structures: 2d lists, trees, graphs.
[pptx]
Homework 2 Due
Homework 3 Assigned
Friday, March 30 Recursion.
Inheritance.
Dynamic Programming.
[pptx]
[inh.py]
[tree.py]
[shape.py]
[class_example.py]
Homework 3 Due on April 2
Friday, April 6 No Class: Spring Recess
Friday, April 13 No Class: Spring Recess
Friday, April 20 Classifying Things
Machine Learning in Computational Linguistics
Using NLTK classification routines.
Evaluation
key
demo code
Homework 4 Assigned
Friday, April 27 Part-of-speech Tagging
Dictionaries
Training a Tagger in NLTK
key
Friday, May 4 Dynamic Programming
Minimum Edit Distance
key
fibonacci example
making change example
minimum edit distance example
Homework 5 Assigned. Homework 4 Due
Friday, May 11 Last Day of Class
Plotting and Graphics with nltk and matplotlib
Segmentation.
Textual Entailment.
key
examples
Tuesday, May 22 No class -- Finals week. Homework 5 Due