LING 83800: Methods in
Computational Linguistics II

Spring 2015
Wednesdays 11:45am - 1:45pm
Location CUNY GC Room 7395
Instructor: Andrew Rosenberg (
Office Hours: By Appointment GC 4420

Course Description

This is the second of a two-part course sequence to train students with a linguistics background in the core methodologies of computational linguistics. Successful completion of this two-course sequence will enable students to take graduate-level elective courses in computational linguistics; both courses are offered by the Graduate Center's Linguistics Program, as well as courses offered by the Computer Science Program. This course will provide training in: the use of computational libraries built specifically for computational linguistics, the techniques used in performing computational analyses of electronic natural language corpora, and the foundational mathematics, probabilistic methods and statistics that are the backbone of modern computational linguistics. The course will go significantly beyond a survey of these topics. By completing the Methods in Computational Linguistics sequence, at the end of the first year, Computational Linguistics Master's students will have the skills they need to engage in further study of state-of-the-art topics in natural language processing.
Successful completion of Methods in Computational Linguistics I is a pre-requisite for this course.

Learning Outcomes

Upon successful completion of this course, a student can expect to:

  1. Be able to extract and analyze statistics from text corpora
  2. Understand foundational tasks in Computational Linguistics -- tagging, parsing, segmentation
  3. Have the ability to comprehend a Computational Linguistics conference paper (ACL, eg.).


Natural Language Processing with Python by Steven Bird, Ewan Klein and Edward Loper. O'Reilly.
ISBN: 978-0596516499
Also available online at:
This should be available through the bookstore, but may be found through other outlets at a discount.

Class Policies

Come to Class. It will be difficult to do well in the class without regular attendance. There is penalty for missing up to 3 classes. Each missed class more than 3 will reduce the maximum Attendence and Participation grade by 1% up to a minimum maximum of 5%. (To get 5 points of Participation while missing more than 8 classes, you'd better be doing something outrageous when you're there.)

Cell phones must be on silent, and are not to be checked or used during class - if you are expecting an urgent call, tell the instructor at the start of class.

Laptops, tablets or lab computers are welcome in class.

Cell phone policy: One warning, after that 5 points off the next homework for each issue. Same policy for the instructor. One warning, after that, everyone gets 5 points on the next homework.

Grading Policy

Assignments: 90% (6 x 15%)

Attendance and Participation: 10%

The Final Letter Grade will be based on a scaled adjustment of the Final Numeric Grade. Note: There may be no scaled adjustments. This is dependent on the performance of the class as a whole. When the scale has been determined, the class will be informed either in class or over email, and it will be posted to the course webpage (here).

Assignment Policy

Do not cheat. You may discuss assignments with your classmates, but write or program your assignment alone. Do not ask for or offer to share code, or written assignments. If you discuss an assignment with a classmate, or on an online forum, include the name of the classmate or URL of the forum on your assignment or in the documentation of your code. The first instance of cheating results in an automatic zero for the assignment (or final project). A second instance of cheating results in a zero (F) for the course. The Computer Science Department will be notified in writing of all instances of cheating. On a second instance a report will be submitted to the Office of Academic Integrity.

Assignments will be posted to the website (here) after class the date that they are assigned.

All assignments will be scored out of 100 points.

There are 6 assignments. Each assignment will have a theoretical (pen-and-paper) component and or an implementation (coding) component.

Assignments will be due by 11:59pm on their due date. Assignments should be delivered electronically, via email.

Deliver assignments with a timestamp before 11:59pm on the due date to avoid a late penalty. If an extension is needed let me know as soon as possible. I will do my best to be reasonable to you and fair to the rest of class.

Incomplete Policy

In extenuating circumstances, students may be given an Incomplete if material has not been completed by the end of the semester. When an incomplete is granted, the student and instructor will specify, in writing, a timeframe for all outstanding material to be submitted. If no other timeframe has been specified in writing, the deadline for all outstanding material to be submitted to resolve an incomplete will be one month following the last meeting of the class. This semester, that would make the deadline: June 20. An incomplete that is not resolved by the deadline will become an F


Date Material (Tentative) Assignments
Week 1: Wednesday, January 28 Welcome. Introduction to NLTK.
Sorting and Searching.
Read Chapter 1 of NLTK book.
Week 2: Wednesday, February 4 Counting Things: Probability, Bayes Rule.
NLTK: FreqDist
Assignment 1 Out
Week 3: Wednesday, February 11 Counting More Things.
NLTK: ConditionalFreqDist
[pptx] Sample Code:
Wednesday, February 18 No Classes. GC Classes follow Monday Schedule.
Week 4: Wednesday, February 25 Matching Things
Regular Expressions.
Assignment 1 Due, Assignment 2 Out. Input Files
Week 5: Wednesday, March 4 Annotating Things
List Comprehensions.
zip, map and lambda.

Week 6: Wednesday, March 11 Relating Things
Lexical Annotation
[format conversion ipython notebook]
[format conversion html]
[more regular expressions ipython notebook]
[more regular expressions html]
Assignment 2 Due.
Week 7: Wednesday, March 18 WordNet
[wordnet demo]
Data Structures (Trees and Graphs)
Assignment 3 Out
Week 8: Wednesday, March 25 Unix utilities
[pdf] Dynamic Programming
Minimum Edit Distance
fibonacci example
making change example
minimum edit distance example
Week 9: Wednesday, April 1 Pandas and Scikit Learn Assignment 3 Due. Assignment 4 Out.
Wednesday, April 8 No Class: Spring Recess
Week 10: Wednesday, April 15 Classifying Things
Machine Learning in Computational Linguistics
Using classification
[pdf] [ipython notebook]
Week 11: Wednesday, April 22 More Machine Learning. interaction between NLTK and scikit learn
Part of Speech Tagging
Training a Tagger in NLTK
pdf ipynb html
Week 12: Wednesday, April 29 Plotting and Graphics.
pdf ipynb html
Assignment 4 Due. Assignment 5 Out
Week 13: Wednesday, May 6 Sentiment Analysis [pdf] [ipynb html]
Week 14: Wednesday, May 13 Textual Entailment.
Unicode and Polyglot.
Assignment 5 Due. Assignment 6 Out
Wednesday, May 20
Final Exam Week. No Class
Assignment 6 Due