LING 78100 - Methods in Computational Linguistics I
Graduate Center 4422
Fall 2010
Friday 11:45am - 1:45am

Instructor: Andrew Rosenberg (
Office Hours: Fridays 10:30-11:30am 4420


This is the first of a two course sequence that exposes students to core methodologies in computational linguistics. After you successfully complete this two course sequence, you will be able to take graduate level elective courses in computational linguistics; including courses offered by the Graduate Center's Linguistics Program, as well as courses offered by the Computer Science Program. This course will introduce you to computer programming at a level that will allow you to begin building computer applications that perform various computational linguistic tasks. No programming experience is required whatsoever. The programming language we will use is Python. Although assignments, lectures and projects will be mostly text/linguistics-oriented, the course provides sufficient 'general' programming background for students from any discipline who want to learn programming. There is a 1 credit practicum that wil be held in a computer lab that runs in conjunction with this course. The practicum covers the course material in a "hands-on" setting. It is strongly recommended that you take the practicum.

Class Policy

Come to Class. It will be difficult to do well in the class without regular attendance. There is no additional penalty for missing class.

Cell phones must be on silent, and are not to be checked or used during class - if you are expecting an urgent call, tell the instructor at the start of class.

Laptops are fine for taking notes. No internet, no chat, no games.

Cell phone and Laptop policy: One warning, after that 5 points off the next homework or exam for each issue. Same policy for the instructor. One warning, after that, everyone gets 5 points on the next homework or exam.

Grading Policy

Assignments: 50%

Final Project: 40%

Presentation: 10%

The Final Letter Grade will be based on a scaled adjustment of the Final Numeric Grade. When the scale has been determined, the class will be informed either in class or over email, and it will be posted to the course webpage (here).

Assignment Policy

Do not cheat. You may discuss assignments with your classmates, but write or program your assignment alone. Do not ask for or offer to share code, or written assignments. If you discuss an assignment with a classmate, or on an online forum, include the name of the classmate or URL of the forum on your assignment or in the documentation of your code. The first instance of cheating results in an automatic zero for the assignment (or midterm or final). A second instance of cheating results in a zero (F) for the course. The Linguistics Department will be notified in writing of all instances of cheating. On a second instance a report will be submitted to the Office of Academic Integrity.

Assignments will be posted to the website (here) after class on Fridays.

All assignments will be scored out of 100 points.

Assignments will be due just after the start of class. Written assignments should be emailed or hard-copies should be delivered in class.

Deliver assignments at the start of class or email with a timestamp before 11:50am to avoid a late penalty. If an extension is needed let me know as soon as possible. I will do my best to be reasonable to you and fair to the rest of class. Delivering an assignment while being more than 5 minutes late for class will be make the assignment considered Late. There is a 5 point Late Penalty for each 12 hours late the assignment is delivered. Friday 11:50am - Friday 11:50pm: -5 points. Friday 11:50pm - Saturday 11:50am: -10 pts. Saturday 11:50am - Saturday 11:50pm: -15pts. Saturday 11:50pm - Sunday 11:50am: -20pts.

Grades will be posted at 11:30am on Fridays, just before class. After 11:50, 2 days after an assignment was due, no assignments will be accepted. Assignments that were delivered on time will be returned during the following class.

After each assignment is graded, anonymous mean, median, maximum and minimum scores will be posted to this page.

Final Project

A relatively substantial programming final project will be required. This assignment should address a linguistics question or task from a computational perspective. Individual meetings will be held at or around the 6th week of class to help guide and formulate an appropriate project.

Text Book

Natural Lanugage Processing with Python by Steven Bird, Ewan Klein and Edward Loper. O'Reilly. ISBN: 978-0596516499

This should be available through the bookstore, but may be found through other outlets at a discount.

The MIT OpenCourseWare site for the Introduction to Computer Science and Programming has some publicly available python resources [link].


Date Material Assignments
August 27 Paper Work & Overview.
pdf keynote math demos strings demo
Read Chapter 1.
September 3 Language Processing with Python
pdf keynote demos Practicum1
Assignment 1 -- Introduction to Python and NLTK.
September 10 No Class. Practicum2
September 14
Text Corpora Note: Class meets on Tuesday!
pdf keynote demos stored function demo files [zip]
practicum3 exercises practicum3 program
Read Chapter 2
September 17 Lists and Tuples
pdf keynote demos demo python programs [zip]
Assignment 1 Due.
Assignment 2 -- Processing Text
Sepember 24 No Class.
October 1 Guest Instructor: Matt Huenerfauth. Input-Output.
pdf powerpoint
practicum files
October 8 List Comprehensions and Dictionaries
pdf keynote demos practicum5
Assignment 3 -- File Processing
Childes data
October 15 Objects and Classes
pdf keynote sample objects demos
October 22 Python Regular Expressions
pdf keynote demo interpreter output
October 29 Processing Raw Text with Regular Expressions
pdf keynote email processing program shells from recent practica
Assignment 3 Due.
Assignment 4 -- Regular Expressions html files to process
November 5 Part of Speech Tagging with NLTK
pdf keynote demo practicum tagging example
Read Chapter 4
November 12 Parsing with NLTK
pdf keynote demo
November 19 Using WordNet
pdf keynote demo practicum example practicum ppt
Assignment 4 Due.
Assignment 5 -- Corpus Statistics Files to process
November 26 No Class.
December 3 Machine Learning Primer  
December 10 Work on Projects -- No Class
December 17 Project Presentations