Overview
This is the first of a two course sequence that exposes students to core methodologies in computational linguistics. After you successfully complete this two course sequence, you will be able to take graduate level elective courses in computational linguistics; including courses offered by the Graduate Center's Linguistics Program, as well as courses offered by the Computer Science Program. This course will introduce you to computer programming at a level that will allow you to begin building computer applications that perform various computational linguistic tasks. No programming experience is required whatsoever. The programming language we will use is Python. Although assignments, lectures and projects will be mostly text/linguistics-oriented, the course provides sufficient 'general' programming background for students from any discipline who want to learn programming. There is a 1 credit practicum that wil be held in a computer lab that runs in conjunction with this course. The practicum covers the course material in a "hands-on" setting. It is strongly recommended that you take the practicum.
Class Policy
Come to Class. It will be difficult to do well in the class without regular attendance. There is no additional penalty for missing class.
Cell phones must be on silent, and are not to be checked or used during class - if you are expecting an urgent call, tell the instructor at the start of class.
Laptops are fine for taking notes. No internet, no chat, no games.
Cell phone and Laptop policy: One warning, after that 5 points off the next homework or exam for each issue. Same policy for the instructor. One warning, after that, everyone gets 5 points on the next homework or exam.
Grading Policy
Assignments: 50%
Final Project: 40%
Presentation: 10%
The Final Letter Grade will be based on a scaled adjustment of the Final Numeric Grade. When the scale has been determined, the class will be informed either in class or over email, and it will be posted to the course webpage (here).
Assignment Policy
Do not cheat. You may discuss assignments with your classmates, but write or program your assignment alone. Do not ask for or offer to share code, or written assignments. If you discuss an assignment with a classmate, or on an online forum, include the name of the classmate or URL of the forum on your assignment or in the documentation of your code. The first instance of cheating results in an automatic zero for the assignment (or midterm or final). A second instance of cheating results in a zero (F) for the course. The Linguistics Department will be notified in writing of all instances of cheating. On a second instance a report will be submitted to the Office of Academic Integrity.
Assignments will be posted to the website (here) after class on Fridays.
All assignments will be scored out of 100 points.
Assignments will be due just after the start of class. Written assignments should be emailed or hard-copies should be delivered in class.
Deliver assignments at the start of class or email with a timestamp before 11:50am to avoid a late penalty. If an extension is needed let me know as soon as possible. I will do my best to be reasonable to you and fair to the rest of class. Delivering an assignment while being more than 5 minutes late for class will be make the assignment considered Late. There is a 5 point Late Penalty for each 12 hours late the assignment is delivered. Friday 11:50am - Friday 11:50pm: -5 points. Friday 11:50pm - Saturday 11:50am: -10 pts. Saturday 11:50am - Saturday 11:50pm: -15pts. Saturday 11:50pm - Sunday 11:50am: -20pts.
Grades will be posted at 11:30am on Fridays, just before class. After 11:50, 2 days after an assignment was due, no assignments will be accepted. Assignments that were delivered on time will be returned during the following class.
After each assignment is graded, anonymous mean, median, maximum and minimum scores will be posted to this page.
Final Project
A relatively substantial programming final project will be required. This assignment should address a linguistics question or task from a computational perspective. Individual meetings will be held at or around the 6th week of class to help guide and formulate an appropriate project.
Text Book
Natural Lanugage Processing with Python by Steven Bird, Ewan Klein and Edward Loper. O'Reilly. ISBN: 978-0596516499
This should be available through the bookstore, but may be found through other outlets at a discount.
The MIT OpenCourseWare site for the Introduction to Computer Science and Programming has some publicly available python resources [link].
Schedule
| Date | Material | Assignments |
|---|---|---|
| August 27 | Paper Work & Overview. pdf keynote math demos strings demo |
Read Chapter 1. |
| September 3 | Language Processing with Python pdf keynote demos Practicum1 |
Assignment 1 -- Introduction to Python and NLTK. |
| September 10 | No Class. Practicum2 | |
| Tuesday, September 14 |
Text Corpora Note: Class meets on Tuesday! pdf keynote demos stored function demo files [zip] practicum3 exercises practicum3 program |
Read Chapter 2 |
| September 17 | Lists and Tuples pdf keynote demos demo python programs [zip] |
Assignment 1 Due. Assignment 2 -- Processing Text |
| Sepember 24 | No Class. | |
| October 1 | Guest Instructor: Matt Huenerfauth. Input-Output. pdf powerpoint practicum files |
|
| October 8 | List Comprehensions and Dictionaries pdf keynote demos practicum5 |
Assignment 3 -- File Processing Childes data |
| October 15 | Objects and Classes pdf keynote sample objects demos |
|
| October 22 | Python Regular Expressions pdf keynote demo interpreter output |
|
| October 29 | Processing Raw Text with Regular Expressions pdf keynote email processing program shells from recent practica |
Assignment 3 Due. Assignment 4 -- Regular Expressions html files to process |
| November 5 | Part of Speech Tagging with NLTK pdf keynote demo practicum tagging example |
Read Chapter 4 |
| November 12 | Parsing with NLTK pdf keynote demo |
|
| November 19 | Using WordNet pdf keynote demo practicum example practicum ppt |
Assignment 4 Due. Assignment 5 -- Corpus Statistics Files to process |
| November 26 | No Class. | |
| December 3 | Machine Learning Primer | |
| December 10 | Work on Projects -- No Class | |
| December 17 | Project Presentations |