Natural Language Processing
(Language Technology)

Fall 2015
Wednesday 11:45-1:45
Location CUNY GC Room 3207
Instructor: Andrew Rosenberg (andrew_at_cs.qc.cuny.edu)
Office Hours: by appointment

Course Description

Natural Language Processing (NLP) is an interdisciplinary subfield of computer science that concerns how computers interact with language. With strong interactions with linguistics, NLP has facilitated a wide array of technologies that we interact with on a daily basis including search, recommendation systems, speech recognition and machine translation.

This course is an introduction to NLP. The first half of the course will build up the machinery of the Finite State Machine through the example of speech recognition. The later half of the course will cover syntactic parsing, semantics and discourse, ending with machine translation.

This course satisfyies the "Language Technology" requirement of the CUNY Graduate Center Computational Linguistics MA/PhD Certificate Program. Linguistics students must have successfully completed Methods in Computational Linguistics I and II.

Schedule

Date Material Assignments
Wednesday, September 2 Week 1 - Structure of Language
[pdf] [keynote]
Readings:
  • Jurafsky and Martin, 2nd Edition (J&M) Chapter 1
Wednesday, September 9 Week 2 - Regular Expressions and Automata
[pdf] [keynote]
Readings:
  • J&M Chapter 2
Assignment 1: Due Sept. 24
Wednesday, September 16 Week 3 - Morphology
(Guest Lecturer: David Guy Brizan)
[Google Docs]
Readings:
  • J&M Chapter 3
Wednesday, September 23 Graduate Center Closed. No Classes
Wednesday, September 30 Week 4 - Pronunciation and Spelling
(Guest Lecturer: Rivka Levitan)
[Google Docs]
Readings:
  • J&M Chapters 7 and 9
Wednesday, October 7 Week 5 - N-grams and Language Modeling
(Guest Lecturer: Michelle Morales) [PowerPoint Online]
Readings: Assignment 2: Due Oct. 26
Wednesday, October 14 Week 6 - HMMs and Speech Recognition
(Guest Lecturer: Ali Raza Syed)
[pdf]
Readings:
  • J&M Chapters 6 and 9
Wednesday, October 21 Week 7 - Intonation
[pdf] [keynote]
Readings:
  • None
Wednesday, October 28 Week 8 - Machine Learning
[pdf] [keynote]
Readings:
  • None
Assignment 3: Due Nov. 11
Wednesday, November 4 Week 9 - POS Tagging
[pdf]
Readings:
  • J&M2ed Chapter 5 and 6
Wednesday, November 11 Week 10 - Syntactic Parsing (week 1 of 2)
[pdf]
Readings:
  • J&M2ed - Chapters 12, 13 and 14
Wednesday, November 18 Week 11 - Syntactic Parsing (week 2 of 2) Readings:
Assignment 4: Due Dec. 2
Wednesday, November 25 Week 12 - Semantics
[pdf]
Readings:
  • J&M2ed Chapters 18, 19 and 22
Wednesday, December 2 Week 13 - Discourse and Dialog
[pdf]
Readings:
  • J&M2ed Chapters 21 and 24
Assignment 5: Due Dec. 16
Wednesday, December 9 Week 14 - Machine Translation Readings:
  • J&M2ed Chapter 25
Wednesday, December 16 Finals Week Final Exam

Learning Outcomes

Upon successful completion of this course, a student can expect to:

  1. Understand the basics of ACL, NAACL, EMNLP papers. (Though lacking exposure to all technical details.)
  2. Understand the main components of linguisic structure
  3. Understand Finite State Automata and Finite State Transducers
  4. Understand the basic components of speech recognition
  5. Understand main approaches to syntactic parsing
  6. Understand some basic approaches to machine translation
  7. Understand some machine learning algorithms including the perceptron, and neural net

Textbook

Required: Speech and Language Processing, Author: Jurafsky and Martin, Publisher: Prentice Hall, Edition: 2 ISBN: 978-0131873216

Class Policies

Come to Class.

Cell phones on silent.

Do the readings.

Do not cheat.

Grading Policy

Assignments: 65%

Final Exam: 30%

Participation: 5%

Assignment Policy

Do not cheat. You may discuss assignments with your classmates, but write or program your assignment alone. Do not ask for or offer to share code, or written assignments. If you discuss an assignment with a classmate, or on an online forum, include the name of the classmate or URL of the forum on your assignment or in the documentation of your code. The first instance of cheating results in a zero (F) for the course. The Computer Science and/or Lingusitics Program(s) Execitive Officers will be notified in writing of all instances of academic dishonesty.

Assignments will be posted to the website (here) after class the date that they are assigned.

All assignments will be scored out of 100 points.

There are 5 assignments. Most assignments will have a theoretical (pen-and-paper) component and an implementation (coding) component.

Unless otherwise noted, axssignments will be due by 11:59pm on their due date. Assignments should be delivered electronically, via email to the instructor.

No late assignments will be accepted. If an extension is needed let me know as early as possible. I will do my best to be reasonable to you and fair to the rest of class. No extensions will be granted after 24 hours before the assignment is due.

Coding Assignments

If there are programming requirements to any assignment, coding assignments can be written in C++, java or python.

In general, grading of coding assignments will be 65% Implementation (compilation, passing tests, implementational details) and 35% Documentation and Style. This may be adjusted for some assignments. Always read the assignment for the grading breakdown.

Detailed requirements will accompany each assignment. The instructions and requirements on a particular assignment always take precedence over the general guidelines on the course website.

Submission of coding assignments should be performed over electronically. Submitting multiple times is fine. The latest assignment submitted on time will be graded.

README guidelines

Each coding assignment will require a README file as a component of its documentation. A README file should provide a high-level description of your assignment, or project.

A successful README file will include the following:

Written Assignments

Written Assignments should also be delivered electronically, via email or other sharing service like dropbox or Google Drives.

Electronic copies must be in one of the following formats: .txt (plaintext), .pdf, Microsoft Word .doc/.docx, Google Drive/Docs.

Points for each question will be described in each assignment.

Incomplete Policy

In extenuating circumstances, students may be given an Incomplete if material has not been completed by the end of the semester. When an incomplete is granted, the student and instructor will specify, in writing, a timeframe for all outstanding material to be submitted. If no other timeframe has been specified in writing, the deadline for all outstanding material to be submitted to resolve an incomplete will be one month following the last meeting of the class. This semester, the deadline will be January 16 . An incomplete that is not resolved by the deadline will become an F or the grade corresponsind to the completed material.