Overview
This class will teach the foundations of spoken language processing.
We will build from the fundamentals of modern speech recognition and
speech synthesis. This work will be motivated by spoken dialog sytems
and conversational agents as examples of machine speech perception and
production.
By the end of the semester, students will have a basic
knowledge of the core problems and state-of-the-art in spoken language
processing, and be prepared to participate in a semester-long spoken
language processing seminar course.
Class Policy
Come to Class. It will be difficult to do well in the class without regular attendance. There is no additional penalty for missing class.
Cell phones must be on silent, and are not to be checked or used during class - if you are expecting an urgent call, tell the instructor at the start of class.
Laptops are fine for taking notes. No internet, no chat, no games.
Cell phone and Laptop policy: One warning, after that 5 points off the next homework or exam for each issue. Same policy for the instructor. One warning, after that, everyone gets 5 points on the next homework or exam.
Grading Policy
Midterm: 25%
Project in 5 parts: 75% (35% final project, plus 10% per milestone assignment)
The Final Letter Grade will be based on a scaled adjustment of the Final Numeric Grade. When the scale has been determined, the class will be informed either in class or over email, and it will be posted to the course webpage (here).
Assignment Policy
Do not cheat. You may discuss assignments with your classmates, but write or program your assignment alone. Do not ask for or offer to share code, or written assignments. If you discuss an assignment with a classmate, or on an online forum, include the name of the classmate or URL of the forum on your assignment or in the documentation of your code. The first instance of cheating results in an automatic zero (F) for the course. The Computer Science Department will be notified in writing of all instances of cheating and a report will be submitted to the Office of Academic Integrity.
Assignments will be posted to the website (here) after class on Tuesdays.
All assignments will be scored out of 100 points.
Assignments will be due just after the start of class, 3:10pm. Written assignments should be emailed or hard-copies should be delivered in class.
Deliver assignments at the start of class or email with a timestamp before 3:10pm to avoid a late penalty. If an extension is needed let me know as soon as possible. I will do my best to be reasonable to you and fair to the rest of class. Delivering an assignment while being more than 5 minutes late for class will be make the assignment considered Late. There is a 5 point Late Penalty for each 12 hours late the assignment is delivered. Due Date (DD) 3:10pm - DD+1 3:10am -5 points. DD+1 3:10am - DD+1 3:10pm -10 pts. DD+1 3:10pm - DD+2 3:10am -15pts. DD+2 3:10am - DD+2 3:10pm -20pts.
After 6:30, 2 days after an assignment was due, no assignments will be accepted.
After each assignment and the midterm is graded, anonymous mean, median, maximum and minimum scores will be distributed to the class.
Coding Assignments
Assignments will be written in C++, java or python and must compile using g++ on venus.cs.qc.cuny.edu with no unsubmitted, external libraries.
In general, grading will be 15% Compilation, 15% Execution, 35% Correctness (15% passing tests, 20% implementational details), 35% Documentation and Style. This may be adjusted for some assignments. Always read the assignment for the grading breakdown.
Testing will be performed automatically. Sample tests will be delivered with each assignment. If code does not operate using the published and distributed testing format, the assignment will be considered Incorrect and zero "Correctness" points will be awarded.
Detailed requirements will accompany each assignment. The instructions and requirements on a particular assignment always take precedence over the general guidelines on the course website.
Submission of coding assignments should be performed over email. Don't forget to attach your files. Submitting multiple times is fine. The latest assignment submitted on time will be graded. If you submit an assignment late, after submitting an assignment on time, you must let me know, via email, that you would like the late submission graded for the assignment.
Written Assignments
Written Assignments can be delivered electronically by email, or hard copies can be delivered by hand either in class, or dropped at my office NSB A330.
Electronic copies must be in one of the following formats: .pdf, Microsoft Word .doc, Google Docs, OpenOffice.
Hand written copies are acceptable, but be very careful that the work is clear. If I can't read that an answer is correct, it is wrong.
Points for each question will be described in each assignment.
Exam Policy
The Midterm will be held during class on November 2nd. If you will not be able to make this date, let me know as early as possible, and I will do my best to schedule another time for you to take the Midterm.
Text Book
Speech and Language Processing, Author: Jurafsky and Martin, Publisher: Prentice Hall, Edition: 2 ISBN: 978-0131873216
This should be available through the bookstore, but may be found through other outlets at a discount.
Schedule
| Date | Material | Assignments |
|---|---|---|
| August 29 | No Class. | |
| August 31 | No Class. | |
| September 5 | No Class - Labor Day | |
| September 7 | Course Overview | |
| September 12 | From Sounds to Language | Homework 1 Assigned Text file with nonsense IPA symbols |
| September 14 | Spoken Dialog Systems | |
| September 19 | Acoustics of Speech | |
| September 21 | Speech Recognition Overview | |
| September 26 | Fast Fourier Transform | Homework 1 Due. Homework 2 Assigned. |
| September 28 | No Class - Rosh Hashannah | |
| October 3 | Speech Representations - MFCC | |
| October 5 | Introduction to Statistical Modeling and Machine Learning | |
| October 10 | No Class - Columbus Day | |
| October 12 | Acoustic modeling - Gaussian Mixture Model | |
| October 17 | Sequential modeling - Hidden Markov Model | Homework 2 Due. |
| October 19 | Pronunciation Modeling | Homework 3 Assigned |
| October 24 | Language Modeling - N-grams | |
| October 26 | Human Speech Perception | |
| October 31 | Slack day and Midterm review | |
| November 2 | In-class Midterm Exam | Homework 3 Due. |
| November 7 | Introduction to Prosody | |
| November 9 | ``Rich'' Transcription | Homework 4 Assigned. |
| November 14 | Modeling Prosody | |
| November 16 | Modeling Prosody Part II. | |
| November 21 |
Introduction to Speech Synthesis Guest Lecture by Raul Fernandez (IBM) |
|
| November 23 | Predicting Prosody from Text | Homework 4 Due. |
| November 28 | Text Normalization | Homework 5 Assigned. |
| November 30 | User Interaction | |
| December 5 | User Interaction ctd. | |
| December 7 | Machine Learning in Spoken Language Processing | Homework 5 Due. Final Project Writeup Description |
| December 12 | Spoken Dialog System Working Session | |
| December 21 @ 1:45 - 3:45 | Spoken Dialog System Demos. Note different start time. | Final Project Due. [No extensions will be granted] |