CSCI 3813/780: Machine Learning

Spring 2010
Tuesday Thursday 5:00pm - 6:15pm
NSB A103
Instructor: Andrew Rosenberg (andrew_at_cs.qc.cuny.edu)
Office Hours: Thursday 3:30-4:30pm NSB A330

Machine Learning addresses the problem of identifying patterns in data. The major goal of machine learning is allow to computers to learn (potentially complex) patterns from data, and then make decisions based on these patterns. This class will provide an introduction to the fundamentals of this discipline.

Much of machine learning relies on mathematical foundations from Probability and Statistics. The course will provide an overview to the requisite math. However, students with some exposure to this field (though MATH 241 or other coursework) will have a smoother time understanding the mathematical underpinnings of the material.

Upon successful completion of this course, a student can expect to have:

  1. Familiarity with a set of well-known supervised, unsupervised and semi-supervised learning algorithms.
  2. The ability to implement some basic machine learning algorithms.
  3. Understanding of how machine learning algorithms are evaluated.
  4. The ability to comprehend a Machine Learning conference paper (NIPS, ICML, eg.).

Class Policy

Come to Class. It will be difficult to do well in the class without regular attendance. There is no additional penalty for missing class.

Cell phones must be on silent, and are not to be checked or used during class - if you are expecting an urgent call, tell the instructor at the start of class.

Laptops are fine for taking notes. No internet, no chat, no games.

Cell phone and Laptop policy: One warning, after that 5 points off the next homework for each issue. Same policy for the instructor. One warning, after that, everyone gets 5 points on the next homework.

Text Book

Pattern Recognition and Machine Learning by Christopher Bishop.
ISBN-13: 978-0387310732
Retail Price: $89.95 (12/27 Amazon Price: $58.83)

Assignments: 50% (5 x 10%)

Final Project: 50%

The Final Letter Grade will be based on a scaled adjustment of the Final Numeric Grade. When the scale has been determined, the class will be informed either in class or over email, and it will be posted to the course webpage (here).

Note: Grading requirements for Master's students and Undergraduates will be outlined on each homework assignment. Typically Master's students will have an additional problem on written assignments, and more thorough experimentation on implementational assignments.

Do not cheat. You may discuss assignments with your classmates, but write or program your assignment alone. Do not ask for or offer to share code, or written assignments. If you discuss an assignment with a classmate, or on an online forum, include the name of the classmate or URL of the forum on your assignment or in the documentation of your code. The first instance of cheating results in an automatic zero for the assignment (or final project). A second instance of cheating results in a zero (F) for the course. The Computer Science Department will be notified in writing of all instances of cheating. On a second instance a report will be submitted to the Office of Academic Integrity.

Assignments will be posted to the website (here) after class on Tuesdays.

All assignments will be scored out of 100 points.

There are 6 assignments. Each assignment will have a theoretical (pen-and-paper) component. Assignments may also include an implementation (coding) component.

Assignments will be due just after the start of class, 5:05pm. Assignments should be delivered electronically.

Deliver assignments with a timestamp before 5:05pm to avoid a late penalty. If an extension is needed let me know as soon as possible. I will do my best to be reasonable to you and fair to the rest of class. Delivering an assignment while being more than 5 minutes late for class will be make the assignment considered Late. There is a 5 point Late Penalty for each 12 hours late the assignment is delivered. Tuesday 5:05pm - Wednesday 5:05am: -5 points. Wednesday 5:06am - Wednesday 5:05pm: -10 pts. Wednesday 5:06pm - Thursday 5:05am: -15pts. Thursday 5:06am - Thursday 5:05pm: -20pts.

Grades will be posted at 5:00pm, 2 days following the due date. After 5:00pm, 2 days after an assignment was due, no assignments will be accepted. Assignments that were delivered on time will be returned 2 days following the due date.

After each assignment and the midterm is graded, anonymous mean, median and standard deviations of scores will be presented during class.

Coding Assignments

Coding assignments can be written in C++, java or python.

In general, grading will be 65% Implementation (compilation, passing tests, implementational details) and 35% Documentation and Style. This may be adjusted for some assignments. Always read the assignment for the grading breakdown.

Testing will be performed automatically. Sample tests will be delivered with each assignment. If code does not operate using the published and distributed testing format, the assignment will be considered Incorrect and a significant (~50%) Implementation penalty will be imposed.

Detailed requirements will accompany each assignment. The instructions and requirements on a particular assignment always take precedence over the general guidelines on the course website.

Submission of coding assignments should be performed over electronically. Submitting multiple times is fine. The latest assignment submitted on time will be graded. If you submit an assignment late, after submitting an assignment on time, you must let me know, via email, that you would like the late submission graded for the assignment.

README guidelines

Each coding assignment will require a README file as a component of its documentation. A README file should provide a high-level description of your assignment, or project.

A successful README file will include the following:

A sample README will be distributed with the first implementation assignment to serve as a template.

Written Assignments

Written Assignments should also be delivered electronically.

Electronic copies must be in one of the following formats: .pdf, Microsoft Word .doc, Google Docs.

Points for each question will be described in each assignment.

The Final Project can take one of two forms: A paper or an implementation project. Possible project ideas will be presented in class. Individual meetings about the project topics will take place early in the semester. A progress meeting will take place more than 2 weeks before the project is due. Part of the project will be a short (5-10 minute) presentation of your work.

Survey Paper.
Identify a relatively narrow machine learning topic. Extensively review the literature, and write a paper describing the topic, approaches and evaluation. The goal of this paper should be that an intelligent and reasonably informed reader with no exposure to the topic would be able to understand the topic, know the current open issues - unresolved research questions, and know where to look for more information on the topic.
A successful survey paper would be approximately 10 pages.

Project
Perform a machine learning experiment. This will involved implementation of a machine learning algorithm, evaluation on data set as well as comparing the results to other approaches. Some part of this experiment should be novel. Either a modification to an existing machine learning algorithm, or a novel evaluation technique or application of the algorithm to a new problem or new set of data. Note: a successful project does not need to generate state-of-the-art results. Some element of novelty, however, is required. A short, 4 page, report on the algorithm, dataset/problem, and evaluation is expected as part of the project.

Date Material Assignments
Thursday, January 28 Welcome. Introduction. Slides
Tuesday, February 2 Probability and Statistics. Slides
Thursday, February 4 Linear Algebra and Vector Calculus. Slides
Tuesday, February 9 Linear Regression. Slides HW-1 stats Reading 1.1, 1.2, 3.1
Thursday, February 11 Linear Regression II. Slides Reading 1.2, 2.3, 3.3
Tuesday, February 16 Logistic Regression. Slides Reading 1.5, 4.1, 4.2, 4.3
Thursday, February 18 No Class -- QC Monday Schedule.
Tuesday, February 23 Graphical Models. Slides HW 1 Due @ 5pm Reading Ch 8.1, 8.2
Thursday, February 25 Graphical Models II. Slides Reading Ch 8.3, 8.4
Tuesday, March 2 Graphical Models III. Slides Reading Ch 8.4
Thursday, March 4 Junction Tree Algorithm. Slides
Tuesday, March 9 Clustering overview. Project discussion. Slides HW-2 train test LinearRegression.java Jama Download Math Example stats
Thursday, March 11 Hidden Markov Models. Slides Read Chapter 13.1, 13.2
Tuesday, March 16 Perceptrons Slides ppt Read Chapter 4.1.7, 5 HW-3
Thursday, March 18 Neural Networks. Slides ppt Read Chapter 5
Tuesday, March 23 Support Vector Machines. Slides pptKernel Methods.
Thursday, March 25 Support Vector Machines II and Kernels Slides ppt HW-3 Due 3/28/2010
Tuesday, March 30 No Class.
Thursday, April 1 No Class.
Tuesday, April 6 Recap of Supervised Learning. Introduction to Unsupervised Learning. Slides ppt HW-4 Assigned stats
Thursday, April 8 Class Cancelled.
Tuesday, April 13 Expectation Maximization. slides ppt Reading 9.1, 9.2
Thursday, April 15 Expectation Maximization II. slides ppt Reading 9.3, 9.4.
Tuesday, April 20 Model Adaptation. slides ppt HW-4 Due. HW-5 EM.java EMExperimenter.java data stats
Thursday, April 22 Spectral Clustering. slides ppt
Tuesday, April 27 Supervised Evaluation slides ppt
Thursday, April 29 Unsupervised Evaluation and Slack slides ppt
Tuesday, May 4 Research Presentations. HW-5 Due.
Thursday, May 6 Research Presentations.
Tuesday, May 11 Work on your Project.
Thursday, May 13 Work on your Project. HW-6 Due
Tuesday, May 25 Final Projects Due. Final Project Due at 4:00pm

Valid XHTML 1.0 Transitional