Machine Learning

Spring 2010
Friday 2:00pm - 4:00pm
Location: Graduate Center. Room TBD
Instructor: Andrew Rosenberg (andrew_at_cs.qc.cuny.edu)
Office Hours: By Appointment

Machine Learning addresses the problem of identifying patterns in data. The major goal of machine learning is allow to computers to learn (potentially complex) patterns from data, and then make decisions based on these patterns. This class will provide an introduction to the fundamentals of this discipline.

Much of machine learning relies on mathematical foundations from Probability and Statistics. The course will provide an overview to the requisite math. However, students with some exposure to this field will have a smoother time understanding the mathematical underpinnings of the material.

Upon successful completion of this course, a student can expect to have:

  1. Familiarity with a set of well-known supervised, unsupervised and semi-supervised learning algorithms.
  2. The ability to implement some basic machine learning algorithms.
  3. Understanding of how machine learning algorithms are evaluated.
  4. The ability to comprehend a Machine Learning conference paper (NIPS, ICML, eg.).

Class Policy

Come to Class. It will be difficult to do well in the class without regular attendance. There is no additional penalty for missing class.

Cell phones must be on silent, and are not to be checked or used during class - if you are expecting an urgent call, tell the instructor at the start of class.

Laptops are fine for taking notes. No internet, no chat, no games.

Cell phone and Laptop policy: One warning, after that 5 points off the next homework or exam for each issue. Same policy for the instructor. One warning, after that, everyone gets 5 points on the next homework or exam.

Text Book

Pattern Recognition and Machine Learning by Christopher Bishop.
ISBN-13: 978-0387310732
Retail Price: $89.95 (12/27 Amazon Price: $58.83)

Assignments: 50% (5 x 10%)

Final Project: 50%

The Final Letter Grade will be based on a scaled adjustment of the Final Numeric Grade. When the scale has been determined, the class will be informed either in class or over email, and it will be posted to the course webpage (here).

Do not cheat. You may discuss assignments with your classmates, but write or program your assignment alone. Do not ask for or offer to share code, or written assignments. If you discuss an assignment with a classmate, or on an online forum, include the name of the classmate or URL of the forum on your assignment or in the documentation of your code. The first instance of cheating results in an automatic zero for the assignment (or midterm or final project). A second instance of cheating results in a zero (F) for the course. The Computer Science Department will be notified in writing of all instances of cheating. On a second instance a report will be submitted to the Office of Academic Integrity.

Assignments will be posted to the website (here) after class on Tuesdays.

All assignments will be scored out of 100 points.

There are 5 assignments. Each assignment will have a theoretical (pen-and-paper) component. Assignments may also include an implementation (coding) component.

Assignments will be due just after the start of class, 2:05pm. Assignments should be delivered electronically.

Deliver assignments with a timestamp before 2:05pm to avoid a late penalty. If an extension is needed let me know as soon as possible. I will do my best to be reasonable to you and fair to the rest of class. Delivering an assignment while being more than 5 minutes late for class will be make the assignment considered Late. There is a 5 point Late Penalty for each 12 hours late the assignment is delivered. Tuesday 2:05pm - Wednesday 2:05am: -5 points. Wednesday 2:06am - Wednesday 2:05pm: -10 pts. Wednesday 2:06pm - Thursday 2:05am: -15pts. Thursday 2:06am - Thursday 2:05pm: -20pts.

Grades will be posted at 2:00pm, 2 days following the due date. After 2:00pm, 2 days after an assignment was due, no assignments will be accepted. Assignments that were delivered on time will be returned 2 days following the due date.

After each assignment and the midterm is graded, anonymous mean, median and standard deviations of scores will be presented during class.

Coding Assignments

Coding assignments can be written in C++, java or python.

In general, grading will be 65% Implementation (compilation, passing tests, implementational details) and 35% Documentation and Style. This may be adjusted for some assignments. Always read the assignment for the grading breakdown.

Testing will be performed automatically. Sample tests will be delivered with each assignment. If code does not operate using the published and distributed testing format, the assignment will be considered Incorrect and a significant (~50%) Implementation penalty will be imposed.

Detailed requirements will accompany each assignment. The instructions and requirements on a particular assignment always take precedence over the general guidelines on the course website.

Submission of coding assignments should be performed over electronically. Submitting multiple times is fine. The latest assignment submitted on time will be graded. If you submit an assignment late, after submitting an assignment on time, you must let me know, via email, that you would like the late submission graded for the assignment.

README guidelines

Each coding assignment will require a README file as a component of its documentation. A README file should provide a high-level description of your assignment, or project.

A successful README file will include the following:

A sample README will be distributed with the first implementation assignment to serve as a template.

Written Assignments

Written Assignments should also be delivered electronically.

Electronic copies must be in one of the following formats: .pdf, Microsoft Word .doc, Google Docs.

Points for each question will be described in each assignment.

The Final Project can take one of two forms: A paper or an implementation project. Possible project ideas will be presented in class. Individual meetings about the project topics will take place early in the semester. A progress meeting will take place at least 2 weeks before the project is due. Part of the project will be a short (15-20 minute) presentation of your work.

Survey Paper.
Identify a relatively narrow machine learning topic. Extensively review the literature, and write a paper describing the topic, approaches and evaluation. The goal of this paper should be that an intelligent and reasonably informed reader with no exposure to the topic would be able to understand the topic, know the current open issues - unresolved research questions, and know where to look for more information on the topic.
A successful survey paper would be approximately 10 pages.

Project
Perform a machine learning experiment. This will involved implementation of a machine learning algorithm, evaluation on data set as well as comparing the results to other approaches. Some part of this experiment should be novel. Either a modification to an existing machine learning algorithm, or a novel evaluation technique or application of the algorithm to a new problem or new set of data. Note: a successful project does not need to generate state-of-the-art results. Some element of novelty, however, is required. A short, 4 page, report on the algorithm, dataset/problem, and evaluation is expected as part of the project.

Date Material Assignments
January 29 Welcome. Introduction. Probability Review. Slides -- Part 1. Part 2.
February 5 Linear Algebra and Vector Calculus Review. Linear Regression. Slides -- Part 1. Part 2. Read Chapter 1.1, 1.2, 2.1, 2.2, 2.3, 3.1 HW 1 stats
February 12 No Class.
February 19 Regularization. Logistic Regression. Part 1. Part 2. HW 1 Due. Read Chapter 1.1, 1.2, 1.5, 4.1, 4.2, 4.3
February 26 Class Cancelled Due to Snow. HW 2 train test stats
March 5 Graphical Models. Part 1. Part 2Belief Propagation. Read Chapter 8.1, 8.2, 8.4
March 12 Graphical Models. Junction Tree Algorithm. Clustering Preview.
Part 1. Part 2
Part 3.
HW 2 Due, Read Chapter 13.1, 13.2
March 19 Hidden Markov Models. Slides
Perceptrons. Slides ppt
Neural Networks. Slides ppt
Read Chapter 4.1.7, 5 HW 3
March 26 Support Vector Machines and Kernel Methods. Part 1ppt Part 2ppt Read Chapter 6. HW 3 Due on 3/28/2010
April 2 No Class
April 9 Expectation Maximization.
Guest Speaker: Sameer Maskey - IBM
HW-4 Assigned stats
April 16 Expectation Maximization in HMMs
Part 1 ppt
Part 2 ppt
Model Adaptation
Part 3 ppt
April 23 Spectral Clustering. Part 1 ppt Part 2 ppt HW-4 Due. HW-5 Assigned clustering data stats
April 30 Evaluation Methods and Slack Part 1 ppt
May 7 Research Presentations. HW-5 Due Monday May 10 @ 11:59pm
May 14 Work on your project.
May 21 Research Presentations. Final Project Due 2:00pm

Valid XHTML 1.0 Transitional