CSCI 3813/780: Machine Learning

Spring 2011
Monday Wednesday 5:00pm - 6:15pm
NSB B141
Instructor: Andrew Rosenberg (andrew_at_cs.qc.cuny.edu)
Office Hours: Monday 3:30-4:30pm NSB A330

Course Description

Machine Learning addresses the problem of identifying patterns in data. The major goal of machine learning is allow to computers to learn (potentially complex) patterns from data, and then make decisions based on these patterns. This class will provide an introduction to the fundamentals of this discipline.

Much of machine learning relies on mathematical foundations from Probability and Statistics. The course will provide an overview to the requisite math. However, students with some exposure to this field (though MATH 241 or other coursework) will have a smoother time understanding the mathematical underpinnings of the material.

Learning Outcomes

Upon successful completion of this course, a student can expect to have:

  1. Familiarity with a set of well-known supervised, unsupervised and semi-supervised learning algorithms.
  2. The ability to implement some basic machine learning algorithms.
  3. Understanding of how machine learning algorithms are evaluated.
  4. The ability to comprehend a Machine Learning conference paper (NIPS, ICML, eg.).

Textbook

Machine Learning: An Algorithmic Perspective by Stephen Marsland.
ISBN-13: 978-1420067187
Retail Price: $72.95 (1/11 Amazon Price: $59.28)

Optional Supplemental Text:
Pattern Recognition and Machine Learning by Christopher Bishop.
ISBN-13: 978-0387310732
Retail Price: $94.95 (2/27 Amazon Price: $61.77)

Class Policies

Come to Class. It will be difficult to do well in the class without regular attendance. There is no additional penalty for missing class.

Cell phones must be on silent, and are not to be checked or used during class - if you are expecting an urgent call, tell the instructor at the start of class.

No laptops, tablets or lab computers.

Cell phone and Laptop policy: One warning, after that 5 points off the next homework for each issue. Same policy for the instructor. One warning, after that, everyone gets 5 points on the next homework.

Grading Policy

Assignments: 50% (5 x 10%)

Final Project: 50%

The Final Letter Grade will be based on a scaled adjustment of the Final Numeric Grade. When the scale has been determined, the class will be informed either in class or over email, and it will be posted to the course webpage (here).

Assignment Policy

Do not cheat. You may discuss assignments with your classmates, but write or program your assignment alone. Do not ask for or offer to share code, or written assignments. If you discuss an assignment with a classmate, or on an online forum, include the name of the classmate or URL of the forum on your assignment or in the documentation of your code. The first instance of cheating results in an automatic zero for the assignment (or final project). A second instance of cheating results in a zero (F) for the course. The Computer Science Department will be notified in writing of all instances of cheating. On a second instance a report will be submitted to the Office of Academic Integrity.

Assignments will be posted to the website (here) after class the date that they are assigned.

All assignments will be scored out of 100 points.

There are 5 assignments. Each assignment will have a theoretical (pen-and-paper) component. Assignments may also include an implementation (coding) component.

Assignments will be due by 11:59pm on their due date. Assignments should be delivered electronically, via email.

Deliver assignments with a timestamp before 11:59pm on the due date to avoid a late penalty. If an extension is needed let me know as soon as possible. I will do my best to be reasonable to you and fair to the rest of class.

Late penalties will be assessed at 10 points for each day the assignment is late. No assignments will be accepted after the next class meeting after the assignment is due.

Coding Assignments

Coding assignments can be written in C++, java or python.

In general, grading will be 65% Implementation (compilation, passing tests, implementational details) and 35% Documentation and Style. This may be adjusted for some assignments. Always read the assignment for the grading breakdown.

Testing will be performed automatically. Sample tests will be delivered with each assignment. If code does not operate using the published and distributed testing format, the assignment will be considered Incorrect and a significant (~50%) Implementation penalty will be imposed.

Detailed requirements will accompany each assignment. The instructions and requirements on a particular assignment always take precedence over the general guidelines on the course website.

Submission of coding assignments should be performed over electronically. Submitting multiple times is fine. The latest assignment submitted on time will be graded. If you submit an assignment late, after submitting an assignment on time, you must let me know, via email, that you would like the late submission graded for the assignment.

README guidelines

Each coding assignment will require a README file as a component of its documentation. A README file should provide a high-level description of your assignment, or project.

A successful README file will include the following:

Written Assignments

Written Assignments should also be delivered electronically, via email or google docs.

Electronic copies must be in one of the following formats: .pdf, Microsoft Word .doc, Google Docs.

Points for each question will be described in each assignment.

Incomplete Policy

In extenuating circumstances, students may be given an Incomplete if material has not been completed by the end of the semester. When an incomplete is granted, the student and instructor will specify, in writing, a timeframe for all outstanding material to be submitted. If no other timeframe has been specified in writing, the deadline for all outstanding material to be submitted to resolve an incomplete will be one month following the last meeting of the class. This semester, that would make the deadline: June 14. An incomplete that is not resolved by this deadline will become an F.

Final Project

The Final Project will be a machine learning experiment. Possible project ideas will be presented in class. Individual meetings about the project topics will take place early in the semester. A progress meeting will take place more than 2 weeks before the project is due. Part of the project will be a short (5-10 minute) presentation of your work.

The goal of the project is to perform a machine learning experiment and report its results. This will involved implementation of a machine learning algorithm, evaluation on data set as well as comparing the results to other approaches. Some part of this experiment should be novel. Good project ideas will involve either a modification to an existing machine learning algorithm, or a novel evaluation technique or application of the algorithm to a new problem or new set of data. Note: a successful project does not need to generate state-of-the-art results. Some element of novelty, however, is expected. A short, 4 page, report on the algorithm, dataset/problem, and evaluation is expected as part of the project.

Schedule

Date Material Assignments
Monday, January 30 No Class.
Wednesday, February 1 Welcome. Introduction.
[pptx]
Monday, February 6 Decision Trees
[pptx]
Read Chapter 6
Wednesday, February 8 Math Primer: Probability and Statistics.
[pptx]
Read Chapter 8
Homework 1 Assigned. [pdf]
[training.txt]
[testing.txt]
Monday, February 13 Lincoln's Birthday Observed. No class.
Wednesday, February 15 Math Primer: Linear Algebra, Vector Calculus, Lagrange Multipliers.
[pptx]
Monday, February 20 President's Day. No class
Tuesday, February 21
Note unusual meeting date
Classes Follow a Monday Schedule
Linear Regression.
[ppt]
Wednesday, February 22 Linear Regression II: Regularization
[ppt]
Homework 2 Assigned
training
testing
Monday, February 27 Clustering.
[ppt]
Wednesday, February 29 Evaluation.
[ppt]
Monday, March 5 Evaluation II -- Clustering Evaluation.
[ppt]
Wednesday, March 7 Support Vector Machines (probably part 1 of 2)
[ppt]
Homework 3 Assigned [pdf]
Monday, March 12 Perceptrons
[pptx]
Wednesday, March 14 Neural Networks. (Multilayer Perceptron Networks)
Guest Lecturer: Taylor Cassidy
[pptx]
Monday, March 19 Support Vector Machines (probably part 2 of 2)
[ppt]
Wednesday, March 21 Kernel Methods
[pptx]
Homework 4 Assigned
Monday, March 26 Logistic Regression (part 1 of 2) [ppt]
Wednesday, March 28 Logistic Regression (part 2 of 2)
Monday, April 2 Gaussian Mixture Models
[pptx]
Wednesday, April 4 Expectation Maximization.
[pptx]
Homework 5 Assigned
EM.java
EMExperimenter.java data
Monday, April 9 Spring Recess. No class.
Wednesday, April 11 Spring Recess. No class.
Monday, April 16 Hidden Markov Models. [pdf]
Wednesday, April 18 Hidden Markov Models II.
[as previous class]
Monday, April 23 Graphical Models [ppt] Homework 5 Due
Wednesday, April 25 Inference in Graphical Models. [ppt]
Monday, April 30 Sampling [ppt]
Wednesday, May 2 Markov Chain Monte Carlo. [as previous class]
Monday, May 7 Semi-supervised Approaches [pptx]
Wednesday, May 9 Ensemble Methods [ppt]
Monday, May 14 Research Presentations. Last Day of Class Final Project Due