Machine Learning addresses the problem of identifying patterns in data. The major goal of machine learning is allow to computers to learn (potentially complex) patterns from data, and then make decisions based on these patterns.
This class will provide an introduction to the fundamentals of this discipline.
Much of machine learning relies on mathematical foundations from Probability and Statistics. The course will provide an overview to the requisite math. However, students with some exposure to this field (though MATH 241 or other coursework) will have a smoother time understanding the mathematical underpinnings of the material.
Upon successful completion of this course, a student can expect to have:
- Familiarity with a set of well-known supervised, unsupervised and semi-supervised learning algorithms.
- The ability to implement some basic machine learning algorithms.
- Understanding of how machine learning algorithms are evaluated.
- The ability to comprehend a Machine Learning conference paper (NIPS, ICML, eg.).
Class Policy
Come to Class. It will be difficult to do well in the class without regular attendance. There is no additional penalty for missing class.
Cell phones must be on silent, and are not to be checked or used during class - if you are expecting an urgent call, tell the instructor at the start of class.
Laptops are fine for taking notes. No internet, no chat, no games.
Cell phone and Laptop policy: One warning, after that 5 points off the next homework for each issue. Same policy for the instructor. One warning, after that, everyone gets 5 points on the next homework.
Text Book
Pattern Recognition and Machine Learning by Christopher Bishop.
ISBN-13: 978-0387310732
Retail Price: $89.95 (12/27 Amazon Price: $58.83)
Assignments: 50% (5 x 10%)
Final Project: 50%
The Final Letter Grade will be based on a scaled adjustment of the Final Numeric Grade. When the scale has been determined, the class will be informed either in class or over email, and it will be posted to the course webpage (here).
Note: Grading requirements for Master's students and Undergraduates will be outlined on each homework assignment. Typically Master's students will have an additional problem on written assignments, and more thorough experimentation on implementational assignments.
Do not cheat. You may discuss assignments with your classmates, but write or program your assignment alone. Do not ask for or offer to share code, or written assignments. If you discuss an assignment with a classmate, or on an online forum, include the name of the classmate or URL of the forum on your assignment or in the documentation of your code. The first instance of cheating results in an automatic zero for the assignment (or final project). A second instance of cheating results in a zero (F) for the course. The Computer Science Department will be notified in writing of all instances of cheating. On a second instance a report will be submitted to the Office of Academic Integrity.
Assignments will be posted to the website (here) after class on Tuesdays.
All assignments will be scored out of 100 points.
There are 6 assignments. Each assignment will have a theoretical (pen-and-paper) component. Assignments may also include an implementation (coding) component.
Assignments will be due just after the start of class, 5:05pm. Assignments should be delivered electronically.
Deliver assignments with a timestamp before 5:05pm to avoid a late penalty. If an extension is needed let me know as soon as possible. I will do my best to be reasonable to you and fair to the rest of class. Delivering an assignment while being more than 5 minutes late for class will be make the assignment considered Late. There is a 5 point Late Penalty for each 12 hours late the assignment is delivered. Tuesday 5:05pm - Wednesday 5:05am: -5 points. Wednesday 5:06am - Wednesday 5:05pm: -10 pts. Wednesday 5:06pm - Thursday 5:05am: -15pts. Thursday 5:06am - Thursday 5:05pm: -20pts.
Grades will be posted at 5:00pm, 2 days following the due date. After 5:00pm, 2 days after an assignment was due, no assignments will be accepted. Assignments that were delivered on time will be returned 2 days following the due date.
After each assignment and the midterm is graded, anonymous mean, median and standard deviations of scores will be presented during class.
Coding Assignments
Coding assignments can be written in C++, java or python.
In general, grading will be 65% Implementation (compilation, passing tests, implementational details) and 35% Documentation and Style. This may be adjusted for some assignments. Always read the assignment for the grading breakdown.
Testing will be performed automatically. Sample tests will be delivered with each assignment. If code does not operate using the published and distributed testing format, the assignment will be considered Incorrect and a significant (~50%) Implementation penalty will be imposed.
Detailed requirements will accompany each assignment. The instructions and requirements on a particular assignment always take precedence over the general guidelines on the course website.
Submission of coding assignments should be performed over electronically. Submitting multiple times is fine. The latest assignment submitted on time will be graded. If you submit an assignment late, after submitting an assignment on time, you must let me know, via email, that you would like the late submission graded for the assignment.
README guidelines
Each coding assignment will require a README file as a component of its documentation. A README file should provide a high-level description of your assignment, or project.
A successful README file will include the following:
- A description of the problem addressed -- in plain English.
- A description of your solution to the problem -- again, in plain English.
- If you feel that either of these descriptions can benefit from the inclusion of code, include pseudocode rather than a verbatim code listing.
- A description of each file that is part of the submission.
- Information about how to use your code -- instructions for compilation and running your code from the command line, if structured as an API, how to use implemented methods (arguments, preconditions, postconditions)
- Indication of any areas where your code differs from the assignment requirements -- any area of incompleteness, different method signatures, different command line parameters, etc.
A sample README will be distributed with the first implementation assignment to serve as a template.
Written Assignments
Written Assignments should also be delivered electronically.
Electronic copies must be in one of the following formats: .pdf, Microsoft Word .doc, Google Docs.
Points for each question will be described in each assignment.
The Final Project can take one of two forms: A paper or an implementation project. Possible project ideas will be presented in class. Individual meetings about the project topics will take place early in the semester. A progress meeting will take place more than 2 weeks before the project is due. Part of the project will be a short (5-10 minute) presentation of your work.
Survey Paper.
Identify a relatively narrow machine learning topic. Extensively review the literature, and write a paper describing the topic, approaches and evaluation. The goal of this paper should be that an intelligent and reasonably informed reader with no exposure to the topic would be able to understand the topic, know the current open issues - unresolved research questions, and know where to look for more information on the topic.
A successful survey paper would be approximately 10 pages.
Project
Perform a machine learning experiment. This will involved implementation of a machine learning algorithm, evaluation on data set as well as comparing the results to other approaches. Some part of this experiment should be novel. Either a modification to an existing machine learning algorithm, or a novel evaluation technique or application of the algorithm to a new problem or new set of data. Note: a successful project does not need to generate state-of-the-art results. Some element of novelty, however, is required. A short, 4 page, report on the algorithm, dataset/problem, and evaluation is expected as part of the project.