Effective Algorithms in Machine Learning and Statistics (CSCI599)
Spring 2016
This course is a graduate introduction to the algorithmic aspects of machine learning and statistics. Our focus will be on the design and analysis of statistically and computationally efficient inference algorithms in well-defined learning models.
The first part of the course will survey recent algorithmic techniques in the context of unsupervised learning. The goal of unsupervised learning is to discover hidden structure in a set of unlabeled data. We will study algorithms for learning probability distributions in various settings (shape restricted distributions, mixture models, latent variable models, topic models).
The second part of the course will focus on algorithmic problems in supervised learning. The goal of supervised learning is to infer a function from a set of labeled observations. More specifically, we will study algorithms for learning Boolean functions from labeled examples in a number of models (online learning, PAC learning, SQ learning, learning with noise, etc.).
The course is primarily targeted toward graduate students who want to gain familiarity with mathematical and algorithmic tools for the analysis of large datasets. Advanced undergraduate students with sufficient mathematical maturity are welcome (subject to instructor's approval).
Course Information
- Instructor: Ilias Diakonikolas
Office Hours: By appointment (send me an email).
- Teaching Assistant: Li Han
Office Hours: By appointment (send Li an email).
- Lectures: Wednesdays 14:00-17:20, THH 215.
Prerequisites
Mathematical maturity. Solid background in algorithms, linear algebra, and probability.
Course Outline
Here is a rough outline of the course material:
- Introduction: Supervised versus Unsupervised Learning
- Unsupervised Learning and Estimation
- Distribution Property Testing
- Learning Structured Distributions
- Method of Moments: Learning Mixture Models
- Tensor Methods; Learning Latent Variable Models, Topic Models
- Online Learning: Winnow, Perceptron
- PAC Learning: Learning Decision Trees and DNFs
- Uniform Distribution PAC Learning: Fourier Analysis, Low Degree Algorithm
- Learning with Noise and Statistical Queries, Learning Juntas and Noisy Parity
- Agnostic Learning, Noise Sensitivity
- Learning from Limited Information, Chow Parameters Problem
Lectures
- Lecture 1 (January 13) Introduction.
Supervised versus Unsupervised Learning. Examples: Learning Linear Separators, Learning Discrete Distributions. Introduction to Property Testing: Testing Uniformity.
- Lecture 2 (January 20) Distribution Property Testing: L2 testing, Reduction of L1 testing to L2 testing; Testing Identity, Closeness, and Independence.
slides
- Lecture 3 (January 27) Analysis of L2 tester; Sample complexity lower bounds via information theory.
- Lecture 4 (February 3) Tight Lower Bound for Testing Closeness; Introduction to Learning Structured Distributions.
slides
- Lecture 5 (February 10) Learning Structured Distributions: Overview of Piecewise Polynomial Learning and
Learning Sums of Random Variables. slides
- Lecture 6 (February 17) Distribution Learning via Piecewise Polynomial Approximation.
slides
Course Evaluation
Homework Assignments: There will be 2 homework assignments will count for 30% of the grade. The assignments which will be proof-based, and are intended to be challenging.
Collaboration and discussion among students is allowed, though students must write up their solutions independently.
- First Homework: [pdf] (due on March 2)
Course Project: A major part of the course (50% of the grade) is a research project on a topic related to algorithmic aspects of learning theory. Projects can be completed individually or in groups of two students.
The goal of the project is to become an expert in an area related to the class material, and potentially contribute to the state of the art. There are two aspects to the course project. The first is a literature review: Students must decide on a topic and a list of papers, understand these papers in depth, and write a survey presentation of the results in their own words. The second aspect is to identify a research problem/direction on the chosen topic, think about it, and describe the progress they make.
Students must consult with the instructor during the first half of the course for help in forming project teams, selecting a suitable project topic, and selecting a suitable set of research papers.
Students will be graded on the project proposal (10%), the progress report (10%), and the final report (20%). Each student will give a project presentation (10%) during the last week of classes.
The remaining part of the grade will be based on class participation (10%) and scribing lecture notes (10%).
Possible project topics can be found here.
Readings
Distribution Property Testing:
- Survey article by Rubinfeld; and
Section 3 of survey by Canonne.
- Two papers: Chan, Diakonikolas, Valiant, and Valiant [pdf];
Diakonikolas and Kane [pdf]
- Basics of Information Theory: Chapter 2 of Cover and Thomas book.
- Sample Complexity Lower Bounds: Sections 2.3, 2.4, 4.2, 4.3 of Bar-Yossef's thesis;
Section 3 of [pdf].
Learning Structured Distributions:
- Survey article by instructor.
- Two papers: [pdf];
and [pdf]
- Useful Probabilistic Tools in Density Estimation: Chapters 3-5 of Devroye-Lugosi book.
- Sample Complexity Lower Bounds: Chapter 2 of Tsybakov's book.
Statement on Academic Conduct and Support Systems
Academic Conduct
Plagiarism - passing someone else's ideas as your own, either verbatim
or recast in your own words - is a serious academic offense with serious
consequences. Please familiarize yourself with the discussion of
plagiarism in SCampus in Section 11, Behavior Violating
University Standards
https://scampus.usc.edu/1100-behavior-violating-university-standards-and-appropriate-sanctions/.
Other forms of academic dishonesty are equally unacceptable. See
additional information in SCampus and university policies on scientific
misconduct, http://policy.usc.edu/scientific-misconduct/.
Discrimination, sexual assault, and harassment are not tolerated by the
university. You are encouraged to report any incidents to the Office
of Equity and Diversity http://equity.usc.edu/ or to the
Department of Public Safety http://capsnet.usc.edu/department/department-public-safety/online-forms/contact-us.
This is important for the safety whole USC community. Another member of
the university community - such as a friend, classmate, advisor, or
faculty member - can help initiate the report, or can initiate the report
on behalf of another person. The Center for Women and Men http://www.usc.edu/student-affairs/cwm/
provides 24/7 confidential support, and the sexual assault resource center
webpage sarc@usc.edu describes reporting
options and other resources.
Support Systems
A number of USC's schools provide support for students who need
help with scholarly writing. Check with your advisor or program staff to
find out more. Students whose primary language is not English should check
with the American Language Institute
http://dornsife.usc.edu/ali, which
sponsors courses and workshops specifically for international graduate
students. The Office of Disability Services and Programs
http://sait.usc.edu/academicsupport/centerprograms/dsp/home_index.html
provides certification for students with disabilities and helps arrange
the relevant accommodations. If an officially declared emergency makes
travel to campus infeasible, USC Emergency Information
http://emergency.usc.edu/ will
provide safety and other updates, including ways in which instruction will
be continued by means of blackboard, teleconferencing, and other
technology.