Stat 613: Statistical Machine Learning

Course Information:
Statistics 613
Fall 2017
Tuesdays & Thursdays, 10:50am - 12:05pm
Location: DH 1042
Piazza Course Webpage

Genevera Allen
Office: DH2080
Office Hours: Thursday 5:15 - 6:30pm

Frederick Campbell
Office: DH2077
Office Hours: Tuesdays 12-1pm

Teaching Assistant:
Minjie Wang
Office: DH2125
Office Hours: Mondays 3-4pm
Recitation: Wednesdays 5-6pm, DH 1046


Course Schedule:

Recommended Textbooks:
Elements of Statistical Learning by Hastie, Tibshirani & Friedman.
Introduction to Statistical Learning by James, Witten, Tibshirani & Hastie.
Statistical Learning with Sparsity by Hastie, Tibshirani & Wainwright.
Statistics for High-Dimensional Data by Buhlmann & van de Geer.
Convex Optimization by Boyd & Vandenberghe.

Grading Policy:
Homeworks 35%
Midterm Exam 20%
Final Project / Competition 40%
Class Participation 5%

Homework Policy:
A hard copy of homeworks are due in class or by 12pm to the TA in Duncan Hall. All homeworks must be typeset using LaTeX and no late homeworks will be accepted. Homeworks may be discussed with classmates but must be written and submitted individually.

Midterm Exam:
There will be an (open book / open notes) in-class midterm exam on November 7.

Final Project & Contest:
The final project will be a data analysis contest. The competition will begin on September 19 and can be done in teams of two people. Competition Description: [pdf]

Important Deadlines:
September 8: Add Deadline
October 6: Drop Deadline

Announcements, Assignments & Lectures:

  • Data Science Competition @ Kaggle In Class

  • Link to join the competition

  • Instructions for Competition Presentations on Tuesday, November 28th and Thursday, November 30th [.pdf]

  • Link to vote on competition presentations.

  • Homework 4 Solutions [.pdf]

  • Lecture 23 Outline [.pdf] Lecture 23 Code [.R]

  • Lecture 22 Outline [.pdf] Lecture 22 Code [.R]

  • Homework 5 Assignment [.pdf]

  • Lecture 21 Outline [.pdf] Lecture 21 Code [.R]

  • Homework 3 Solutions [.pdf]

  • Homework 4 Assignment [.pdf] Author data [.csv]

  • Lecture 19 Outline [.pdf] Lecture 19 Code [.R]

  • Lecture 18 Outline [.pdf] Lecture 18 Code [.R]

  • Lecture 17 Outline [.pdf] Lecture 17 Code [.R]

  • Lecture 16 Outline [.pdf] Lecture 16 Code [.R]

  • Midterm Study Guide [.pdf]

  • Lecture 15 Outline [.pdf] Lecture 15 Code [.R] Model Validation Discussion Scenarios [.pdf]

  • Homework 3 Assignment [.pdf]

  • Lecture 14 Outline [.pdf] Lecture 14 Code [.R]

  • Homework 2 Solutions [.pdf]

  • Lecture 13 Code [.m]

  • Polynomial Example Code [.R]

  • Homework 2 Assignment [.pdf]

  • Homework 1 Solutions [.pdf]

  • Lecture 11 Code [.m]

  • GLM Example Code [.R]

  • Lecture 8 Code [.m]

  • Lecture 6 Outline [.pdf]

  • Lecture 4 Outline [.pdf]

  • Homework 1 Assignment [.pdf]

  • Homework 0 Solutions [.pdf]

  • Lecture 3 Outline [.pdf] Lecture 3 Code [.R]

  • Class resumes on Tuesday, September 5 and in a new location: DH 1042. Also, HW 0 is due.

  • Lecture 2 Outline [.pdf] Lecture 2 Code [.R]

  • Homework 0 Assignment [.pdf]

  • Lecture 1 Outline [.pdf] Lecture 1 Code [.R]

  • All students who are interested in this course (even those on the waitlist or who aren't registered) should attend the first lecture. HW 0, assigned on the first day, will illustrate the mathematical and computational level of the course and ensure that students have met the course prerequisites.

Course Description

This course is an advanced survey of statistical machine learning theory and methods. Emphasis will be placed methodological, theoretical, and computational aspects of tools such as regularized regression, classification, kernels, dimension reduction, clustering, graphical models, trees, and ensemble learning. Students will learn how and when to apply statistical learning techniques, their comparative strengths and weaknesses, their mathematical and statistical properties, how to compute each method and how to critically evaluate the performance of learning algorithms. Students completing this course should be able to (i) apply sophisticated statistical learning methods to build predictive models or perform exploratory analysis, (ii) evaluate methods for a mathematical, statistical and computational perspective, and (iii) properly validate statistical learning models and interpret their results.

Tentative Lecture Schedule:

August 22: Intro & MSE / Least Squares
August 24: Ridge Regression
September 5: Sparse Regression I
September 7: Sparse Regression II
September 12: High-Dimensional Theory I
September 14: High-Dimensional Theory II
September 19: GLMs & Regularized GLMs
September 21 & Bayes Classifiers
September 26: LDA
September 28: SVMs I
October 3: SVMs II
October 5: Non-Linear I
October 12: Non-Linear II
October 17: Model Validation I
October 19: Model Validation II
October 24: Dimension Reduction I
October 26: Dimension Reduction & Clustering
October 31: Clustering II
November 2: Graphical Models
November 7: In-Class Midterm Exam
November 9: Trees & Bagging
November 14: Random Forests
November 16: Boosting
November 21: Boosting & Ensemble Learning
November 28: Competition Presentations
November 30: Competition Presentations

Assignment Schedule

Assignment Assigned Due
HW 0August 22September 5
HW 1September 7September 21
HW 2September 26October 12
HW 3October 17October 31
HW 4October 26November 9
HW 5November 9November 21
Competition Progress ReportSeptember 19October 19
Competition Final Report September 19 December 1