Stat 413: Introduction to Statistical Machine Learning

Course Information:
Statistics 413
Fall 2017
Tuesdays & Thursdays, 4:00 - 5:15pm
Location: Herzstein Hall 210
Piazza Course Webpage

Genevera Allen
Office: DH2080
Office Hours: Thursday 5:15 - 6:30pm

Teaching Assistant:
Tianyi Yao
Office: DH2090
Office Hours: Wednesdays 4-5pm
Recitation: Wednesdays 5-6pm, Mech Lab 254


Course Schedule:

Recommended Textbooks:
Elements of Statistical Learning by Hastie, Tibshirani & Friedman.
Introduction to Statistical Learning by James, Witten, Tibshirani & Hastie.
Statistical Learning with Sparsity by Hastie, Tibshirani & Wainwright.

Grading Policy:
Homeworks 35%
Midterm Exam 20%
Final Project / Competition 40%
Class Participation 5%

Homework Policy:
A hard copy of homeworks are due in class or by 5pm to the TAs in Duncan Hall. There will be a deduction of 25% of the grade for homeworks that are not typeset using LaTeX. There will be a deduction of 25% of the grade for each day homeworks are late. Homeworks may be discussed with classmates but must be written and submitted individually.

Midterm Exam:
There will be an (open book / open notes) in-class midterm exam on November 7.

Final Project & Contest:
The final project will be a data analysis contest. The competition will begin on September 19 and can be done in teams of two people. Competition Description: [pdf]

Important Deadlines:
September 8: Add Deadline
October 6: Drop Deadline

Announcements, Assignments & Lectures:

  • Data Science Competition @ Kaggle In Class

  • Link to join the competition

  • Link to vote on competition presentations.

  • Instructions for Competition Presentations on Thursday, November 30th [.pdf]

  • Lecture 24 Outline [.pdf] Lecture 24 Code [.R]

  • Homework 4 Solutions [.pdf]

  • Lecture 23 Outline [.pdf] Lecture 23 Code [.R]

  • Homework 5 Assignment [.pdf]

  • Lecture 22 Outline [.pdf] Lecture 22 Code [.R]

  • Lecture 21 Outline [.pdf] Lecture 21 Code [.R]

  • Homework 3 Solutions [.pdf]

  • Lecture 19 Outline [.pdf] Lecture 19 Code [.R]

  • Lecture 18 Outline [.pdf] Lecture 18 Code [.R]

  • Homework 4 Assignment [.pdf] Author data [.csv]

  • Lecture 17 Outline [.pdf] Lecture 17 Code [.R]

  • Lecture 16 Outline [.pdf] Lecture 16 Code [.R]

  • Midterm Study Guide [.pdf]

  • Lecture 15 Outline [.pdf] Lecture 15 Code [.R]

  • Homework 3 Assignment [.pdf]

  • Lecture 14 Outline [.pdf] Lecture 14 Code [.R]

  • Homework 2 Solutions [.pdf]

  • Lecture 13 Outline [.pdf] Lecture 13 Code [.R]

  • Lecture 12 Outline [.pdf] Lecture 12 Code [.R]

  • Lecture 11 Outline [.pdf] Lecture 11 Code [.R]

  • Lecture 10 Outline [.pdf] Lecture 10 Code [.R]

  • Homework 2 Assignment [.pdf]

  • Homework 1 Solutions [.pdf]

  • Lecture 9 Outline [.pdf] Lecture 9 Code [.R]

  • Lecture 8 Outline [.pdf] Lecture 8 Code [.R] Model Validation Discussion Scenarios [.pdf]

  • Lecture 7 Outline [.pdf] Lecture 7 Code [.R] Model Validation Discussion Scenarios [.pdf]

  • Lecture 6 Outline [.pdf] Lecture 6 Code [.R]

  • Lecture 5 Outline [.pdf] Lecture 5 Code [.R]

  • Homework 1 Assignment [.pdf]

  • Homework 0 Solutions [.pdf]

  • Lecture 4 Outline [.pdf] Lecture 4 Code [.R]

  • Lecture 3 Outline [.pdf] Lecture 3 Code [.R]

  • Lecture 2 Outline [.pdf] Lecture 2 Code [.R]

  • Homework 0 Assignment [.pdf]

  • Lecture 1 Outline [.pdf] Lecture 1 Code [.R]

  • All students who are interested in this course (even those on the waitlist or who aren't registered) should attend the first lecture. HW 0, assigned on the first day, will illustrate the mathematical and computational level of the course and ensure that students have met the course prerequisites.

Course Description

This course is an introduction to concepts, methods, and best practices in statistical machine learning. Topics covered include regularized regression, classification, kernels, dimension reduction, clustering, trees, and ensemble learning. Emphasis will be placed on applied data analysis. Students will learn how and when to apply statistical learning techniques, their comparative strengths and weaknesses, and how to critically evaluate the performance of learning algorithms. Students completing this course should be able to (i) apply basic statistical learning methods to build predictive models or perform exploratory analysis, (ii) properly tune, select and validate statistical learning models, and (iii) interpret their results.

Tentative Lecture Schedule:

August 22: Introduction & KNN
August 24: MSE & Least Squares
September 5: Ridge Regression
September 7: Sparse Regression I
September 12: Sparse Regression II
September 14: Model Validation I
September 19: Model Validation II
September 21: GLMs I
September 26: GLMs II
September 28 & Sparse GLMs & Bayes Classifiers
October 3: LDA
October 5: SVMs I
October 12: SVMs II
October 17: Intro to Non-Linear I
October 19: Intro to Non-Linear II
October 24: Dimension Reduction I
October 26: Dimension Reduction II
October 31: Clustering I
November 2: Clustering II
November 7: In-Class Midterm Exam
November 9: Trees & Bagging
November 14: Random Forests
November 16: Boosting
November 21: Boosting & Ensemble Learning
November 28: Competition Presentations
November 30: Competition Presentations

Assignment Schedule

Assignment Assigned Due
HW 0August 22September 5
HW 1September 7September 21
HW 2September 26October 12
HW 3October 17October 31
HW 4October 26November 9
HW 5November 9November 21
Competition Progress ReportSeptember 19October 19
Competition Final Report September 19 December 1