Statistics 640 / 444
Tuesdays & Thursdays, 4:00 - 5:15pm
Location: Herzstein Hall 210
Piazza Course Webpage
Office Hours: Thursday 5:15 - 6:15pm
Stat 444 Teaching Assistant:
Office Hours: Wednesday 1 - 2pm
Stat 444 Recitation: Wednesday 5 - 6pm in Duncan 1070.
Stat 640 Teaching Assistant:
Office Hours: Wednesday 1 - 2pm
Stat 640 Recitation: Monday 5:30 - 6:30pm in Duncan 1070.
Elements of Statistical Learning by Hastie, Tibshirani & Friedman.
Introduction to Statistical Learning by James, Witten, Tibshirani & Hastie.
Statistical Learning with Saprsity by Hastie, Tibshirani & Wainwright.
Statistics for High-Dimensional Data by Buhlmann & van de Geer.
Convex Optimization by Boyd & Vandenberghe.
Midterm Exam 25%
Final Project 45%
Class Participation 5%
A hard copy of homeworks are due by 5pm to the TAs in Duncan Hall. Late homeworks will NOT be accepted. Homeworks may be discussed with classmates but must be written and submitted individually.
There will be an (open book / open notes) in-class midterm exam on November 3.
Final Project & Contest:
The final project will be a data analysis contest. The competition will begin on Tuesday, September 13 and can be done in teams of two people. Competition Description: [pdf] Data Use Agreement: [pdf]
September 2: Add Deadline
October 7: Drop Deadline
Announcements, Assignments & Lectures:
Course DescriptionThis course is a survey of statistical learning methods and will cover major techniques and concepts for both supervised and unsupervised learning. Topics covered include penalized regression and classification, support vector machines, kernel methods, model selection, matrix factorizations, graphical models, clustering, boosting, random forests, and ensemble learning. Students will learn how and when to apply statistical learning techniques, their comparative strengths and weaknesses, and how to critically evaluate the performance of learning algorithms. Students completing this course should be able to (i) apply basic statistical learning methods to build predictive models or perform exploratory analysis, (ii) properly tune, select, and validate statistical learning models, and (iii) build an ensemble of learning algorithms.
Tentative Schedule:HW 0 Assigned: August 23
August 23: Introduction & KNN
August 25: MSE & Least Squares
August 30: Ridge Regression
September 1: Sparse Regression I
HW 0 Due: September 6
HW I Assigned: September 6
September 6: Sparse Regression II
September 8: Non-Linear Regression
Competition Opens: September 13
September 13: Competition Intro
September 15: Linear Discriminant Analysis
September 20: LDA & Logistic Regression
HW I Due: September 20
September 22: Sparse Classification
HW II Assigned: September 22
September 27: Support Vector Machines I
September 29: SVMs II
October 4: Non-Linear Classification
Progress Report I Due: October 4
October 6: Matrix Factorizations I
HW II Due: October 13
October 13: Matrix Factorizations II
HW III Assigned: October 18
October 18: Clustering I
October 20: Clustering II
October 25: Model Validation I
October 27: Model Validation II
Progress Report II Due: November 1
November 1: Graphical Models
November 3: In-Class Midterm Exam
HW III Due: November 10
November 8: Trees & Bagging
November 10: Boosting
November 15: Boosting & Ensemble Learning
November 17: Random Forests
Competition Closes: November 20
November 22: Best Practices
November 29: Competition Presentations
December 1: Competition Presentations
Final Report Due: December 2