Stat 640 / Stat 444: Statistical Learning



Course Information:
Statistics 640 / 444
Fall 2016
Tuesdays & Thursdays, 4:00 - 5:15pm
Location: Herzstein Hall 210
Piazza Course Webpage

Instructor:
Genevera Allen
Office: DH2098
Email: gallen@rice.edu
Office Hours: Thursday 5:15 - 6:15pm

Stat 444 Teaching Assistant:
John Nagorski
Office: DH1041
Email: john.nagorski@rice.edu
Office Hours: Wednesday 1 - 2pm
Stat 444 Recitation: Wednesday 5 - 6pm in Duncan 1070.

Stat 640 Teaching Assistant:
Michael Weylandt
Office: DH1041
Email: michael.weylandt@rice.edu
Office Hours: Wednesday 1 - 2pm
Stat 640 Recitation: Monday 5:30 - 6:30pm in Duncan 1070.

Syllabus:
[pdf]

Course Schedule:
[pdf]

Recommended Textbooks:
Elements of Statistical Learning by Hastie, Tibshirani & Friedman.
Introduction to Statistical Learning by James, Witten, Tibshirani & Hastie.
Statistical Learning with Saprsity by Hastie, Tibshirani & Wainwright.
Statistics for High-Dimensional Data by Buhlmann & van de Geer.
Convex Optimization by Boyd & Vandenberghe.

Grading Policy:
Homeworks 25%
Midterm Exam 25%
Final Project 45%
Class Participation 5%

Homework Policy:
A hard copy of homeworks are due by 5pm to the TAs in Duncan Hall. Late homeworks will NOT be accepted. Homeworks may be discussed with classmates but must be written and submitted individually.

Midterm Exam:
There will be an (open book / open notes) in-class midterm exam on November 3.

Final Project & Contest:
The final project will be a data analysis contest. The competition will begin on Tuesday, September 13 and can be done in teams of two people. Competition Description: [pdf] Data Use Agreement: [pdf]

Important Deadlines:
September 2: Add Deadline
October 7: Drop Deadline

Announcements, Assignments & Lectures:


  • Data Mining Competition @ Kaggle In Class

  • Lecture 25 Outline: [pdf] Lecture 25 Code: [.R]

  • Homework 3 Solutions: [.pdf].

  • Lecture 24 Outline: [pdf] Lecture 24 Code: [.R]

  • Lecture 23 Outline: [pdf] Lecture 23 Code: [.R]

  • Lecture 22 Outline: [pdf] Lecture 22 Code: [.R]

  • Lecture 20 Outline: [pdf] Lecture 20 Discussion Scenarios: [.pdf]

  • Lecture 19 Outline: [pdf] Lecture 19 Discussion Scenarios: [.pdf]

  • Homework 2 Solutions: [.pdf].

  • Lecture 18 Outline: [pdf] Lecture 18 Code: [.R]

  • Lecture 17 Outline: [pdf] Lecture 17 Code: [.R]

  • Lecture 16 Outline: [pdf] Lecture 16 Code: [.R]

  • Midterm Exam Instructions: [.pdf]. Practice Midterm Exam: [.pdf]

  • Homework 3 Assignment [.pdf] Author data [.csv]

  • Lecture 15 Outline: [pdf] Lecture 15 Code: [.R]

  • Lecture 14 Outline: [pdf] Lecture 14 Code: [.R]

  • Lecture 13 Outline: [pdf] Lecture 13 Code: [.m] [.R]

  • Lecture 12 Outline: [pdf] Lecture 12 Code: [.m]

  • Lecture 11 Outline: [pdf] Lecture 11 Code: [.m]

  • Homework 1 Solutions: [.pdf].

  • Homework 2 Assignment [.pdf]

  • Lecture 10 Outline: [pdf] Lecture 10 Code: [.R]

  • Lecture 9 Outline: [pdf] Lecture 9 Code: [.R]

  • Lecture 8 Outline: [pdf] Lecture 8 Code: [.m]

  • Competition Description: [pdf] Data Use Agreement: [pdf]

  • Lecture 6 Outline: [pdf] Lecture 6 Code: [.R] [.m]

  • Homework 0 Solutions: [.pdf].

  • Homework 1 Assignment [.pdf] Ozone data: [.csv] Microarray data: [X.csv] [Y.csv]

  • Lecture 5 Outline: [pdf] Lecture 5 Code: [.R]

  • Lecture 4 Outline: [pdf] Lecture 4 Code: [.R]

  • Lecture 3 Outline: [pdf] Lecture 3 Code: [.R]

  • Lecture 2 Outline: [pdf] Lecture 2 Code: [.R]

  • Lecture 1 Outline: [pdf] Lecture 1 Code: [.R]

  • Homework 0 Assignment [.pdf] Students must perform satisfactorily on this assignment that covers pre-requisites to remain enrolled in the course.

  • Review session on Linear Regression & Matrix Analysis: Wednesday, August 24th, 5 - 7pm in Duncan 1064.

  • Review session on Optimization & Matrix Analysis: Wednesday, August 31st, 5 - 7pm in Duncan 1064.

Course Description

This course is a survey of statistical learning methods and will cover major techniques and concepts for both supervised and unsupervised learning. Topics covered include penalized regression and classification, support vector machines, kernel methods, model selection, matrix factorizations, graphical models, clustering, boosting, random forests, and ensemble learning. Students will learn how and when to apply statistical learning techniques, their comparative strengths and weaknesses, and how to critically evaluate the performance of learning algorithms. Students completing this course should be able to (i) apply basic statistical learning methods to build predictive models or perform exploratory analysis, (ii) properly tune, select, and validate statistical learning models, and (iii) build an ensemble of learning algorithms.

Tentative Schedule:

HW 0 Assigned: August 23
August 23: Introduction & KNN
August 25: MSE & Least Squares
August 30: Ridge Regression
September 1: Sparse Regression I
HW 0 Due: September 6
HW I Assigned: September 6
September 6: Sparse Regression II
September 8: Non-Linear Regression
Competition Opens: September 13
September 13: Competition Intro
September 15: Linear Discriminant Analysis
September 20: LDA & Logistic Regression
HW I Due: September 20
September 22: Sparse Classification
HW II Assigned: September 22
September 27: Support Vector Machines I
September 29: SVMs II
October 4: Non-Linear Classification
Progress Report I Due: October 4
October 6: Matrix Factorizations I
HW II Due: October 13
October 13: Matrix Factorizations II
HW III Assigned: October 18
October 18: Clustering I
October 20: Clustering II
October 25: Model Validation I
October 27: Model Validation II
Progress Report II Due: November 1
November 1: Graphical Models
November 3: In-Class Midterm Exam
HW III Due: November 10
November 8: Trees & Bagging
November 10: Boosting
November 15: Boosting & Ensemble Learning
November 17: Random Forests
Competition Closes: November 20
November 22: Best Practices
November 29: Competition Presentations
December 1: Competition Presentations
Final Report Due: December 2