Course Information: Statistics 640 / 444 Fall 2016 Tuesdays & Thursdays, 4:00  5:15pm Location: Herzstein Hall 210 Piazza Course Webpage Instructor: Genevera Allen Office: DH2098 Email: gallen@rice.edu Office Hours: Thursday 5:15  6:15pm Stat 444 Teaching Assistant: John Nagorski Office: DH1041 Email: john.nagorski@rice.edu Office Hours: Wednesday 1  2pm Stat 444 Recitation: Wednesday 5  6pm in Duncan 1070. Stat 640 Teaching Assistant: Michael Weylandt Office: DH1041 Email: michael.weylandt@rice.edu Office Hours: Wednesday 1  2pm Stat 640 Recitation: Monday 5:30  6:30pm in Duncan 1070. Syllabus: [pdf] Course Schedule: [pdf] Recommended Textbooks: Elements of Statistical Learning by Hastie, Tibshirani & Friedman. Introduction to Statistical Learning by James, Witten, Tibshirani & Hastie. Statistical Learning with Saprsity by Hastie, Tibshirani & Wainwright. Statistics for HighDimensional Data by Buhlmann & van de Geer. Convex Optimization by Boyd & Vandenberghe. Grading Policy: Homeworks 25% Midterm Exam 25% Final Project 45% Class Participation 5% Homework Policy: A hard copy of homeworks are due by 5pm to the TAs in Duncan Hall. Late homeworks will NOT be accepted. Homeworks may be discussed with classmates but must be written and submitted individually. Midterm Exam: There will be an (open book / open notes) inclass midterm exam on November 3. Final Project & Contest: The final project will be a data analysis contest. The competition will begin on Tuesday, September 13 and can be done in teams of two people. Competition Description: [pdf] Data Use Agreement: [pdf] Important Deadlines: September 2: Add Deadline October 7: Drop Deadline 
Announcements, Assignments & Lectures:

Course DescriptionThis course is a survey of statistical learning methods and will cover major techniques and concepts for both supervised and unsupervised learning. Topics covered include penalized regression and classification, support vector machines, kernel methods, model selection, matrix factorizations, graphical models, clustering, boosting, random forests, and ensemble learning. Students will learn how and when to apply statistical learning techniques, their comparative strengths and weaknesses, and how to critically evaluate the performance of learning algorithms. Students completing this course should be able to (i) apply basic statistical learning methods to build predictive models or perform exploratory analysis, (ii) properly tune, select, and validate statistical learning models, and (iii) build an ensemble of learning algorithms.Tentative Schedule:HW 0 Assigned: August 23August 23: Introduction & KNN August 25: MSE & Least Squares August 30: Ridge Regression September 1: Sparse Regression I HW 0 Due: September 6 HW I Assigned: September 6 September 6: Sparse Regression II September 8: NonLinear Regression Competition Opens: September 13 September 13: Competition Intro September 15: Linear Discriminant Analysis September 20: LDA & Logistic Regression HW I Due: September 20 September 22: Sparse Classification HW II Assigned: September 22 September 27: Support Vector Machines I September 29: SVMs II October 4: NonLinear Classification Progress Report I Due: October 4 October 6: Matrix Factorizations I HW II Due: October 13 October 13: Matrix Factorizations II HW III Assigned: October 18 October 18: Clustering I October 20: Clustering II October 25: Model Validation I October 27: Model Validation II Progress Report II Due: November 1 November 1: Graphical Models November 3: InClass Midterm Exam HW III Due: November 10 November 8: Trees & Bagging November 10: Boosting November 15: Boosting & Ensemble Learning November 17: Random Forests Competition Closes: November 20 November 22: Best Practices November 29: Competition Presentations December 1: Competition Presentations Final Report Due: December 2 