Session Slot: 4:00- 5:50 Sunday
Estimated Audience Size:
AudioVisual Request: Two Overheads
Session Title: Local Learning: Modern Nonparametric Regression
Theme Session: No
Applied Session: No
Session Organizer: Tibshirani, Rob University of Toronto, Toronto, Canada
Address: Dept of Preventive Med & Biostats, and Dept of Statistics Univ of Toronto, Toronto, Canada M5S 1A8.
Phone: 416-978-4642 (PMB), 416-978-0673 (stats)
Fax: 416 978-8299
Session Timing: 110 minutes total (Sorry about format):
Opening Remarks by Chair - 5 minutes First Speaker - 25 minutes Second Speaker - 25 minutes Third Speaker - 25 minutes Discussant - 10 minutes Floor Discussion - 10 minutes
Session Chair: TBN
1. Theory and practice of boosting
Schapire, Robert, AT&T Labs
Address: 180 Park Avenue, Room A279 Florham Park, NJ 07932-0971
Abstract: Boosting is a general method for producing a very accurate classification rule by combining rough and moderately inaccurate ``rules-of-thumb.'' Any given classification method can be used to find these weak rules-of-thumb. Boosting works by running the given classification method many times, each time on a different set of training examples, and then combining the resulting rules-of-thumb.
In this talk, I will introduce the most recent boosting algorithm, called AdaBoost, explain the basic, underlying theory of boosting, and review some experiments comparing AdaBoost to other algorithms, including Breiman's ``bagging'' algorithm. Besides demonstrating that AdaBoost often performs well in practice, several of those who experimented with AdaBoost (including Drucker and Cortes, Quinlan, and Breiman) made the surprising observation that the algorithm usually does not suffer from overfitting, even when it generates models that are very complex. I will describe a new theoretical analysis which explains this phenomenon.
[This talk includes joint work with Yoav Freund, Peter Bartlett and Wee Sun Lee.]
2. Locally Bagged Decision Trees
Rao, J.S., Cleveland Clinic
Address: Department of Biostatistics The Cleveland Clinic 9500 Euclid Ave, Cleveland, OH 44195
Potts, W.J.E., SAS Institute Inc.
Address: SAS Institute Inc. SAS Campus Drive, Bldg H Cary, NC 27513
Phone: 919-677-8000 ext. 4629
Abstract: Tree-structured classifiers (CART, C4.5) are attractive in that they produce flexible models of simple structure. They are however unstable in that small perturbations of the training data can give different trees. The idea of bootstrap aggregation of trees (bagging) was introduced by Breiman and can lead to greatly improved predictions but loss of interpretable structure. Under the hypothesis that instability may be local in nature, we introduce a version of bagging that uses bootstrap resampling at a node along with a determination of local instability using principal co-ordinate analysis. The result is a new class of models with bagging-like improvements in prediction accuracy, and that maintain a tree-like topology. The method is illustrated on benchmark datasets from the University of California-Irvine machine learning repository.
3. Minimum Description Length and Extended Linear Modeling
Hansen, Mark, Bell Labs, Lucent Technologies
Abstract: In this talk, we investigate the use of the principle of minimum description length (MDL) for problems of model selection in the context of an extended linear model. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. In our application, we will consider a variety of spline bases (including classical polynomial and smoothing splines) as building blocks and employ MDL to identify promising models.
This is joint work with Bin Yu, University of California at Berkeley.
List of speakers who are nonmembers: