Next: ims.22 Up: Institute of Mathematical Statistics Previous: ims.20

ims.21

IMS

Session Slot: 4:00- 5:50 Sunday

Estimated Audience Size:

AudioVisual Request: Two Overheads

Session Title: Local Learning: Modern Nonparametric Regression and Classification

Theme Session: No

Applied Session: No

Session Organizer: Tibshirani, Rob University of Toronto, Toronto, Canada

Address: Dept of Preventive Med & Biostats, and Dept of Statistics Univ of Toronto, Toronto, Canada M5S 1A8.

Phone: 416-978-4642 (PMB), 416-978-0673 (stats)

Fax: 416 978-8299

Email: tibs@utstat.toronto.edu

Session Timing: 110 minutes total (Sorry about format):

Opening Remarks by Chair - 5 minutes First Speaker - 25 minutes Second Speaker - 25 minutes Third Speaker - 25 minutes Discussant - 10 minutes Floor Discussion - 10 minutes

Session Chair: TBN

Address:

Phone:

Fax:

Email:

1. Theory and practice of boosting

Schapire, Robert, AT&T Labs

Address: 180 Park Avenue, Room A279 Florham Park, NJ 07932-0971

Phone: 973-360-8329

Fax: 973-360-8970

Email: schapire@reseach.att.com

Abstract: Boosting is a general method for producing a very accurate classification rule by combining rough and moderately inaccurate ``rules-of-thumb.'' Any given classification method can be used to find these weak rules-of-thumb. Boosting works by running the given classification method many times, each time on a different set of training examples, and then combining the resulting rules-of-thumb.
In this talk, I will introduce the most recent boosting algorithm, called AdaBoost, explain the basic, underlying theory of boosting, and review some experiments comparing AdaBoost to other algorithms, including Breiman's ``bagging'' algorithm. Besides demonstrating that AdaBoost often performs well in practice, several of those who experimented with AdaBoost (including Drucker and Cortes, Quinlan, and Breiman) made the surprising observation that the algorithm usually does not suffer from overfitting, even when it generates models that are very complex. I will describe a new theoretical analysis which explains this phenomenon.
[This talk includes joint work with Yoav Freund, Peter Bartlett and Wee Sun Lee.]

2. Locally Bagged Decision Trees

Rao, J.S., Cleveland Clinic

Address: Department of Biostatistics The Cleveland Clinic 9500 Euclid Ave, Cleveland, OH 44195

Phone: 216-445-7844

Fax: 216-444-8023

Email: srao@bio.ri.ccf.org

Potts, W.J.E., SAS Institute Inc.

Address: SAS Institute Inc. SAS Campus Drive, Bldg H Cary, NC 27513

Phone: 919-677-8000 ext. 4629

Fax: 919-677-8225

Email: saswzp@wnt.sas.com

Abstract: Tree-structured classifiers (CART, C4.5) are attractive in that they produce flexible models of simple structure. They are however unstable in that small perturbations of the training data can give different trees. The idea of bootstrap aggregation of trees (bagging) was introduced by Breiman and can lead to greatly improved predictions but loss of interpretable structure. Under the hypothesis that instability may be local in nature, we introduce a version of bagging that uses bootstrap resampling at a node along with a determination of local instability using principal co-ordinate analysis. The result is a new class of models with bagging-like improvements in prediction accuracy, and that maintain a tree-like topology. The method is illustrated on benchmark datasets from the University of California-Irvine machine learning repository.

3. Minimum Description Length and Extended Linear Modeling

Hansen, Mark, Bell Labs, Lucent Technologies

Address:

Phone:

Fax:

Email:

Abstract: In this talk, we investigate the use of the principle of minimum description length (MDL) for problems of model selection in the context of an extended linear model. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. In our application, we will consider a variety of spline bases (including classical polynomial and smoothing splines) as building blocks and employ MDL to identify promising models.
This is joint work with Bin Yu, University of California at Berkeley.

List of speakers who are nonmembers:

Next: ims.22 Up: Institute of Mathematical Statistics Previous: ims.20

David Scott
6/1/1998