7th US Army Conference on Applied Statistics 22-26 October 2001 Santa Fe, NM Title: On a New Approach to Robust Estimation David W. Scott Noah Harding Professor of Statistics Rice University Houston, TX 77251-1892 scottdw@rice.edu 713-348-6037 713-348-5476 (Fax) Abstract: In this talk, I describe an alternative approach to robust estimation. Robust estimation provides a powerful solution to practical problems in applied statistics. Simple tasks such as data cleaning may be prohibitively expensive with large datasets. These techniques may also handle the difficult situation where a dataset contains large clusters of outliers. In order to use a robust estimation algorithm (such as the M-estimator described by Hampel and Huber), the shape and scale of the influence function must be specified. Tukey's biweight function is a popular choice but there are many, many possibilities. The scale may be determined by a simple robust method (such as the interquartile range), or by iteratively reweighting the data. In our approach, maximum likelihood is replaced by a data-based minimum-distance criterion. I show that the specification of the shape and scale of the influence function can be replaced by a single choice of a distribution function for the data. This idea is illustrated for several common choices of data, including Gaussian. This framework works well in both density and regression problems. Groups of multivariate outliers may be readily identified. Experimental design with messy data is facilitated. Semiparametric models such as mixtures of normals also fall within this paradigm. Several case studies are presented and actual code given.