Data Mining, Clustering, and Robust Partial Mixture Estimation
Abstract:
Mining large datasets successfully requires careful application
of statistical modeling tools. Such data are seldom clean and
robust statistical methods are especially appropriate to
automatically cope with large numbers of outliers. We
discuss in particular robust normal mixture estimation. Such
models are useful for clustering by associating a cluster with
each component of the mixture model. Finally, we present a
number of examples including simple regression, robust
covariance estimation, incomplete model specification,
lightning detection, particle physics detection, and
finding the largest eigenvector in a mixture dataset.