"Alternatives to Mixture Modeling in High Dimensions"
Clustering p-dimensional data by fitting a mixture
of K normals has enjoyed renewed interest (for example,
see Splus function "mclust"). However, the number of
parameters for the model grows rapidly with dimension p.
For example, even if all the covariance matrices are
assumed to be equal, the number of parameters is
(K-1) + K*p + p(p+1)/2 for the weights, means and
covariance matrix. At ACAS in 2001, Scott introduced
the partial mixture component algorithm which fits
only one component of the mixture model at a time.
This algorithm requires only 1 + p + p*(p+1)/2 parameters
for the weight, mean vector, and covariance matrix.
In this talk, we introduce a new algorithm which attempts
to find the "best" line through individual clusters.
This model requires only 2*p parameters. That is,
the new algorithm is linear rather than quadratic
in p. By repeatedly reinitializing the search algorithm,
many clusters may be identified. Intuitively, the line
found is approximately the largest eigenvector of the
local covariance matrix. The GGobi visualization
program will be used to illustrate the success of
this algorithm on real and simulated data.