Principal component analysis for structured highdimensional data Debashis Paul Statistics Department University of California, Davis Abstract
Increasingly we are confronting multivariate data
with very high dimension and comparatively low sample size, e.g. in
medical imaging, microarray analysis, speech and image recognition,
atmospheric science, finance etc. In this talk we consider the problem
of estimation of the principal components in situations where the
dimension of the observation vectors are comparable to the sample size,
even though the intrinsic dimensionality of the signal part of the data
is small. It will be demonstrated that the standard principal component
analysis can fail to provide good estimate of the eigenvectors of the
population covariance matrix. However, if the eigenvectors
corresponding to the bigger eigenvalues of the population covariance
matrix are sparse in a suitable sense, then one can get much better
estimates. A twostage algorithm to efficiently deal with the problem
of estimating the population eigenvectors will be proposed and
analyzed. As a related problem, a brief survey will be presented on
some recent work on functional principal components analysis for
irregularly sampled longitudinal data.
