Principal component analysis for structured high-dimensional data

Debashis Paul
Statistics Department
 University of California, Davis


Abstract


        Increasingly we are confronting multivariate data with very high dimension and comparatively low sample size, e.g. in medical imaging, microarray analysis, speech and image recognition, atmospheric science, finance etc. In this talk we consider the problem of estimation of the principal components in situations where the dimension of the observation vectors are comparable to the sample size, even though the intrinsic dimensionality of the signal part of the data is small. It will be demonstrated that the standard principal component analysis can fail to provide good estimate of the eigenvectors of the population covariance matrix. However, if the eigenvectors corresponding to the bigger eigenvalues of the population covariance matrix are sparse in a suitable sense, then one can get much better estimates. A two-stage algorithm to efficiently deal with the problem of estimating the population eigen-vectors will be proposed and analyzed. As a related problem, a brief survey will be presented on some recent work on functional principal components analysis for irregularly sampled longitudinal data.