Correspondence Analysis (COA) is a method in multivariate statistics forScaled Principal Components and Correspondence Analysis:clustering and ordering

Chris DingLawrence Berkeley National Laboratory

analyzing contingency tables using a technique similar to principal

component analysis (PCA). The relationship between COA and PCA, however,

has not been fully explored so far.In this paper, we develop the theory of scaled principal component analysis

(SPCA) first on a matrix of pairwise similarities. Extending this approach

to asymmetric (and rectangle) similarities of contingency tables, the

resulting SPCA components are precisely those in COA.SPCA is motivated by data ordering (ordination). Given n objects and pairwise

similarities among them, we seek an optimal ordering such that similarities

between adjacent objects are maximized while similarities between distant

objects are minimized. Optimizing such an ordination objective function,

SPCA components are continuous solution for the desired index permutations.Extending this approach to simultaneous ordering the rows and columns of a

contingency tables, the resulting SPCA components are precisely those in COA.

SPCA can also be derived from data clustering. Given n objects and pairwise

similarities among them, we seek to cluster them into two clusters

such that the between-cluster similarities are minimized while the within-cluster

similarities are maximized.Optimizing such an objective function, SPCA components are continuous solution

for the desired cluster membership indicators. Extending this approach to

simultaneous clustering of the rows and columns of a contingency table,

the resulting cluster membership indicators are precisely those in COA.Underlying the objective function optimizations for data ordering and clustering

is a fundamental property of SPCA: the cluster self-aggregation. In the space

spanned by the $K$ SPCA components, objects within each cluster self-aggregate

towards each other; In a properly defined connectivity matrix, connections between

different clusters are automatrically suppressed while connections within same

cluster are enhanced.We illustrate the SPCA theory with examples and apply them to the analyses of

DNA microarray gene expression profiles.