A Distribution Free Summarization Method for Affymetrix GeneChip® Arrays


Monnie McGee
Department of Statistical Science
Southern Methodist University

Abstract


        Affymetrix GeneChip arrays require summarization in order to combine the probe-level intensities into one value representing the expression level of a gene. However, probe intensity measurements are expected to be affected by different levels of non-specific- and cross-hybridization to non-specific transcripts.  Here we present a new summarization technique, the Distribution Free Weighted method (DFW), which uses information about the variability in probe behavior to estimate the extent of non-specific and cross-hybridization for each probe.  The contribution of the probe is weighted accordingly during summarization, without making any distributional assumptions for the probe-level data.

        We compare DFW with several popular summarization methods on spike-in data sets, via both our own calculations and the ŒAffycomp II‚ competition.  The results show that DFW outperforms other methods when sensitivity and specificity are considered simultaneously. With the Affycomp spike-in data sets, the area under the Receiver Operating Characteristic (ROC) curve for DFW is nearly 1.0 (a perfect value), indicating that DFW can identify all differentially expressed genes with a few false positives.  The approach used is also computationally faster than most other methods in current use.


This is joint work with Zhongxue Chen, Qingzhong Liu, and Richard H. Scheuermann