A Distribution Free Summarization Method for Affymetrix GeneChip® Arrays Monnie McGee Department of Statistical Science Southern Methodist University Abstract
Affymetrix
GeneChip arrays require summarization in order to combine the
probe-level intensities into one value representing the expression
level of a gene. However, probe intensity measurements are expected to
be affected by different levels of non-specific- and
cross-hybridization to non-specific transcripts. Here we present
a new summarization technique, the Distribution Free Weighted method
(DFW), which uses information about the variability in probe behavior
to estimate the extent of non-specific and cross-hybridization for each
probe. The contribution of the probe is weighted accordingly
during summarization, without making any distributional assumptions for
the probe-level data.
We compare DFW with several popular summarization methods on spike-in data sets, via both our own calculations and the ŒAffycomp II‚ competition. The results show that DFW outperforms other methods when sensitivity and specificity are considered simultaneously. With the Affycomp spike-in data sets, the area under the Receiver Operating Characteristic (ROC) curve for DFW is nearly 1.0 (a perfect value), indicating that DFW can identify all differentially expressed genes with a few false positives. The approach used is also computationally faster than most other methods in current use. This is joint work with Zhongxue Chen, Qingzhong Liu, and Richard H. Scheuermann |