Statistical
challenges in analyzing mass
spectrometry proteomic data Xihong Lin
Department of Biostatistics School of Public Health Harvard University Abstract
In high-throughput mass spectrometry (MS) proteomic
experiments, we can simultaneously detect and quantify a large number
of peptides/proteins. Such techniques have good potentials for new
biomarker discovery for diseases. Resulting data (spectra) from such
experiments are large and can be treated as finely sampled functions.
Most of the existing MS analysis involves multiple ad hoc sequential
methods for preprocessing the MS data, such as baseline subtraction,
truncation, normalization, peak detection and peak alignmen. We will
discuss challenges in analyzing MS preteomic data and propose a unified
statistical framework for pre-processing and post-processing mass
spectra using advanced nonparametric regression and functional data
analysis technqiues in conjunction with statistical learning methods.
We stress that pre-processing is critical in analysis of mass
spectrometry proteomic data. We apply the methodology to a motivating
data set obtained from a study of lung cancer patients whose serum
samples were collected and processed using a surface-enhanced laser
desorption/ionization time-of-flight (SELDI-TOF) mass spectrometry (MS)
instrument.
|