COTS 2002

Abstracts

Scott Cantor, Ph.D.
Department of Biostatistics
U. T. M. D. Anderson Cancer Center

Medical Decision Making and Its Application to the Decision to Screen for Prostate Cancer
Abstract:
Medical decision making applies concepts from probability theory and decision analysis to clinical practice and health policy. The first part of this talk will introduce the discipline of medical decision making and explain its usefulness as a tool for making patient and policy decisions. The second part of this talk focuses on an application of medical decision making, namely the decision to screen an asymptomatic man for prostate cancer. We incorporated preferences for prostate cancer treatment outcomes--assessed in a novel way--into a model of the screening decision, which resulted in conflicting recommendations for screening.

Dennis Cox, Ph.D.
Department of Statistics
Rice University

Resampling Methods for Multiple Comparisons of Randomization Tests
Abstract:
In their book "Resampling-Based Multiple Testing,'' Westfall and Young claim that (except in trivial cases) it is impractical to use resampling to adjust for multiple comparisons when the individual comparisons are made with a randomization test because the eplications of the randomization testing would be nested within the replications of the resampling. We describe a method to avoid this problem in most settings. This involves only two basic sets of randomizations (as opposed to the thousands that Westfall and Young had in mind). A "reference'' set of randomized data is used for the basic tests, and a "correction'' set is used for the correction for multiple comparisons. A motivating application is described which consists of testing for a signal due to the menstrual cycle in about 700 fluorescence spectroscopy channels measured on the cervix. We are part of a team at M. D. Anderson that is investigating the potential use of
fluorescence spectroscopy as a painless probe for detection of cancerous and precancerous lesions of the cervix.

Kim-Ahn Do, Ph.D.
Department of Biostatistics
U. T. M. D. Anderson Cancer Center

Some Recent Methods in Clustering Microarray Gene Expression Data and Applications
Abstract:
We discuss the statistical development of two methods of clustering gene microarray data: ``Gene shaving'' as developed by Hastie et al. (2000) and the mixture-model based clustering program, EMMIX-GENE of McLachlan et al. (2000). We applied these methods to the analysis of some well-known data sets: the colon data of Alon et al. (2000), and the leukemia data of Golub et al. (2000). We also analysed the NCI 60 data using Gene-shaving alone. A close correspondence is found between the gene clusters found by both methods in the case of the Alon data. We also comment on the two distinct tissue clusterings which are found in this data set, which can be explained by the external classification of tumor/non-tumour tissues and by the protocol change explained in Getz et al (2000). In the Golub data, clusters are found which produce a division of tissues corresponding closely to the external classification for both methods. The pros and cons of both methods will be compared and contrasted in details.

Specifically, we will also demonstrate the software GENECLUST developed at MDACC (Biostatistics) based on the idea of gene-shaving.

Ian Harris
Southern Methodist University

Density power divergence for estimation
Abstract:
I will discuss a minimum divergence based estimation method. The method uses density-based divergences, which are indexed by a parameter "alpha". Varying "alpha" produces a trade-off between robustness and efficiency. The method can be viewed as a robust extension of maximum likelihood, and a "bridge" between maximum likelihood and minimum "L_2" as "alpha=0" corresponds to maximum likelihood estimation, and "alpha=1" to minimum "L_2" estimation.

Lem Moye
UT School of Public Health

Multiple Statistical Analyses in Clinical Trials
Abstract:
The execution of multiple analyses in clinical trials is the process by which several statistical hypothesis tests are crafted, executed and evaluated within a single clinical trial. The research and medical communities are frequently exposed to the results of these analyses. Multiple analyses in clinical trials can appear in many forms, e.g. the analysis of multiple endpoints in a study, the inter-treatment group comparisons in a clinical trial with more than one active group, and subgroup analyses. The circumstances of multiple analyses in clinical trials are more complicated then the previous examples suggest because in reality, the previous examples commonly occur in complex mixtures. For example a clinical trial may report the effect of therapy on several different endpoints, then proceed to report subgroup findings for the effect of therapy on a completely different endpoint. Some of these hypothesis tests were designed before the study was executed, while other were merely "targets of opportunity" that the investigators noticed as the clinical trial evaluation proceeded to the end; some of these analyses have small p values, others do not. The development of complex and rapid statistical analysis systems perpetuate and accelerate the execution of multiple analyses in clinical trials — nevertheless the end results are often chaotic and inconsistent. This line of research reflecting several published manuscripts and work in progress seeks to identify an easily understood, executable plan of multiple analyses in clinical trials which is both statistically justifiable and comprehensible to clinicians and the regulatory community.

Chad Shaw
Baylor College of Medicine

Raw microarray quantitation data is contaminated by experimental artifact and variable data quality. One of the most important contributions statisticians can make is to process values from individual experiments into standardized data sets. This process of data normalization makes large scale multi-array experiments feasible and adds noticeably to the quality of biological results. This study considers cDNA microarrays with spatially distributed on-chip replication. I show the benefits of normalization for such data. I conclude with a message derived from 2 years of work inside an academic medical institution: I argue that statisticians are the computational scientists uniquely suited to address the data normalization problem, and I argue that this process of normalization is esential to other aspects of microarray analysis.

James Stamey
SFASU

Bayesian Analysis of Poisson Data with False Positives and False Negatives
Abstract:
Misclassified count data is a common problem in many fields, especially economics, marketing, and epidemiology. Much research exists on the binomial distribution with misclassification; see for example Tenenbein (1970). The Poisson distribution with underreporting is also thoroughly researched. The Poisson model allowing for both false positive and false negative counts is not as well researched. A model for Poisson counts with misclassification will be presented as well as Bayesian estimators for each of the parameters.

Shan Sun
Texas Technical University

Smooth Quantile Processes from Right Censored Data and Construction of Simultaneous Confidence Bands
Abstract:
We study the asymptotic properties of the smooth qauntile processes based on randomly right censored life time data. The bootstrap approaches to approximate the distributions of the smooth quantile processes are investigated and are used to construct simultaneous confidence bands for quantile functions. Data-based selection of the bandwidth required for computing smooth distribution functions is also investigated using bootstrap methods. A Monte Carlo simulation is carried out to asses small sample performance of the proposed confidence bands. An application to construct confidence bands for the quantile function of the time between a manuscript's submission and its first review is provided using a JASA data set. The developed results can be applied to construct simultaneous confidence bands for the difference of two quantile functions and to check whether there is a location shift or scale change for two distributions under study.

Marnia Vannucci
Texas A&M University

Multinomial Data, Model Selection and Microarray Data
Abstract:
Our general context is a classification problem where the response is a categorical variable with two or more categories and where the number of predictors substantially exceeds the sample size. In one of our practical contexts, near infra-red spectra are measured at 100 wavelengths on 3 different wheat varieties. A second example involves expression profiles of 755 genes and 2 treatments with only around 30 observations. We use probit models and data augmentation approaches. By making use of latent variables we write multinomial models in a regression setting and then use mixture priors to select variables. We develop appropriate computational schemes that use MCMC methods and truncated sampling techniques. We explore ways to classify future observations. We treat both ordinal and nominal responses. In the microarray example, very good predictions can be obtained with an extremely small number of genes. Moreover, marginal probabilities of inclusion offer insight into bigger groups of promising genes that can be of interest to biologists that study relationships and functions. Extensions envisaged are to trying to tease out interaction effects.
This is joint work with Naijun Sha and Philip J. Brown.