Home | COTS | Program | Registration | Students | Posters | Committee | Travel | Local_Info |
Abstracts
Scott
Cantor, Ph.D.
Department of Biostatistics
U. T. M. D. Anderson Cancer Center
Medical Decision Making and Its Application to the Decision
to Screen for Prostate Cancer
Abstract:
Medical decision making applies concepts from probability theory and decision
analysis to clinical practice and health policy. The first part of this talk
will introduce the discipline of medical decision making and explain its usefulness
as a tool for making patient and policy decisions. The second part of this talk
focuses on an application of medical decision making, namely the decision to
screen an asymptomatic man for prostate cancer. We incorporated preferences
for prostate cancer treatment outcomes--assessed in a novel way--into a model
of the screening decision, which resulted in conflicting recommendations for
screening.
Dennis
Cox, Ph.D.
Department of Statistics
Rice University
Resampling Methods for Multiple Comparisons of Randomization
Tests
Abstract:
In their book "Resampling-Based Multiple Testing,'' Westfall and Young
claim that (except in trivial cases) it is impractical to use resampling to
adjust for multiple comparisons when the individual comparisons are made with
a randomization test because the eplications of the randomization testing would
be nested within the replications of the resampling. We describe a method to
avoid this problem in most settings. This involves only two basic sets of randomizations
(as opposed to the thousands that Westfall and Young had in mind). A "reference''
set of randomized data is used for the basic tests, and a "correction''
set is used for the correction for multiple comparisons. A motivating application
is described which consists of testing for a signal due to the menstrual cycle
in about 700 fluorescence spectroscopy channels measured on the cervix. We are
part of a team at M. D. Anderson that is investigating the potential use of
fluorescence spectroscopy as a painless probe for detection of cancerous and
precancerous lesions of the cervix.
Kim-Ahn
Do, Ph.D.
Department of Biostatistics
U. T. M. D. Anderson Cancer Center
Some Recent Methods in Clustering Microarray Gene Expression
Data and Applications
Abstract:
We discuss the statistical development of two methods of clustering gene microarray
data: ``Gene shaving'' as developed by Hastie et al. (2000) and the mixture-model
based clustering program, EMMIX-GENE of McLachlan et al. (2000). We applied
these methods to the analysis of some well-known data sets: the colon data of
Alon et al. (2000), and the leukemia data of Golub et al. (2000). We also analysed
the NCI 60 data using Gene-shaving alone. A close correspondence is found between
the gene clusters found by both methods in the case of the Alon data. We also
comment on the two distinct tissue clusterings which are found in this data
set, which can be explained by the external classification of tumor/non-tumour
tissues and by the protocol change explained in Getz et al (2000). In the Golub
data, clusters are found which produce a division of tissues corresponding closely
to the external classification for both methods. The pros and cons of both methods
will be compared and contrasted in details.
Specifically, we will also demonstrate the software GENECLUST developed at MDACC (Biostatistics) based on the idea of gene-shaving.
Ian Harris
Southern Methodist University
Density power divergence for estimation
Abstract:
I will discuss a minimum divergence based estimation method. The method uses
density-based divergences, which are indexed by a parameter "alpha".
Varying "alpha" produces a trade-off between robustness and efficiency.
The method can be viewed as a robust extension of maximum likelihood, and a
"bridge" between maximum likelihood and minimum "L_2" as
"alpha=0" corresponds to maximum likelihood estimation, and "alpha=1"
to minimum "L_2" estimation.
Lem Moye
UT School of Public Health
Multiple Statistical Analyses in Clinical Trials
Abstract:
The execution of multiple analyses in clinical trials is the process by which
several statistical hypothesis tests are crafted, executed and evaluated within
a single clinical trial. The research and medical communities are frequently
exposed to the results of these analyses. Multiple analyses in clinical trials
can appear in many forms, e.g. the analysis of multiple endpoints in a study,
the inter-treatment group comparisons in a clinical trial with more than one
active group, and subgroup analyses. The circumstances of multiple analyses
in clinical trials are more complicated then the previous examples suggest because
in reality, the previous examples commonly occur in complex mixtures. For example
a clinical trial may report the effect of therapy on several different endpoints,
then proceed to report subgroup findings for the effect of therapy on a completely
different endpoint. Some of these hypothesis tests were designed before the
study was executed, while other were merely "targets of opportunity"
that the investigators noticed as the clinical trial evaluation proceeded to
the end; some of these analyses have small p values, others do not. The development
of complex and rapid statistical analysis systems perpetuate and accelerate
the execution of multiple analyses in clinical trials nevertheless the
end results are often chaotic and inconsistent. This line of research reflecting
several published manuscripts and work in progress seeks to identify an easily
understood, executable plan of multiple analyses in clinical trials which is
both statistically justifiable and comprehensible to clinicians and the regulatory
community.
Chad Shaw
Baylor College of Medicine
Raw microarray quantitation data is contaminated by experimental artifact and variable data quality. One of the most important contributions statisticians can make is to process values from individual experiments into standardized data sets. This process of data normalization makes large scale multi-array experiments feasible and adds noticeably to the quality of biological results. This study considers cDNA microarrays with spatially distributed on-chip replication. I show the benefits of normalization for such data. I conclude with a message derived from 2 years of work inside an academic medical institution: I argue that statisticians are the computational scientists uniquely suited to address the data normalization problem, and I argue that this process of normalization is esential to other aspects of microarray analysis.
Bayesian Analysis of Poisson Data with False Positives and
False Negatives
Abstract:
Misclassified count data is a common problem in many fields, especially economics,
marketing, and epidemiology. Much research exists on the binomial distribution
with misclassification; see for example Tenenbein (1970). The Poisson distribution
with underreporting is also thoroughly researched. The Poisson model allowing
for both false positive and false negative counts is not as well researched.
A model for Poisson counts with misclassification will be presented as well
as Bayesian estimators for each of the parameters.
Shan Sun
Texas Technical University
Smooth Quantile Processes from Right Censored Data and Construction
of Simultaneous Confidence Bands
Abstract:
We study the asymptotic properties of the smooth qauntile processes based on
randomly right censored life time data. The bootstrap approaches to approximate
the distributions of the smooth quantile processes are investigated and are
used to construct simultaneous confidence bands for quantile functions. Data-based
selection of the bandwidth required for computing smooth distribution functions
is also investigated using bootstrap methods. A Monte Carlo simulation is carried
out to asses small sample performance of the proposed confidence bands. An application
to construct confidence bands for the quantile function of the time between
a manuscript's submission and its first review is provided using a JASA data
set. The developed results can be applied to construct simultaneous confidence
bands for the difference of two quantile functions and to check whether there
is a location shift or scale change for two distributions under study.
Marnia
Vannucci
Texas A&M University
Multinomial Data, Model Selection and Microarray Data
Abstract:
Our general context is a classification problem where the response is a categorical
variable with two or more categories and where the number of predictors substantially
exceeds the sample size. In one of our practical contexts, near infra-red spectra
are measured at 100 wavelengths on 3 different wheat varieties. A second example
involves expression profiles of 755 genes and 2 treatments with only around
30 observations. We use probit models and data augmentation approaches. By making
use of latent variables we write multinomial models in a regression setting
and then use mixture priors to select variables. We develop appropriate computational
schemes that use MCMC methods and truncated sampling techniques. We explore
ways to classify future observations. We treat both ordinal and nominal responses.
In the microarray example, very good predictions can be obtained with an extremely
small number of genes. Moreover, marginal probabilities of inclusion offer insight
into bigger groups of promising genes that can be of interest to biologists
that study relationships and functions. Extensions envisaged are to trying to
tease out interaction effects.
This is joint work with Naijun Sha and Philip J. Brown.