Sponsoring Section/Society: ASA-BIOMETRICS
Session Slot: 8:30-10:20 Tuesday
Estimated Audience Size: 150-200
AudioVisual Request: LCD Projection Panel (SVGA capable) with connection for laptop,Overhead, slide projector
Session Title: Issues in Using Regression, Classification Trees,
and Neural Networks for Predicting Binary Outcomes
This session explores the use of regression models, classification trees, and neural networks for the construction of prediction models for binary outcomes. Over the past twenty years statistical prediction models for binary outcomes have become widely used to assess and predict the probability of an event. For example, regression models are central to health policy and health services research, and particularly over the past decade, are used as clinical decision aids. Logistic regression models for medical events are central to most current probabilistic predictive clinical decision aids and severity of illness tools, and are fundamental to comparative analyses of medical care based on risk-adjusted events. This session will look at the utility and limitations of three widely used methods for constructing these prediction models.
Theme Session: No
Applied Session: Yes
Session Organizer: Ruthazer, Robin New England Medical Center
Address: Robin Ruthazer, MPH New England Medical Center 750 Washington Street, NEMC #63 Boston, MA. 02111
Session Timing: 110 minutes total (Sorry about format):
110 minutes total...please allocate Opening Remarks by Chair - 5 or 0 minutes First Speaker - 30 minutes (or 25) Second Speaker - 30 minutes Third Speaker - 30 minutes Discussant - 10 minutes (or none) Floor Discussion - 10 minutes (or 5 or 15)
Session Chair: Schmid, Christopher Tufts University School of Medicine
Address: Tufts University School of Medicine New England Medical Center 750 Washington Street, NEMC #63 Boston, MA. 02111
1. A Comparision of Regression Models, Calssification Trees, and Neural Networks for the Prediction of Cardiac Complications
Griffith, John, New England Medical Center
Address: Tufts University School of Medicine Director Biostatistics Research Center New England Medical Center 750 Washinton Street, Nemc #63 Boston, MA. 02111
Ruthazer, Robin, New England Medical Center
Schmid, Christopher, Tufts University School of Medicine
Terrin, Norma, New England Medical Center
Abstract: Prediction models of various types have long been central to health policy and health services research, and particularly over the past decade, as clinical decision aids. We report results of a comparison of models constructed using regression methods, classification trees, and neural networks to predict cardiac complications among patients presenting to the emergency department with ischemic symptoms. Data used were collected in a large prospective multi-center clinical trial of 10,783 patients. Models were constructed on a random subset of these patients and performance tested on the remaining data. Modeling methods included standard logistic regression, regression models with smoothing splines, generalized additive models, recursive partitioning with pruning and shrinkage, and back-propagated neural networks. On the independent test dataset the accuracy of model performance and predictions, including the bias, area under the ROC curve, calibration chi-square, mean discrimination, and ratio of number of cases in highest to lowest quintile of predictions were compared.
2. Logistic Regression - Tools for Measuring and Improving Predictive Accuracy
Lee, Kerry L., Duke University Medical Center
Address: Community and Family Medicine Duke University Medical Center P.O. Box 3363 Durham, NC 27710
Harrell, Frank, Duke University Medical Center
Abstract: Logistic regression, along with other multivariable regression models, is widely used in risk assessment and in studies of clinical outcomes. Uncritical application of modeling techniques, however, can result in models that inadequately fit the dataset at hand, or even more likely, inaccurately predict outcomes for new subjects. It is important to measure qualities of a model's fit in order to avoid poorly fitted or overfitted models. This talk will discuss several measures for quantifying predictive accuracy, including an easily interpretable measure of predictive discrimination as well as methods for assessing calibration of predicted probabilities of the outcome event. Both types of predictive accuracy should be unbiasedly validated using bootstrapping or cross validation before applying a model to a new group of subjects. Using real-life examples with actual clinical data and readily available software tools, the talk will also present some strategies for avoiding the hazards of poorly fitted or overfitted regression models and the tendency to place too much trust in the unvalidated fit of a model.
3. Hybrid CART-Logit for Classification and Segmentation
Steinberg, Dan, Salford Systems
Address: Salford Systems 8800 Rio San Diego Dr. San Diego, CA 92108
Cardell, Scott N., Salford Systems
Abstract: CART and logistic regression are among the most used classification and response probability modeling tools and both have exhibited from good to excellent performance in a variety of data analysis problems. Since the two methods have quite different strengths and weaknesses, it is natural to investigate whether some combination might prove superior to either used separately. We introduce a new method for combining CART and logit which does exhibit performance superiority, and which admits of a natural set of statistical tests for the incremental value of the combination. This method differs considerably from previous hybridization experiments in that the logistic component is not run within CART child nodes; thus our hybrid exploits CART's ability to detect local data structure, and uses the logit to recognize global structure.
4. Neural Networks: Advantages and Limitations for Statistical Modeling
Goodman, Philip H., University of Nevada School of Medicine, Washoe Medical Center
Address: Division of General Internal Medicine Department of Internal Medicine University of Nevada School of Medicine Washoe Medical Center 77 Pringle Way Reno, NV 89520
Harrell, Frank, Duke University Medical Center
Abstract: The principal advantage of an artificial neural network (ANN) is that a single probabilistic architecture can simultaneously and flexibly capture the effects of predictor curvilinearity and global interaction. Theoretically, then, a (properly fitting) ANN is more likely than a GLM to be a ``true'' model of a complex predictive relationship. As a screening tool, an ANN can efficiently indicate when evidence of generalizable nonlinearity is lacking, thereby justifying the use of only a simple GLM (decreasing both the risk of finding false positive associations, and the investment of human effort). A general approach to improving predictive modeling through the complementary use of ANNs and GLMs will be demonstrated through the analyses of several large datasets. The presentation will focus on issues of overfitting and estimating optimistic bias in calibration and discrimination, on the interpretation of nonlinear predictive ``effects'', and on Bayesian evidence-based hyperpenalties for variable selection. Free software and tutorials may be downloaded from ftp.scs.unr.edu/pub/cbmr/nevpropdir.
List of speakers who are nonmembers: None