Next: asa.biometrics.05 Up: ASA Biometrics (4 + Previous: asa.biometrics.03

asa.biometrics.04

Sponsoring Section/Society: ASA-BIOMETRICS

Session Slot: 8:30-10:20 Tuesday

Estimated Audience Size: 150-200

AudioVisual Request: LCD Projection Panel (SVGA capable) with connection for laptop,Overhead, slide projector

Session Title: Issues in Using Regression, Classification Trees, and Neural Networks for Predicting Binary Outcomes

This session explores the use of regression models, classification trees, and neural networks for the construction of prediction models for binary outcomes. Over the past twenty years statistical prediction models for binary outcomes have become widely used to assess and predict the probability of an event. For example, regression models are central to health policy and health services research, and particularly over the past decade, are used as clinical decision aids. Logistic regression models for medical events are central to most current probabilistic predictive clinical decision aids and severity of illness tools, and are fundamental to comparative analyses of medical care based on risk-adjusted events. This session will look at the utility and limitations of three widely used methods for constructing these prediction models.

Theme Session: No

Applied Session: Yes

Session Organizer: Ruthazer, Robin New England Medical Center

Address: Robin Ruthazer, MPH New England Medical Center 750 Washington Street, NEMC #63 Boston, MA. 02111

Phone: (617)-636-8819

Fax: (617)-636-5560

Email: rruthazer@es.nemc.org

Session Timing: 110 minutes total (Sorry about format):

110 minutes total...please allocate Opening Remarks by Chair - 5 or 0 minutes First Speaker - 30 minutes (or 25) Second Speaker - 30 minutes Third Speaker - 30 minutes Discussant - 10 minutes (or none) Floor Discussion - 10 minutes (or 5 or 15)

Session Chair: Schmid, Christopher Tufts University School of Medicine

Address: Tufts University School of Medicine New England Medical Center 750 Washington Street, NEMC #63 Boston, MA. 02111

Phone: (617)-636-5179

Fax: (617)-636-8023

Email: cschmid@es.nemc.org

1. A Comparision of Regression Models, Calssification Trees, and Neural Networks for the Prediction of Cardiac Complications

Griffith, John, New England Medical Center

Address: Tufts University School of Medicine Director Biostatistics Research Center New England Medical Center 750 Washinton Street, Nemc #63 Boston, MA. 02111

Phone: (617)-636-4619

Fax: (617)-636-5560

Email: John.griffith@es.nemc.org

Ruthazer, Robin, New England Medical Center

Schmid, Christopher, Tufts University School of Medicine

Terrin, Norma, New England Medical Center

Abstract: Prediction models of various types have long been central to health policy and health services research, and particularly over the past decade, as clinical decision aids. We report results of a comparison of models constructed using regression methods, classification trees, and neural networks to predict cardiac complications among patients presenting to the emergency department with ischemic symptoms. Data used were collected in a large prospective multi-center clinical trial of 10,783 patients. Models were constructed on a random subset of these patients and performance tested on the remaining data. Modeling methods included standard logistic regression, regression models with smoothing splines, generalized additive models, recursive partitioning with pruning and shrinkage, and back-propagated neural networks. On the independent test dataset the accuracy of model performance and predictions, including the bias, area under the ROC curve, calibration chi-square, mean discrimination, and ratio of number of cases in highest to lowest quintile of predictions were compared.

2. Logistic Regression - Tools for Measuring and Improving Predictive Accuracy

Lee, Kerry L., Duke University Medical Center

Address: Community and Family Medicine Duke University Medical Center P.O. Box 3363 Durham, NC 27710

Phone: (919)-286-8725

Fax: (919)-286-2947

Email: lee0001@mc.duke.edu

Harrell, Frank, Duke University Medical Center

Abstract: Logistic regression, along with other multivariable regression models, is widely used in risk assessment and in studies of clinical outcomes. Uncritical application of modeling techniques, however, can result in models that inadequately fit the dataset at hand, or even more likely, inaccurately predict outcomes for new subjects. It is important to measure qualities of a model's fit in order to avoid poorly fitted or overfitted models. This talk will discuss several measures for quantifying predictive accuracy, including an easily interpretable measure of predictive discrimination as well as methods for assessing calibration of predicted probabilities of the outcome event. Both types of predictive accuracy should be unbiasedly validated using bootstrapping or cross validation before applying a model to a new group of subjects. Using real-life examples with actual clinical data and readily available software tools, the talk will also present some strategies for avoiding the hazards of poorly fitted or overfitted regression models and the tendency to place too much trust in the unvalidated fit of a model.

3. Hybrid CART-Logit for Classification and Segmentation

Steinberg, Dan, Salford Systems

Address: Salford Systems 8800 Rio San Diego Dr. San Diego, CA 92108

Phone: (619)-543-8880

Fax: (619)-543

Email: dstein@salford-systems.com

Cardell, Scott N., Salford Systems

Abstract: CART and logistic regression are among the most used classification and response probability modeling tools and both have exhibited from good to excellent performance in a variety of data analysis problems. Since the two methods have quite different strengths and weaknesses, it is natural to investigate whether some combination might prove superior to either used separately. We introduce a new method for combining CART and logit which does exhibit performance superiority, and which admits of a natural set of statistical tests for the incremental value of the combination. This method differs considerably from previous hybridization experiments in that the logistic component is not run within CART child nodes; thus our hybrid exploits CART's ability to detect local data structure, and uses the logit to recognize global structure.

4. Neural Networks: Advantages and Limitations for Statistical Modeling

Goodman, Philip H., University of Nevada School of Medicine, Washoe Medical Center

Address: Division of General Internal Medicine Department of Internal Medicine University of Nevada School of Medicine Washoe Medical Center 77 Pringle Way Reno, NV 89520

Phone: (702)-328-4869

Fax: (702)-328-4871

Email: goodman@unr.edu

Harrell, Frank, Duke University Medical Center

Abstract: The principal advantage of an artificial neural network (ANN) is that a single probabilistic architecture can simultaneously and flexibly capture the effects of predictor curvilinearity and global interaction. Theoretically, then, a (properly fitting) ANN is more likely than a GLM to be a ``true'' model of a complex predictive relationship. As a screening tool, an ANN can efficiently indicate when evidence of generalizable nonlinearity is lacking, thereby justifying the use of only a simple GLM (decreasing both the risk of finding false positive associations, and the investment of human effort). A general approach to improving predictive modeling through the complementary use of ANNs and GLMs will be demonstrated through the analyses of several large datasets. The presentation will focus on issues of overfitting and estimating optimistic bias in calibration and discrimination, on the interpretation of nonlinear predictive ``effects'', and on Bayesian evidence-based hyperpenalties for variable selection. Free software and tutorials may be downloaded from ftp.scs.unr.edu/pub/cbmr/nevpropdir.

List of speakers who are nonmembers: None

Next: asa.biometrics.05 Up: ASA Biometrics (4 + Previous: asa.biometrics.03

David Scott
6/1/1998