next up previous
Next: Materials and Methods Up: Variable Selection in Regression Previous: Variable Selection in Regression

Introduction

We give an example of ``model selection'' in multiple regression. Here, ``model selection'' means selecting from a relatively large collection of independent variables which ones to keep (with nonzero coefficients) in the regression equation. The subject is discussed in the text (Section 11.10, p. 531-539). We utilize both methods discussed there (Stepwise and $C_p$) and introduce a third method: Cross-Validation.

It is necessary to consider model selection because including too many variables increases uncertainty in the estimated coefficients. This is especially true when new predictors are computed from given predictors (e.g., when adding in squared, cubed, etc. values of given predictor variables so as to fit polynomial terms).


Dennis Cox 2002-12-01