Next: About this document ... Up: Variable Selection in Regression Previous: Cross Validation

Conclusions

The models selected by Stepwise in the first two tries had fewer variables than those selected by Best Subsets. The final model selected by Cross Validation is probably the best for prediction because the cross validation method has fewer assumptions than and directly estimates the prediction error when one predicts on new observations.

Cross Validation can be used in many other settings. For example, when the response is binary or categorical (usually referred to as a classification problem), then with cross validation we can estimate the error rate. There are other variations on cross validation, a particularly popular one being ``leave one out'' wherein each single observation is left out, predicted with a regression fit to the rest of the data. This cross validation method can be used with smaller sample sizes than the subsampling method we considered.

Dennis Cox 2002-12-01