The best subsets regression with the original origin variable results are:
Best Subsets Regression
Response is mpg
c d
y i
l s w o
i p e r
n l i y i
d a g a e g
Adj. e c h h c a i
Vars R-Sq R-Sq C-p s r e p t c r n
1 69.3 69.2 273.2 4.3327 X
1 64.8 64.7 368.7 4.6351 X
2 80.8 80.7 26.6 3.4272 X X
2 74.1 74.0 171.4 3.9835 X X
3 81.7 81.6 8.7 3.3476 X X X
3 80.9 80.7 27.7 3.4276 X X X
4 81.8 81.6 9.3 3.3460 X X X X
4 81.8 81.6 9.3 3.3461 X X X X
5 82.0 81.8 7.1 3.3325 X X X X X
5 82.0 81.8 7.4 3.3339 X X X X X
6 82.1 81.8 6.7 3.3262 X X X X X X
6 82.1 81.8 7.5 3.3299 X X X X X X
7 82.1 81.8 8.0 3.3277 X X X X X X X
After recoding the origin variable, the results are:
Best Subsets Regression
Response is mpg
c d
y i
l s w
i p e j
n l i y e a
d a g a e u p
Adj. e c h h c a r a
Vars R-Sq R-Sq C-p s r e p t c r o n
1 69.3 69.2 281.6 4.3327 X
1 64.8 64.7 378.4 4.6351 X
2 80.8 80.7 31.9 3.4272 X X
2 74.1 74.0 178.6 3.9835 X X
3 81.2 81.1 25.1 3.3952 X X X
3 81.1 80.9 28.8 3.4106 X X X
4 81.9 81.7 12.3 3.3374 X X X X
4 81.3 81.1 25.4 3.3926 X X X X
5 82.1 81.8 10.7 3.3268 X X X X X
5 81.9 81.7 13.4 3.3380 X X X X X
6 82.3 82.0 8.1 3.3113 X X X X X X
6 82.3 82.0 8.6 3.3136 X X X X X X
7 82.4 82.1 7.6 3.3050 X X X X X X X
7 82.3 82.0 8.8 3.3098 X X X X X X X
8 82.4 82.1 9.0 3.3065 X X X X X X X X
The best subsets gives the two best variable subsets
for each subset size except for all variables. Note in
the first 4 selections how weight and
displacement switch back and forth replacing
each other with weight being better. Heavier cars
tend to have larger engines, but the car weight is
a better predictor. year and the origin variables
appear next as best predictors with the others falling
in later. The best value of
is
which includes
all variables except cylinder (number of cylinders)
and acc (acceleration). Both of these may be good
predictors (I would certainly expect an 8 cylinder engine
to be less gas efficient than a 4 cylinder engine), but
are not included presumably because other variables are
highly correlated with these (e.g. displacement) and do a
better job of predicting.
When we dropped the variables cylinder and acc, and took out the test data, the results from Best Subset did not change:
Best Subsets Regression
Response is mpg
d
i
s w
p e j
l i y e a
a g e u p
Adj. c h h a r a
Vars R-Sq R-Sq C-p s e p t r o n
1 69.4 69.3 233.9 4.3909 X
1 64.2 64.1 328.4 4.7475 X
2 80.8 80.7 26.7 3.4783 X X
2 73.4 73.2 162.9 4.1004 X X
3 81.1 81.0 23.1 3.4559 X X X
3 81.1 80.9 23.6 3.4584 X X X
4 81.8 81.6 12.8 3.3988 X X X X
4 81.2 81.0 23.5 3.4531 X X X X
5 82.1 81.8 9.3 3.3759 X X X X X
5 81.8 81.6 14.3 3.4013 X X X X X
6 82.4 82.0 7.0 3.3586 X X X X X X