(30 pts)
In assessing street stability, the "rate of rutting" was
measured on 31 experimental asphalt pavements. Five independent
or predictor variables were used to specify the conditions under
which asphalt was prepared, while a sixth "dummy" variable was
used to express the difference between the two separate "blocks"
of runs into which the experiment was divided (15 in one block,
16 in the other). The equation used to fit the data was
Y = B0 + B1X1 + B2X2 + B3X3 + B4X4 + B5X5 + B6X6 + E
where
Y = log(change of rut depth in inches per million wheel passes)
X1 = log(viscosity of asphalt)
X2 = percent asphalt in surface course
X3 = percent asphalt in base course
X4 = dummy variable to separate two sets of runs
X5 = percent fines in surface course
X6 = percent voids in surface course
You may assume that the above equation is "complete" in the
sense that it includes all the relevant terms. Your assignment
is to select a suitable subset of these terms as the "best"
regression equation under the circumstances. The residual
sums of squares for all 64 of the possible regressions are
given below, together with dummy indices for each X indicating
whether the corresponding term is present in (1) or absent from (0)
the model being fit. (note: n the number of observations=31;
the number of models fit = 64.)
The first row, with all Variables 0, is simply the constant fit.
X1 X2 X3 X4 X5 X6 Residual Sum of Squares
0 0 0 0 0 0 11.0580
1 0 0 0 0 0 0.6070
0 1 0 0 0 0 10.7950
1 1 0 0 0 0 0.4990
0 0 1 0 0 0 10.6630
1 0 1 0 0 0 0.6000
0 1 1 0 0 0 10.1680
1 1 1 0 0 0 0.4980
0 0 0 1 0 0 1.5220
1 0 0 1 0 0 0.5820
0 1 0 1 0 0 1.2180
1 1 0 1 0 0 0.4500
0 0 1 1 0 0 1.4530
1 0 1 1 0 0 0.5810
0 1 1 1 0 0 1.0410
1 1 1 1 0 0 0.4410
0 0 0 0 1 0 9.9220
1 0 0 0 1 0 0.5970
0 1 0 0 1 0 9.4790
1 1 0 0 1 0 0.4770
0 0 1 0 1 0 9.8910
1 0 1 0 1 0 0.5820
0 1 1 0 1 0 9.3620
1 1 1 0 1 0 0.4750
0 0 0 1 1 0 1.3970
1 0 0 1 1 0 0.5690
0 1 0 1 1 0 1.0300
1 1 0 1 1 0 0.4130
0 0 1 1 1 0 1.3830
1 0 1 1 1 0 0.5610
0 1 1 1 1 0 0.9580
1 1 1 1 1 0 0.4120
0 0 0 0 0 1 9.1960
1 0 0 0 0 1 0.5760
0 1 0 0 0 1 9.1920
1 1 0 0 0 1 0.3670
0 0 1 0 0 1 8.8480
1 0 1 0 0 1 0.5670
0 1 1 0 0 1 8.8380
1 1 1 0 0 1 0.3650
0 0 0 1 0 1 1.5070
1 0 0 1 0 1 0.5580
0 1 0 1 0 1 1.1920
1 1 0 1 0 1 0.3230
0 0 1 1 0 1 1.4370
1 0 1 1 0 1 0.5550
0 1 1 1 0 1 0.9950
1 1 1 1 0 1 0.3110
0 0 0 0 1 1 7.6800
1 0 0 0 1 1 0.5740
0 1 0 0 1 1 7.6790
1 1 0 0 1 1 0.3640
0 0 1 0 1 1 7.6780
1 0 1 0 1 1 0.5610
0 1 1 0 1 1 7.6750
1 1 1 0 1 1 0.3640
0 0 0 1 1 1 1.3520
1 0 0 1 1 1 0.5530
0 1 0 1 1 1 1.0240
1 1 0 1 1 1 0.3130
0 0 1 1 1 1 1.3420
1 0 1 1 1 1 0.5450
0 1 1 1 1 1 0.9390
1 1 1 1 1 1 0.3070
a) The information provided in the above table is enough to
fit plots of R^2, R^2_adj, and C_p for all of the models.
Describe how.
b) Produce plots using the above three criteria. If some
of your values are excessively large, produce one plot
including all values and another truncated so as to highlight
those of interest. Include the line C_p = p in the C_p plot
for visual reference.
c) Produce a table of the corresponding values, in the form
Variables R^2 R^2_adj C_p
d) Choose the best 4 or 5 models based on the above.
What variables do they have in common?
Which variable values would you like to know?