next up previous
Next: Solution to Exercise 14.08. Up: No Title Previous: Solution to Exercise 12.04.

Solution to Exercise 12.12.

  Minitab output:
Regression Analysis


The regression equation is
removal = 0.63 + 0.652 loading

Predictor        Coef       StDev          T        P
Constant        0.626       2.135       0.29    0.774
loading       0.65229     0.04041      16.14    0.000

S = 5.715       R-Sq = 95.6%     R-Sq(adj) = 95.2%

Analysis of Variance

Source            DF          SS          MS         F        P
Regression         1      8510.9      8510.9    260.56    0.000
Residual Error    12       392.0        32.7
Total             13      8902.9

Obs    loading    removal         Fit   StDev Fit    Residual    St Resid
  1          3       4.00        2.58        2.05        1.42        0.27  
  2          8       7.00        5.84        1.92        1.16        0.21  
  3         10       8.00        7.15        1.88        0.85        0.16  
  4         11       8.00        7.80        1.85        0.20        0.04  
  5         13      10.00        9.11        1.81        0.89        0.16  
  6         16      11.00       11.06        1.75       -0.06       -0.01  
  7         27      16.00       18.24        1.58       -2.24       -0.41  
  8         30      26.00       20.19        1.55        5.81        1.06  
  9         35      21.00       23.46        1.53       -2.46       -0.45  
 10         37       9.00       24.76        1.53      -15.76       -2.86R 
 11         38      31.00       25.41        1.53        5.59        1.01  
 12         44      30.00       29.33        1.55        0.67        0.12  
 13        103      75.00       67.81        3.08        7.19        1.49  
 14        142      90.00       93.25        4.51       -3.25       -0.93 X

R denotes an observation with a large standardized residual
X denotes an observation whose X value gives it large influence.

(a) The least squares line is

removal = 0.63 + 0.652 loading

or, in more conventional notation and using the more accurate values of the estimates given in the ANOVA table:

\begin{displaymath}
\hat{y} \; = \; 0.626 \, + \, 0.65229 x .\end{displaymath}

Note: I like to write this as $\hat{y}$ since it is the estimated (conditional) mean of the population of y values corresponding to the given x value. Besides using the estimated coefficients, it doesn't include the ``error'' term $\epsilon$ as in equation (12.1), p. 492.

(b) I had the residuals printed out (one of the minitab options in regression). Looking in the list of residuals, I see the predicted value at x = 35 is $\hat{y}$ = 23.46, from Obs number 9, and the residual is -2.46 = 21-23.46 = $y_9 - \hat{y}$.

(c) The SSE is 392.0 and the MSE is S2 = 32.7, so the point estimate of $\sigma$ is $S = \sqrt{32.7} = 5.718$, which appears in the minitab output as S = 5.715, a presumably more accurate value.

(d) The proportion of variation explained by the regression is R-Sq = 95.6%.

(e) It's easy enough to delete those two observations and rerun Minitab:

Regression Analysis


The regression equation is
removal = 2.29 + 0.564 loading

Predictor        Coef       StDev          T        P
Constant        2.289       3.166       0.72    0.486
loading        0.5645      0.1202       4.69    0.001

S = 5.584       R-Sq = 68.8%     R-Sq(adj) = 65.7%

Analysis of Variance

Source            DF          SS          MS         F        P
Regression         1      687.13      687.13     22.04    0.001
Residual Error    10      311.79       31.18
Total             11      998.92

Unusual Observations
Obs    loading    removal         Fit   StDev Fit    Residual    St Resid
 10       37.0       9.00       23.17        2.36      -14.17       -2.80R 

R denotes an observation with a large standardized residual
The intercept estimate changes from 0.63 to 2.29. The slope estimate changes from 0.652 to 0.564. The r2 went down from $95.6\%$ to $68.8\%$, which is a substantial drop. We conclude these two value do have a big effect on the regression equation and the proportion of variation explained by the regression. The question comes up: do we really trust these values and the regression line fitted with them included? That has to be decided by the scientist.

Extra: Find 95% CI for the slope, and 90% PI when x = 50 and x = 200.

The 95% CI is

\begin{displaymath}
\hat{\beta}_1 \pm t_{.025,14-2}*SE{\beta_1}
 \; = \; 0.65229 \pm 2.179*0.04041\end{displaymath}

\begin{displaymath}
\; = \; 0.65229 \pm 0.088053
 \; = \; ( 0.56424, \; 0.74034)\end{displaymath}

I had to rerun minitab to get the prediction intervals (without a lot of calculation on my part). To get it to work, I entered a new column with the values 50 and 200, and input the column name (c3) to ``Prediction intervals for new observations'' field in the Options dialogue box. I also changed the default confidence level from 95 to 90.

The results are:

Predicted Values

     Fit  StDev Fit         90.0% CI             90.0% PI
       *          *   (       *,       *)  (       *,       *)   
       *          *   (       *,       *)  (       *,       *)   
       *          *   (       *,       *)  (       *,       *)   
       *          *   (       *,       *)  (       *,       *)   
   33.24       1.62   (   30.36,   36.12)  (   22.66,   43.83)   
  131.08       6.76   (  119.03,  143.14)  (  115.30,  146.87) XX
X  denotes a row with X values away from the center
XX denotes a row with very extreme X values
I don't know why minitab printed out the first 4 lines; I left them in anyway. The estimated value at x = 50 is $\hat{y}$= 33.24 and a 90% PI is ( 22.66, 43.83). The corresponding point estimate at x = 200 is $\hat{y}$ = 131.08 and the 90% PI is ( 115.30, 146.87). Note that minitab flags this last value as being very far from the center of the data, suggesting that the prediction is not to be relied on unless we are willing to believe in the linear regression model at large distances from where we have observations.


next up previous
Next: Solution to Exercise 14.08. Up: No Title Previous: Solution to Exercise 12.04.
Dennis Cox
5/3/2001