next up previous
Next: About this document Up: Theory: Covariance & Correlation Previous: Covariance and Correlation.

Relation of Covariance and Correlation to (Linear) Prediction.

Suppose (X,Y) are jointly distributed random variables and we wish to ``predict'' Y from X. This situation arises for instance when we will make an observation of X but not of Y. As an example, X may be years of education and Y may be annual income. It is typically easy to get data on X and somewhat harder to get data on Y, so we may wish to predict income from education. Our predictor of Y given X will of course be some function of X, say tex2html_wrap_inline989 . We will measure the accuracy of prediction by so called Mean Squared Prediction Error:

displaymath955

One can in fact find the best predictor (in the sense of minimizing MSPE) over all predictors, i.e.\ all functions of X, namely

displaymath956

where tex2html_wrap_inline993 is the conditional density

displaymath957

However, it is sometimes desirable to restrict the class of predictors to so-called linear predictors, which are predictors of the form

displaymath958

where a and b are constants. It is easy enough to find the best linear predictor. For convenience, assume for now

  equation346

We will show how to deal with nonzero means in a moment.

eqnarray349

Taking (partial) derivatives and setting equal to 0 gives

eqnarray351

which gives

eqnarray353

One can check that this indeed gives a minimum. If (10) doesn't hold, then replace X by tex2html_wrap_inline1003 and similarly for Y. The previous result applies then to tex2html_wrap_inline1007 and tex2html_wrap_inline1009 and says the best linear predictor for tex2html_wrap_inline1009 given tex2html_wrap_inline1007 is

eqnarray361

To get the best linear predictor of Y = tex2html_wrap_inline1019 , we simply add tex2html_wrap_inline1021 :

  eqnarray373

Note that the corresponding optimal coefficients are

  eqnarray382

When one has data tex2html_wrap_inline919 , tex2html_wrap_inline921 , tex2html_wrap_inline627 , tex2html_wrap_inline925 , it is common to consider the linear function of x which best predicts y in the data set. This leads to the so-called least squares regression line, which is defined by the slope a and intercept b which minimize the Residual Sum of Squares (abbreviated RSS)

displaymath959

It is easy enough to derive the least squares estimates of a and b:

  eqnarray395

Note the similarity with (12). In fact, the least squares line is perhaps best thought of as a sample based estimate of the best linear predictor. Thus, for example, if we wish to estimate the best linear predictor of income given education, we simply do least squares regression with a sample of (education, income) data.


next up previous
Next: About this document Up: Theory: Covariance & Correlation Previous: Covariance and Correlation.

Dennis Cox
Tue Jan 21 09:20:27 CST 1997