Suppose X is a r.v. (random variable) with density
. If its distribution is discrete, then
the density is also known as the probability mass
function and in fact gives the ``point'' probabilities:
Probabilities of sets with multiple points are obtained by summing:
If the distribution of X is continuous, then all point probabilities are 0 and the density gives probabilities of intervals through integration:
For The mathematical expectation of a function of a r.v. is defined to be
The formulae displayed here illustrate the
general principle: summations for discrete r.v.'s
and integration for continuous r.v.'s. Also,
whenever the limits of integration or summation
are not shown, it is assumed that they are over
the entire ``space,'' which effectively means
all values of x where .
Technical Note:
To define E[h(X)], it is usually required that either
(i) so that the summands or integrand is never
negative, and then E[h(X)] may possibly be
, or
(ii)
, which means effectively
that the summation or integration converges absolutely.
We will generally not bother ourselves with such details.
We always asssume that the integral or summation satisfies
whatever mathematical properties are needed for things to
make sense. There are very few practical situations where
problems of infinite expectation arise.
The connection between mathematical expectation and data
comes through the notion of long run averages:
If ,
,
,
is a sample of realized
values of the r.v. X, then as
the sample mean
tends to
E[h(X)]. The precise mathematical formulation of
this ``principle'' is the Law of Large Numbers, which
makes certain assumptions on how the sample is generated
(e.g. that the sample are realized values of independent
and identically distributed (abbreviated i.i.d.) random
variables with the same distribution as X).
Of course, there are certain mathematical expectations which are of most interest, namely the mean and variance of the r.v.:
Some useful properties of these mathematical ``operators'' are summarized in the next proposition. Note that if X and Y are r.v.'s, then so are h(X) and g(X,Y) for any appropriately defined real valued functions h(x) and g(x,y).
Proof. Part (i) is proved in Hogg & Craig. For part (ii), assuming X is a continuous r.v. we have
For (iii) we apply the version of Chebyshev's inequality that
says if X is a nonnegative r.v. and c > 0 then
P[X > c] E[X]/c. See Hogg & Craig. Since
E[X] = 0, it follows that P[X > c] = 0 for all
c > 0. In particular, P[X > 0] =
= 0. Since we
know
we have 1 =
=
P[X = 0] + P[X > 0] = P[X=0].
Proof.
Since the r.v. is nonnegative, it follows
that
=
0 by part (ii)
of Proposition 1. Continuing with the fact that
0, if 0 =t
=
,
then by part (iii) of Proposition 1 it follows that
the r.v.
= 0 with probability 1, i.e.\
with probability 1. Since
is a constant,
the completes the proof of part (i) of Proposition 2.
For part (ii), note from part (i) of Proposition 1 that E[a X + b] = a E[X] + b, so
Part (iii) is already proved in Hogg & Craig, right after
the definition of variance.
The corresponding ``sample'' quantities are given by
An ``alternative'' sample variance is sometimes considered::
The difference between the two sample variances is unimportant when n is large.