Some Reflections On The Foundations Of Statistics

 David A. Freedman
University of California at Berkeley


This talk reviews the basis for inferring causation from regression, proceeding
by example-- simple regressions, path models, simultaneous equations. After that
comes nonlinear graphical models. Causal relationships cannot be inferred from a
data set by running regressions, unless there is substantial prior knowledge
about the mechanism that generates the data. Some kind of invariance assumption is
needed, and exogeneity is a further issue. Parameters need to be invariant to
interventions, a well-known condition. Invariance is also needed for (i) errors or
(ii) error distributions.  Furthermore, "manipulation" theorems for graphical models
can be interpreted in purely probabilistic terms, which permits a clearer view of
the connection between the mathematical framework and causality in the world.
However, there are few successful applications, mainly because causal pathways can
seldom be excluded on a priori grounds. Invariance remains to be assessed.