The data used here were taken from http://www.zoology.ubc.ca/ schluter/bio300www/poissonnotes.html. Quoting from that web page:
The main use of the Poisson distribution is in providing us with a null hypothesis for events in space or time. Deviations from this hypothesis tell us something about real processes in nature. For example, consider the distribution of extinction events over intervals of time across millions of years of fossil history. The best record of extinctions through earth's history come from fossil marine invertebrates, because they have shells and preserve well. Below I list the number of recorded extinctions of marine invertebrate families (a high-level taxonomic category) in 76 intervals of time, ...
The question of interest is whether the pattern of extinction events through the fossil record is "random" in time, or whether instead extinctions tend to be clumped and occur in bursts ("mass extinctions"). Alternatively, extinctions may occur in a dispersed pattern. The easiest way to test this is to compare the frequency distribution of extinctions to that expected from a Poisson distribution using a chi-square goodness of fit test. Our hypotheses are:
Ho: The number of extinctions per time interval has a Poisson distribution Ha: The number of extinctions per time interval does NOT have a Poisson distribution.
In Table 1 we show the data. Because several of the categories have expected frequency less than 5, we grouped the first two categories (corresponding to time periods with 0 and 1 extinction events) and the last 13 categories (corresponding to time periods with 8 or more extinction events). The grouped data are shown in Table 2.
To test the goodness of fit of the Poisson model (which is
that extinction events occur completely at random), we need
to estimate the parameter, which is the mean number of events
during a time interval. The Maximum Likelihood Estimate (MLE)
happens to be the sample mean. This value is
. The expected frequencies were computed with
this value. With the grouped data,
we computed the
value of 23.94996. The number of
degrees of freedom is
, where
is the number of categories, and
is the number of
parameters estimated. The
critical value is
. Since our observed value
exceeds this, we conclude that the observed data do differ
significantly from the Poisson distribution. Examining the
Z-scores in Table 2, we note that there are two values
outside the range 1.960: there are ``too many'' time intervals
with 0 to 1 extinction events (
) and ``too few''
time intervals with 4 events (
). There is also
a ``suggestion'' of ``too many'' intervals with 8 or more
events, although we would not consider
significant.
The results are rather intriguing. The results for 0 to 1 and
for 8 and above suggest that extinction events are ``clumping
together,'' but the result for periods with 4 events is perplexing.