boxplots
normal probability plots
It is important to observe that none of these procedures are automatic, and in many real situations deciding one way or another is very much a subjective decision. This lab is also designed to familiarize the student with the basic descriptive statistics associated with the correlation between two variables. Both summary statistics and graphical methods will be explored. The methods shown here can be used not only in the two variable applications, but also in other types of problems.
LET NORMAL=ZRN
LET HEAVY=TRN(2)
LET LIGHT=URN
LET ASYMETR=XRN(1)
IF CASE<21 THEN LET SAMPLE=1
IF CASE>20 AND CASE<41 THEN LET SAMPLE=2
IF CASE>40 AND CASE<61 THEN LET SAMPLE=3
IF CASE>60 AND CASE<81 THEN LET SAMPLE=4
IF CASE>80 AND CASE<101 THEN LET SAMPLE=5
These commands fill the NORMAL column with generated data from
a N(0,1) distribution, the HEAVY column with data from a
t-distribution with 2 df, the LIGHT column with data from
a U(0,1) distribution, and the ASYMETR column with data
from a distribution. You do not need to worry about
what these different distributions are for now; you will learn about
them later in the semester.
For this exercise, we want to have 5 groups each with sample sizes of 20. We generated samples of size 100 in the 1st 4 commands above. The second set of commands (involving CASE above) creates a variable called SAMPLE which has values corresponding to the 5 groups. Thus, we have created 5 samples of 20 observations each for each of the 4 situations: NORMAL, HEAVY, LIGHT, and ASYMETR.
The fast way to make the boxplots is to go to the command window and type:
GRAPH
BOX NORMAL*SAMPLE/NSORT
BOX HEAVY*SAMPLE/NSORT
BOX LIGHT*SAMPLE/NSORT
BOX ASYMETR*SAMPLE/NSORT
DATA: In 1798, H. Cavendish set out to find out the density of the earth relative to that of water, using a torsion balance. He conducted 29 very precise experiments assessing this relative density. A working assumption which we begin with is that the values of his measurements were taken from a Normal distribution centered on the true relative density. The data from all of his 29 experiments are given below (taken from Stigler, S., ``Do Robust Estimators Work With Real Data?'', Annals of Statistics, 5, 1977).
5.50 | 5.55 | 5.57 | 5.34 | 5.42 | 5.30 |
5.61 | 5.36 | 5.53 | 5.79 | 5.47 | 5.75 |
4.88 | 5.29 | 5.62 | 5.10 | 5.63 | 5.68 |
5.07 | 5.58 | 5.29 | 5.27 | 5.34 | 5.85 |
5.26 | 5.65 | 5.44 | 5.39 | 5.46 |
LET STDCAV=(CAVEND-5.4479)/0.2209
This creates a standardized version of CAVEND in a new
column called STDCAV.
Also, we want to generate several data sets from a standard normal distribution with the same sample size as the Cavendish data (i.e. 29). In the command window type:
LET Z1=ZRN
LET Z2=ZRN
.
.
.
LET Z10=ZRN
This creates 10 new columns each filled with 29 generated N(0,1)'s.
DATA: The third data set in this lab is found in the Systat data file ROTATE. These data measure the angle of rotation in degrees, ANGLE, and the reaction time in seconds, RT, for a perception study. The experiment measured the time it took subjects to make the same judgements when comparing a picture of a three-dimensional object to a picture of possible rotations of the object.