next up previous
Next: About this document ...

Lab 2: Normality Diagnostics and Regression-Correlation Descriptive Statistics

1.
Open a new Systat File and name the columns NORMAL, HEAVY, LIGHT, and ASYMETR. The abbreviations are for heavy-tailed, light-tailed and asymmetric. Use the Fill Worksheet option under the Data menu to fill the worksheet to 100 rows. Go to the command window and type:



LET NORMAL=ZRN

LET HEAVY=TRN(2)

LET LIGHT=URN

LET ASYMETR=XRN(1)



IF CASE<21 THEN LET SAMPLE=1

IF CASE>20 AND CASE<41 THEN LET SAMPLE=2

IF CASE>40 AND CASE<61 THEN LET SAMPLE=3

IF CASE>60 AND CASE<81 THEN LET SAMPLE=4

IF CASE>80 AND CASE<101 THEN LET SAMPLE=5



These commands fill the NORMAL column with generated data from a N(0,1) distribution, the HEAVY column with data from a t-distribution with 2 df, the LIGHT column with data from a U(0,1) distribution, and the ASYMETR column with data from a $\chi^2(1)$ distribution. You do not need to worry about what these different distributions are for now; you will learn about them later in the semester.

For this exercise, we want to have 5 groups each with sample sizes of 20. We generated samples of size 100 in the 1st 4 commands above. The second set of commands (involving CASE above) creates a variable called SAMPLE which has values corresponding to the 5 groups. Thus, we have created 5 samples of 20 observations each for each of the 4 situations: NORMAL, HEAVY, LIGHT, and ASYMETR.

2.
Now we want to try out our normality diagnostics on some real data. The first data set we will consider is the Cavendish Data.

DATA: In 1798, H. Cavendish set out to find out the density of the earth relative to that of water, using a torsion balance. He conducted 29 very precise experiments assessing this relative density. A working assumption which we begin with is that the values of his measurements were taken from a Normal distribution centered on the true relative density. The data from all of his 29 experiments are given below (taken from Stigler, S., ``Do Robust Estimators Work With Real Data?'', Annals of Statistics, 5, 1977).

5.50 5.55 5.57 5.34 5.42 5.30
5.61 5.36 5.53 5.79 5.47 5.75
4.88 5.29 5.62 5.10 5.63 5.68
5.07 5.58 5.29 5.27 5.34 5.85
5.26 5.65 5.44 5.39 5.46  

3.
The goal of descriptive statistics in a regression-correlation setting is to determine if and to what extent a linear relationship exists between two variables.

DATA: The third data set in this lab is found in the Systat data file ROTATE. These data measure the angle of rotation in degrees, ANGLE, and the reaction time in seconds, RT, for a perception study. The experiment measured the time it took subjects to make the same judgements when comparing a picture of a three-dimensional object to a picture of possible rotations of the object.



 
next up previous
Next: About this document ...
Dennis Cox
1/28/1998