Project: Estimate recombination rate from linkage:


To reduce the grading work of the TA, please do not copy everything from the output file.

Select the proper part and highlight your key results, please.


First, read “A tutorial for LINKAGE”.


Problem1: Here is a small family. The color-filled individuals are affected. We also have 

two-allele marker information for each person. We assume the disease is dominant.


1)      Write down the likelihood of recombination rate and find the value that maximizes the

likelihood. You can either solve the equation or numerically or just search several points

and find the one gives maximum likelihood value. (Hint: father is phase unknown. There are

two possibilities)


2)      Use the linkage package to do it. You can download the pedfile from here: ex1.ped.

Interpret the 2nd and 5th lines of the ex1.ped pedfile; that is, explain what each entry in each

line tells you.


Parameter setting:

assume dominant disease, i.e., set penetrances for genotype (1,1), (1,2), (2,2) to be 0,1,1;

set disease allele frequencies to be 0.99999 and 0.00001;

set marker allele frequencies to be 0.5 and 0.5.


Follow the instruction and use the above parameter setting to create a datafile for MLINK.

Run the MLINK and explain the results. Please include a figure of lod score v.s. \theta.


3)      Then use ILINK to find the recombination rate that has maximum lod score.

Here is the prepared datafile for ILINK: ex1ilink.dat. Run ILINK and explain the results.

The way to run ILINK is similar to the way to run MLINK. First copy ex1.ped to pedfile.dat

and copy ex1ilink.dat to datafile.dat. Then invoke UNKNOWN (case insensitive). Finally invoke

ILINK(case insensitive).


The ln(likelihood) and lod scores can be found from final.dat in your directory. Calculate the P value.


Compare results from 1) 2) and 3). Are they consistent?


                                                    Family 1

4)      Now, double the kids hence we have a new pedfile ex11.ped. See Family 2 below.

You can use the same datafile for ILINK of Family 1 since they have the same parameter

setting. Run ILINK, calculate the point estimate and the P-value. See the difference with

the result from Family 1.


                                                            Family 2

Problem 2:

Consider a larger data set including 6 families. Download pedin.dat and datain.dat to your directory.

We have an affection status and another 11 markers. We want to know the recombination rates

between the affection status and the first 6 markers. A batch file is read to run. Download from here:

pedin2. This script file is created by LINKAGE CONTROL PROGRAM(LCP). When LCP is

invoked, you will be asked questions about your parameter setting; what you want to estimate,

where are your pedfile and datafile located et.al. .In your command line: type pedin2 and check the

result in final.out in your current directory.


In case pedin2 doesn’t work, make it executable first:

yu@neyman:/home/helpdesk/bin % chmod +x pedin2


1)      Describe the family structure from pedin.dat. ‘pedin.dat’ is a pedigree data hence has the format

      is pedigree format. Draw the family tree of the first family.

2)      Check the results in final.out and explain the relation of the disease with the first 6 markers in the data.


Project due: Friday, April, 02, 2004.


Problems? Email Zhaoxia Yu @ yu@rice.edu


Last update 3/23/2004.