HOME | SYLLABUS | GENETICS LINKS | HANDOUTS | HOMEWORK | LINKS
 
A tutorial for LINKAGE
Zhaoxia Yu
yu@rice.edu
February 5, 2004
 
This file will give a brief direction about how to solve assigned 
problems for STAT670. For more ambitious students, please refer site 
 
http://linkage.rockefeller.edu/soft/linkage/
 
and Handbook of Human Genetic Linkage 1944 by Joseph Douglas
Terwilliger and Jurg Ott,The Johns Hopkins University Press.
 
 
Where to use LINKAGE? 
To access linkage, you can 'cd' to /home/helpdesk/bin, e.g.,
yu@neyman:~ % cd /home/helpdesk/bin. And your command line will be
become "yu@neyman:/home/helpdesk/bin % ". 
 
You can also copy all the files in /home/helpdesk/bin to a folder in
your home directory. I copied all file to /home/yu/670soft, where I
will work.
 
 
What LINKAGE can do for us? 
Three major procedures in linkage are to estimate recombination rates
and calculate maximum lod score, to construct lod score tables and do
risk analysis, to calculate location scores.
 
ILINK: estimate recombination rates and calculate the maximum lod
scores for general pedigrees.
CILINK: estimate recombination rates and calculate the maximum lod
scores for three-generation reference pedigrees.
 
MLINK: give lod score tables and do risk analysis.
 
LINKMAP: calculate location scores for general pedigrees.
CMAP: calculate location scores for three-generation reference
pedigree.
 
 
How to use LINKAGE? 
 
Step 1: Prepare a .pre data set
 
If you have a prepared pedigree format data, you can ignore this step
and go to step 2.
 
We have a sample data set which name sample.pre. I use '.pre' to
denote that it is a data before it is transferred to a pedigree form
data.
 
We will use the following data set through out the tutorial.
 
 
 
 
 
yu@neyman:~/670soft % cat sample.pre
1 1 0 0 1 2 1 1
1 2 0 0 2 1 2 1
1 3 1 2 2 1 2 1
1 4 1 2 2 2 1 1
1 5 1 2 1 2 2 1
1 6 1 2 2 1 1 1
1 7 1 2 1 1 1 1
2 1 0 0 1 2 1 1
2 2 0 0 2 1 2 1
2 3 1 2 2 1 1 1
2 4 1 2 2 2 2 1
2 5 1 2 1 2 1 2
 
There is one row per person. 
The 1st  column is the ID of the pedigree. 
The 2nd  column is the ID for the person. 
The 3rd column is the father’s ID of the person in the row. 
The 4th column is the mother’s ID of the person in the row. 
The 5th column is gender: 1 is male, 2 is female. 
 
Note: the order of the first 5 columns is fixed for all the “.pre”
data sets.
 
The remaining columns describe the genotypes and phenotypes. There
are 4 formats: numbered alleles, binary factors, affection status,
quantitative traits. It must be described in the DATAFILE which type
of formats you are using. For codes and more details, see Step 3 of
this document. 
 
The 6th column in the example is the affection status: 1 is
unaffected, 2 is affected. Note: the affection status is also a
locus.
 
The 7th and 8th columns in the example are alleles for locus 1. In
this example, we use numbered alleles. 
 
Of course we can have more loci. You can specify which loci you want
to do the analysis in step 3 if you are only interested in part of
the loci. 
 
 
Step 2: get a pedigree format data
 
Use command
yu@neyman:~/670soft % makeped sample.pre sample.ped n
where the last N tells the program that no loops are present and
that probands should be selected automatically. We can get a pedigree
format file. For more details about makeped program, refer to
linkhelp file.
 
 
Step 3: Write a DATAFILE (description of loci)
 
This can be done by program preplink or you can write one use a
standard editor. 
 
Using preplink to construct a DATAFILE.
 
yu@neyman:~/670soft % preplink
Copyright 1991 CEPH, University of Utah, and Columbia University New
York
 
 
 
Program PREPLINK version 5.22
 
The program constants are set to the following maxima:
    60 loci in mapping problem (maxlocus)
    20 alleles at a single locus (maxallele)
    64 haplotypes for haplotype frequencies (maxhap)
    20 binary codes at a single locus (maxphen)
     4 quantitative factor(s) at a single locus (maxtrait)
    20 liability classes (maxliab)
   200 iterated parameters possible in Ilink (maxiter)
 
NOTES:
1. Disk files read as input files are assumed error free.  Errors
   will crash the program.
2. Presence of the ANSI.SYS driver is assumed.  If it is not
installed
   you will see strange symbols such as ^[ but they are not harmful.
 

Press ENTER to continue

 

Press Enter and you will see

 

 

********** PRESENT STATUS **************

(a) Number of loci               :    2

(b) Sexlinked                    :    N

(c) Calculate Risk               :    N

(d) Mutation                     :    N

(e) Haplotype frequencies        :    N

(f) Locus Order                  :    1 2

(g) Interference                 :    N

(h) Recombination sex difference :    N

(i) Program used                 : MLINK

(j) Recombination values         :

       0.100

*********** OTHER OPTIONS **************

(k) See or modify loci description

(l) See or modify recombination to vary

(m) Read datafile

(n) Write datafile

(o) Exit

****************************************

Press letter to modify or see values

(To be continued)

We need to change the setting for our analysis. Suppose we want to

calculate the lod score for different recombination theta from 0 to

0.5 with increment 0.1.

 

(a)    is number of loci, including affection status or(and)

continuous traits. In our example, we have two loci, hence no    

change needed for (a).

(b)    is to tell the program whether we are using autosomal markers

 or X-linked ones. We are considering autosomal markers. Hence

 leave it as defaulted.

(c)    If we want to compute genetic risk or not. We don’t want,

 hence no change.

(d)    Need mutation or not. For our purpose, we ignore it.

(e)    We assume linkage equilibrium, no change.

(f)    The order of loci. We only have two loci, no change needed.

(g)    Not very useful for our purpose. Ignore it for the moment.

(h)    We assume no difference.

(i)    We will use MLINK for lod score calculation.

(j)    Recombination at which you want to calculate lod score.

 Generally we don’t need to change it since we want calculate

 lod scores at several recombination rates, which will be

 specified in (l).

(k)    See or Modify Loci Description. Type “k” and ENTER, we will

 see

(Continued)

k

 

*******************************************

(1) allele numbers   GENE FREQS :  0.50000  0.50000

(2) allele numbers   GENE FREQS :  0.50000  0.50000

*******************************************

(a) SEE OR MODIFY A LOCUS

(b) DELETE LOCUS

(c) ADD LOCUS

(d) CHANGE ORDER TO CORRESPOND TO PEDIGREE FILE (NOT CHROMOSOME

ORDER)

(e) CHANGE LOCUS TYPE

(f) RETURN TO MAIN MENU

*******************************************

Press letter to modify values

 

We need change the first locus (the 6th column in the pre data) to

affection format, so we type e

e

ENTER LOCUS TO CHANGE

1

ENTER NEW LOCUS TYPE:

(a) BINARY FACTORS

(b) QUANTITATIVE TRAIT

(c) AFFECTION STATUS

(d) ALLELE NUMBERS

c

 

 

 

 

 

*******************************************

(1) affection status GENE FREQS :  0.50000  0.50000

(2) allele numbers   GENE FREQS :  0.50000  0.50000

*******************************************

(a) SEE OR MODIFY A LOCUS

(b) DELETE LOCUS

(c) ADD LOCUS

(d) CHANGE ORDER TO CORRESPOND TO PEDIGREE FILE (NOT CHROMOSOME

ORDER)

(e) CHANGE LOCUS TYPE

(f) RETURN TO MAIN MENU

*******************************************

Press letter to modify values

Then we want to change the second locus (column 7 and 8 in the pre

data) to numbered allele format by doing this:

e

ENTER LOCUS TO CHANGE

2

ENTER NEW LOCUS TYPE:

(a) BINARY FACTORS

(b) QUANTITATIVE TRAIT

(c) AFFECTION STATUS

(d) ALLELE NUMBERS

d

 

 

*******************************************

(1) affection status GENE FREQS :  0.50000  0.50000

(2) allele numbers   GENE FREQS :  0.50000  0.50000

*******************************************

(a) SEE OR MODIFY A LOCUS

(b) DELETE LOCUS

(c) ADD LOCUS

(d) CHANGE ORDER TO CORRESPOND TO PEDIGREE FILE (NOT CHROMOSOME

ORDER)

(e) CHANGE LOCUS TYPE

(f) RETURN TO MAIN MENU

*******************************************

Press letter to modify values

 

Now we can return to the main menu:

f

 

 

 

********** PRESENT STATUS **************

(a) Number of loci               :    2

(b) Sexlinked                    :    N

(c) Calculate Risk               :    N

(d) Mutation                     :    N

(e) Haplotype frequencies        :    N

(f) Locus Order                  :    1 2

(g) Interference                 :    N

(h) Recombination sex difference :    N

(i) Program used                 : MLINK

(j) Recombination values         :

       0.100

*********** OTHER OPTIONS **************

(k) See or modify loci description

(l) See or modify recombination to vary

(m) Read datafile

(n) Write datafile

(o) Exit

****************************************

Press letter to modify or see values

 

(l)    The recombination to vary, i.e. under which recombination

 rate(s) you want LINKAGE to calculate lod score for you.

 

Next we modify recombination to vary:

l

 

 

 

 

 

*********************************************

(a) RECOMBINATION TO VARY             :          1

(b) STARTING VALUE                    : 0.1000

(c) INCREMENT                         : 0.1000

(d) FINISHING VALUE                   : 0.4500

(e) RETURN TO MAIN MENU

********************************************

Press letter to modify values

b (I want to modify the starting value to 0)

ENTER NEW STARTING VALUE

0

 

*********************************************

(a) RECOMBINATION TO VARY             :          1

(b) STARTING VALUE                    : 0.0000

(c) INCREMENT                         : 0.1000

(d) FINISHING VALUE                   : 0.4500

(e) RETURN TO MAIN MENU

********************************************

Press letter to modify values

d (I want to modify the finishing value to 0.5)

ENTER NEW FINISHING VALUE

0.5

 

 

*********************************************

(a) RECOMBINATION TO VARY             :          1

(b) STARTING VALUE                    : 0.0000

(c) INCREMENT                         : 0.1000

(d) FINISHING VALUE                   : 0.5000

(e) RETURN TO MAIN MENU

********************************************

Press letter to modify values

e (I finished my parameter setting and want to return to main

menu)

 

 

 

 

********** PRESENT STATUS **************

(a) Number of loci               :    2

(b) Sexlinked                    :    N

(c) Calculate Risk               :    N

(d) Mutation                     :    N

(e) Haplotype frequencies        :    N

(f) Locus Order                  :    1 2

(g) Interference                 :    N

(h) Recombination sex difference :    N

(i) Program used                 : MLINK

(j) Recombination values         :

       0.000

*********** OTHER OPTIONS **************

(k) See or modify loci description

(l) See or modify recombination to vary

(m) Read datafile

(n) Write datafile

(o) Exit

****************************************

Press letter to modify or see values

(m)    & (n) Read and write DATAFILE. You might have a DATAFILE in

 your disk already and you want to make a few changes. You can

 read from the old file by choose (m) and make write to a new

 files by choose (n). Then in the new file you will have both

 the changes you made and the other setting from the old file.

 Here we will write a new file since we do not have a old file

 to modify.

 

Now I have done all the changes needed to calculate the lod score

and I will save the description to a DATAFILE which I name

sample.dat:

 

n

 

 

Enter output file name - a file by the same name will be overwritten!

Press only Enter to skip

sample.dat

 

 

 

 

********** PRESENT STATUS **************

(a) Number of loci               :    2

(b) Sexlinked                    :    N

(c) Calculate Risk               :    N

(d) Mutation                     :    N

(e) Haplotype frequencies        :    N

(f) Locus Order                  :    1 2

(g) Interference                 :    N

(h) Recombination sex difference :    N

(i) Program used                 : MLINK

(j) Recombination values         :

       0.000

*********** OTHER OPTIONS **************

(k) See or modify loci description

(l) See or modify recombination to vary

(m) Read datafile

(n) Write datafile

(o) Exit

****************************************

Press letter to modify or see values

o (Now it’s time to exit)

 

yu@neyman:~/670soft %

 

 

 

yu@neyman:~/670soft % cat sample.dat

 2 0 0 5  << NO. OF LOCI, RISK LOCUS, SEXLINKED (IF 1) PROGRAM

 0 0.0 0.0 0 << MUT LOCUS, MUT MALE, MUT FEM, HAP FREQ (IF 1)

  1  2

1   2  << AFFECTION, NO. OF ALLELES

  0.50000  0.50000 << GENE FREQUENCIES

 1 << NO. OF LIABILITY CLASSES

 0 0  1.0000 << PENETRANCES

3   2  << ALLELE NUMBERS, NO. OF ALLELES

  0.50000  0.50000 << GENE FREQUENCIES

 0 0  << SEX DIFFERENCE, INTERFERENCE (IF 1 OR 2)

  0.00000000 << RECOMBINATION VALUES

 1 0.10000 0.50000 << REC VARIED, INCREMENT, FINISHING VALUE

 

Now copy sample.ped and sample.dat to pedfile.dat and datafile.dat

separately. The reason to do this is that in the default of next

program UNKNOWN, it will read files from “pedfile.dat” and

“datafile.dat” by default.

 

yu@neyman:~/670soft % cp sample.ped pedfile.dat

yu@neyman:~/670soft % cp sample.dat datafile.dat

 

Step4: Invoke UNKNOWN

UNKNOWN is an auxiliary program. It infers possible genotypes and

mating combinations for parents with unknown genotypes for ILINK, NLINK

and LINKMAP. UNKNOWN reads the file “pedfile.dat” and

“datafile.dat” in current directory and produces temporary files

called ipedfile.dat and speedfile as output file. Check your current

directory after you invoke UNKNOWN.

yu@neyman:~/670soft % unknown

Program UNKNOWN version 5.23

The following maximum values are in effect:

      20 loci

     120 single locus genotypes

      15 alleles at a single locus

     500 individuals in one pedigree

       3 marriage(s) for one male

       3 quantitative factor(s) at a single locus

      20 liability classes

      17 binary codes at a single locus

Opening DATAFILE.DAT

Opening PEDFILE.DAT

YOU ARE USING LINKAGE (V5.23) WITH  2-POINT AUTOSOMAL DATA

Ped.  1

Ped.  2

yu@neyman:~/670soft % 

 

Both ipedfile.dat and speedfile.dat will be used in next step. Leave

it there! Each time you run unknown, you will get those two files. If

they are there already, they will be replaced.

 

Step 5: Invoke MLINK to finish the calculation.

The temporary files produced from UNKNOWN together with datafile.dat

will be input files of MLINK. Invoke mlink

 

yu@neyman:~/670soft % mlink

Copyright (C) CEPH, University of Utah, and Columbia University 1990

Program MLINK version  5.10

 

The program constants are set to the following maxima:

     7 loci in mapping problem (maxlocus)

    15 alleles at a single locus (maxall)

 19532 recombination probabilities (maxneed)

 10000 maximum of censoring array (maxcensor)

   256 haplotypes = n1 x n2 x ... where ni = current # alleles at locus i

 32896 joint genotypes for a female

 32896 joint genotypes for a male

   500 individuals in all pedigrees combined (maxind)

    50 pedigrees (maxped)

    16 binary codes at a single locus (maxfact)

     2 quantitative factor(s) at a single locus

    10 liability classes

    16 binary codes at a single locus

    2.00 base scaling factor for likelihood (scale)

    3.00 scale multiplier for each locus (scalemult)

 0.00000 frequency for elimination of heterozygotes (minfreq)

 

YOU ARE USING LINKAGE (V5.1) WITH  2-POINT AUTOSOMAL DATA

Maxneed can be reduced to          7

-----------------------------------

-----------------------------------

THETAS  0.500

-----------------------------------

PEDIGREE |  LN LIKE  | LOG 10 LIKE

-----------------------------------

        1   -11.090355    -4.816480

        2    -8.317766    -3.612360

-----------------------------------

TOTALS      -19.408121    -8.428840

-2 LN(LIKE) =  3.88162421113569e+01 LOD SCORE =     0.000000

Maxcensor can be reduced to     -32749

-----------------------------------

-----------------------------------

THETAS  0.000

-----------------------------------

PEDIGREE |  LN LIKE  | LOG 10 LIKE

-----------------------------------

        1 -100000000000000000000.000000 -43429448190325178368.000000

        2    -6.931472    -3.010300

-----------------------------------

TOTALS    -100000000000000000000.000000 -43429448190325178368.000000

-2 LN(LIKE) =  2.00000000000000e+20 LOD SCORE = -43429448190325178368.000000

Maxcensor can be reduced to     -32749

-----------------------------------

-----------------------------------

THETAS  0.100

-----------------------------------

PEDIGREE |  LN LIKE  | LOG 10 LIKE

-----------------------------------

        1   -13.133657    -5.703875

        2    -7.246183    -3.146977

-----------------------------------

TOTALS      -20.379840    -8.850852

-2 LN(LIKE) =  4.07596798689245e+01 LOD SCORE =    -0.422012

Maxcensor can be reduced to     -32749

-----------------------------------

-----------------------------------

THETAS  0.200

-----------------------------------

PEDIGREE |  LN LIKE  | LOG 10 LIKE

-----------------------------------

        1   -11.982929    -5.204120

        2    -7.585398    -3.294297

-----------------------------------

TOTALS      -19.568327    -8.498417

-2 LN(LIKE) =  3.91366547344441e+01 LOD SCORE =    -0.069577

-----------------------------------

-----------------------------------

THETAS  0.300

-----------------------------------

PEDIGREE |  LN LIKE  | LOG 10 LIKE

-----------------------------------

        1   -11.439062    -4.967921

        2    -7.925724    -3.442098

-----------------------------------

TOTALS      -19.364786    -8.410020

-2 LN(LIKE) =  3.87295714843840e+01 LOD SCORE =     0.018820

-----------------------------------

-----------------------------------

THETAS  0.400

-----------------------------------

PEDIGREE |  LN LIKE  | LOG 10 LIKE

-----------------------------------

        1   -11.171999    -4.851937

        2    -8.204437    -3.563142

-----------------------------------

TOTALS      -19.376436    -8.415079

-2 LN(LIKE) =  3.87528727188240e+01 LOD SCORE =     0.013760

-----------------------------------

-----------------------------------

THETAS  0.500

-----------------------------------

PEDIGREE |  LN LIKE  | LOG 10 LIKE

-----------------------------------

        1   -11.090355    -4.816480

        2    -8.317766    -3.612360

-----------------------------------

TOTALS      -19.408121    -8.428840

-2 LN(LIKE) =  3.88162421113569e+01 LOD SCORE =     0.000000

yu@neyman:~/670soft %

 

You can see the results from screen. Both likelihoods and lodscores

are calculated. A copy of the results is also automatically saved in

outfile.dat. Another file names stream.dat is also generated.