`HOME | SYLLABUS | GENETICS LINKS | HANDOUTS | HOMEWORK | LINKS`
` `
`A tutorial for LINKAGE`
`Zhaoxia Yu`
`yu@rice.edu`
`February 5, 2004`
` `
`This file will give a brief direction about how to solve assigned `
`problems for STAT670. For more ambitious students, please refer site `
` `
`http://linkage.rockefeller.edu/soft/linkage/`
` `
`and Handbook of Human Genetic Linkage 1944 by Joseph Douglas`
`Terwilliger and Jurg Ott,The Johns Hopkins University Press.`
` `
` `
`Where to use LINKAGE? `
`To access linkage, you can 'cd' to /home/helpdesk/bin, e.g.,`
`yu@neyman:~ % cd /home/helpdesk/bin. And your command line will be`
`become "yu@neyman:/home/helpdesk/bin % ". `
` `
`You can also copy all the files in /home/helpdesk/bin to a folder in`
`your home directory. I copied all file to /home/yu/670soft, where I`
`will work.`
` `
` `
`What LINKAGE can do for us? `
`Three major procedures in linkage are to estimate recombination rates`
`and calculate maximum lod score, to construct lod score tables and do`
`risk analysis, to calculate location scores.`
` `
`ILINK: estimate recombination rates and calculate the maximum lod`
`scores for general pedigrees.`
`CILINK: estimate recombination rates and calculate the maximum lod`
`scores for three-generation reference pedigrees.`
` `
`MLINK: give lod score tables and do risk analysis.`
` `
`LINKMAP: calculate location scores for general pedigrees.`
`CMAP: calculate location scores for three-generation reference`
`pedigree.`
` `
` `
`How to use LINKAGE? `
` `
`Step 1: Prepare a .pre data set`
` `
`If you have a prepared pedigree format data, you can ignore this step`
`and go to step 2.`
` `
`We have a sample data set which name sample.pre. I use '.pre' to`
`denote that it is a data before it is transferred to a pedigree form`
`data.`
` `
`We will use the following data set through out the tutorial.`
` `
` `
` `
` `
` `
`yu@neyman:~/670soft % cat sample.pre`
`1 1 0 0 1 2 1 1`
`1 2 0 0 2 1 2 1`
`1 3 1 2 2 1 2 1`
`1 4 1 2 2 2 1 1`
`1 5 1 2 1 2 2 1`
`1 6 1 2 2 1 1 1`
`1 7 1 2 1 1 1 1`
`2 1 0 0 1 2 1 1`
`2 2 0 0 2 1 2 1`
`2 3 1 2 2 1 1 1`
`2 4 1 2 2 2 2 1`
`2 5 1 2 1 2 1 2`
` `
`There is one row per person. `
`The 1st  column is the ID of the pedigree. `
`The 2nd  column is the ID for the person. `
`The 3rd column is the father’s ID of the person in the row. `
`The 4th column is the mother’s ID of the person in the row. `
`The 5th column is gender: 1 is male, 2 is female. `
` `
`Note: the order of the first 5 columns is fixed for all the “.pre”`
`data sets.`
` `
`The remaining columns describe the genotypes and phenotypes. There`
`are 4 formats: numbered alleles, binary factors, affection status,`
`quantitative traits. It must be described in the DATAFILE which type`
`of formats you are using. For codes and more details, see Step 3 of`
`this document. `
` `
`The 6th column in the example is the affection status: 1 is`
`unaffected, 2 is affected. Note: the affection status is also a`
`locus.`
` `
`The 7th and 8th columns in the example are alleles for locus 1. In`
`this example, we use numbered alleles. `
` `
`Of course we can have more loci. You can specify which loci you want`
`to do the analysis in step 3 if you are only interested in part of`
`the loci. `
` `
` `
`Step 2: get a pedigree format data`
` `
`Use command`
`yu@neyman:~/670soft % makeped sample.pre sample.ped n`
`where the last N tells the program that no loops are present and`
`that probands should be selected automatically. We can get a pedigree`
`format file. For more details about makeped program, refer to`
`linkhelp file.`
` `
` `
`Step 3: Write a DATAFILE (description of loci)`
` `
`This can be done by program preplink or you can write one use a`
`standard editor. `
` `
`Using preplink to construct a DATAFILE.`
` `
`yu@neyman:~/670soft % preplink`
`Copyright 1991 CEPH, University of Utah, and Columbia University New`
`York`
` `
` `
` `
`Program PREPLINK version 5.22`
` `
`The program constants are set to the following maxima:`
`    60 loci in mapping problem (maxlocus)`
`    20 alleles at a single locus (maxallele)`
`    64 haplotypes for haplotype frequencies (maxhap)`
`    20 binary codes at a single locus (maxphen)`
`     4 quantitative factor(s) at a single locus (maxtrait)`
`    20 liability classes (maxliab)`
`   200 iterated parameters possible in Ilink (maxiter)`
` `
`NOTES:`
`1. Disk files read as input files are assumed error free.  Errors`
`   will crash the program.`
`2. Presence of the ANSI.SYS driver is assumed.  If it is not`
`installed`
`   you will see strange symbols such as ^[ but they are not harmful.`
` `

Press ENTER to continue

Press Enter and you will see

` `

********** PRESENT STATUS **************

(a) Number of loci               :    2

(c) Calculate Risk               :    N

(d) Mutation                     :    N

(e) Haplotype frequencies        :    N

(f) Locus Order                  :    1 2

(g) Interference                 :    N

(h) Recombination sex difference :    N

(j) Recombination values         :

0.100

*********** OTHER OPTIONS **************

(k) See or modify loci description

`(l) See or modify recombination to vary`

(n) Write datafile

(o) Exit

****************************************

# Press letter to modify or see values

(To be continued)

We need to change the setting for our analysis. Suppose we want to

calculate the lod score for different recombination theta from 0 to

0.5 with increment 0.1.

(a)    is number of loci, including affection status or(and)

continuous traits. In our example, we have two loci, hence no

change needed for (a).

(b)    is to tell the program whether we are using autosomal markers

or X-linked ones. We are considering autosomal markers. Hence

leave it as defaulted.

(c)    If we want to compute genetic risk or not. We don’t want,

hence no change.

(d)    Need mutation or not. For our purpose, we ignore it.

(e)    We assume linkage equilibrium, no change.

(f)    The order of loci. We only have two loci, no change needed.

(g)    Not very useful for our purpose. Ignore it for the moment.

(h)    We assume no difference.

(i)    We will use MLINK for lod score calculation.

(j)    Recombination at which you want to calculate lod score.

Generally we don’t need to change it since we want calculate

lod scores at several recombination rates, which will be

specified in (l).

(k)    See or Modify Loci Description. Type “k” and ENTER, we will

see

(Continued)

k

*******************************************

(1) allele numbers   GENE FREQS :  0.50000  0.50000

(2) allele numbers   GENE FREQS :  0.50000  0.50000

*******************************************

(a) SEE OR MODIFY A LOCUS

(b) DELETE LOCUS

(d) CHANGE ORDER TO CORRESPOND TO PEDIGREE FILE (NOT CHROMOSOME

ORDER)

(e) CHANGE LOCUS TYPE

*******************************************

Press letter to modify values

We need change the first locus (the 6th column in the pre data) to

affection format, so we type e

e

ENTER LOCUS TO CHANGE

1

ENTER NEW LOCUS TYPE:

(a) BINARY FACTORS

(b) QUANTITATIVE TRAIT

(c) AFFECTION STATUS

(d) ALLELE NUMBERS

c

*******************************************

(1) affection status GENE FREQS :  0.50000  0.50000

(2) allele numbers   GENE FREQS :  0.50000  0.50000

*******************************************

(a) SEE OR MODIFY A LOCUS

(b) DELETE LOCUS

(d) CHANGE ORDER TO CORRESPOND TO PEDIGREE FILE (NOT CHROMOSOME

ORDER)

(e) CHANGE LOCUS TYPE

*******************************************

Press letter to modify values

Then we want to change the second locus (column 7 and 8 in the pre

data) to numbered allele format by doing this:

e

ENTER LOCUS TO CHANGE

2

ENTER NEW LOCUS TYPE:

(a) BINARY FACTORS

(b) QUANTITATIVE TRAIT

(c) AFFECTION STATUS

(d) ALLELE NUMBERS

d

*******************************************

(1) affection status GENE FREQS :  0.50000  0.50000

(2) allele numbers   GENE FREQS :  0.50000  0.50000

*******************************************

(a) SEE OR MODIFY A LOCUS

(b) DELETE LOCUS

(d) CHANGE ORDER TO CORRESPOND TO PEDIGREE FILE (NOT CHROMOSOME

ORDER)

(e) CHANGE LOCUS TYPE

*******************************************

Press letter to modify values

f

********** PRESENT STATUS **************

(a) Number of loci               :    2

(c) Calculate Risk               :    N

(d) Mutation                     :    N

(e) Haplotype frequencies        :    N

(f) Locus Order                  :    1 2

(g) Interference                 :    N

(h) Recombination sex difference :    N

(j) Recombination values         :

0.100

*********** OTHER OPTIONS **************

(k) See or modify loci description

(l) See or modify recombination to vary

(n) Write datafile

(o) Exit

****************************************

Press letter to modify or see values

(l)    The recombination to vary, i.e. under which recombination

rate(s) you want LINKAGE to calculate lod score for you.

Next we modify recombination to vary:

l

*********************************************

(a) RECOMBINATION TO VARY             :          1

(b) STARTING VALUE                    : 0.1000

(c) INCREMENT                         : 0.1000

(d) FINISHING VALUE                   : 0.4500

********************************************

Press letter to modify values

b (I want to modify the starting value to 0)

ENTER NEW STARTING VALUE

0

*********************************************

(a) RECOMBINATION TO VARY             :          1

(b) STARTING VALUE                    : 0.0000

(c) INCREMENT                         : 0.1000

(d) FINISHING VALUE                   : 0.4500

********************************************

Press letter to modify values

d (I want to modify the finishing value to 0.5)

ENTER NEW FINISHING VALUE

0.5

*********************************************

(a) RECOMBINATION TO VARY             :          1

(b) STARTING VALUE                    : 0.0000

(c) INCREMENT                         : 0.1000

(d) FINISHING VALUE                   : 0.5000

********************************************

Press letter to modify values

e (I finished my parameter setting and want to return to main

********** PRESENT STATUS **************

(a) Number of loci               :    2

(c) Calculate Risk               :    N

(d) Mutation                     :    N

(e) Haplotype frequencies        :    N

(f) Locus Order                  :    1 2

(g) Interference                 :    N

(h) Recombination sex difference :    N

(j) Recombination values         :

0.000

*********** OTHER OPTIONS **************

(k) See or modify loci description

(l) See or modify recombination to vary

(n) Write datafile

(o) Exit

****************************************

Press letter to modify or see values

(m)    & (n) Read and write DATAFILE. You might have a DATAFILE in

your disk already and you want to make a few changes. You can

read from the old file by choose (m) and make write to a new

files by choose (n). Then in the new file you will have both

the changes you made and the other setting from the old file.

Here we will write a new file since we do not have a old file

to modify.

Now I have done all the changes needed to calculate the lod score

and I will save the description to a DATAFILE which I name

sample.dat:

n

Enter output file name - a file by the same name will be overwritten!

Press only Enter to skip

sample.dat

********** PRESENT STATUS **************

(a) Number of loci               :    2

(c) Calculate Risk               :    N

(d) Mutation                     :    N

(e) Haplotype frequencies        :    N

(f) Locus Order                  :    1 2

(g) Interference                 :    N

(h) Recombination sex difference :    N

(j) Recombination values         :

0.000

*********** OTHER OPTIONS **************

(k) See or modify loci description

(l) See or modify recombination to vary

(n) Write datafile

(o) Exit

****************************************

Press letter to modify or see values

o (Now it’s time to exit)

yu@neyman:~/670soft %

yu@neyman:~/670soft % cat sample.dat

2 0 0 5  << NO. OF LOCI, RISK LOCUS, SEXLINKED (IF 1) PROGRAM

0 0.0 0.0 0 << MUT LOCUS, MUT MALE, MUT FEM, HAP FREQ (IF 1)

1  2

1   2  << AFFECTION, NO. OF ALLELES

0.50000  0.50000 << GENE FREQUENCIES

1 << NO. OF LIABILITY CLASSES

0 0  1.0000 << PENETRANCES

3   2  << ALLELE NUMBERS, NO. OF ALLELES

0.50000  0.50000 << GENE FREQUENCIES

0 0  << SEX DIFFERENCE, INTERFERENCE (IF 1 OR 2)

0.00000000 << RECOMBINATION VALUES

1 0.10000 0.50000 << REC VARIED, INCREMENT, FINISHING VALUE

Now copy sample.ped and sample.dat to pedfile.dat and datafile.dat

separately. The reason to do this is that in the default of next

program UNKNOWN, it will read files from “pedfile.dat” and

“datafile.dat” by default.

yu@neyman:~/670soft % cp sample.ped pedfile.dat

yu@neyman:~/670soft % cp sample.dat datafile.dat

Step4: Invoke UNKNOWN

UNKNOWN is an auxiliary program. It infers possible genotypes and

“datafile.dat” in current directory and produces temporary files

called ipedfile.dat and speedfile as output file. Check your current

directory after you invoke UNKNOWN.

yu@neyman:~/670soft % unknown

Program UNKNOWN version 5.23

The following maximum values are in effect:

20 loci

120 single locus genotypes

15 alleles at a single locus

500 individuals in one pedigree

3 marriage(s) for one male

3 quantitative factor(s) at a single locus

20 liability classes

17 binary codes at a single locus

Opening DATAFILE.DAT

Opening PEDFILE.DAT

YOU ARE USING LINKAGE (V5.23) WITH  2-POINT AUTOSOMAL DATA

Ped.  1

Ped.  2

yu@neyman:~/670soft %

Both ipedfile.dat and speedfile.dat will be used in next step. Leave

it there! Each time you run unknown, you will get those two files. If

they are there already, they will be replaced.

Step 5: Invoke MLINK to finish the calculation.

The temporary files produced from UNKNOWN together with datafile.dat

Copyright (C) CEPH, University of Utah, and Columbia University 1990

The program constants are set to the following maxima:

7 loci in mapping problem (maxlocus)

15 alleles at a single locus (maxall)

19532 recombination probabilities (maxneed)

10000 maximum of censoring array (maxcensor)

256 haplotypes = n1 x n2 x ... where ni = current # alleles at locus i

32896 joint genotypes for a female

32896 joint genotypes for a male

500 individuals in all pedigrees combined (maxind)

50 pedigrees (maxped)

16 binary codes at a single locus (maxfact)

2 quantitative factor(s) at a single locus

10 liability classes

16 binary codes at a single locus

2.00 base scaling factor for likelihood (scale)

3.00 scale multiplier for each locus (scalemult)

0.00000 frequency for elimination of heterozygotes (minfreq)

YOU ARE USING LINKAGE (V5.1) WITH  2-POINT AUTOSOMAL DATA

Maxneed can be reduced to          7

-----------------------------------

-----------------------------------

THETAS  0.500

-----------------------------------

PEDIGREE |  LN LIKE  | LOG 10 LIKE

-----------------------------------

1   -11.090355    -4.816480

2    -8.317766    -3.612360

-----------------------------------

TOTALS      -19.408121    -8.428840

-2 LN(LIKE) =  3.88162421113569e+01 LOD SCORE =     0.000000

Maxcensor can be reduced to     -32749

-----------------------------------

-----------------------------------

THETAS  0.000

-----------------------------------

PEDIGREE |  LN LIKE  | LOG 10 LIKE

-----------------------------------

1 -100000000000000000000.000000 -43429448190325178368.000000

2    -6.931472    -3.010300

-----------------------------------

TOTALS    -100000000000000000000.000000 -43429448190325178368.000000

-2 LN(LIKE) =  2.00000000000000e+20 LOD SCORE = -43429448190325178368.000000

Maxcensor can be reduced to     -32749

-----------------------------------

-----------------------------------

THETAS  0.100

-----------------------------------

PEDIGREE |  LN LIKE  | LOG 10 LIKE

-----------------------------------

1   -13.133657    -5.703875

2    -7.246183    -3.146977

-----------------------------------

TOTALS      -20.379840    -8.850852

-2 LN(LIKE) =  4.07596798689245e+01 LOD SCORE =    -0.422012

Maxcensor can be reduced to     -32749

-----------------------------------

-----------------------------------

THETAS  0.200

-----------------------------------

PEDIGREE |  LN LIKE  | LOG 10 LIKE

-----------------------------------

1   -11.982929    -5.204120

2    -7.585398    -3.294297

-----------------------------------

TOTALS      -19.568327    -8.498417

-2 LN(LIKE) =  3.91366547344441e+01 LOD SCORE =    -0.069577

-----------------------------------

-----------------------------------

THETAS  0.300

-----------------------------------

PEDIGREE |  LN LIKE  | LOG 10 LIKE

-----------------------------------

1   -11.439062    -4.967921

2    -7.925724    -3.442098

-----------------------------------

TOTALS      -19.364786    -8.410020

-2 LN(LIKE) =  3.87295714843840e+01 LOD SCORE =     0.018820

-----------------------------------

-----------------------------------

THETAS  0.400

-----------------------------------

PEDIGREE |  LN LIKE  | LOG 10 LIKE

-----------------------------------

1   -11.171999    -4.851937

2    -8.204437    -3.563142

-----------------------------------

TOTALS      -19.376436    -8.415079

-2 LN(LIKE) =  3.87528727188240e+01 LOD SCORE =     0.013760

-----------------------------------

-----------------------------------

THETAS  0.500

-----------------------------------

PEDIGREE |  LN LIKE  | LOG 10 LIKE

-----------------------------------

1   -11.090355    -4.816480

2    -8.317766    -3.612360

-----------------------------------

TOTALS      -19.408121    -8.428840

-2 LN(LIKE) =  3.88162421113569e+01 LOD SCORE =     0.000000

yu@neyman:~/670soft %

You can see the results from screen. Both likelihoods and lodscores

are calculated. A copy of the results is also automatically saved in

outfile.dat. Another file names stream.dat is also generated.