Stochastic Models in Molecular Evolution and Genetics

Stat 684

Spring 1999

TTh 9:25-10:40am

Instructor Marek Kimmel

Office: DH 1083 Ext. 5255

kimmel@rice.edu


Designed as an intensive introductory course in modern stochastic models in molecular evolutionary biology and population genetics. Starting from statistical models describing primary and secondary structures of nucleic acids, the material will lead to construction of phylogenetic trees reflecting relatedness of nucleic acids in different organisms. Then, stochastic models for different types of mutations will be considered, followed by considerations of genetic drift models, in particular of the coalescent. Models of mutation and drift will be used to describe evolutionary dynamics of neutral genetic loci. Applications include estimation of mutation rates of loci and relatedness of populations, mapping of disease genes and inference concerning past demography of populations.


I. Biological Sequences

Book:Durbin, Eddy, Krogh and Mitchison "Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids" Cambridge University Press, Cambridge 1998

Probabilistic models are becoming increasingly important in analysing the huge amount of data being produced by large-scale DNA-sequencing efforts such as the Human Genome Project. For example, hidden Markov models are used for analysing biological sequences, linguistic-grammar-based probabilistic models for identifying RNA secondary structure, and evolutionary models for inferring of sequences from different organisms. This book gives a unified, up-to-date and self-contained account, with a Bayesian slant, of such methods, and more generally to probabilistic methods of sequence analysis. Written by an interdisciplinary team of authors, it aims to be accessible to molecular biologists computer scientists, and mathematicians with no formal knowledge of the other fields, and at the same time present the state-of-the-art in this new and highly important field.

  1. Pairwise sequence alignment
  2. Multiple alignments
  3. Hidden Markov models
  4. Hidden Markov models applied to biological sequences
  5. The Chomsky hierarchy of formal grammars
  6. RNA and stochastic context-free grammars
  7. Phylogenetic trees
  8. Phylogeny and alignment

II. Dynamics of Mutation, Genetic Drift and Recombination

No particular text will be used. A good reference for models of mutation is Li's "Molecular Evolution" (Sinauer Associates 1997) and for population genetics Hartl and Clark's "Introduction to Population Genetics" (Sinauer Associates, several editions exist).

  1. Models of mutations: Markov chain (MC) mutations, infinite allele model (IAM), infinite sites model (ISM), stepwise mutation models (SMM's)
  2. Genetic drift: Fisher-Wright model, Moran time-discrete model, Fisher-Wright-Moran time-continuous model, coalescence
  3. Neutral theory of evolution: background
  4. Neutral evolution: modeling and estimation for DNA sequence loci, microsatellite loci and single-nucleotide polymorphisms
  5. Recombination and mapping of genes of rare diseases
  6. Branching processes and genetic models: ancestral trees for branching processes, approximation of the Fisher-Wright model
  7. Demography and genetics: past population changes coded in DNA

Grading

Weekly homeworks (50% of the grade)

Two tests (50% of the grade)