Phylogenomics is a term coined by Eisen and Hannawalt (1999) that describes a methodology that relies upon the information gained from whole genome sequencing. The method involves several steps with feedback loops for correction and/or reinterpretation of the data as each component of the analyses proceeds. This methodology has been used in a less formally stated way by many bioinformaticians but is elegantly illustrated and defined by Eisen and Hannawalt. The data used are in the form of any genetic information possible from whole genomes, i.e., gene loss, gene gain, gene duplication, path-way presence/partial pathway, and species/gene trees correlation. One criticism of phylogenomics has been that horizontal gene transfer is not sufficiently accounted for. We base our systems evolution approach upon the phylogenomics outline but with several important additions. Our additions are numbered 12 –17.

 

 

 

Systems Evolution

 


Component
How is it determined
Uses of this component
Gene analysis
(1)Database of genes of interest Personal choice, characterized genes Similarity searches (2)
(2) Searching for homologs Blast, PSI-blast Presence/absence (4); gene tree (7)
(3) Functional predictions Overlay known functions of genes onto gene tree Prediction of phenotypes (6), func-tional evolution
Genome analysis
(4) Gene presence/absence in species (4) Gene presence/absence in species Searches (2) of complete genome sequences. Some refinement from evolutionary analysis (7, 10) Evolutionary analysis (8, 10)
(5) Correlated presence/absence Analyze presence/absence (4) in different species Functional predictions (3), pathway evolution (11)
(6) Phenotype predictions Combine functional predictions (3), presence/absence (4) and pathway evolution (11) Identify universal activities.
Evolutionary analysis
(7) Gene trees Set homology threshold for searches (2) and use phylogenetic analysis of all homologs. Presence/absence (4); identifying evolutionary events (10), functional predictions (3)
(8) EDPs (Evolutionary Distribution Patterns) Overlay gene presence/absence (4) onto species tree Identifying gene evolutionary events; pathway evolution.
(9) Congruence Compare gene tree (7) to species tree Compare gene tree (7) to species tree
(10) Gene evolution events Analysis of gene tree (7), congru-ence (9) and EDPs (8). Analysis of gene tree (7), congru-ence (9) and EDPs (8).
(11) Pathway evolution Integrate gene evolution (10), evolutionary distribution (8), cor-related presence/absence (5) Integrate gene evolution (10), evolutionary distribution (8), cor-related presence/absence (5)
Structure analysis
(12) Structural homology Search for homology of structure independent of sequence informa-tion, integrate into EDP’s (8). Uses THoR in the Genome Analy-sis Environment with the HOM-STRAD database Retrieves conserved structural do-mains within proteins, which may give information that is more ancient than sequence conservation.
Genome Architecture analysis
(13) Synteny of gene cluster 2 or more gene in operon like structure, check for homologues (2, 4) and gene duplication (5), evolutionary distribution (8), inte-grate Determine evolution of operon struc-ture, integrate with species tree (9)
(14) Regulation elements Search for major transcription control elements for pathways and enzyme systems (2) To further characterize the operon structure evolution
(15) Endosymbiosis Search for homologues in eu-karyotes (2) Identifies gene pool present prior to mitochondrial/chloroplast endosym-biosis; characterization of metabo-lisms prior to endosymbiotic event.
Metabolic Pathway Evaluation
(16)MPP - Most parsimony pathway Uses all the information gathered (1 –15) to create a set of trees that describe the phylogenetic recon-struction of the pathway given the data. Each tree is evaluated for the least number of events or steps, which explain the data . Quantitatively identifies the most parsimonious explanation for the data as the most likely evolutionary recon-struction.
Numerical modeling
(17) Reconstruct ecosystem-based models for a period in earth’s history. Use the MPP reconstructions (16) of pathways and systems for mapping of geophysical and pos-sible ecosystem reconstructions described by the rock record and the numerical modeling. Allows a system of checks and bal-ances for correct identification of biological innovations important during the time period and provides a system in which to modify the mod-els incorporating astrobiological data