Академический Документы
Профессиональный Документы
Культура Документы
Darren Soanes
Genetic Code
BLOSUM 62 Matrix
Phylogenetic analysis
Phylogenetic analysis programs take an alignment of protein sequences and attempt to produce a phylogenetic tree showing evolutionary relationships between the sequences. User can select amino acid substitution matrix and number of gamma rate categories, the program will estimate the proportion of invariant sites. Programs use these parameters and protein alignment to estimate evolutionary distance between sequences. They calculate topology and branch length of final tree.
Distance Methods
Evolutionary distance calculated for all pairs of taxa. UPGMA - assumes rate of substitution is constant. Least squares allows different rates of substitution in different branches. Minimum evolution (ME) topology chosen where the sum of branch lengths is the smallest. Can take a long time to compute, neighbour joining (NJ) method is simplified version of ME much quicker.
Maximum parsimony
For each topology the smallest number of amino acid substitutions are calculated that could explain the evolutionary process. The topology that requires the smallest number of substitutions is chosen as the best one.
Bootstrapping
Tests the reliability of a tree. Initial protein alignment is randomised (by sampling columns at random). Tree construction repeated for each randomised alignment. For each group of taxa in the original tree it is determined what percentage of the randomised trees contain the same group.
Bayesian methods
A sample is taken of a large number of trees with high ML. Posterior probabilities calculated for different events of interest. Markov Chain Monte Carlo method used to generate samples of trees. Mr Bayes uses these methods.
Taxon sampling
Take initial protein sequence. Decide which range of species you are interested in. Use BLAST to find homologous sequences in databases, either NCBI database or individual genome databases.
Creating tree
Take alignment produced by Gblocks and use program of choice to generate a tree (using substitution model suggest by ModelGenerator and specifying number of gamma rate categories, 4 is sufficient). File format problems, different programs use different file formats use Readseq to convert between file formats. Use tree viewing program to look at graphical representation of tree (TreeView, TreeDyn).
Gene duplication
Paralogues
A, B and C are different species and are different paralogues of the same gene
Out-paralogues
In-paralogues
30 proteins
ascomycetes
yeasts
60 proteins
filamentous ascomycetes
oomycete
fungi
Phytophthora sojae
Aspergillus oryzae
Summary
Phylogenetic methods can be used to analyse protein sequences and produce models of the evolution of a particular protein encoding gene. Comparison with a species tree can identify events such as duplication, gene loss and lateral gene transfer.
Workshop task
Looking at evolution of genes encoding two types of phosphoglycerate mutase in fungi.
cofactor-independent PGM (iPGM) has two bound Mn(II) ions at its active site.
3PG + Enzyme PG + P-Enzyme 2PG + Enzyme
Structure of iPGM
Structure of dPGM
Task
Use BLAST search to find PGM protein sequences in a sample of fungal species. Use these to create phylogenetic trees showing the evolution of genes encoding these enzymes.
Alignment (ClustalW)