Академический Документы
Профессиональный Документы
Культура Документы
The exercises use programs in the PHYLIP package (Felsenstein, 1995; Phylogeny Inference Package, ver. 3.5c. Seattle, Department of Genetics, University of Washington). For further information see http://evolution.genetics.washington.edu/phylip.html. During the course, the programs included with PHYLIP can be assessed in two ways: From web interface at Institut Pasteur:
http://bioweb.pasteur.fr/seqanal/phylogeny/phylip-uk.html
or from web interface at KVL implemented by Peter Sestoft: http://www.dina.kvl.dk/~sestoft/bsa/dinaws/phylogeny.html. The exercises are presented in relation to the Pasteur-server. For larger jobs, output will be returned by email. Random number of seeds should be of the form 4n + 1, eg. 137. For each job submitted on the Pasteur server press the link coming up otherwise the data will be lost: Example: From now, this files will remain accessible for 10 days at: http://bioweb.pasteur.fr/seqanal/tmp/neighbor/A41284110643970/ You can also install PHYLIP on your PC. It will then work with the DOS command prompt. This is not used with the exercises but said here for your information to use this possibility download the package from the Felsenstein homepage (address above) and procees in DOS format.
The purpose of the exercise is to explore the programs implementing the parsimony, neighbour joining and maximum likelihood algorithms. The expected phylogeny of primates on basis of morphological data is that man and chimpanzee are closest related, more distantly related are gorillas, orangutans and gibbons in the described order (Benton, 1997. Vertebrate Palaeontology 2nd ed. Chapman & Hall). The expected phylogeny can be described by the tree formula: gibbon(orang-utan(gorilla(chimpanzee, man))). Parsimony. Procees as in exercise 1 by the DNAPARS program. Write down the resulting tree by rooting with gibbon. You will need the tree for the following comparison. Save the treefile on your local PC eg. on the desktop - you will need it later. Neighbour joining. First generate a distance matrix with the DNADIST program using standard settings (Jukes and Cantor matrix). Study the output file and proceed using it as input to the neighbour joining program, NEIGHBOR. Write down the tree with gibbon as root. Save the treefile on your local PC eg. on the desktop - you will need it later. Try also the substitutions model F84 with DNADIST and run NEIGHBOR again. Did it change anything, why? Hint: Transversion/transition bias. Maximum likelihood. Use the fastDNAml program with standard settings. Repeat the analysis and observe if the final log likelihood value is constant and if not why? write down the branching order of the tree with the highest ln likelihood and compare it with those form the other two methods. Bootstrap. Any of the analysis above can be supplemented by bootstrap analysis. In PHYLIP this is done as a three-step procedure. First eg. 100 bootstrap resampels are generated with the SEQBOOT program and the original sequence data-set. Then all the re-seamples are analysed by one of the phylogentic methods with the M (multiple) option set to 100. Finally the concensus tree of the 100 trees is obtained by CONSENSE. Some of these steps can be overruled by running PHYLIP at the Pasteur-server. It is important to use the file Treefile with all the tree formulas as input for CONSENSE. Try to bootstrap the parsimony and the neighbour joining analysis. Start with the advanced forms of the two analysis and you will be guided through the analysis just by clicking. In the last step you need to choose CONSENSE. Write the bootstrap proportions on the trees you made under Parsimony and Neighbour joining (the maximum likelihood analysis can be bootstrapped but will be too slow for the exercise). You are only allowed to write down a given bootstrap value if the monophyletic group is recognized in the original tree. Compare the two trees made from parsiomony and neighbour joining with the bootstrap values included. Where are the bootstrap proportions highest? Do
high bootstrap values indicate high consistency of the method or just that the bootstrap approach is reproducing the original data better?
lpt1) or edit it further with a Graphic programme. After saving as Windows Meta file the file can be imported into Word or Power-Point. B. Treeview. This program is used to draw and manipulate trees. It might be your only choice to do branch swapping which is rotation around a node, this way the tree can be manipulated to illustrate results better. If you have the outgroup located in the middle of the tree, it cannot be shown in a publication and the outgroup need to be rotated either to the top or botton of the tree. You have to install this program on the local PC. Locate the server with Google.com searching just treewiev and download, unzip and install the program. The input is given as a file with the tree-formula (((a, b)d)c)d by click File>Open. Take the raw tree formula [(a,b)c)d; etc.] from exercise 2 and treat it further. Most important is the swap function. Save the tree (File > Save as graphic > windows metafile). Insert the tree into Word or Power-Point and treat the text further.
Exercise 2
5 846 chimpanzeeAAGCTTCACC gibbonxxxxAAGCTTTACA gorillaexxAAGCTTCACC homosapienAAGCTTCACC orangutangAAGCTTCACC CATTATTATT CCCTGCTATT CATTATTATT CATTACTATT CCCTACTGTT ATCATAATTC ATCATAATCC ATCATAATTC ATCATAATCC CTGCCTAGCA CTGCCTTGCA CTGCCTAGCA CTGCCTAGCA CTGCCTAGCA TCTCCCAAGG TATCTCGAGG TCTCTCAAGG TCTCTCAAGG GGCGCAATTA GGTGCAACCG GGCGCAGTTG GGCGCAGTCA GGCGCAACCA AACTCAAATT AACTCAAACT AACTCAAACT AACTCAAACT AACTCAAACT ACTTCAAACT GCTCCAAGCC ACTCCAAACC ACTTCAAACT TCCTCATAAT TCCTCATAAT TTCTTATAAT TTCTCATAAT CCCTCATGAT ATGAACGCAC ACGAACGAAC ACGAACGAAC ACGAACGCAC ACGAACGAAC CTACTCCCAC TTACTCCCAC CTACTCCCAC CTACTCCCAC CGCCCACGGA CGCCCACGGA TGCCCACGGA CGCCCACGGA TGCCCATGGA CCACAGTCGC TCACAGCCGC CCACAGCCGC TCACAGTCGC CCACAGCCGC TAATAGCCTT TGATAGCCTT TAATAGCCCT TAATAGCTTT CTTACATCCT CTAACCTCTT CTTACATCAT CTTACATCCT CTCACATCCT
ATCATAATCC TCTCTCAAGG CCTTCAAACT CTACTCCCCC TAATAGCCCT TTGATGACTC CTGATGACTC TTGATGACTT TTGATGACTT CTGATGACTT ATCTCCTAGG ACCTCCTAGG ACCTACTAGG ACCTACTGGG ACCTTCTAGG ACCACTCTCC ACTACTATTA ACCACCCTTT ATCACTCTCC ATCACCATCC CCTCTACATG CCTTTACATA CCTTTATATA CCTCTACATA TCTCTATATA ATAACATAAA AAAACATAAA CCAACATAAA ACAACATAAA ACAACATAAA CTATCCCCCA CTCTTCCCCC CTATCCCCCA CTATCCCCCA CTATCCCCCA CACCTCCTGT TACTCCCTGT CACCTCCTGT TTCCTCTTGT CGCCTACTGT ACAGAGGCTC ATAGAGGCTC ACAGAGGCTC ACAGAGGCTT ATAGGGCCCC ATTCATATCC ACTCACTATC ACTCATACCC ACTCATGCCC ACTCNTCACT ACAGCCATCC ACAGCTATCC ACAGCTATCC ACAGCTATCC ACAGCTATCC CTAGCAAGCC GCAGCAAGCC CTGGCAAGCC CTAGCAAGCC CTAGCAAGCC GGAACTCTCC TGAACTCTTC AGAGCTCTCC AGAACTCTCT AGAACTCTCC TACTCACAGG CACTCACCGG TACTTACAGG TACTTACAGG TACTAACAGG TTTACCACAA TTTATCATAA TTTACCACAA TTTACCACAA TTCACCACAA GCCCTCATTC ACCCTCACTC ACCCTCATTT ACCCTCATTC ACCTTCTTTC TCCTCCTTCT TCCTCCTCCT TCCTCCTCCT TTCTCCTCCT TCCTCCTCTT AAATATAGTT AAACATAGTT AAATATAGTT AAATATAGTT AAATATAGTT ACGACCCCTT GAAACCTCTT ACAACCCCTT ACGACCCCTT ACAACCCCTT CCATGCCTGA CCATGTATGA CCGTGCTTGA CCATGTCTAA CCATGTGTGA GTTGGTCTTA ATTGGTCTTA ATTGGTCTTA ATTGGTCTTA CTTGGTCTTA TCGCTAACCT TCGCTAACCT TCGCCAACCT TCGCTAACCT TCACTAACCT GTGCTAGTAA GTACTAATGG GTACTAGTAA GTGCTAGTAA GTACTAATAG ATTCAACATA GCTCAACGTA ATCTAACATA ACTCAACATA ACTCAACATA CACAATGAGG CACAACGAGG CACAATGAGG CACAATGAGG CACAACGAGG ACACGAGAAA ACACGAGAAA ACACGAGAAA ACACGAGAAA ACACGCGAAA ATCCCTCAAT AACCCTCAAC ATCCCTCAAC ATCCCTCAAC ATCCCTCAAC TAACCAAAAC TAATCAAAAC TAACCAAAAC TAACCAAAAC TAACCAAAAC ATTTACCGAG GCTTACCGAG ATTTACCGAG ATTTACCGAG ATTTACCGAG CAACATGGCT CAACATGGCT CAACATGGCT CAACATGGCT CAACATGGCT GGCCCCAAAA GGACCCAAAA GGACCCAAAA GGCCCCAAAA GGATCCAAAA CGCCCTACCC CGCCCTACCC CGCCTTACCC CGCCTTACCC TGCCCTACCA CCTCATTCTC CCTCCTTCTC CCACATTCTC CCACATTCTC CCATATTCTC CTAATCACAG CTAATCACGG CTAATCACAG CTAGTCACAG CTAATCACAA CTCACTCACC CACACTTACA CCCACTCACA CTCACTCACC TACACCCACA ATACTCTCAT ACATATTAAT ACATCCTCAT ACACCCTCAT ATACCCTCAT CCTGATATCA CCTAACATCA CCCGATATTA CCCGACATCA CCCAGCATCA ATCAGATTGT ATTAGATTGT ATCAGATTGT ATCAGATTGT ATTAGATTGT AAAGCTTATA AAAGCCCACA AAAGCTCGTA AAAGCTCACA AAAGCTCACA TTCTCAACTT TTCTCAACTT TTCTCAACTT TTCTCAACTT TTCTCAGCTT ATTTTGGTGC ATTTTGGTGC ATTTTGGTGC ATTTTGGTGC ATTTTGGTGC CCTACCATTA CCCACTATTA CCCACCATTA CCCACTATTA CCCACCATCA CTGATCAAAT CTGGGCAAAC CTGATCAAAT CTGATCAAAT TTGATCTAAC CCCTGTACTC CCCTATACTC CCCTGTACTC CCCTATACTC CCCTATACTC CACCACATTA CACCACATTA CACCACATCA CACCACATTA CACCACATCA ATTTTTACAC ACTTATGCAC ATTCATGCAC GTTCATACAC GCTCATACAC TCACTGGATT TTACTGGCTT TCACCGGGTT TTACCGGGTT TCGCTGGGTT GAATCTGACA GAATCTAACA GAATCTGATA GAATCTGACA GAATCTAATA AGAACTGCTA AGAACTGCTA AGAGCTGCTA AGAACTGCTA AGAACTGCTA TTAAAGGATA TTAAAGGATA TTAAAGGATA TTAAAGGATA TTAAAGGATA AACTCCAAAT AACTCCAAAT AACTCCAAAT AACTCCAAAT AACTCCAAAT
AAAAGTAATA AAAAGTAATA AAAAGTAATA AAAAGTAATA AAAAGTAACA TAATTCTCCC TAATTCCCCC TAATTCCCCC TAATTCCCCC TAATCCCCCC TATCCCCATT TACCCGCACT TACCCCCATT TACCCCCATT TACCCCCACT TTTCCCCACA ATTTCCCACA CTTCCCCACA CTTCCCCACA TATCCCAACA ACTGGCACTG ACTGACACTG GCTGACACTG ACTGACACTG ACTGATGCTG
ACCATGTATA GCAATGTACA ACTATGTACG ACCATGCACA GCCATGTTTA CATCCTCACC CATTACAGCC TATCCTTACC CATCCTTACC CATTACCGCT ATGTGAAATC ACGTAAAAAT ACGTAAAATC ATGTAAAATC ATGTAAAAAC ACAATATTCA ATAATATTCA ACAATATTTC ACAATATTCA ACAATATTTA AGCAACAACC AACTGCAACC AGCAACAACC AGCCACAACC AACAACCACC
CTACCATAAC CCACCATAGC CTACCATAAC CTACTATAAC CCACCATAAC ACCCTCATTA ACCCTTATTA ACCTTCATCA ACCCTCGTTA ACCCTCATTA CATTATCGCG GACCATTGCC TATCGTCGCA CATTGTCGCA GGCCATCGCA TATGCCTAGA TGTGCACAGA TATGCCTAGA TGTGCCTAGA TCTGCCTAGG CAAACAACCC CAAACGCTAG CAAACAATTC CAAACAACCC CAGACACTAC
CACCTTAACC CATTCTAACG CACCTTAGCC CACCCTAACC TGCCCTCACC ACCCTAACAA ACCCCAATAA ATCCTAACAA ACCCTAACAA ACCCCAACAA TCCACCTTTA TCTACCTTTA TCCACCTTTA TCCACCTTTA TCCGCCTTTA CCAAGAAGCT CCAAGAAACC CCAAGAAGCT CCAAGAAGTT ACAAGAAACC AGCTCTCCCT AACTCTCCCT AACTCTCCCT AGCTCTCCCT AACTCTCACT
CTAACTCCCT CTAACCTCCC CTAACTTCCT CTGACTTCCC TTAACTTCCC AAAAAACTCA AAAGAACTTA AAAAAGCTCA AAAAAACTCA AAAAAACCCA TCATTAGCCT TAATCAGCCT TCATCAGCCT TTATCAGTCT CTATCAGCCT ATTATCTCAA ATTATTTCAA ATTATCTCAA ATTATCTCGA ATCGTCACAA AAGCTT AAGCTT AAGCTT AAGCTT AAGCTT
Exercise 3
5 149 chimpanzeeSFTGAIILIIAHGLTSSLLFCLANSNYERTHSRIIILSQGLQTLLPLIAF gibbonxxxxSFTGATVLIIAHGLTSSLLFCLANSNYERTHSRIIILSRGLQALLPLIAF gorillaexxSFTGAVVLIIAHGLTSSLLFCLANSNYERTHSRIIILSQGLQTLLPLIAL homosapienSFTGAVILIIAHGLTSSLLFCLANSNYERTHSRIIILSQGLQTLLPLIAF orangutangSFTGATTLMIAHGLTSSLLFCLANSNYERTHSRIIILSQGLQTLLPLIAL LLASLANLALPPTINLLGELSVLVTSFSSNTTLLLTGFNILITALYS LAASLANLALPPTINLLGELFVLMASFSANTTITLTGLNVLITALYS LLASLANLALPPTINLLGELSVLVTTFSSNTTLLLTGSNILITALYS LLASLANLALPPTINLLGELSVLVTTFSSNITLLLTGLNILVTALYS LLASLTNLALPPTINLLGELSVLIAIFSSNITILLTGLNILITTLYS LYMFTTTQGSLTHHINNIKPSFTRENTLIFLHLSPILLLSLNPDIITGF LYIFIITQGTLTHHIKNIKPSLTRENILILMHLFPLLLLTLNPNIITGF LYIFTTTQGPLTHHITNIKPSFTRENILIFMHLSPILLLSLNPDIITGF LYIFTTTQGSLTHHINNIKPSFTRENTLMFIHLSPILLLSLNPDIITGF LYIFTTTQGTPTHHINNIKPSFTRENTLMLIHLSPILLLSLNPSIIAGF TSC TPC TSC SSC AYC
Exercise 5.
>hinfatpD
AVIDVEFPQD AVPKVYDALK VESGLTLEVQ QQLGGGVVRC IALGTSDGLK RGLKVENTNN PIQVPVGTKT LGRIMNVLGE PIDEQGAIGE EERWAIHRSA PSYEEQSNST ELLETGIKVI DLICPFAKGG KVGLFGGAGV GKTVNMMELI RNIAIEHSGY SVFAGVGERT REGNDFYHEM KDSNVLDKVS LVYGQMNEPP GNRLRVALTG LTMAEKFRDE GRDVLFFVDN IYRYTLAGTE VSALLGRMPS AVGYQPTLAE EMGVLQERIT STKTGSITSV QAVYVPADDL TDPSPATTFA HLDSTVVLSR QIASLGIYPA VDPLDSTSRQ LDPLVVGQEH YDVARGVQGI LQRYKELKDI IAILGMDELS EEDKLVVARA RKIERFLSQP FFVAEVFTGS PGKYVTLKDT IRGFKGILDG EYDHIPEQAF Y >pmatpD AVIDVEFPQD AVPKVYDALN VETGLVLEVQ QQLGGGVVRC IAMGSSDGLK RGLSVTNTNN PISVPVGTKT LGRIMNVLGE PIDEQGEIGA EENWSIHRAP PSYEEQSNST ELLETGIKVI DLVCPFAKGG KVGLFGGAGV GKTVNMMELI RNIAIEHSGY SVFAGVGERT REGNDFYHEM KDSNVLDKVS LVYGQMNEPP GNRLRVALTG LTMAEKFRDE GRDVLFFVDN IYRYTLAGTE VSALLGRMPS AVGYQPTLAE EMGVLQERIT STKTGSITSV QAVYVPADDL TDPSPATTFA HLDSTVVLSR QIASLGIYPA VDPLESTSRQ LDPLVVGEEH YNVARGVQTT LQRYKELKDI IAILGMDELS EEDKLVVARA RKIERFLSQP FFVAEVFNGT PGKYVPLKET IRGFKGILDG EYDHIPEQAF Y >ecoliATPD MATGKIVQVIGAVVDVEFPQDAVPRVYDALEVQNGNERLVLEVQQQLGGGIVRTIAMGSSDGLRRGLDV K DLEHPIEVPVGKATLGRIMNVLGEPVDMKGEIGEEERWAIHRAAPSYEELSNSQELLETGIKVIDLMCP F AKGGKVGLFGGAGVGKTVNMMELIRNIAIEHSGYSVFAGVGERTREGNDFYHEMTDSNVIDKVSLVYGQ M NEPPGNRLRVALTGLTMAEKFRDEGRDVLLFVDNIYRYTLAGTEVSALLGRMPSAVGYQPTLAEEMGVL Q ERITSTKTGSITSVQAVYVPADDLTDPSPATTFAHLDATVVLSRQIASLGIYPAVDPLDSTSRQLDPLV V GQEHYDTARGVQSILQRYQELKDIIAILGMDELSEEDKLVVARARKIQRFLSQPFFVAEVFTGSPGKYV S LKDTIRGFKGIMEGEYDHLPEQAFYMVGSIEEAVEKAKKL >ecoliATPB MASENMTPQDYIGHHLNNLQLDLRTFSLVDPQNPPATFWTINIDSMFFSVVLGLLFLVLFRSVAKKATS G VPGKFQTAIELVIGFVNGSVKDMYHGKSKLIAPLALTIFVWVFLMNLMDLLPIDLLPYIAEHVLGLPAL R VVPSADVNVTLSMALGVFILILFYSIKMKGIGGFTKELTLQPFNHWAFIPVNLILEGVSLLSKPVSLGL R LFGNMYAGELIFILIAGLLPWWSQWILNVPWAIFHILIITLQAFIFMVLTIVYLSMASEEH >yersatpB MSASGEISTPRDYIGHHLNHLQLDLRTFELVNPHSTGPATFWTLNIDSLFFSVVLGLAFLLVFRKVAAS A TSGVPGKLQTAVELIIGFVDNSVRDMYHGKSKVIAPLALTVFVWVLLMNMMDLLPIDLLPYIGEHVFGL P ALRVVPTADVSITLSMALGVFILIIFYSIKMKGVGGFTKELTMQPFNHPIFIPVNLILEGVSLLSKPLS L GLRLFGNMYAGELIFILIAGLLPWWSQWMLSVPWAIFHILIITLQAFIFMVLTIVYLSMASEEH >hinfatpB MSGQTTSEYISHHLSFLKTGDGFWNVHIDTLFFSILAAVIFLFVFSRVGKKATTGVPGKMQCLVEIVVE W VNGIVKENFHGPRNVVAPLALTIFCWVFIMNAIDLIPVDFLPQFAGLFGIHYLRAVPTADISATLGMSI C VFFLILFYTIKSKGFKGLVKEYTLHPFNHWAFIPVNFILETVTLLAKPISLAFRLFGNMYAGELIFILI A VMYSANMAIAALGIPLHLAWAIFHILVITLQAFIFMMLTVVYLSIAYNKADH
Exercise 1
DNA parsimony algorithm, version 3.573c Name ---seq1 seq2 seq3 seq4 Sequences --------AAG ..A GGA .GA
One most parsimonious tree found: +--seq4 +--3 +--2 +--seq3 ! ! --1 +-----seq2 ! +--------seq1 remember: this is an unrooted tree! requires a total of 3.000 steps in each site: 0 1 2 3 4 5 6 7 8 9 *----------------------------------------0! 1 1 1 >From on tree) 1 2 3 3 2 1 1 2 3 seq4 seq3 seq2 seq1 maybe yes no yes no maybe To Any Steps? State at upper node ( . means same as in the node below it AAR ..A .G. ... G.. ... ..G
Exercise 2a
DNA parsimony algorithm, version 3.573c One most parsimonious tree found: +--------gorillaexx +--2 ! ! +-----homosapien ! +--3 --1 ! +--orangutang ! +--4 ! +--gibbonxxxx ! +-----------chimpanzee remember: this is an unrooted tree! requires a total of 330.000
Exercise 2b
Neighbor-Joining/UPGMA method version 3.573c +-------gibbonxxxx +--1 +--2 +-----orangutang ! ! ! +---gorillaexx ! --3-homosapien ! +--chimpanzee remember: this is an unrooted tree! Between ------3 2 1 1 2 3 3 And --2 1 gibbonxxxx orangutang gorillaexx homosapien chimpanzee Length -----0.00318 0.03598 0.12602 0.09198 0.05777 0.04015 0.05195
Transition/transversion ratio = 2.000000 (Transition/transversion parameter = 1.653039) +-homosapien +--2 ! ! +-----orangutang ! +--3 ! +-------gibbonxxxx ! --1---gorillaexx ! +--chimpanzee remember: this is an unrooted tree! Ln Likelihood = -2514.48557 Examined 17 trees Between ------And --Length -----Approx. Confidence Limits ------- ---------- ------
** ** ** ** ** ** **
1 2 2 3 3 1 1
( ( ( ( ( ( (
Exercise 2d
(Bootstrapping of parsimony) Majority-rule and strict consensus tree program, version 3.573c Species in order: gorillaexx homosapien orangutang gibbonxxxx chimpanzee Sets included in the consensus tree Set (species in order) How many times out of 100.00 ..**. .***. 100.00 73.67
Sets NOT included in consensus tree: Set (species in order) How many times out of 100.00 ..*** .*..* 20.17 6.17
CONSENSUS TREE: the numbers at the forks indicate the number of times the group consisting of the species which are to the right of that fork occurred among the trees, out of 100.00 trees +----gibbonxxxx +-100.0 +-73.7 +----orangutang ! ! +-100.0 +---------homosapien ! ! ! +--------------chimpanzee ! +-------------------gorillaexx