Академический Документы
Профессиональный Документы
Культура Документы
INTRODUCTION
General Information
TREECON is a software package developed primarily for the construction and drawing of phylogenetic trees based on evolutionary distances computed from nucleic and amino acid sequences. In distance methods, the evolutionary distance is computed for all pairs of sequences and a phylogenetic tree is inferred by considering the relationship between these distance values. Different algorithms are available to construct a phylogenetic tree starting from these evolutionary distances and a number of them are implemented in TREECON. In estimating the evolutionary distances between sequences it is preferable to correct for superimposed mutations and several equations for this are implemented in the software package described. Programs for rooting the unrooted evolutionary trees, for drawing the tree on the screen, and for saving the tree are also included, as well as several other tools. TREECON is simple to use and prior knowledge about computers is restricted to an absolute minimum. Therefore, the package should be particularly suited for molecular biologists and evolutionists who want to build evolutionary trees based on their own molecular data. Starting from a simple ASCII text file, containing nucleic or amino acid sequences with gaps required for mutual alignment, one can produce publishable trees in a user-friendly and straightforward way. TREECON for Windows has a standard MS-Windows interface including pulldown and pop-up menus, dialog boxes and scrollable lists. It is therefore assumed that users are familiar with the basic interface elements of MS-Windows. The program runs on IBMcompatible computers (80486 and higher) and requires the Microsoft WindowsTM 3.x, Windows 95 or Windows NT operating system, a hard disk, a mouse and at least 8 Mbytes of RAM. The software package consists of several executables which are managed through a principal menu. As dynamic memory allocation is used throughout the program, the size of the data is constrained only by the available memory. The main advantages of TREECON for Windows over the older DOS version of the program are device-independence, the multitasking environment, and the possibility of displaying large trees containing hundreds of sequences. Furthermore, due to the standard Windows interface, the software package becomes more user-friendly. TREECON for Windows costs 75$ or a comparable amount in local currency. This fee is mainly to support my work and to buy new computer hard- and software, and additionally to defray the costs of diskettes and mailing expenses. The fee should be paid only once and includes updates and new releases of the package.
Of course, there are still some deficiencies in the current version, but developing TREECON is only part of my interests and responsibilities. However, I will try to further improve the package and to add interesting features. All suggestions are very welcome. Since TREECON is improved continuously and it is impossible for me to inform users of every improvement. Please check the TREECON web-site at URL http://biocwww.uia.ac.be/u/yvdp/treeconw.html for the latest information about TREECON for Windows.
Van de Peer, Y., De Wachter, R. (1994) TREECON for Windows: a software package for the construction and drawing of evolutionary trees for the Microsoft Windows environment. Comput. Applic. Biosci. 10, 569-570.
A second paper describes the implementation of our substitution rate calibration method, which is a method that considers the substitution rates of the individual nucleotides in a sequence alignment (see further) in the computation of evolutionary distances:
Van de Peer, Y., De Wachter, R. (1997) Construction of evolutionary distance trees with TREECON for Windows: accounting for variation in nucleotide substitution rate among sites. Comput. Applic. Biosci. 13, 227-230.
A more detailed paper announcing the DOS version of the program was previously published:
o
Van de Peer, Y., De Wachter, R. (1993) TREECON: a software package for the construction and drawing of evolutionary trees. Comput. Applic. Biosci. 9, 177182.
Acknowledgements
I want to thank all the people in our research group for using and testing TREECON in their analyses and for their encouragement and stimulating conversations. Furthermore, I am especially grateful to Stefan Rensing of the University of Freiburg in Germany,
who tested TREECON extensively, made many helpful suggestions and reported several bugs. Special thanks also to Peter De Rijk for his help in some programmatorical problems and Gert Van der Auwera for his help in drawing unrooted trees. James S. Farris is greatly acknowledged for sharing his code to convert trees saved in the New Hampshire bracket format. I also want to thank all other individuals who made helpful suggestions, reported bugs, and stimulated me in this work. Last but not least, I would like to thank all people that have purchased TREECON for Windows. Yves Van de Peer
Note
Of course, a user manual is never complete. Some inaccuracies may exist while a few sections may be too brief or missing. Nevertheless, with some good will and a little bit of trial and error, it should be no problem to discover what the TREECON program does and what it does not.
Windows 7
It is not possible to run TREECON on Windows 7, not natively and not in any compatibility or adminstrator mode. It will be necessary to install and run TREECON in a virtual Windows environment.
By default, the TREECON program will be placed in the directory c:\treeconw\programs\ o The test files will be automatically placed in the directory c:\treeconw\data\ o Start TREECON for Windows by double-clicking the TREECON icon.
o
Windows 3.x
Since TREECON for Windows is compiled in 32-bit mode, it is necessary to install the Win32 extension first. Win32 is an operating-system extension to Windows 3.x that
provides support for developing and running 32-bit Windows executables. 32-bit executables run faster, make use of all available RAM, and will run on both 16 and 32-bit versions of Windows and on future processors hosting Windows. Therefore, if Win32 is not installed on your computer, put the diskette labeled Win32 in the floppy drive and copy its content (a file named PW1118.EXE) to a directory on the hard disk. Run this executable (e.g. by double-clicking), which is a self-extracting one. When this file is executed, several new files will be created, amongst which the file named W32S125.EXE. Executing this file creates several new files, amongst which the file setup.exe. Run this file and installation of the Win32 extension will proceed automatically. When this is done, your computer is ready to run 32-bit applications. Installation of TREECON for Windows is then very simple. Just insert the floppy labeled TREECON for Windows in drive a, choose File|Run from the Program Manager and type a:\install. Installation of TREECON for Windows will then proceed automatically.
By default, the TREECON program will be placed in the directory c:\treeconw\programs\ o The test files will be automatically placed in the directory c:\treeconw\data\ o Start TREECON for Windows by double-clicking the TREECON icon.
o
GETTING STARTED
When the TREECON program is started, the principal menu appears. In this main menu of TREECON, the following options are available, and are usually performed one after the other:
o o o o
Distance estimation Infer tree topology Root unrooted trees Draw phylogenetic trees
o o o o
In the next chapters, the different steps involved in the construction of pairwise distance trees will be discussed. For the impatient ones, the next section briefly describes how to construct a first tree using the default options of the program.
o o o o
Start TREECON by double-clicking the TREECON icon Choose the Distance estimation option from the principal menu Choose Start distance estimation from the distance estimation menu Select and open the file test.seq under directory c:\treeconw\data\ (if TREECON was installed in c:\treeconw\programs\) o Press OK in the Sequence type menu: Nucleic acid sequences and TREECON sequence format are the default values o Press the select all button in the Select sequences menu. The names of 20 small ribosomal subunit RNA sequences will now be highlighted. Press OK
A set of sequences can be selected by pressing the left mouse-button while holding the Ctrl-key (as in the Windows file manager when selecting multiple files).
o o
o o o o o
Press the OK button in the Options menu. Distances will be computed by the Jukes and Cantor equation (see further) Press the OK button in the Job status menu, when it says finished . All evolutionary distances have now been computed and we are ready to infer a tree topology Select Infer tree topology from the principal menu Choose Start inferring tree topology from the inferring tree topology menu Press the OK button in the Options menu. A neighbor-joining tree will be inferred Press the OK button in the Job status menu, when it says finished . A tree topology has been inferred by neighbor-joining Select Root unrooted trees from the principal menu. Since neighbor-joining infers an unrooted tree topology, it is necessary to root the unrooted tree before we can display it on the screen
When a tree topology is inferred with clustering methods such as UPGMA or WPGMA, the rooting procedure should be skipped because these methods infer rooted tree topologies.
Choose Start rooting unrooted trees from the root unrooted trees menu Press OK in the Options menu. The tree will be rooted with a single sequence that is a suitable outgroup to the other sequences. o Select the last sequence in the list. This is the red alga Palmaria palmata
o o
Press the OK button in the Job status menu, when it says finished . The tree is now rooted with Palmaria palmata o Select Draw Phylogenetic tree from the principal menu o In the TREECON drawing program, select File|Open|(new) tree. The tree should now be displayed on the screen.
To construct a tree with bootstrap values, repeat this procedure but select bootstrap analysis in the Options menu at each step of the process.
50 Homo sapiens AGUCGAGUC---GCAGAAACGCAUGAC-GACCACAUUUU-CCUUGCAAAG Pan paniscus AGUCGCGUCG--GCAGAAACGCAUGACGGACCACAUCAU-CCUUGCAAAG Gorilla gorilla AGUCGCGUCG--GCAGAUACGCAUCACGGAC-ACAUCAUCCCUCGCAGAG Pongo pigmaeus AGUCGCGUCGAAGCAGA--CGCAUGACGGACCACAUCAUCCCUUGCAGAG example 2 50 Homo sapiens AGUCGAGUC-- -GCAGAAAC Pan paniscus AGUCGCGUCG- -GCAGAAAC Gorilla gorilla AGUCGCGUCG- -GCAGAUAC Pongo pigmaeus AGUCGCGUCGA AGCAGA--C example 3 50 Homo sapiens AGUCGAGUC---GCAGAAAC GCAUGAC-GACCACAUUUUCCUUGCAAAG Pan paniscus AGUCGCGUCG--GCAGAAAC GCAUGACGGACCACAUCAUCCUUGCAAAG Gorilla gorilla AGUCGCGUCG--GCAGAUAC GCAUCACGGAC-ACAUCAUC CCUCGCAGAG Pongo pigmaeus AGUCGCGUCGAAGCAGA--C GCAUGACGGACCACAUCAUC CCUUGCAGAG example 4 (RFLP/AFLP/RAPD data) 50 sample1 111101001110110110110111111111010111110111111111110111
GCAUGAC-GA CCACAUUUU- CCUUGCAAAG GCAUGACGGA CCACAUCAU- CCUUGCAAAG GCAUCACGGA C-ACAUCAUC CCUCGCAGAG GCAUGACGGA CCACAUCAUC CCUUGCAGAG
-GACCACAUUUU-CCUUGCAAAG GGACCACAUCAU-CCUUGCAAAG GGAC-ACAUCAUCCCUCGCAGAG GGACCACAUCAUCCCUUGCAGAG example 2 4 50 Homo Pan pani Gorilla Pongo
Examples of the PHYLIP non-interleaved (sequential) format example 1 4 50 Homo AGUCGAGUC---GCAGAAACGCAUGAC -GACCACAUUUU-CCUUGCAAAG Pan pani AGUCGCGUCG--GCAGAAACGCAUGAC GGACCACAUCAU-CCUUGCAAAG Gorilla AGUCGCGUCG--GCAGAUACGCAUCAC GGAC-ACAUCAUCCCUCGCAGAG Pongo AGUCGCGUCGAAGCAGA--CGCAUGAC GGACCACAUCAUCCCUUGCAGAG example 2 4 50 Homo (! ten characters of the species name MUST be present) AGUCGAGUC---GCAGAAACGCAUGACGACCACAUUUU-CCUUGCAAAG Pan pani AGUCGCGUCG--GCAGAAACGCAUGACG GACCACAUCAU-CCUUGCAAAG Gorilla AGUCGCGUCG--GCAGAUACGCAUCACG GAC-ACAUCAUCCCUCGCAGAG Pongo AGUCGCGUCGAAGCAGA--CGCAUGACG GACCACAUCAUCCCUUGCAGAG
When the input file is selected, you can make a selection of the organisms (sequences) you want to construct a tree with. If you want to select a set of sequences, but not all, press the control-key while selecting organisms, just like you would do when selecting several files in the file manager of Windows. It is also possible to save a selection of sequences to a file. This selection can then be retrieved afterwards. If the sequence names are not listed properly when selecting an input file, there is most probably a format error in the input file.