Вы находитесь на странице: 1из 9

TREECON for Windows user manual

INTRODUCTION
General Information
TREECON is a software package developed primarily for the construction and drawing of phylogenetic trees based on evolutionary distances computed from nucleic and amino acid sequences. In distance methods, the evolutionary distance is computed for all pairs of sequences and a phylogenetic tree is inferred by considering the relationship between these distance values. Different algorithms are available to construct a phylogenetic tree starting from these evolutionary distances and a number of them are implemented in TREECON. In estimating the evolutionary distances between sequences it is preferable to correct for superimposed mutations and several equations for this are implemented in the software package described. Programs for rooting the unrooted evolutionary trees, for drawing the tree on the screen, and for saving the tree are also included, as well as several other tools. TREECON is simple to use and prior knowledge about computers is restricted to an absolute minimum. Therefore, the package should be particularly suited for molecular biologists and evolutionists who want to build evolutionary trees based on their own molecular data. Starting from a simple ASCII text file, containing nucleic or amino acid sequences with gaps required for mutual alignment, one can produce publishable trees in a user-friendly and straightforward way. TREECON for Windows has a standard MS-Windows interface including pulldown and pop-up menus, dialog boxes and scrollable lists. It is therefore assumed that users are familiar with the basic interface elements of MS-Windows. The program runs on IBMcompatible computers (80486 and higher) and requires the Microsoft WindowsTM 3.x, Windows 95 or Windows NT operating system, a hard disk, a mouse and at least 8 Mbytes of RAM. The software package consists of several executables which are managed through a principal menu. As dynamic memory allocation is used throughout the program, the size of the data is constrained only by the available memory. The main advantages of TREECON for Windows over the older DOS version of the program are device-independence, the multitasking environment, and the possibility of displaying large trees containing hundreds of sequences. Furthermore, due to the standard Windows interface, the software package becomes more user-friendly. TREECON for Windows costs 75$ or a comparable amount in local currency. This fee is mainly to support my work and to buy new computer hard- and software, and additionally to defray the costs of diskettes and mailing expenses. The fee should be paid only once and includes updates and new releases of the package.

Of course, there are still some deficiencies in the current version, but developing TREECON is only part of my interests and responsibilities. However, I will try to further improve the package and to add interesting features. All suggestions are very welcome. Since TREECON is improved continuously and it is impossible for me to inform users of every improvement. Please check the TREECON web-site at URL http://biocwww.uia.ac.be/u/yvdp/treeconw.html for the latest information about TREECON for Windows.

References and citation


A paper has been written presenting TREECON for Windows. If you have used the program for the construction and/or drawing of the evolutionary trees in a paper that you have written, please cite one of the following references:
o

Van de Peer, Y., De Wachter, R. (1994) TREECON for Windows: a software package for the construction and drawing of evolutionary trees for the Microsoft Windows environment. Comput. Applic. Biosci. 10, 569-570.

A second paper describes the implementation of our substitution rate calibration method, which is a method that considers the substitution rates of the individual nucleotides in a sequence alignment (see further) in the computation of evolutionary distances:

Van de Peer, Y., De Wachter, R. (1997) Construction of evolutionary distance trees with TREECON for Windows: accounting for variation in nucleotide substitution rate among sites. Comput. Applic. Biosci. 13, 227-230.

A more detailed paper announcing the DOS version of the program was previously published:
o

Van de Peer, Y., De Wachter, R. (1993) TREECON: a software package for the construction and drawing of evolutionary trees. Comput. Applic. Biosci. 9, 177182.

A reprint of the article in which TREECON is mentioned is highly appreciated!

Acknowledgements
I want to thank all the people in our research group for using and testing TREECON in their analyses and for their encouragement and stimulating conversations. Furthermore, I am especially grateful to Stefan Rensing of the University of Freiburg in Germany,

who tested TREECON extensively, made many helpful suggestions and reported several bugs. Special thanks also to Peter De Rijk for his help in some programmatorical problems and Gert Van der Auwera for his help in drawing unrooted trees. James S. Farris is greatly acknowledged for sharing his code to convert trees saved in the New Hampshire bracket format. I also want to thank all other individuals who made helpful suggestions, reported bugs, and stimulated me in this work. Last but not least, I would like to thank all people that have purchased TREECON for Windows. Yves Van de Peer

Note
Of course, a user manual is never complete. Some inaccuracies may exist while a few sections may be too brief or missing. Nevertheless, with some good will and a little bit of trial and error, it should be no problem to discover what the TREECON program does and what it does not.

INSTALLATION OF TREECON FOR WINDOWS


At the moment, two different installation procedures are available. One is for installation on IBM-computers running Windows 95 or Windows NT (up to Windows XP), the other installation is for computers running the older Windows 3.1.

Windows 7
It is not possible to run TREECON on Windows 7, not natively and not in any compatibility or adminstrator mode. It will be necessary to install and run TREECON in a virtual Windows environment.

Windows 95, Windows NT and Windows XP


When Windows 95, Windows NT or Windows XP is used as operating system, installation is very simple. Just insert the floppy in drive a, choose Add/Remove Programs from the Control Panel and select a:\setup. The Control Panel can be found via the Windows task bar Start|Settings|Control Panel. Installation of TREECON for Windows will then proceed automatically.

By default, the TREECON program will be placed in the directory c:\treeconw\programs\ o The test files will be automatically placed in the directory c:\treeconw\data\ o Start TREECON for Windows by double-clicking the TREECON icon.
o

Windows 3.x
Since TREECON for Windows is compiled in 32-bit mode, it is necessary to install the Win32 extension first. Win32 is an operating-system extension to Windows 3.x that

provides support for developing and running 32-bit Windows executables. 32-bit executables run faster, make use of all available RAM, and will run on both 16 and 32-bit versions of Windows and on future processors hosting Windows. Therefore, if Win32 is not installed on your computer, put the diskette labeled Win32 in the floppy drive and copy its content (a file named PW1118.EXE) to a directory on the hard disk. Run this executable (e.g. by double-clicking), which is a self-extracting one. When this file is executed, several new files will be created, amongst which the file named W32S125.EXE. Executing this file creates several new files, amongst which the file setup.exe. Run this file and installation of the Win32 extension will proceed automatically. When this is done, your computer is ready to run 32-bit applications. Installation of TREECON for Windows is then very simple. Just insert the floppy labeled TREECON for Windows in drive a, choose File|Run from the Program Manager and type a:\install. Installation of TREECON for Windows will then proceed automatically.

By default, the TREECON program will be placed in the directory c:\treeconw\programs\ o The test files will be automatically placed in the directory c:\treeconw\data\ o Start TREECON for Windows by double-clicking the TREECON icon.
o

GETTING STARTED
When the TREECON program is started, the principal menu appears. In this main menu of TREECON, the following options are available, and are usually performed one after the other:
o o o o

Distance estimation Infer tree topology Root unrooted trees Draw phylogenetic trees

Additionally, buttons are available for the following items:

o o o o

Tools Help About (TREECON) Quit (return to Windows)

In the next chapters, the different steps involved in the construction of pairwise distance trees will be discussed. For the impatient ones, the next section briefly describes how to construct a first tree using the default options of the program.

Making a first tree using the default options


The next example shows how to create a neighbor-joining tree:

o o o o

Start TREECON by double-clicking the TREECON icon Choose the Distance estimation option from the principal menu Choose Start distance estimation from the distance estimation menu Select and open the file test.seq under directory c:\treeconw\data\ (if TREECON was installed in c:\treeconw\programs\) o Press OK in the Sequence type menu: Nucleic acid sequences and TREECON sequence format are the default values o Press the select all button in the Select sequences menu. The names of 20 small ribosomal subunit RNA sequences will now be highlighted. Press OK

A set of sequences can be selected by pressing the left mouse-button while holding the Ctrl-key (as in the Windows file manager when selecting multiple files).

o o

o o o o o

Press the OK button in the Options menu. Distances will be computed by the Jukes and Cantor equation (see further) Press the OK button in the Job status menu, when it says finished . All evolutionary distances have now been computed and we are ready to infer a tree topology Select Infer tree topology from the principal menu Choose Start inferring tree topology from the inferring tree topology menu Press the OK button in the Options menu. A neighbor-joining tree will be inferred Press the OK button in the Job status menu, when it says finished . A tree topology has been inferred by neighbor-joining Select Root unrooted trees from the principal menu. Since neighbor-joining infers an unrooted tree topology, it is necessary to root the unrooted tree before we can display it on the screen

When a tree topology is inferred with clustering methods such as UPGMA or WPGMA, the rooting procedure should be skipped because these methods infer rooted tree topologies.

Choose Start rooting unrooted trees from the root unrooted trees menu Press OK in the Options menu. The tree will be rooted with a single sequence that is a suitable outgroup to the other sequences. o Select the last sequence in the list. This is the red alga Palmaria palmata
o o

Press the OK button in the Job status menu, when it says finished . The tree is now rooted with Palmaria palmata o Select Draw Phylogenetic tree from the principal menu o In the TREECON drawing program, select File|Open|(new) tree. The tree should now be displayed on the screen.

To construct a tree with bootstrap values, repeat this procedure but select bootstrap analysis in the Options menu at each step of the process.

INPUT FILE FORMATS


Since TREECON, as most other tree construction programs, starts from a set of aligned sequences, the input file format will be discussed first. The input file for the distance estimation module of TREECON is the file containing the aligned sequences in the case of nucleic and amino acid sequences or the gel results in the case of RFLP/AFLP data. In the case of RFLP/AFLP data, the sequence is a row consisting of 1 s and 0 s representing the presence or absence of a band on the gel for that particular sample (see further). TREECON can handle several file formats:

TREECON input file format


The first line of the TREECON input file format should always contain the number of characters, i.e. the number of aligned nucleotides. From the second line on, the organisms and their sequences (or restriction fragment results) are summed up sequentially. The name of the sequence should be written on a separate line and may contain no more than 40 characters. The format in which the sequence has to be written is very flexible (see examples). The sequences may be written in one long stretch, divided into several lines, or in blocks interleaved with blanks. It is also allowed to write every sequence in a different format. The sequence may only contain characters, hyphens (representing gaps) and blanks. It is important that all sequences comprise the number of symbols mentioned in the first line of the input file. Characters may be written in upper- or lowercase. Blanks are not allowed at the end of the sequence! The input file may contain as many as 2000 sequences. However, the maximum number of sequences that can be used to construct a tree with is set to 1000. If this is not enough, please contact the author. As dynamic memory allocation is used, there is no limit to the size of the sequences, and the number of sequences that can be compared depends on the available memory. Examples of the TREECON format example 1

50 Homo sapiens AGUCGAGUC---GCAGAAACGCAUGAC-GACCACAUUUU-CCUUGCAAAG Pan paniscus AGUCGCGUCG--GCAGAAACGCAUGACGGACCACAUCAU-CCUUGCAAAG Gorilla gorilla AGUCGCGUCG--GCAGAUACGCAUCACGGAC-ACAUCAUCCCUCGCAGAG Pongo pigmaeus AGUCGCGUCGAAGCAGA--CGCAUGACGGACCACAUCAUCCCUUGCAGAG example 2 50 Homo sapiens AGUCGAGUC-- -GCAGAAAC Pan paniscus AGUCGCGUCG- -GCAGAAAC Gorilla gorilla AGUCGCGUCG- -GCAGAUAC Pongo pigmaeus AGUCGCGUCGA AGCAGA--C example 3 50 Homo sapiens AGUCGAGUC---GCAGAAAC GCAUGAC-GACCACAUUUUCCUUGCAAAG Pan paniscus AGUCGCGUCG--GCAGAAAC GCAUGACGGACCACAUCAUCCUUGCAAAG Gorilla gorilla AGUCGCGUCG--GCAGAUAC GCAUCACGGAC-ACAUCAUC CCUCGCAGAG Pongo pigmaeus AGUCGCGUCGAAGCAGA--C GCAUGACGGACCACAUCAUC CCUUGCAGAG example 4 (RFLP/AFLP/RAPD data) 50 sample1 111101001110110110110111111111010111110111111111110111

GCAUGAC-GA CCACAUUUU- CCUUGCAAAG GCAUGACGGA CCACAUCAU- CCUUGCAAAG GCAUCACGGA C-ACAUCAUC CCUCGCAGAG GCAUGACGGA CCACAUCAUC CCUUGCAGAG

sample2 111101000110111110110111111110010111100111111111110110 sample3 111101111110110110111111111101010111110111111111010111 sample4 101101001110110110110111111001010111100111111111010111

PHYLIP input file format


PHYLIP (PHYLogy Inference Package) is the very well-known software package of Joe Felsenstein (Department of Genetics, University of Washington, Box 357360, Seattle, Washington 98195-7360, USA) for inferring phylogenies. In this package, two different file formats can be used, viz. interleaved (sequences are written in the form of a sequence alignment) and non-interleaved or sequential (sequences are written one after the other). In both file formats, the upper line contains two numbers, namely the number of sequences, and the number of alignment positions (see examples). The sequence names may not exceed 10 characters (but may include punctuation marks and blanks). Examples of the PHYLIP interleaved format example 1 4 50 Homo Pan pani Gorilla Pongo AGUCGAGUC---GCAGAAACGCAUGAC AGUCGCGUCG--GCAGAAACGCAUGAC AGUCGCGUCG--GCAGAUACGCAUCAC AGUCGCGUCGAAGCAGA--CGCAUGAC

-GACCACAUUUU-CCUUGCAAAG GGACCACAUCAU-CCUUGCAAAG GGAC-ACAUCAUCCCUCGCAGAG GGACCACAUCAUCCCUUGCAGAG example 2 4 50 Homo Pan pani Gorilla Pongo

AGUCGAGUC---GCAGAAACGCAUGAC AGUCGCGUCG--GCAGAAACGCAUGAC AGUCGCGUCG--GCAGAUACGCAUCAC AGUCGCGUCGAAGCAGA--CGCAUGAC -GACCACAUUUU-CCUUGCAAAG GGACCACAUCAU-CCUUGCAAAG GGAC-ACAUCAUCCCUCGCAGAG GGACCACAUCAUCCCUUGCAGAG

Examples of the PHYLIP non-interleaved (sequential) format example 1 4 50 Homo AGUCGAGUC---GCAGAAACGCAUGAC -GACCACAUUUU-CCUUGCAAAG Pan pani AGUCGCGUCG--GCAGAAACGCAUGAC GGACCACAUCAU-CCUUGCAAAG Gorilla AGUCGCGUCG--GCAGAUACGCAUCAC GGAC-ACAUCAUCCCUCGCAGAG Pongo AGUCGCGUCGAAGCAGA--CGCAUGAC GGACCACAUCAUCCCUUGCAGAG example 2 4 50 Homo (! ten characters of the species name MUST be present) AGUCGAGUC---GCAGAAACGCAUGACGACCACAUUUU-CCUUGCAAAG Pan pani AGUCGCGUCG--GCAGAAACGCAUGACG GACCACAUCAU-CCUUGCAAAG Gorilla AGUCGCGUCG--GCAGAUACGCAUCACG GAC-ACAUCAUCCCUCGCAGAG Pongo AGUCGCGUCGAAGCAGA--CGCAUGACG GACCACAUCAUCCCUUGCAGAG

When the input file is selected, you can make a selection of the organisms (sequences) you want to construct a tree with. If you want to select a set of sequences, but not all, press the control-key while selecting organisms, just like you would do when selecting several files in the file manager of Windows. It is also possible to save a selection of sequences to a file. This selection can then be retrieved afterwards. If the sequence names are not listed properly when selecting an input file, there is most probably a format error in the input file.

Вам также может понравиться