Академический Документы
Профессиональный Документы
Культура Документы
Volume
Micro-Checker
THE UNIVERSITY OF HULL
Table of Figures
1
M I C R O - C H E C K E R
Acknowledgements
Financial Support:
University of Hull Research Support Grant
C van Oosterhout, WF Hutchinson and DPM Wills
Acknowledgemets:
We thank PW Shaw, E. D’Amato, R van Treuren, L-E Holm, AG Jones for
providing test data, and Prof. GR Carvalho, DW Weetman, B Hänfling, HR
Wilcock, A Gomez, G Adcock, N Mesquita and RA Case for many helpful
discussions.
2
M I C R O - C H E C K E R
Micro-Checker
An application designed to check microsatellite data for null alleles and
scoring errors.
Introduction
M icrosatellites are a class of co-dominant DNA markers used widely in population and
evolutionary genetics. They are highly polymorphic, usually non-coding and only
small amounts of tissue are required for sampling. The locations of these microsatellites are
distributed throughout the chromosomes and each locus is identifiable by a particular
sequence. The sequence is made up of flanking regions onto which primers can bind and a
short repeat motif repeated many times.
It is during the processes of identification and isolation using primers and amplification by
polymerase chain reaction (PCR) that the following errors can occur:
• Large allele dropout – large alleles do not amplify as efficiently as small alleles.
The purpose of the application is to help researchers detect these errors that occur during the
interpretation of sequences of microsatellite allele data. The application uses a Monte Carlo
simulation (bootstrap) method to generate expected homozygote and heterozygote allele size
difference frequencies. The Hardy-Weinberg theory of equilibrium is used to calculate
expected allele frequencies and the frequency of any null alleles detected.
3
1
M I C R O - C H E C K E R Chapter
The Application
System Requirements
Important
Analysis of larger datasets may take some time, even if the computer being used
exceeds the recommended specification.
Installation
T his initial version of the application is designed to run directly from the
folder in which it is placed after extraction from the zip file. This is to
avoid any installation problems with incompatible systems. Example files are
included and can be found in the DemoFiles folder.
4
M I C R O - C H E C K E R
Genepop files
Genepop files should be in the format shown below with a three digit number
representing the length of each allele. Single or multiple populations can be used
although populations are analysed singly. Note the use of ‘Pop’ as a separator
between different populations.
5
M I C R O - C H E C K E R
Excel files
Excel files should be in the format shown below. Note the use of ‘Pop’ as a
separator between different populations.
Important
Opening an Excel file will fail if, during preparation of the data, cells containing data
that are not required are only ‘Cleared’ and not ‘Deleted’.
6
2
M I C R O - C H E C K E R Chapter
Using Micro-Checker
Important Note
Some of the functions described in this User Guide have been disabled. In particular,
the seed value required for the random number generator and the results text files are
not saved. Also, the option to save the DataGrid in Excel format is only enabled if the
operating system is found to be Microsoft Windows 2000 or Windows XP.
B rowse to the folder containing the Micro-Checker files and double click on
the StartMicroChecker.exe file. Alternatively, right click the file
StartMicroChecker.exe and select Open from the drop down menu.
When the application starts, a folder will be installed on the computer’s ‘C’ drive
(C:\LogFiles). This folder is necessary as a default location to store details of the
random number that is used each time the application analyses microsatellite data.
7
M I C R O - C H E C K E R
If the file being opened is an Excel file the user is prompted for the name of the
required sheet (See Appendix A - Initial Analysis).
The default sheet name is displayed (Sheet1) or the user can supply an alternative.
Click OK to continue.
The DataGrid
Data from a Genepop or Excel file are displayed in a grid of cells much like a
spreadsheet. The data are displayed in a particular format in preparation for
analysis.
8
M I C R O - C H E C K E R
Multiple Populations
If a file containing more than one population is loaded, the Next Population
button is enabled. Click this button to load the next population into the
DataGrid.
Click the left and right arrows to display the currently loaded populations.
9
M I C R O - C H E C K E R
Enter the number of loci in the new population either by typing directly into the
text box or by using the up or down arrows to set the required number. Click OK
to continue.
Data can be entered from the keyboard or Cut/Copied and Pasted from either
Excel or existing rows in the DataGrid.
Note
Only complete rows of data in the correct format can be pasted into the DataGrid.
The data can be in the correct Excel format or the slightly longer DataGrid format.
the DataGrid by clicking the Paste button on the tool bar or by clicking
Edit|Paste on the menu bar or the keyboard keys Ctrl-V. To select all rows in the
DataGrid select Edit|Select All on the menu bar or the keyboard keys Ctrl-A.
Deleting Rows
To delete a row or rows, select the row(s) as detailed above, then click
Edit|Delete or press the Backspace or Delete key.
10
M I C R O - C H E C K E R
Use the left and right arrows to move between loci ensuring that all are
given a value.
Click on the ‘All’ button to set the repeat motif for all loci to the currently selected
value.
11
M I C R O - C H E C K E R
A warning is display if any values are found that are not positive integers.
This type of error value must be corrected or any values found will be
automatically omitted when the application proceeds.
A text window opens on the left of the screen displaying the location of the faulty
data. Also displayed are any out of range values or any values with an inconsistent
modulus.
12
M I C R O - C H E C K E R
Out of range data or data with an inconsistent modulus will not prevent the
application from proceeding.
13
M I C R O - C H E C K E R
If no errors are found, analysis of the data proceeds (See Appendix A – Analysis
Flowchart).
14
M I C R O - C H E C K E R
Note
The generation of the results files and graphs may take some time if a large data set is
being analysed or an older computer is being used.
Click on the Save button or select File | Save As from the menu bar to display
the Save dialog. The currently loaded populations can then be saved in either
Genepop format as TXT or DAT files or in Excel format. The current
population can also be saved by copying and pasting the contents of the DataGrid
into another application (e.g. Microsoft Word or Excel).
15
M I C R O - C H E C K E R
Click on the Close button to close the current DataGrid. If the DataGrid
contains unsaved data, a dialog box will be displayed giving the option of saving
the current populations.
from the menu bar or click the close button in the top corner of the
window. If the DataGrid contains unsaved data, a dialog box will be displayed
giving the option of saving the current populations.
16
3
M I C R O - C H E C K E R Chapter
The Results
To view the graphs for another locus, click Window on the menu bar and select
the required locus.
17
M I C R O - C H E C K E R
18
M I C R O - C H E C K E R
Probabilities
The expected number of homozygotes and the probabilities of the observed
homozygote frequencies are displayed below the graphs on the left side.
Expected Homozygotes
The application calculates the expected homozygote frequency for each class
based on the heterozygote frequency for that class. The total number of expected
homozygotes is then calculated and compared to the observed number.
The probability for each homozygote class frequency is calculated using the
homozygote and heterozygote frequencies of each size class. The significance of
the combined probability is calculated and displayed. The probability for each
homozygote class frequency is also calculated by comparing the observed value to
the mean rank position of that value in the sorted simulated values.
19
M I C R O - C H E C K E R
20
M I C R O - C H E C K E R
Click on the left and right arrows to display the results for each locus.
If the estimated null allele frequency is significant (greater than 0.05), then the
button to display the adjusted genotypes is enabled.
Adjusted Genotypes
The number of homozygote genotypes in each size class is adjusted to reflect the
estimated ‘real’ numbers of homozygotes. A zero value is entered to replace one
of the homozygote allele values.
21
M I C R O - C H E C K E R
Select a correction algorithm from the drop down list to compare results. Right
click within the text area to display the text copying menu. If this is not available,
click on the Select All button then Ctrl-C to copy the data prior to pasting (Ctrl-V)
into another application (e.g. spreadsheet).
Note
The adjusted genotypes are ordered according to allele size such that the row numbers
do not correspond to the original sample numbers. Consequently multi locus
genotypic analysis cannot be performed with these data.
22
M I C R O - C H E C K E R
The estimated null allele frequency for each locus is compared to the null allele
frequencies obtained using methods by Chakraborty [Chakraborty et al 1992] and
Brookfield [Brookfield 1996].
23
M I C R O - C H E C K E R
References
Brookfield J F Y, (1996) A simple new method for estimating null allele frequency
from heterozygote deficiency, Molecular Ecology, 5, 453-455
Bibliography
DeWoody J A, Avise J C (1999) Microsatellite variation in marine, freshwater and
anadromous fishes compared with other animals. Journal of Fish Biology (2000) 56,
461-473
Ewen KR, Bahlo M, Treloar SA, Levinson DF, Mowry B, Barlow JW, Foote SJ,
2000. Identification and analysis of error types in high-throughput genotyping.
Am. J. Hum. Genet. 67: 727-736.
Hedrick PW, 2000. Genetics of Populations. 2nd Ed. Jones & Bartlett Publishers,
Sudbury, Massachusetts.
Jones AG, Stockwell CA, Walker D, Avise JC, 1998. The molecular basis of a
microsatellite null allele from the white sands pupfish. J. Hered. 89: 339-342.
24
M I C R O - C H E C K E R
Launey S, Hedgecock D, 2001. High genetic load in the Pacific oyster Crassostrea
gigas. Genetics 159: 255-265
Manaster CJ, Nanthakumar E, Morin PA, 1999. Detecting null alleles with
Vasarely Charts. Axys Pharmaceuticals. Institute of Electrical and Electronics
Engineers. Visualisation ’99 Conference Proceedings.
McGoldrick DJ, Hedgecock D, English LJ, Baoprasertkul P, Ward RD, 2000. The
transmission of microsatellite alleles in Australian and North American stocks of
the Pacific oyster (Crassostrea gigas): Selection and null alleles. J. Shellfish Res. 19:
779-788.
Rodzen JA & May B, 2002. Inheritance of microsatellite loci in the white sturgeon
(Acipenser transmontanus). Genome, 45: 1064-1076
Shaw PW, Pierce GJ, Boyle PR, 1999. Subtle population structuring within a
highly vagile marine invertebrate, the veined squid Loligo forbesi, demonstrated with
microsatellite DNA markers. Mol. Ecol. 8: 407-417.
25
M I C R O - C H E C K E R Appendices
Appendix A
Initial Analysis
Start
Input Micro-Checker Initial Analysis
file
Import file
Check
Format
Is XLS? No Genepop No
OK?
format
Yes Yes
Copy one
Format
No population
OK?
Yes Results
file
Validate
Modify
Valid No
Yes
Repeat
Copy one
locus
Store
Observed Count
homozygotes Analyse data
and size
differences
Yes
26
M I C R O - C H E C K E R
Analysis Flowchart
Start
Is the
observed number of
homozygotes > the
maximum
expected
Yes
Is there Is there
a general excess of a deficiency of
No heterozygote genotypes
homozygotes over most
allele size classes with alleles of one repeat
unit difference
Yes
Yes
Is there an excess of
No large homozygote
classes
Yes
27
Index
Excel files 6 Next Population 9
Exit Application 16 Null Allele 23
A Expected Homozygotes 19 Null alleles 3
Adjusted Genotypes 21
Allele Frequencies 21 F O
Allele size (maximum expected)
11 File format 5 Observed Homozygote
Analysing the data 13 Frequencies 19
Opening a file 7
G Operating System 4
B
Genepop files 5
Blank DataGrid 9 Genotypes (adjusted) 21 P
Graph Results 17
Populations 9
C Probabilities 19
I
Checking the data 11
Close all graphs 17 Input File 5 R
Close Application 16 Installation 4
Close DataGrid 15 Introduction 3 Repeat motif 11
Compare Null Alleles 23 Results 20
Confidence interval 13 Results file location 14
Cut, Copy and Paste 10 L
Large allele dropout 3 S
D
Seed value 14
DataGrid 8 M Setting the confidence interval13
DataGrid (blank) 9 Show the DatatGrid / Graphs 17
DataGrid (closing) 15 Shutdown Application 16
Maximum allele size 11
Deleting Rows 10 Starting the Application 7
Multiple Populations 9
Stuttering 3
System Requirements 4
E N
Empty DataGrid 9
New DataGrid 9
Estimating Allele Frequencies 21
28