Вы находитесь на странице: 1из 31

1

Volume

THE UNIVERSITY OF HULL


Department of Biological Sciences
Department of Computer Science

Micro-Checker
THE UNIVERSITY OF HULL

Micro-Checker User Guide

Dr. C. van Oosterhout


Dr. W.F. Hutchinson
D.P.M. Wills
P.F. Shipley

The University of Hull


Cottingham Road,
Hull HU6 7RX
2003 - 2005

Dr. C. van Oosterhout


NERC Research Fellow
Molecular Ecology & Fisheries Genetics Laboratory
Department of Biological Sciences
•Telephone +44(0)1482 465505/ 466434
•Switchboard +44(0)1482 346311
• Fax +44(0)1482 465458
Email C.van-Oosterhout@hull.ac.uk
Table of Contents
Table of Figures 1 Setting the seed value 14
Acknowledgements 2 Setting the results file location 15
Introduction 3 Saving the currently loaded populations 15
Closing the current DataGrid 15
CHAPTER 1 Exiting from the application 16
System Requirements 4
Installation 4 CHAPTER 3

Input File Types and Format 5 Interpreting the Results 17


Genepop files 5 Graph Results 17
Excel files 6 Show the DatatGrid / Graphs 17
Close all graphs 17
CHAPTER 2 Probabilities 19
Starting the Application 7 Expected Homozygotes 19
Using the Application 7 Probabilities of Observed Homozygote
Opening a file 7 Frequencies 19
The DataGrid 8 Results of the Analysis 20
Multiple Populations 9 Estimating Allele Frequencies 21
Using a blank or empty DataGrid 9 Adjusted Genotypes 21
Using Cut, Copy and Paste 10 Comparing Null Allele Frequencies 23
Deleting Rows 10 References 24
Selecting a repeat motif 11 Bibliography 24
Setting the maximum expected allele size 11
Checking the data 12 APPENDI X A

Setting the confidence interval 13 Initial Analysis 26


Analysing the data 13 Analysis Flowchart 27
M I C R O - C H E C K E R

Table of Figures

FIGURE 1 GENEPOP FILE FORMAT .................................................................................................................... 5


FIGURE 2 EXCEL FILE FORMAT ........................................................................................................................ 6
FIGURE 3 OPENING A FILE ................................................................................................................................ 8
FIGURE 4 EXCEL SHEET DIALOG ..................................................................................................................... 8
FIGURE 5 LAYOUT OF THE DATAGRID ............................................................................................................. 9
FIGURE 6 POPULATION NAVIGATION ............................................................................................................... 9
FIGURE 7 NUMBER OF LOCI IN THE BLANK DATAGRID ................................................................................... 9
FIGURE 8 SELECTING A REPEAT MOTIF ......................................................................................................... 11
FIGURE 9 SET ALL LOCI TO THE SAME REPEAT MOTIF ................................................................................... 11
FIGURE 10 CHECK OR ANALYSE THE DATA .................................................................................................. 11
FIGURE 11 WARNING OF NON-INTEGER DATA .............................................................................................. 12
FIGURE 12 WARNING OF POSSIBLE FAULTY DATA ........................................................................................ 12
FIGURE 13 DISPLAY OF POSSIBLE FAULTY DATA .......................................................................................... 13
FIGURE 14 SETTING THE CONFIDENCE INTERVAL.......................................................................................... 13
FIGURE 15 OPTIONS FOR ERRORS FOUND ...................................................................................................... 14
FIGURE 16 AUTOMATICALLY GENERATED SEED VALUE ............................................................................... 14
FIGURE 17 DATAGRID CLOSE BUTTON ......................................................................................................... 16
FIGURE 18 HOMOZYGOTE FREQUENCIES BY CLASS SIZE............................................................................... 18
FIGURE 19 FREQUENCY OF ALLELE DIFFERENCES (BP) ................................................................................. 18
FIGURE 20 PROBABILITY OF OBSERVED HOMOZYGOTE FREQUENCIES .......................................................... 20
FIGURE 21 ANALYSIS OF RESULTS ................................................................................................................ 20
FIGURE 22 ESTIMATION OF ALLELE FREQUENCIES ........................................................................................ 21
FIGURE 23 ADJUSTMENT OF HOMOZYGOTE GENOTYPES ................................................................................ 22
FIGURE 24 COMPARISON OF ESTIMATED NULL ALLELE FREQUENCIES USING FOUR ALGORITHMS .............. 23

1
M I C R O - C H E C K E R

Acknowledgements

Financial Support:
University of Hull Research Support Grant
C van Oosterhout, WF Hutchinson and DPM Wills

NERC Research Fellowship


C van Oosterhout

Acknowledgemets:
We thank PW Shaw, E. D’Amato, R van Treuren, L-E Holm, AG Jones for
providing test data, and Prof. GR Carvalho, DW Weetman, B Hänfling, HR
Wilcock, A Gomez, G Adcock, N Mesquita and RA Case for many helpful
discussions.

2
M I C R O - C H E C K E R

Micro-Checker
An application designed to check microsatellite data for null alleles and
scoring errors.

Introduction

M icrosatellites are a class of co-dominant DNA markers used widely in population and
evolutionary genetics. They are highly polymorphic, usually non-coding and only
small amounts of tissue are required for sampling. The locations of these microsatellites are
distributed throughout the chromosomes and each locus is identifiable by a particular
sequence. The sequence is made up of flanking regions onto which primers can bind and a
short repeat motif repeated many times.

It is during the processes of identification and isolation using primers and amplification by
polymerase chain reaction (PCR) that the following errors can occur:

• Null alleles – one or more alleles fail to amplify during PCR.

• Stuttering – slight changes occur in the allele sizes during PCR.

• Large allele dropout – large alleles do not amplify as efficiently as small alleles.

The purpose of the application is to help researchers detect these errors that occur during the
interpretation of sequences of microsatellite allele data. The application uses a Monte Carlo
simulation (bootstrap) method to generate expected homozygote and heterozygote allele size
difference frequencies. The Hardy-Weinberg theory of equilibrium is used to calculate
expected allele frequencies and the frequency of any null alleles detected.

3
1
M I C R O - C H E C K E R Chapter

The Application

System Requirements

T he application has been tested with Microsoft® Windows 98 Second


Edition, Windows 2000 and Windows XP. It is unlikely that the application
can be used with Windows 95 or Windows 98 but may run under Windows
Millennium Edition or Windows NT.

Minimum requirement Recommended


300 MHz processor 1.4 GHz processor
64 MB RAM 256 MB RAM
35 MB hard disk space 35 MB hard disk space
800 x 600 display 1024 x 768 or 1152 x 864 display

Important

Analysis of larger datasets may take some time, even if the computer being used
exceeds the recommended specification.

Installation

T his initial version of the application is designed to run directly from the
folder in which it is placed after extraction from the zip file. This is to
avoid any installation problems with incompatible systems. Example files are
included and can be found in the DemoFiles folder.

4
M I C R O - C H E C K E R

Input File Types and Format

T he application uses as input, files in the Genepop format saved as TXT or


DAT text files or correctly formatted Excel files.

Genepop files
Genepop files should be in the format shown below with a three digit number
representing the length of each allele. Single or multiple populations can be used
although populations are analysed singly. Note the use of ‘Pop’ as a separator
between different populations.

FIGURE 1 Genepop file format

5
M I C R O - C H E C K E R

Excel files
Excel files should be in the format shown below. Note the use of ‘Pop’ as a
separator between different populations.

FIGURE 2 Excel file format

Important

Opening an Excel file will fail if, during preparation of the data, cells containing data
that are not required are only ‘Cleared’ and not ‘Deleted’.

6
2
M I C R O - C H E C K E R Chapter

Using Micro-Checker
Important Note

Some of the functions described in this User Guide have been disabled. In particular,
the seed value required for the random number generator and the results text files are
not saved. Also, the option to save the DataGrid in Excel format is only enabled if the
operating system is found to be Microsoft Windows 2000 or Windows XP.

Starting the Application

B rowse to the folder containing the Micro-Checker files and double click on
the StartMicroChecker.exe file. Alternatively, right click the file
StartMicroChecker.exe and select Open from the drop down menu.

When the application starts, a folder will be installed on the computer’s ‘C’ drive
(C:\LogFiles). This folder is necessary as a default location to store details of the
random number that is used each time the application analyses microsatellite data.

Using the Application


Opening a file
To open a file, click on the ‘Open File’ button on the toolbar or select
File|Open from the drop down menu

7
M I C R O - C H E C K E R

FIGURE 3 Opening a file

If the file being opened is an Excel file the user is prompted for the name of the
required sheet (See Appendix A - Initial Analysis).

FIGURE 4 Excel Sheet Dialog

The default sheet name is displayed (Sheet1) or the user can supply an alternative.
Click OK to continue.

The DataGrid
Data from a Genepop or Excel file are displayed in a grid of cells much like a
spreadsheet. The data are displayed in a particular format in preparation for
analysis.

8
M I C R O - C H E C K E R

FIGURE 5 Layout of the DataGrid

Multiple Populations
If a file containing more than one population is loaded, the Next Population
button is enabled. Click this button to load the next population into the
DataGrid.

FIGURE 6 Population Navigation

Click the left and right arrows to display the currently loaded populations.

Using a blank or empty DataGrid


To open a blank DataGrid, click on the ‘Blank DataGrid’ button on the
toolbar or select File | New from the drop down menu.

FIGURE 7 Number of Loci in the Blank DataGrid

9
M I C R O - C H E C K E R

Enter the number of loci in the new population either by typing directly into the
text box or by using the up or down arrows to set the required number. Click OK
to continue.

Data can be entered from the keyboard or Cut/Copied and Pasted from either
Excel or existing rows in the DataGrid.

Note

Only complete rows of data in the correct format can be pasted into the DataGrid.
The data can be in the correct Excel format or the slightly longer DataGrid format.

Using Cut, Copy and Paste


Data can be copied from one part of the DataGrid to another. To Cut or Copy a
row in the DataGrid, click at the left end of the row to highlight it, and then click

on the Cut button or Copy button on the toolbar. Alternatively click


Edit|Cut or Edit|Copy on the menu bar. The keyboard keys Ctrl-X or Ctrl-C can
also be used. To select multiple rows, select the first row and then hold down the
Ctrl key while clicking on further rows. Rows can only be appended to the end of

the DataGrid by clicking the Paste button on the tool bar or by clicking
Edit|Paste on the menu bar or the keyboard keys Ctrl-V. To select all rows in the
DataGrid select Edit|Select All on the menu bar or the keyboard keys Ctrl-A.

Deleting Rows
To delete a row or rows, select the row(s) as detailed above, then click
Edit|Delete or press the Backspace or Delete key.

10
M I C R O - C H E C K E R

Selecting a repeat motif


Select a Repeat Motif, from the drop down list, for each locus in the population.

Use the left and right arrows to move between loci ensuring that all are
given a value.

FIGURE 8 Selecting a Repeat Motif

Click on the ‘All’ button to set the repeat motif for all loci to the currently selected
value.

FIGURE 9 Set all loci to the same Repeat Motif

Setting the maximum expected allele size


The Maximum Expected Allele Size is used to detect typographic errors or out of
range values. This value can be modified if required, although it only provides a
warning and does not prevent the application from proceeding.

FIGURE 10 Check or Analyse the data

11
M I C R O - C H E C K E R

Checking the data


Clicking on the Check button detects:

Values that are not positive integers

Out of range values or zero values

Values with an inconsistent modulus based on the Repeat Motif

A warning is display if any values are found that are not positive integers.

FIGURE 11 Warning of Non-integer Data

This type of error value must be corrected or any values found will be
automatically omitted when the application proceeds.

A text window opens on the left of the screen displaying the location of the faulty
data. Also displayed are any out of range values or any values with an inconsistent
modulus.

FIGURE 12 Warning of Possible Faulty Data

12
M I C R O - C H E C K E R

FIGURE 13 Display of Possible Faulty Data

Out of range data or data with an inconsistent modulus will not prevent the
application from proceeding.

Setting the confidence interval


Select the required Confidence Interval for the Monte Carlo simulations from the
drop down list. The 100% item displays all values and the Bonferroni item
displays the Bonferroni (Dunn-Sidak) adjusted 95% confidence interval.

FIGURE 14 Setting the Confidence Interval

Analysing the data


Clicking on the Analyse button initiates the checking routines of the Check
button. If any faulty data are found, the details are displayed on the left of the
screen as before. The user is then given the option of:

• Correcting the faulty values

• Omitting the faulty values from the analysis

• Proceeding with the analysis including the suspect data

13
M I C R O - C H E C K E R

FIGURE 15 Options for Errors Found

If no errors are found, analysis of the data proceeds (See Appendix A – Analysis
Flowchart).

Setting the seed value


If no faulty values are found, analysis of the data continues. A seed value,
required for the random number generator, is displayed in a dialog box. This
number is a positive integer based on the number of elapsed seconds since
January 1st 2003. It is recommended that the user clicks OK to accept this value.
The seed value is stored together with the time and date in a text file (Seedlog.txt)
in the LogFiles folder.

FIGURE 16 Automatically Generated Seed Value

14
M I C R O - C H E C K E R

If it is required that an analysis is repeated under identical conditions, then the


user can locate the seed value in the SeedLog file and enter it in the Random
Number Seed dialog box.

Setting the results file location


When the seed value has been saved, the Save File dialog is displayed. The user
then enters a location for the results files. Analysis of the data is then completed
and the graph results displayed.

Note

The generation of the results files and graphs may take some time if a large data set is
being analysed or an older computer is being used.

Saving the currently loaded populations


Click on the Close All Graphs button if the results graphs are being displayed.

Click on the Save button or select File | Save As from the menu bar to display
the Save dialog. The currently loaded populations can then be saved in either
Genepop format as TXT or DAT files or in Excel format. The current
population can also be saved by copying and pasting the contents of the DataGrid
into another application (e.g. Microsoft Word or Excel).

Closing the current DataGrid

15
M I C R O - C H E C K E R

FIGURE 17 DataGrid Close Button

Click on the Close button to close the current DataGrid. If the DataGrid
contains unsaved data, a dialog box will be displayed giving the option of saving
the current populations.

Exiting from the application


Click on the Exit button to close the application. Alternatively, select File | Exit

from the menu bar or click the close button in the top corner of the
window. If the DataGrid contains unsaved data, a dialog box will be displayed
giving the option of saving the current populations.

16
3
M I C R O - C H E C K E R Chapter

The Results

Interpreting the Results


Graph Results
Graphs are displayed in pairs, one showing homozygote frequencies for a locus,
the other allele size differences. The observed frequency of homozygote classes is
compared to Monte Carlo simulated homozygotes. The red bar represents the
range of simulated values within the selected confidence interval with the mean
value shown as a red circle. The observed value is shown as a black cross.

Show the DatatGrid / Graphs


Click on the Show DataGrid/Graphs button on the toolbar or click View |
Show/Hide DataGrid on the menu bar to toggle between the DataGrid view and
graph view.

To view the graphs for another locus, click Window on the menu bar and select
the required locus.

Close all graphs


To close all graphs, click on the Close All Graphs button on the tool bar or
click Tools|Close All Graphs on the menu bar.

17
M I C R O - C H E C K E R

FIGURE 18 Homozygote Frequencies by Class Size

FIGURE 19 Frequency of Allele Differences (bp)

18
M I C R O - C H E C K E R

Probabilities
The expected number of homozygotes and the probabilities of the observed
homozygote frequencies are displayed below the graphs on the left side.

Expected Homozygotes

The application calculates the expected homozygote frequency for each class
based on the heterozygote frequency for that class. The total number of expected
homozygotes is then calculated and compared to the observed number.

Probabilities of Observed Homozygote Frequencies

The probabilities of observed homozygote frequencies are calculated and


displayed. It may not be possible to calculate probabilities for all classes,
depending on the available data. Two methods are used to calculate the
probabilities:

The probability for each homozygote class frequency is calculated using the
homozygote and heterozygote frequencies of each size class. The significance of
the combined probability is calculated and displayed. The probability for each
homozygote class frequency is also calculated by comparing the observed value to
the mean rank position of that value in the sorted simulated values.

19
M I C R O - C H E C K E R

FIGURE 20 Probability of Observed Homozygote Frequencies

Results of the Analysis


The results of the initial analysis are displayed below the graphs on the right side.

FIGURE 21 Analysis of Results

20
M I C R O - C H E C K E R

Estimating Allele Frequencies


Click on the Estimate Allele Frequencies button or select Tools|Estimate
Frequencies on the menu bar to calculate the estimated allele frequency for each
allele class. The window displays the observed frequency of each allele size class
as well as the estimated allele frequency using the four different algorithms.

Click on the left and right arrows to display the results for each locus.

FIGURE 22 Estimation of Allele Frequencies

If the estimated null allele frequency is significant (greater than 0.05), then the
button to display the adjusted genotypes is enabled.

Adjusted Genotypes
The number of homozygote genotypes in each size class is adjusted to reflect the
estimated ‘real’ numbers of homozygotes. A zero value is entered to replace one
of the homozygote allele values.

21
M I C R O - C H E C K E R

Select a correction algorithm from the drop down list to compare results. Right
click within the text area to display the text copying menu. If this is not available,
click on the Select All button then Ctrl-C to copy the data prior to pasting (Ctrl-V)
into another application (e.g. spreadsheet).

FIGURE 23 Adjustment of homozygote genotypes

Note

The adjusted genotypes are ordered according to allele size such that the row numbers
do not correspond to the original sample numbers. Consequently multi locus
genotypic analysis cannot be performed with these data.

22
M I C R O - C H E C K E R

Comparing Null Allele Frequencies


Click on the Compare Null Alleles button or select Tools | Compare Loci on
the menu bar to display the estimated null allele frequencies for all loci in a
population.

FIGURE 24 Comparison of Estimated Null Allele Frequencies Using Four Algorithms

The estimated null allele frequency for each locus is compared to the null allele
frequencies obtained using methods by Chakraborty [Chakraborty et al 1992] and
Brookfield [Brookfield 1996].

23
M I C R O - C H E C K E R

References
Brookfield J F Y, (1996) A simple new method for estimating null allele frequency
from heterozygote deficiency, Molecular Ecology, 5, 453-455

Chakraborty R, De Andrade M, Daiger SP, Budowle B (1992) Apparent


heterozygote deficiencies observed in DNA typing data and their implications in
forensic applications. Annals of Human Genetics, 56, 45-47.

Bibliography
DeWoody J A, Avise J C (1999) Microsatellite variation in marine, freshwater and
anadromous fishes compared with other animals. Journal of Fish Biology (2000) 56,
461-473

Dieringer D, Schlötterer C, 2003. Microsatellite analyser (MSA): a platform


independent analysis tool for large microsatellite data sets. Mol. Ecol. Notes 3: 167-
169.

Ewen KR, Bahlo M, Treloar SA, Levinson DF, Mowry B, Barlow JW, Foote SJ,
2000. Identification and analysis of error types in high-throughput genotyping.
Am. J. Hum. Genet. 67: 727-736.

Gagneux P, Boesch C, Woodruff DS, 1997. Microsatellite scoring errors


associated with noninvasive genotyping based on nuclear DNA amplified from
shed hair. Mol. Ecol. 6: 861-868.

Hedrick PW, 2000. Genetics of Populations. 2nd Ed. Jones & Bartlett Publishers,
Sudbury, Massachusetts.

Holm LE, Loeschcke V, Bendixen C, 2001. Elucidation of the molecular basis of


a null allele in a rainbow trout microsatellite. Marine Biotech. 3: 555-560

Jones AG, Stockwell CA, Walker D, Avise JC, 1998. The molecular basis of a
microsatellite null allele from the white sands pupfish. J. Hered. 89: 339-342.

24
M I C R O - C H E C K E R

Launey S, Hedgecock D, 2001. High genetic load in the Pacific oyster Crassostrea
gigas. Genetics 159: 255-265

Lehmann T, Hawley WA, Collins FH, 1996. An evaluation of evolutionary


constraints on microsatellite loci using null alleles. Genetics 144: 1155-1163.

Manaster CJ, Nanthakumar E, Morin PA, 1999. Detecting null alleles with
Vasarely Charts. Axys Pharmaceuticals. Institute of Electrical and Electronics
Engineers. Visualisation ’99 Conference Proceedings.

McGoldrick DJ, Hedgecock D, English LJ, Baoprasertkul P, Ward RD, 2000. The
transmission of microsatellite alleles in Australian and North American stocks of
the Pacific oyster (Crassostrea gigas): Selection and null alleles. J. Shellfish Res. 19:
779-788.

Pemberton JM, Bancroft DR, Barrerr JA, 1995. Nonamplifying alleles at


microsatellite loci - A caution for parentage and population studies. Mol. Ecol. 4:
249-252.

Rodzen JA & May B, 2002. Inheritance of microsatellite loci in the white sturgeon
(Acipenser transmontanus). Genome, 45: 1064-1076

Shaw PW, Pierce GJ, Boyle PR, 1999. Subtle population structuring within a
highly vagile marine invertebrate, the veined squid Loligo forbesi, demonstrated with
microsatellite DNA markers. Mol. Ecol. 8: 407-417.

Van Treuren R, 1998. Estimating null allele frequencies at a microsatellite locus in


the oystercatcher (Haematopus ostralegus). Mol. Ecol. 7: 1413-1417.

Wattier R, Engel CR, Saumitou-Laprade P, Valero M, 1998. Short allele


dominance as a source of heterozygote deficiency at microsatellite loci:
experimental evidence at the dinucleotide locus Gv1CT in Gracilaria gracilis
(Rhodophyta). Mol. Ecol. 7: 1569-1573.

Wright SE, 1931. Evolution in Mendelian populations. Genetics. 16: 97-159.

25
M I C R O - C H E C K E R Appendices

Appendix A
Initial Analysis
Start
Input Micro-Checker Initial Analysis
file

Import file

Check
Format
Is XLS? No Genepop No
OK?
format

Yes Yes

Check XLS Load all


format populations Temp store

Copy one
Format
No population
OK?

Yes Results
file

Validate

Modify
Valid No

Yes
Repeat
Copy one
locus
Store

Observed Count
homozygotes Analyse data
and size
differences

Count Display results


Randomise homozygotes
Simulated and size
data
differences

End No Next pop?


Repeat N times

Yes

26
M I C R O - C H E C K E R

Analysis Flowchart
Start

Is the
observed number of
homozygotes > the
maximum
expected

Yes

Is there Is there
a general excess of a deficiency of
No heterozygote genotypes
homozygotes over most
allele size classes with alleles of one repeat
unit difference

Yes
Yes

Is there an excess of
No large homozygote
classes

Yes

May be large allele


drop-out and/or Indicates null May indicate Indicates
deviation from panmixia allele stuttering stuttering

27
Index
Excel files 6 Next Population 9
Exit Application 16 Null Allele 23
A Expected Homozygotes 19 Null alleles 3

Adjusted Genotypes 21
Allele Frequencies 21 F O
Allele size (maximum expected)
11 File format 5 Observed Homozygote
Analysing the data 13 Frequencies 19
Opening a file 7
G Operating System 4
B
Genepop files 5
Blank DataGrid 9 Genotypes (adjusted) 21 P
Graph Results 17
Populations 9
C Probabilities 19
I
Checking the data 11
Close all graphs 17 Input File 5 R
Close Application 16 Installation 4
Close DataGrid 15 Introduction 3 Repeat motif 11
Compare Null Alleles 23 Results 20
Confidence interval 13 Results file location 14
Cut, Copy and Paste 10 L
Large allele dropout 3 S
D
Seed value 14
DataGrid 8 M Setting the confidence interval13
DataGrid (blank) 9 Show the DatatGrid / Graphs 17
DataGrid (closing) 15 Shutdown Application 16
Maximum allele size 11
Deleting Rows 10 Starting the Application 7
Multiple Populations 9
Stuttering 3
System Requirements 4
E N
Empty DataGrid 9
New DataGrid 9
Estimating Allele Frequencies 21

28

Вам также может понравиться