Вы находитесь на странице: 1из 1

A Novel High-Throughput SNP

Genotyping System Utilizing Capillary


Electrophoresis Detection Platforms
H. Michael Wenz et al.*,
Applied Biosystems, 850 Lincoln Centre Dr., Foster City, CA 94404, USA
ABSTRACT

INTRODUCTION

MATERIALS AND METHODS

We have developed a high throughput SNP genotyping system that offers speed
and flexibility, produces reliable, high quality data and supports large-scale
genotyping projects for disease research, including association and linkage
analysis. This SNPlex Genotyping System is based on multiplex OLA/PCR and
capillary electrophoresis for high throughput genotyping. The assay uses an
optimized universal set of ZipChute reagents for accurate read-out on highthroughput capillary electrophoresis platforms. Genomic DNA is interrogated with
multiplexed sets of ligation probes targeting currently up to 48 specific SNP loci
in each reaction. Such a multiplex reaction utilizes less than 1 ng of gDNA per
SNP genotype. A pair of universal PCR primers amplifies all ligation products in
a multiplex simultaneously. Amplicons containing internal universal cZipCode
oligonucleotide sequences are hybridized to a corresponding mix of universal
fluorescent ZipChute reagents. ZipChute reagents contain sequences complementary to the cZipCode oligonucleotide sequences and exhibit unique preoptimized mobilities during electrophoresis. Genotypes are determined by identification of specifically hybridized ZipChute reagents that are eluted and identified
by capillary electrophoresis and subsequently associated with target SNPs using
the GeneMapper Analysis Software. Using a 48-plex format with the Applied
Biosystems 3730xl instrument and commercially available robotics systems one
can process approximately 1.5 million genotypes in 5 days. Statistics for 1,250
population validated SNPs are presented including the in silico assay design,
conversion rate, accuracy and call rate.

Genome-wide linkage and association studies involving thousands of DNA


samples in combination with multiple SNP loci require tens of thousands of
genotypes per project. Such large-scale studies necessitate a genotyping
solution that fulfills the following requirements: 1) low consumption of gDNA, 2)
high throughput, 3) flexible assay platform, 4) high call rate and accuracy, 5) ease
of use, and 6) low cost. We are developing the SNPlex System, a genotyping
technology with the potential to deliver all these requirements. The assay utilizes
multiplexed oligonucleotide ligation assay (OLA) on gDNA, followed by a
universal PCR reaction. The encoded genotype information is read utilizing a set
of common ZipChute probes. ZipChute probes are hybridized to complementary
sequences that are part of genotype specific amplicons. These ZipChute probes
are eluted and detected by electrophoretic separation on Applied Biosystems
3730xl DNA Analyzers. This approach is an attractive alternative to existing
genotyping methodologies since it requires only three unlabeled probes per SNP,
consumes a low amount of gDNA, can be highly multiplexed, and uses widely
available capillary electrophoresis instruments.

Fragmented gDNA is interrogated directly by a set of three unlabeled ligation


probes per SNP in a multiplex of 48 assays (Figure 1). Ligation probes are
designed utilizing proprietary design software (Figure 3). After phosphorylation of
OLA probes and universal linkers, genotype specific ligation and an enzymatic
clean up is performed. PCR amplification, using two universal PCR primers
follows. Biotinylated amplicons are bound to streptavidin-coated plates. Singlestranded PCR products are interrogated by a set of universal ZipChute probes.
These probes are fluorescently labeled, have a unique sequence that is complementary to a specific portion of the single-stranded PCR product, and contain
mobility modifiers (Figure 1). After elution ZipChute probes are electrophoretically separated on an Applied Biosystems 3730xl DNA Analyzer. ZipChute probe
pairs representing both alleles for a given SNP are arranged into markers. Two
bins for each marker represent both alleles of the respective SNP. The intensities
of specific signals in each bin of a marker are automatically converted into cluster
plots using GeneMapper Analysis Software version 3.5. Cluster plots together
with data to associate a universal ZipChute pair with the respective individual
SNP are used to automatically call genotypes (Figure 2).

Results

Figure 4. Cluster Plots for 8 Representative SNPs

A total of 1250 high-quality SNPs were selected from Applera Corp. resequencing and discovery project. The SNP targets were submitted to the
automated assay and pool design pipeline (Figure 3). Successfully designed and
synthesized OLA probe sets were tested against a panel of 92 DNA samples
from Coriell Cell Repositories. Genotype data were compared to TaqMan probebased assays (Ranade et al., 2001) and sequencing data.

Figure 5. Assay Reproducibility


A

Figure 1. Simplified Diagram of the SNPlex System Assay


Start
fw primer
site

ZipCode1

GER

Prep genomic DNA

A
ASO1

spacer
ASO-Linker 1

rev primer
GER
site

ZipCode2 GER

ASO2

LSO

spacer

ASO-Linker 2

Kinase probes and linkers

Activate probes

Perform OLA

Allelic
Discrimination

LSO-Linker

Step 1: Activation
A

P
P

Step 2: Ligation
G

Purify by enzymatic digestion

Purification

gDNA

Perform Universal PCR

Step 3: Purification

Ligation Product
Amplification

Capture on SA-plates

Step 4: Amplification

Purification

Biotin

Step 5: Capture

Hybridize ZipChute Set

Anneal Reporter
Probe

Wash, Elute & Load on 3730

Read Out on CE

Table 1. Summary of In Silico Probe Design Success for


1,250 SNP Loci, Selected from the Applera Genome
Initiative

Streptavidin
Fluorescent
Label

ZipChute
Probe

Mobility
Modifier

Step 6: Hybridization

Total

~16 hrs
Step 7: Elution

Primary Analysis & QC data

Allele Calling

1250

100

Successful designs

1166

93.3

Successful design for both strands1

762

61.0

Successful design for A only1

404

32.3

OLA probes were designed using proprietary probe design algorithm. Both DNA
strands were initially targeted for probe design.

Figure 2. The SNPlex System Detection


ZipChute
Probes

Percent

SNPs submitted

Marker(x48) = SNP Identification


Each ZipChute
probe pair is
representative
of one SNP.

For 84 SNPs both strands failed design due to common incompatible sequence
motifs, secondary structure, and repeat sequences within the genome.
Based on probe design rules one strand is favored over the other based on
sequence composition and secondary structure. Our design filters indicated that
second strand synthesis should not be attempted.

Step 1: Detection

Polar plot representation for three SNPs, CV1163126, CV11918682, and


CV2059319 from the study described in Table 4, is shown. The three plots in A
show for each SNP data points for 92 DNAs. In B four separate runs were
combined into one project, and data points for all four runs were overlaid.
Therefore each plot shows 368 data points. In C all 12 separate runs were
combined into one project, and data points for all 12 runs were overlaid.
Therefore each plot shows 1104 data points. Data points represented by a
indicate no template control (NTC). Data points indicated by an x indicate a no
call for this particular DNA.

Table 2. Automated Assay Pass Rate for Successfully


Designed and Synthesized Assays
Step 2: Analysis

Total

Percent

1928

100.0

1773

92.0

Tested Designs strand A

1166

100.0

Passed Designs strand A

1068

91.6

Tested Designs strand B

762

100.0

Passed Designs strand B

705

92.5

Tested Designs
Passed Designs
Bin1 + Bin2 = Allele Identification
Applied Biosystems 3730xl DNA Analyzer

Step 3:
Genotype Clustering

References
High-Throughput Genotyping with Single Nucleotide Polymorphisms (2001). Ranade, K., Chang, M.-S., Ting, C.T., Pei, D., Hsiao, C.-F., Olivier, M., Pesich, R., Hebert, J., Chen, Y.-D., Dzau, V. J., Curb, D., Olshen, R., Risch,
N., Cox, D. R. and Botstein, D. Genome Res., 11, 12621268.

Acknowledgements
*The author wishes to acknowledge the following groups at Applied Biosystems for their support of this project:
Genomic Applications R&D, Pilot Operations R&D, Bioinformatics, Global Oligo Operations, Product Test, Genotyping
Applications Marketing, Consumables Development and Manufacturing, and Analysis Software R&D.

Note

All designs that passed strand A and strand B were tested.

For Research Use Only. Not for use in diagnostic procedures.

Table 3. SNPlex System Concordance and Call Rate


Percent

The PCR process is covered by patents owned by Roche Molecular Systems, Inc. and F. Hoffmann-La Roche Ltd.
Applied Biosystems and GeneMapper are registered trademarks and AB (Design), Applera, SNPbrowser, SNPlex,
ZipChute, and ZipCode are trademarks of Applera Corporation or its subsidiaries in the US and/or certain other
countries.
TaqMan is a registered trademark of Roche Molecular Systems, Inc.

Genotype clustering with polar plotting and genotype calling, using GeneMapper Analysis Software version 3.5

Concordance between strands1

99.6%

Concordance between duplicate DNAs2

99.9%

Concordance with TaqMan assay3

99.2%

Concordance with sequencing4

98.5%

Mean call rate

96.4%

To determine concordance between strands, data for approximately 190,000


genotype calls were compared. 2Two identical DNA samples were placed on the
DNA plate. Concordance was determined based on 5574 genotype calls.
3
Concordance with the TaqMan probe-based assay is based on 89 DNA samples
in common between genotyping by SNPlex and TaqMan probe-based assay. The
non-concordance is due to errors in SNPlex and in the TaqMan probe-based
assay, which are approximately equal. Thus, the SNPlex error rate is about half
of the 0.8%, or 99.6% real accuracy. 4Concordance with sequencing is based on
14 DNA samples in common between genotyping by SNPlex and sequencing.
1

Figure 3. SNPlex System Assay Design Pipeline

Start Design
Batch

Multiplex
Pool Design

myScience

Assay Design

XML

Web Portal

SNP Specificity

SNP sequence
Input flat-file

Input validation

SNPlex Assay Design Pipeline

Web Portal

myScience

Probe Sets
Manufacturing

Review Design
Report

Table 4. SNPlex System Call Rate, Concordance, and


Precision

SNP sequence
Input XML file
Genome
Assembly

Assay Information
File (XML)

Probe Sets
(48-plex)

Experimental design. A set of SNPs were selected for which both sequencing
and TaqMan probe based assay data were available. This set is a subset of SNPs
with genotype information from both the Applera Genome Initiative and the
TaqMan Assays-on-Demand SNP Genotyping Products. Selected assays were
required to have between 9 and 14 high-quality sequencing reads for DNA
samples in common between the re-sequencing data and the TaqMan data. A
further criterion was to select only SNPs whose minor allele frequencies
determined by 5-nuclease were at least 0.10 with at least one of the major
populations. These assays were previously tested against a panel of 89 DNA
samples from the Coriell Institute Cell Repository to obtain allele frequencies.

Number of replicate runs

Call rate

Concordance

Precision

12

98.68%

99.79%

99.97%

One additional SNP set, consisting of 48 high quality SNPs with minor allele
frequencies of typically >0.1 with four major populations (Caucasian, African
American, Japanese, Chinese) was selected from Applera Corp. SNP discovery
project. This 48-plex SNP set was run through the whole SNPlex system 12
times. For the 46 SNPs that always passed the automated GeneMapper analysis
across all twelve runs, call rate, concordance, and precision were calculated.
Concordance was determined based on 34,891 genotypes that were in common
between SNPlex and the TaqMan Assays on Demand SNP genotyping products.
Of the 576 possible assays 560 (97.2%) passed the automatic genotyping
pipeline. Of the 16 failed assays one SNP failed in all twelve runs, a second SNP
failed in 4 out of 12 runs due to insufficient cluster separation.

2004 Applied Biosystems. All rights reserved.

Conclusions
We show initial results of SNP genotyping that were achieved
with the SNPlex Genotyping System, an assay for the high
throughput determination of SNP genotypes. Genotypes are
detected with low consumption of gDNA (~0.8 ng/genotype), high
reproducibility and accuracy, and high throughput on readily
available capillary electrophoresis based read-out systems. In
this study that used population validated SNPs with a minor allele
frequency of at least 5% in at least one major population, the
overall system conversion rate was ~85% (93% in silico design
conversion x 92% assay conversion). The SNPbrowser software
can be used to select these SNPs and to display their position
within LD and haplotype blocks. Where possible, those SNPs can
be used as backbone SNPs for the required application. At a 48plex level one can conceivably generate approximately 1.5 million
genotypes in 5 days on an Applied Biosystems 3730xl system
and commercially available robotics systems. Genotypes are
called automatically, using GeneMapper software, with the option
of manual editing. The system is currently configured at 48-plex
per reaction; however, the future system enhancements include
increased throughput through higher multiplexed reactions, such
as 96- and 192-plex per reaction.

Вам также может понравиться