Вы находитесь на странице: 1из 5

Analytical Biochemistry 391 (2009) 64–68

Contents lists available at ScienceDirect

Analytical Biochemistry
journal homepage: www.elsevier.com/locate/yabio

Comparative analysis of human saliva microbiome diversity by barcoded


pyrosequencing and cloning approaches
Ivan Nasidze a,*, Dominique Quinque a, Jing Li b, Mingkun Li a, Kun Tang c, Mark Stoneking a
a
Max Planck Institute for Evolutionary Anthropology, D-04103 Leipzig, Germany
b
National Drug Screening Laboratory, China Pharmaceutical University, Nanjing City 21009, China
c
Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China

a r t i c l e i n f o a b s t r a c t

Article history: Metagenomic studies traditionally rely on cloning polymerase chain reaction (PCR) products and
Received 9 March 2009 sequencing multiple clones. However, this approach is tedious and expensive, thereby limiting the range
Available online 3 May 2009 and scale of questions that can be addressed. Recent developments in DNA sequencing technologies
enable a dramatic increase in throughput via parallel in-depth analysis of many samples with limited
Keywords: sample processing and lower costs. We directly compared the traditional cloning approach with a bar-
Saliva coded pyrosequencing method to see whether the latter accurately describes microbiome diversity in
Microbiome diversity
human saliva. Our results indicate that despite the shorter read lengths, the pyrosequencing approach
Barcoded pyrosequencing
Cloning
provides a description of the human salivary microbiome that is in good agreement with results based
on the traditional cloning and sequencing approach.
Ó 2009 Elsevier Inc. All rights reserved.

Analysis of 16S ribosomal RNA (rRNA)1 sequence variation has ing, not a direct comparison between the pyrosequencing and
revolutionized the study of bacterial diversity in various commu- cloning approaches, and concluded that the pyrosequencing ap-
nities [1–12]. However, the traditional way to carry out such proach would describe with reasonable accuracy the diversity
metagenomic analyses relies on cloning polymerase chain reaction of microbial communities within the mouse and human gut
(PCR) products and sequencing multiple clones [1–5], a process and the Guerrero Negro microbial mat ecosystems down to the
that is tedious and expensive. Therefore, metagenomic studies genus level [6–8]. In addition, a few studies of microbiome
have been limited in the number of samples and number of se- diversity in environmental and human fecal samples have been
quences obtained per sample, and this in turn limits the range carried out using the pyrosequencing approach alone [9–12]
and scale of questions that can be addressed. Recent improve- without any comparison with the traditional cloning approach.
ments in DNA sequencing technology (e.g., barcoded pyrosequenc- Here we directly compare the results of the barcoded pyrose-
ing on the Genome Sequencer FLX/454 Life Sciences platform) quencing approach with a previous study of cloned sequences
enable a dramatic increase in throughput via parallel in-depth from the same samples [14] to assess the reliability and informa-
analysis of many samples with limited sample processing and tiveness of the former.
lower costs [13].
However, such advances generally come at the cost of shorter
sequence read lengths, and this could potentially decrease taxo- Materials and methods
nomic classification fidelity. Even though recent developments in
pyrosequencing technology are enabling longer read lengths, it is Samples
still of interest to know whether shorter sequences limit the dis-
criminatory power of each read and whether they are informa-
DNA extracts from saliva samples collected from 12 individuals,
tive enough for metagenomic analyses. To date, this issue has
each from a different geographic location, were used in this study.
been addressed by only a few studies that used in silico model-
The geographic locations of the sampled groups were described
elsewhere [14]. The same DNA samples were used in a previous
study of the human saliva microbiome in which a portion
* Corresponding author. Fax: +49 34 13550555.
(500 bp) of the 16S rRNA gene was amplified and cloned
E-mail address: nasidze@eva.mpg.de (I. Nasidze).
1
Abbreviations used: rRNA, ribosomal RNA; PCR, polymerase chain reaction; R1,
and approximately 120 clones were sequenced from each individ-
region 1; R2, region 2; C, cloning; RDPII, Ribosomal Database Project II. ual [14].

0003-2697/$ - see front matter Ó 2009 Elsevier Inc. All rights reserved.
doi:10.1016/j.ab.2009.04.034
Comparative analysis by pyrosequencing and cloning / I. Nasidze et al. / Anal. Biochem. 391 (2009) 64–68 65

PCR amplification of the microbial 16S rRNA gene

We amplified two informative regions of the microbial 16S


rRNA gene. The first, region 1 (R1), contains two variable regions,
V1 and V2, which were previously shown to be more informa-
tive than other parts of the 16S rRNA gene in terms of the num-
ber of phylotypes detected [10]. We used the forward primer for
V1 and the reverse primer for V2 [10], which together amplify
approximately a 350-bp PCR product containing V1 and V2.
The second informative region, region 2 (R2), corresponds to
the 16S rRNA gene fragment that was studied previously by
the cloning approach [14], hereafter designated C. This region
was chosen to enable direct comparisons between the pyrose-
quencing and cloning approaches. PCR conditions were as de-
scribed previously [14].

Fig. 1. Distribution of sequence read lengths for the pyrosequencing approach.


Sequencing on the Genome Sequencer FLX platform

The PCR products from R1 and R2 for the 12 individuals were


processed for parallel-tagged sequencing on the Genome Sequen- The results of comparing these 4598 sequences with the
cer FLX platform, as described elsewhere [13]. Briefly, sample- RDPII database are provided in Supplementary Table 1 (see Sup-
specific barcode sequences were ligated to the PCR products plementary material) and illustrated in the heat plot in Fig. 2.
and DNA concentrations were assessed on an Mx3005P system Altogether, 99.2% of the sequences matched a previously identi-
(Stratagene). Samples were then pooled in equimolar ratios to fied genus and 0.8% were unknown (did not match any sequence
a total DNA amount of 440 ng. The pooled DNA was subse- in the database above the 90% threshold value). By comparison,
quently amplified in PCR mixture in oil emulsions and se- the sequences from the cloning approach resulted in 98.5%
quenced on 1 lane of a 16-lane PicoTiterPlate on a Genome matches, 0.4% unclassified sequences (matching a sequence in
Sequencer FLX/454 Life Sciences sequencer according to the RDPII for which the genus has not been identified), and 1.1% un-
manufacturer’s protocol. The negative control was sequenced known sequences (see Supplementary Table 1). In the following
on an individual lane. analyses, we focus only on those sequences that matched a
known genus in the RDPII database. A total of 52 genera were
detected in these 12 individuals by the cloning approach, based
Data analysis
on 1429 sequences, whereas 68 genera were detected for R1 and
54 genera were detected for R2 (Table 1). Of the combined total
The initial sequence reads were filtered to remove artifactual
of 89 genera detected in the 12 individuals across all three data-
sequence reads (i.e., reads containing two or more different tags,
sets (C, R1, and R2), 37.1% were shared among all three datasets,
no tags, primers in the middle of sequence reads, or no primer se-
21.3% were unique for R1, 7.9% were unique for R2, and 11.2%
quence). The filtered sequences were then searched against the
were unique for C (Fig. 3).
Ribosomal Database Project II (RDPII) database [15] using the on-
Overall, more genera were detected with pyrosequencing
line program Seqmatch (http://rdp.cme.msu.edu/seqmatch/seq-
reads than with cloned PCR product sequences (Table 1). Is this
match_intro.jsp) and a threshold setting of 90%, to assign a genus
because there were more pyrosequencing reads than cloned se-
to each sequence. Diversity statistics and apportionment of varia-
quences (Table 1), or would we still detect more genera from
tion based on the frequency distribution of genera within and be-
pyrosequencing than from cloning even with the same number
tween individuals were calculated with Arlequin 3.1 [16], whereas
of sequences? To answer this question, we carried out a rarefac-
pairwise correlation analysis was carried out using Statistica 6.1
tion analysis in which the number of bacterial genera detected
(StatSoft) [17]. Rarefaction analysis was carried out using Resam-
was determined for randomly resampled subsets of sequences.
pling Rarefaction 1.3 software (http://www.uga.edu/~strata/soft-
The results of this analysis (Fig. 4) show that similar results
ware/index.html).
are obtained for C and R2, which is to be expected given that
they come from the same region of the 16S rRNA gene. However,
consistently more genera are detected from R1 (Fig. 4). R1 con-
Results and discussion
tains variable regions V1 and V2, which were predicted in a pre-
vious in silico study to be the most informative regions for
A total of 9989 sequence reads were obtained from the Gen-
detecting microbiome diversity [10]. Hence, our results are con-
ome Sequencer FLX machine. After filtering, 5825 sequences re-
sistent with this prediction.
mained for analysis. The relatively large number of sequences
Given the overall differences in genera detected among the C,
removed during the filtering primarily reflects inefficiencies dur-
R1, and R2 datasets, how strongly correlated are the different data-
ing the barcode tagging process that have since been improved
sets within the same individual? To address this, we calculated
(unpublished results). The average length of each read was
correlation coefficients among the distribution of genera detected
222 bp, and altogether more than 80.9% of the sequence reads
from the C, R1, and R2 datasets both within and between individ-
were at least 200 bp (Fig. 1). Because sequences shorter than
uals (Fig. 5). In general, there are significant positive correlations
200 bp give potentially unreliable results in comparisons with
among the C, R1, and R2 datasets both within and between individ-
the RDPII database, they cannot be used by the Seqmatch pro-
uals with one exception (discussed below). The significant correla-
gram. Therefore, we excluded these sequences from further anal-
tions among datasets from different individuals are not
ysis, leaving 4598 sequence reads (1567 from R1 and 3031 from
unexpected given our previous finding that approximately 87% of
R2).
66 Comparative analysis by pyrosequencing and cloning / I. Nasidze et al. / Anal. Biochem. 391 (2009) 64–68

Fig. 2. Heat plot of the abundance of each bacterial genus in each individual, based on the partial 16S rRNA sequences. Each numbered horizontal row corresponds to a genus;
the genus name corresponding to each number can be found in Supplementary Table 1 (see Supplementary material). Each block of three columns, separated by vertical lines,
is an individual saliva sample. Column C in each block refers to the results obtained by the cloning approach, whereas columns R1 and R2 indicate results for region 1 and
region 2, respectively, obtained by the pyrosequencing approach. The abundance of each genus is indicated by the grayscale value according to the scale at the bottom of the
plot.

the variability in the saliva microbiome is shared among individu- the three datasets within an individual was 0.732, whereas be-
als [14]. Nevertheless, the average correlation coefficient among tween individuals it was 0.441, which is significantly lower
Comparative analysis by pyrosequencing and cloning / I. Nasidze et al. / Anal. Biochem. 391 (2009) 64–68 67

Table 1
Numbers of sequence reads and genera detected by different approaches.

Individual Number Number of Number of Number of Number of Number of Number of Number of Number of Number of
of reads C detected reads R1 detected reads R2 detected reads detected reads detected
genera genera genera R1 + R2 genera C + R1 + R2 genera
Germany 116 14 141 22 117 18 258 26 374 28
China 118 26 148 26 204 27 352 34 470 42
Philippines 115 16 119 18 342 23 461 27 576 30
Poland 118 15 164 17 271 21 435 27 553 29
Oakland 112 15 126 22 385 25 511 27 623 29
Turkey 120 20 160 23 187 20 347 26 467 28
Georgia 124 21 107 24 292 27 399 36 523 43
Bolivia 127 20 95 18 315 20 410 25 537 33
Argentina 115 16 102 12 271 17 373 22 488 26
Louisiana 102 19 138 20 298 23 436 29 538 38
Congo 124 15 151 11 137 12 288 17 412 22
South 138 20 116 17 212 20 328 24 466 28
Africa
Total 1429 52 1567 68 3031 54 4598 79 6027 89

Fig. 4. Rarefaction analysis of the cloning and pyrosequencing approaches,


indicating the number of phylotypes sampled as a function of the number of reads.
The data points represent averages of 1000 randomized resamplings without
replacement.
Fig. 3. Distribution of shared and unique microbial genera for the cloning approach
(C) and the pyrosequencing method (R1 and R2).

between-individual components. The results of the AMOVA


showed that 86.0% of the total variation is shared among datasets
(Mann–Whitney U test, P < 0.0001). Thus, there is a significantly and individuals, whereas only 5.3% is due to differences among
higher correlation among datasets within an individual than be- datasets within the same individual and 8.7% is due to differences
tween individuals, indicating that the pyrosequencing and cloning between individuals. The fact that most of the variation is shared
approaches are capturing the same information on the saliva among individuals is in good agreement with our previous obser-
microbiome. vation based on a larger number of individuals [14]. Also, the fact
The one exception to the general finding of significant positive that more of the variation is due to differences between individuals
correlations among datasets from different individuals involves than to differences between datasets from the same individual is a
the Congo. Here the datasets showed significant positive correla- further indication that the cloning and pyrosequencing datasets
tions within this individual but showed nonsignificant negative tend to agree with one another.
correlations with the datasets from the other individuals (Fig. 5). In conclusion, our results indicate that despite the shorter read
This is in accordance with our previous results, based on a larger lengths, the pyrosequencing approach provides a description of the
sampling from each location [14], which showed that the Congo human salivary microbiome that is in good agreement with results
exhibited the largest differences from other locations. Thus, there based on the traditional cloning and sequencing approach. Read
is good concordance in this respect as well between the cloning lengths continue to increase, and costs continue to decrease, for
and pyrosequencing approaches. the next-generation sequencing approaches, further enhancing
To further investigate the correspondence among the C, R1, and their utility in metagenomic analyses. We anticipate that applica-
R2 datasets from different individuals, an analysis of molecular tion of next-generation sequencing methods will significantly
variance (AMOVA) was carried out. In this analysis, the total vari- boost both the number of samples and the depth of sequencing
ation in the microbiome composition of individuals is decomposed attainable in metagenomic studies, thereby providing new insights
into within-dataset, between-dataset (within-individual), and into microbiome diversity.
68 Comparative analysis by pyrosequencing and cloning / I. Nasidze et al. / Anal. Biochem. 391 (2009) 64–68

Fig. 5. Pairwise correlation matrix between datasets both within individuals and between individuals. The correlation coefficient values are indicated by the grayscale value
according to the scale at the bottom of the matrix.

Acknowledgments [6] A.F. Andersson, M. Lindberg, H. Jakobsson, F. Bäckhed, P. Nyrén, L. Engstrand,


Comparative analysis of human gut microbiota by barcoded pyrosequencing,
PLoS ONE 7 (2008) e2836.
We thank all individuals who kindly donated samples for this [7] Z. Liu, C. Lozupone, M. Hamady, F.D. Bushman, R. Knight, Short pyrosequencing
study, and we thank Jarek Bryk, Heather and Bryan Buckner, Daniel reads suffice for accurate microbial community analysis, Nucleic Acids Res. 35
(2007) e120.
Corach, Anne Fischer, Michel Halbwax, Janet Kelso, Marina Nag-
[8] Z. Liu, T.Z. DeSantis, G.L. Andersen, R. Knight, Accurate taxonomy assignments
veradze, Samra Sardas, Beth Trachtenberg, Rebecca Atencia, and from 16S rRNA sequences produced by highly parallel pyrosequencers, Nucleic
Guido Valverde for assistance with sample collections. This re- Acids Res. 36 (2008) 4845–4862.
[9] P.J. Turnbaugh, M. Hamady, T. Yatsunenko, B.L. Cantarel, A. Duncan, R.E. Ley,
search was funded by the Max Planck Society.
M.L. Sogin, W.J. Jones, B.A. Roe, J.P. Affourtit, M. Egholm, B. Henrissat, A.C.
Heath, R. Knight, J.I. Gordon, A core gut microbiome in obese and lean twins,
Nature 457 (2009) 480–484.
Appendix A. Supplementary data [10] A. Sundquist, S. Bigdeli, R. Jalili, M.L. Druzin, S. Waller, K.M. Pullen, Y.Y. El-
Sayed, M.M. Taslimi, S. Batzoglou, M. Ronaghi, Bacterial flora typing with deep,
targeted, chip-based pyrosequencing, BMC Microbiol. 7 (2007) 108.
Supplementary data associated with this article can be found, in [11] J.A. Huber, D.B.M. Welch, H.G. Morrison, S.M. Huse, P.R. Neal, D.A. Butterfield,
the online version, at doi:10.1016/j.ab.2009.04.034. M.L. Sogin, Microbial population structures in the deep marine biosphere,
Science 318 (2007) 97–100.
[12] M.L. Sogin, H.G. Morrison, J.A. Huber, D.M. Welch, S.M. Huse, P.R. Neal, J.M.
Arrieta, G.J. Herndl, Microbial diversity in the deep sea and the underexplored
References ‘‘rare biosphere”, Proc. Natl. Acad. Sci. USA 103 (2006) 12115–12120.
[13] M. Meyer, U. Stenzel, M. Hofreiter, Parallel tagged sequencing on the 454
[1] J.A. Aas, B.J. Paster, L.N. Stokes, I. Olsen, F.E. Dewhirst, Defining the normal platform, Nat. Protoc. 3 (2008) 267–278.
bacterial flora of the oral cavity, J. Clin. Microbiol. 43 (2005) 5721–5732. [14] I. Nasidze, J. Li, D. Quinque, K. Tang, M. Stoneking, Global diversity in the
[2] Z. Gao, C. Tseng, Z. Pei, M.J. Blaser, Molecular analysis of human forearm human salivary microbiome, Genome Res. (2009) 636–643.
superficial skin bacterial biota, Proc. Natl. Acad. Sci. USA 104 (2007) 2927– [15] J.R. Cole, B. Chai, R.J. Farris, Q. Wang, A.S. Kulam-Syed-Mohideen, D.M.
2932. McGarrell, A.M. Bandela, E. Cardenas, G.M. Garrity, J.M. Tiedje, The ribosomal
[3] I. Kroes, P.W. Lepp, D.A. Relman, Bacterial diversity within the human database project (RDP-II): introducing myRDP space and quality controlled
subgingival crevice, Proc. Natl. Acad. Sci. USA 96 (1999) 14547–14552. public data, Nucleic Acids Res. 35 (2007) D169–D172.
[4] L. Marchini, M.S. Campos, A.M. Silva, L.C. Paulino, F.G. Nobrega, Bacterial [16] S. Schneider, D. Roessli, L. Excoffier, Arlequin: A Software for Population
diversity in aphthous ulcers, Oral Microbiol. Immunol. 22 (2007) 225–231. Genetics Data Analysis (Version 2.000), University of Geneva, Genetics and
[5] M. Sakamoto, M. Umeda, Y. Benno, Molecular analysis of human oral Biometry Laboratory, 2000.
microbiota, J. Periodontal Res. 40 (2005) 277–285. [17] Statistica Package 6.1, StatSoft, 1984–2004.

Вам также может понравиться