Академический Документы
Профессиональный Документы
Культура Документы
Edited by Rudolf Jaenisch, Whitehead Institute, Cambridge, MA, and approved January 6, 2017 (received for review September 26, 2016)
Genome integrity of induced pluripotent stem cells (iPSCs) has to search for de novo CNVs in both fibroblast and iPSC clones,
been extensively studied in recent years, but it is still unclear whether and all detected CNVs were subjected to validation using sec-
iPSCs contain more genomic variations than cultured somatic cells. ondary methods. Our data show that iPSCs do not contain more
One important question is the origin of genomic variations detected genomic variations than the fibroblast subclones, suggesting that
in iPSCswhether iPSC reprogramming induces such variations. Here, the iPSC reprogramming is not mutagenic. In addition, more than
we undertook a unique approach by deriving fibroblast subclones 90% of the putative new mutations in the daughter lines (both
and clonal iPSC lines from the same fibroblast population and applied iPSCs and fibroblast subclones) preexist in the parental fibroblast
next-generation sequencing to compare genomic variations in these population at very low frequencies.
GENETICS
lines. Targeted deep sequencing of parental fibroblasts revealed that
most variants detected in clonal iPSCs and fibroblast subclones were Results
rare variants inherited from the parental fibroblasts. Only a small Establishment of Single-Cell Clonal Fibroblast and iPSC Lines. Clonal
number of variants remained undetectable in the parental fibroblasts, fibroblast lines and iPSC lines were derived from two independent
which were thus likely to be de novo. Importantly, the clonal iPSCs skin fibroblast cultures (Fig. 1A). We will from here on refer to
and fibroblast subclones contained comparable numbers of de novo both fibroblast subclones and iPSC lines as daughter lines. Pa-
variants. Collectively, our data suggest that iPSC reprogramming rental fibroblast line 1 (PF1) was obtained from a patient with
is not mutagenic. familial platelet disorder (FPD), who carried a Y260X mutation in
the RUNX1 gene (17), and the second parental fibroblast line
| |
iPSCs fibroblasts reprogramming | genomic variation | (PF2) was obtained from a clinically healthy donor (CTRL-1) (18).
exome sequencing Early passage fibroblasts (passage 4) were single-cell sorted into
96-well plates and clonally expanded to establish the subclones and
to obtain sufficient amounts of genomic DNA (gDNA) for whole
G enomic integrity of human induced pluripotent stem cells
(iPSCs) is an unresolved question and a crucial issue for
iPSCs-based therapeutics. To understand the scope of acquired
exome sequencing and SNP genotyping analyses (Fig. 1B). In
Identify pre-existing
and de novo variants
Clonally expanded fibroblast
subclones (N=15) Clonal iPSC lines (N=5)
Fig. 1. Overall experimental design. (A) Schematic of establishment of clonal fibroblast subclones and clonal iPSCs from pooled parental fibroblast pop-
ulations. (B) Flowchart of experimental design to detect de novo mutations in daughter lines by sequencing and SNP array methods.
total, we obtained 10 fibroblast subclones from PF1 and 5 fi- parental fibroblasts (SI Appendix, Tables S2 and S3). Among the
broblast subclones from PF2. Generation of clonal iPSC lines 450 variants from the PF1 cohort, 315 (70.0%) were present in
from each starting skin fibroblast population was previously only one daughter line, whereas 135 (30%) were shared by at least
reported (17, 18). We obtained two clonal iPSC lines from PF1 two daughter lines, including 4 variants that were shared by all
and three clonal iPSC lines from PF2, analyzing both early and daughter lines (Fig. 2C and SI Appendix, Table S4). Similarly, the
late passage iPSCs to compare accumulation of somatic muta-
PF2 samples had 224 variants (60.5%) that were unique to one
tions during long-term in vitro culture. Schematic representation
daughter line and 146 variants (39.5%) shared by at least two
of each daughter line and its relationship to the parental fibro-
blasts are depicted in Fig. 1.
120
reads with an average exome coverage rate of 85% for PF1 and 100
83
50
94% for PF2 sample sets (SI Appendix, Table S1). The average
76
68
68
67
66
80
60
59
coverage depth was 88-fold and 63-fold for PF1 and PF2, re- 56
101
60
53
39
spectively (PF1 range: 56 to 105; PF2 range: 54 to 71). To
58
53
46
53
54
40
49
48
68
50
12 7 32
20
30
20
18
6 15
7 13
12 11
10 11
2 6
0
fibroblasts using Shimmer (19). We identified a wide range of
14
26
3
1E
1F
1G
1G
2A
2D
2F
2Y
2Y
1G
2B
2F
somatic variants in each daughter line, ranging from 39 to 162
19
19
putative new variants per line (Figs. 2A and 3A). iPSCs Fibroblast Subclones
Variant frequency (VF) spectrums of PF1 and PF2 fibroblast
subclones and iPSCs are depicted in Figs. 2B and 3B, respectively. B VF =0.05 C Total variants = 450
100
135
frequent variants (VF < 0.05, vertical lines in Figs. 2B and 3B) in
daughter lines (8% of the variants detected in PF1 daughter lines
40
and 2% of the variants for PF2 daughter lines), which may rep- D Fibroblast
resent variants that were either de novo somatic mutations that subclones
20
iPSCs
occurred spontaneously during in vitro culture or preexisting var-
iants with reduced fitness (Figs. 2B and 3B and SI Appendix, Tables 88 81 281
0
119
111
120 101 Shared Variants in Parental Fibroblasts. To determine whether any of the putative
Unique Variants
new variants were present in the parental fibroblasts at low fre-
100
quency, we performed targeted deep resequencing using Nim-
55
76
blegen custom capture kit at these variant sites. Approximately
72
80
67
Variants (N)
67
66
90% of targeted variant sites, which included both the unique and
72
48
49
38
of 4,300 and 10,000 for PF1 and PF2, respectively (SI Appendix,
69
20
40
66
Tables S7 and S8). The untargeted sites have either failed custom
64
44
28
28
23
read depth (<100).
7
0
1
iPSC2 iPSC3 iPSC26 1B04 1G04 1H06 3A08 3C03 Our deep targeted resequencing revealed that 9095% of the
putative de novo variants, including almost all shared variants,
iPSCs Fibroblast Subclones could be detected in parental cells at low frequency, with median
VFs of 0.00758 and 0.00063 in PF1 and PF2, respectively (Figs. 4A
B VF =0.05 C Total variants =370 and 5A). As expected, we observed much higher median VFs for
70
224 Shared line (Fig. 4 B and C for PF1 and Fig. 5 B and C for PF2). The
50
Variants (N)
146 finding that the shared variants exist in parental cells at higher
frequency is consistent with the fact that these variants have an
40
GENETICS
However, genes that contain the shared variants did not show
10
0.0 0.2 0.4 0.6 0.8 1.0 iPSCs and Fibroblast Subclones Have Similar Numbers of de Novo
Frequency Variants. Targeted deep sequencing revealed that only 45 and 18
putative de novo variants remained undetectable in PF1 and PF2
Fig. 3. Somatic mutations detected by whole exome sequencing in parental
fibroblast 2 (PF2) clonal iPSCs and fibroblast subclones. (A) Total numbers of
parental fibroblasts, respectively (SI Appendix, Tables S7 and S8).
putative de novo variants in individual daughter lines are shown in the bar Importantly, the numbers of these likely de novo mutations were
graph. Orange bar indicates variants that are shared among PF2 daughter lines no longer different between the iPSCs and the fibroblast subclones
and unique variants to individual daughter lines are shaded in blue. Total (average of 3.7 per iPSC line vs. 4 per fibroblast subclone for PF1
number of variants for each daughter line is indicated on Top of each bar. and 2.4 per iPSC line vs. 2 per fibroblast subclone for PF2).
Variants from early and late passage iPSC lines are combined into one bar Moreover, we observed no difference in median VFs between
graph. (B) Variant frequency (VF) spectrum of putative de novo variants. If preexisting and de novo variants in daughter lines from both PF1
more than one daughter line shared the variant, maximum VF is used. Gray and PF2 (SI Appendix, Fig. S3 A and B).
vertical line indicates VF = 0.05. (C) Venn diagram of de novo variants that are
Interestingly, we found that the transition (Ts) mutation type was
either unique (blue) or shared (yellow). (D) Venn diagram of number of de
novo variants that are common between iPSCs and fibroblast subclones.
more common for preexisting variants, whereas transversion (Tv)
was more common for de novo variants (average Ts/Tv ratio of 1.5
for preexisting variants and 0.46 for de novo variants). For the PF1
daughter lines, including 5 variants shared by all daughter lines cohort, most preexisting variants had either G > A or C > T
(Fig. 3C and SI Appendix, Table S5). Interestingly, 18% of changes, which are characteristic types of UV-induced mutations,
variants (81/450 for PF1 and 64/370 for PF2, respectively) were whereas most de novo variants had either C > A or G > T changes
shared between iPSCs and fibroblast subclones in both sample (Fig. 4 D and E). The number of de novo variants for the PF2
cohort was too small to make this comparison (Fig. 5 D and E).
sets (Figs. 2D and 3D). These shared variants, especially ones
These data suggest that the cells may have undergone different
shared among all daughter lines, were most likely rare variants mechanisms to acquire somatic changes than the ones that are
present in a subset of cells of the parental fibroblast population inherited from parental cells.
and did not show enrichment of any biological process (SI Ap-
pendix, Table S6). In parental cells, the variants were undetect- Majority of de Novo Variants Are Random Somatic Changes Present in
able by WES, possibly due to low variant frequency; however, Daughter Lines. We further characterized the de novo variants
because each daughter line was clonally expanded, the variant identified by targeted resequencing to investigate whether any
frequency rises above the detection level in these daughter lines. variants had increased fitness that may lead to clonal selection of
Consistent with this hypothesis, the shared variants in daughter cells favoring survival or reprogramming into pluripotency. First,
lines were identified as true heterozygotes with median VFs of evaluating variants that are located in the coding region demon-
0.35 for PF1 and 0.48 for PF2. strated that iPSCs lines contained fewer mutations in the coding
We checked for differences in sequencing read depth between region (4 coding variants/5 iPSC lines = 0.8 new mutations per
clone) than the fibroblast subclones (22 coding variants/15 sublones =
shared and unique variant sites to eliminate the possibility of lower
1.47 new mutations per clone), indicating no selective increase of
coverage at the shared sites of the parental populations. We found
mutations in the iPSCs (SI Appendix, Table S9). Secondly, all de
that the median sequencing coverage was slightly higher at shared novo variants were unique to one daughter line with the exception
variant sites compared with the unique variant sites (PF1: 111 vs. of one variant in the COL1A1 gene, which was shared by PF2
63; PF2: 57 vs. 49). Both unique and shared variants had roughly fibroblast subclones 1H06 and 1G04. We suspect that this shared
50% exonic variants and VFs were similar among shared variants variant is a preexisting variant in PF2 parental fibroblasts and
regardless of their genomic location (SI Appendix, Fig. S1 AD for undetected during resequencing due to low targeted capture cov-
PF1 and Fig. S2 AD for PF2). erage at this site (375, compared with 10,000 average coverage
A
250 Pre existing in parental cells
B C 200
VF median = 0.00063
Variants (N)
150
100
50
0
0.00 0.05 0.10 0.15 0.20 0.25 0.30
Frequency
D B C Shared
Unique
100
150
VF median = 0.00045 80 VF median = 0.0012
Variants (N)
Variants (N)
100 60
40
50
20
0 0
0.010 0.030 0.00 0.10 0.20
E Frequency Frequency
70
A
G
60
Variants (N)
50
C>A
T>C
G>T
40
T>G
Fig. 4. Variant frequencies and mutation types of preexisting and de novo
G>C
C>G
30
T
unique variants (B) and shared variants (C) in the parental fibroblasts. Distri- 10
bution of variants for each mutation type for preexisting variants (D) and de 0
novo variants (E). Pink bars indicate transition mutations, and gray bars in- # of SNVs 48 56 66 29 12 16 35 6 19 18 28 24
dicate transversion mutations.
E mutation type
G>A
5
Variants (N)
G>T
T>G
0
pathways among the de novo variants. It should be noted that we # of SNVs 1 4 0 0 1 0 1 1 2 2 2 2
did not observe any enrichment of any biological pathways among
the preexisting variants either. Fig. 5. Variant frequencies and mutation types of preexisting and de novo
Collectively, these results suggest that these de novo variants variants determined by targeted deep sequencing of PF2 samples. (A) Overall
VF spectrum of preexisting variants in the parental fibroblasts. VF spectrum of
present in the fibroblast sublones and clonal iPSCs were acquired
unique variants (B) and shared variants (C) in the parental fibroblasts. Distri-
randomly during in vitro culture. Of course it cannot be ruled out bution of variants for each mutation type for preexisting variants (D) and de
completely that these variants are still preexisting variants in the novo variants (E). Pink bars indicate transition mutations, and gray bars in-
parental fibroblast cells at levels undetectable even with targeted dicate transversion mutations.
GENETICS
de novo variants among the three iPSC lines (SI Appendix, Fig. S4).
Evaluation of Putative Structural Variations as Preexisting Variations. Our findings suggest the possibility that different somatic tissues
We set out to test for the presence of putative CNVs in the pa- may have varying levels of mosaic mutations. Gene set enrichment
rental fibroblasts and to validate identified CNVs in daughter lines analyses showed that genes containing de novo or preexisting
by fluorescent in situ hybridization (FISH) for larger structural variants did not cluster in any specific functional pathway and there
variations. Using BAC clones as FISH probes, we validated were no indications that they contributed to increased reproductive
structural variations including the mosaic t(5;11)(q35;q13) in PF1- fitness. Therefore, it appears that the de novo and the preexisting
2A6 at the comparable frequency of 20 (SI Appendix, Fig. S5). variants were largely randomly occurring mutations.
However, our FISH results did not reveal the presence of any of We observed that the iPSC lines contained more putative new
these structural changes in the parental fibroblasts, which could be variants (before deep resequencing to detect their existence in the
due to low sensitivity of the FISH method. Therefore, we further parental cells) than the fibroblast subclones (Figs. 2A and 3A). One
investigated the preexistence of these changes in the parental fi- possible reason for the increased mutation load in iPSCs is that
broblasts using digital droplet PCR (ddPCR), which allows abso- iPSCs were collected at later passages compared with fibroblast
lute quantification of CNVs at 0.1% sensitivity of detection. We subclones, increasing the chance for rare variants to rise in fre-
were able to successfully target six of the putative de novo CNV quency. Because our WES analysis was limited to detect variants
regions for PF1 daughter lines and for two putative de novo CNVs with minor variant frequency >0.001, any variants with a frequency
shared by PF2 fibroblast subclones 1H06 and 1G04, because these of <0.001 will remain undetected using our WES approach. Al-
shared CNVs are likely to be preexisting variations (SI Appendix, ternatively, the differences in mutation load could simply be caused
Figs. S7 and S8). Among the targeted putative de novo CNVs, one by chance, because the total numbers of cell lines were relatively
amplification event on chr4: 86,288,42386,420,622 was detected small, especially for the iPSCs. Such differences may disappear if
in the parental fibroblast cells (SI Appendix, Fig. S7) as well as in more cell lines were analyzed. Finally, the data may reflect true
additional daughter cell lines. Because SNP arrays are not sensi- differences in the numbers of inherited variants between iPSCs and
tive to distinguish copy number changes greater than two, it is not fibroblast subclones. Therefore, this is an issue that requires ad-
surprising that only the fibroblast subclone 2F12 was reported to ditional investigation in the future.
have amplification on chr4: 86,288,42386,420,622, as this sample The majority of CNVs we detected were also found to be pre-
had a CN = 5 compared with CN = 4 among other lines. Our existing variations that were either present in the parental fibroblast
finding indicates that at least a subset of putative somatic struc- cells or were previously reported to be present in the general
tural variations identified in our dataset were inherited from pa- population (14). We observed one difference in genomic stability of
rental fibroblast cells, although we were not able to unmask very iPSCs where a chromosome 12 duplication occurred in the highest
low frequent CNVs in these cells. We also queried each putative passage iPSC line, iPSC3 (n = 39). Several other studies have al-
CNV against the Database of Genomic Variants (dgv.tcag.ca/) to ready reported high incidence of chromosome 12 duplications in
determine whether these CNVs have been reported previously. iPSCs grown in culture for long term and have hypothesized that
We found that all of the putative de novo CNVs have been pre- these iPSCs undergo high selective pressure during prolonged
viously reported except for one duplication event (chr1: 23,884,369 in vitro culture (10, 2729).
23,941,735) in PF1 fibroblast subclone 2F3. Based on these exper- Interestingly, we identified a mosaic translocation event,
imental findings and database searches, it is likely that most, if not t(5;11)(q35;q13), that has been previously reported to be associ-
all, of the putative CNVs are preexisting variations. ated with acute myeloid leukemia (AML) subtype M2, in one fi-
broblast subclone where the parental fibroblast sample (PF1) was
Discussion isolated from a patient with FPD harboring the Y260X mutation
In this study, we have taken a unique approach to compare the in RUNX1 (17, 30). However, RUNX1 mutations are not known to
mutational history of iPSCs with fibroblast subclones derived affect mutagenesis process or DNA repair, and our data do not
from the same fibroblast population. This experimental design show significant difference between the numbers and types of
made it possible to directly assess whether iPSC-reprogramming mutations between daughter lines of PF1 and those from the
process enhances mutagenesis, because the fibroblast subclones healthy donor (PF2).
Discovery of Potential Somatic Mutations Using Whole Exome Sequencing. ACKNOWLEDGMENTS. We thank Dionyssia Clagett (Georgetown University)
for establishing fibroblast lines from patients with FPD, Ms. Ursula Harper for
Whole exome sequence reads for each sample were aligned to the hg19 ref-
performing short tandem repeat mapping for sample identify confirmation,
erence sequence using novoalign version 2.08.02 and PCR duplicates were members of the NIH Intramural Sequencing Center for their support on WES
removed using SAMtools (31). We ran Shimmer (19) to perform all-versus-all and targeted sequencing, and Julia Fekecs for expert design of Fig. 1. The
pairwise comparisons usingminqual 20 andtestall options, and filtering research was supported by the Intramural Research Programs of National
mutation predictions with a read depth of >750 in either sample or a q value Human Genome Research Institute and National Heart, Lung, and Blood
of >0.05 as likely artifacts. This resulted in a list of 425 single nucleotide Institute, NIH.
1. Cheng L, et al.; NISC Comparative Sequencing Program (2012) Low incidence of DNA 17. Connelly JP, et al. (2014) Targeted correction of RUNX1 mutation in FPD patient-
sequence variation in human induced pluripotent stem cells generated by non- specific induced pluripotent stem cells rescues megakaryopoietic defects. Blood
integrating plasmid expression. Cell Stem Cell 10(3):337344. 124(12):19261930.
2. Hussein SM, et al. (2011) Copy number variation and selection during reprogramming 18. Winkler T, et al. (2013) Defective telomere elongation and hematopoiesis from te-
to pluripotency. Nature 471(7336):5862. lomerase-mutant aplastic anemia iPSCs. J Clin Invest 123(5):19521963.
3. Ji J, et al. (2012) Elevated coding mutation rate during the reprogramming of human 19. Hansen NF, Gartner JJ, Mei L, Samuels Y, Mullikin JC (2013) Shimmer: Detection of
somatic cells into induced pluripotent stem cells. Stem Cells 30(3):435440. genetic alterations in tumors using next-generation sequence data. Bioinformatics
4. Liu P, et al. (2014) Passage number is a major contributor to genomic structural var- 29(12):14981503.
iations in mouse iPSCs. Stem Cells 32(10):26572667. 20. Quinlan AR, et al. (2011) Genome sequencing of mouse induced pluripotent stem cells
5. Hamada M, Malureanu LA, Wijshake T, Zhou W, van Deursen JM (2012) Re- reveals retroelement stability and infrequent DNA rearrangement during re-
programming to pluripotency can conceal somatic cell chromosomal instability. PLoS programming. Cell Stem Cell 9(4):366373.
Genet 8(8):e1002913. 21. Abyzov A, et al. (2012) Somatic copy number mosaicism in human skin revealed by
6. Rouhani FJ, et al. (2016) Mutational history of a human cell lineage from somatic to induced pluripotent stem cells. Nature 492(7429):438442.
induced pluripotent stem cells. PLoS Genet 12(4):e1005932. 22. ilina O, et al. (2015) Somatic mosaicism for copy-neutral loss of heterozygosity and
7. Gore A, et al. (2011) Somatic coding mutations in human induced pluripotent stem
DNA copy number variations in the human genome. BMC Genomics 16:703.
cells. Nature 471(7336):6367.
23. Freed D, Stevens EL, Pevsner J (2014) Somatic mosaicism in the human genome. Genes
8. Sugiura M, et al. (2014) Induced pluripotent stem cell generation-associated point
(Basel) 5(4):10641094.
mutations arise during the initial stages of the conversion of these cells. Stem Cell Rep
24. Vijg J (2014) Somatic mutations, genome mosaicism, cancer and aging. Curr Opin
2(1):5263.
Genet Dev 26:141149.
9. Ruiz S, et al. (2013) Analysis of protein-coding mutations in hiPSCs and their possible
25. Machiela MJ, Chanock SJ (2013) Detectable clonal mosaicism in the human genome.
role during somatic cell reprogramming. Nat Commun 4:1382.
Semin Hematol 50(4):348359.
10. Mayshar Y, et al. (2010) Identification and classification of chromosomal aberrations
26. Szulwach KE, et al. (2015) Single-cell genetic analysis using automated microfluidics to
in human induced pluripotent stem cells. Cell Stem Cell 7(4):521531.
resolve somatic mosaicism. PLoS One 10(8):e0135007.
11. Laurent LC, et al. (2011) Dynamic changes in the copy number of pluripotency and cell
27. Draper JS, et al. (2004) Recurrent gain of chromosomes 17q and 12 in cultured human
proliferation genes in human ESCs and iPSCs during reprogramming and time in
culture. Cell Stem Cell 8(1):106118. embryonic stem cells. Nat Biotechnol 22(1):5354.
12. Pasi CE, et al. (2011) Genomic instability in induced stem cells. Cell Death Differ 18(5): 28. Garitaonandia I, et al. (2015) Increased risk of genetic and epigenetic instability in
745753. human embryonic stem cells associated with specific culture conditions. PLoS One
13. Dekel-Naftali M, et al. (2012) Screening of human pluripotent stem cells using CGH 10(2):e0118307.
and FISH reveals low-grade mosaic aneuploidy and a recurrent amplification of 29. Kang X, et al. (2015) Effects of integrating and non-integrating reprogramming
chromosome 1q. Eur J Hum Genet 20(12):12481255. methods on copy number variation and genomic stability of human induced plurip-
14. Ben-Yosef D, et al. (2013) Genomic analysis of hESC pedigrees identifies de novo otent stem cells. PLoS One 10(7):e0131128.
mutations and enables determination of the timing and origin of mutational events. 30. de Oliveira FM, et al. (2007) Acute myeloid leukemia (AML-M2) with t(5;11)(q35;q13)
Cell Reports 4(6):12881302. and normal expression of cyclin D1. Cancer Genet Cytogenet 172(2):154157.
15. Young MA, et al. (2012) Background mutations in parental cells account for most of 31. Li H, et al.; 1000 Genome Project Data Processing Subgroup (2009) The Sequence
the genetic heterogeneity of induced pluripotent stem cells. Cell Stem Cell 10(5): Alignment/Map format and SAMtools. Bioinformatics 25(16):20782079.
570582. 32. Wang K, et al. (2007) PennCNV: An integrated hidden Markov model designed for
16. Bhutani K, et al. (2016) Whole-genome mutational burden analysis of three pluri- high-resolution copy number variation detection in whole-genome SNP genotyping
potency induction methods. Nat Commun 7:10536. data. Genome Res 17(11):16651674.