Вы находитесь на странице: 1из 6

iPSCs and fibroblast subclones from the same

fibroblast population contain comparable levels


of sequence variations
Erika M. Kwona, John P. Connellya,1, Nancy F. Hansenb, Frank X. Donovanc, Thomas Winklerd, Brian W. Davise,
Halah Alkadia, Settara C. Chandrasekharappac, Cynthia E. Dunbard, James C. Mullikinb, and Paul Liua,2
a
Oncogenesis and Development Section, National Human Genome Research Institute, NIH, Bethesda, MD 20892; bComparative Genomics Unit, NIH
Intramural Sequencing Center, National Human Genome Research Institute, NIH, Bethesda, MD 20892; cGenomics Core, National Human Genome Research
Institute, NIH, Bethesda, MD 20892; dHematology Branch, National Heart Lung and Blood Institute, NIH, Bethesda, MD 20892; and eCancer Genetics and
Comparative Genomics Branch, National Human Genome Research Institute, NIH, Bethesda, MD 20892

Edited by Rudolf Jaenisch, Whitehead Institute, Cambridge, MA, and approved January 6, 2017 (received for review September 26, 2016)

Genome integrity of induced pluripotent stem cells (iPSCs) has to search for de novo CNVs in both fibroblast and iPSC clones,
been extensively studied in recent years, but it is still unclear whether and all detected CNVs were subjected to validation using sec-
iPSCs contain more genomic variations than cultured somatic cells. ondary methods. Our data show that iPSCs do not contain more
One important question is the origin of genomic variations detected genomic variations than the fibroblast subclones, suggesting that
in iPSCswhether iPSC reprogramming induces such variations. Here, the iPSC reprogramming is not mutagenic. In addition, more than
we undertook a unique approach by deriving fibroblast subclones 90% of the putative new mutations in the daughter lines (both
and clonal iPSC lines from the same fibroblast population and applied iPSCs and fibroblast subclones) preexist in the parental fibroblast
next-generation sequencing to compare genomic variations in these population at very low frequencies.

GENETICS
lines. Targeted deep sequencing of parental fibroblasts revealed that
most variants detected in clonal iPSCs and fibroblast subclones were Results
rare variants inherited from the parental fibroblasts. Only a small Establishment of Single-Cell Clonal Fibroblast and iPSC Lines. Clonal
number of variants remained undetectable in the parental fibroblasts, fibroblast lines and iPSC lines were derived from two independent
which were thus likely to be de novo. Importantly, the clonal iPSCs skin fibroblast cultures (Fig. 1A). We will from here on refer to
and fibroblast subclones contained comparable numbers of de novo both fibroblast subclones and iPSC lines as daughter lines. Pa-
variants. Collectively, our data suggest that iPSC reprogramming rental fibroblast line 1 (PF1) was obtained from a patient with
is not mutagenic. familial platelet disorder (FPD), who carried a Y260X mutation in
the RUNX1 gene (17), and the second parental fibroblast line
| |
iPSCs fibroblasts reprogramming | genomic variation | (PF2) was obtained from a clinically healthy donor (CTRL-1) (18).
exome sequencing Early passage fibroblasts (passage 4) were single-cell sorted into
96-well plates and clonally expanded to establish the subclones and
to obtain sufficient amounts of genomic DNA (gDNA) for whole
G enomic integrity of human induced pluripotent stem cells
(iPSCs) is an unresolved question and a crucial issue for
iPSCs-based therapeutics. To understand the scope of acquired
exome sequencing and SNP genotyping analyses (Fig. 1B). In

genetic changes that occur during iPSC generation, previous Significance


studies have compared single nucleotide variants (SNVs), copy
number variations (CNVs), and chromosomal rearrangements in One important unsolved question in the stem cell field is, do
iPSCs to donor somatic cells or embryonic stem cells (ESCs) using induced pluripotent stem cells (iPSCs) have more mutations than
various assays, including SNP array and next-generation sequenc- other cultured somatic cells because of the reprogramming
ing methods (16). Whole genome or exome sequencing (WGS/ process? In this work, we took a novel approach to interrogate
WES) of parental and iPSC pairs have demonstrated that there the genome integrity of iPSCs by comparing mutational load of
are, on average, 612 coding SNVs per iPSC line (1, 79) with clonal fibroblast lines and iPSC lines derived from the same fi-
varying theories regarding when these SNVs arose in the iPSCs. broblast parental cells. Whole exome sequencing demonstrates
Most reports attributed de novo SNVs or CNVs in iPSCs to that iPSCs and clonal fibroblasts have comparable numbers of
reprogramming-induced stress or long-term in vitro culture (2, 3, 7, new mutations, as compared with their parental fibroblasts.
8, 1014). However, several studies also suggested that many of the Deep, targeted resequencing also shows that greater than 90%
de novo variants were either inherited rare preexisting variations of these mutations are random, preexisting sequence variants in
in the founder source cells or benign variants regardless of small subsets of the parental fibroblast population. Our data
reprogramming method used (1, 9, 15, 16). strongly suggest that reprogramming process is not mutagenic.
To determine whether iPSCs are inherently more likely to ac-
cumulate mutations and further elucidate the origin of genomic Author contributions: E.M.K., J.P.C., S.C.C., and P.L. designed research; E.M.K., J.P.C., F.X.D.,
and H.A. performed research; T.W. contributed new reagents/analytic tools; E.M.K., N.F.H.,
variants present in iPSCs, we have undertaken a unique approach F.X.D., B.W.D., S.C.C., C.E.D., J.C.M., and P.L. analyzed data; and E.M.K., C.E.D., and P.L. wrote
by establishing clonal fibroblast subclones and iPSCs from the the paper.
same fibroblast population to directly assess whether the reprog- The authors declare no conflict of interest.
ramming process leads to more mutations by next-generation se- This article is a PNAS Direct Submission.
quencing. De novo mutations, which were not detected in the Data deposition: The sequences reported in this paper have been deposited in the dbGaP
starting pooled parental fibroblasts, were identified in each database, www.ncbi.nlm.nih.gov/gap (accession no. phs001277.v1.p1).
daughter cell line, both fibroblast subclones and iPSCs. These 1
Present address: Genome Engineering and iPSC Center, Washington University, St. Louis,
putative de novo mutation sites were then deep sequenced in the MO 63110.
parental fibroblast population to determine whether they were 2
To whom correspondence should be addressed. Email: pliu@mail.nih.gov.
present at low frequency as mosaic variants. In addition, high- This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.
resolution single nucleotide polymorphism (SNP) arrays were used 1073/pnas.1616035114/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1616035114 PNAS Early Edition | 1 of 6


APooled B
Pre-existing variant
Fibroblasts gDNA from PF1 and PF2 sample cohorts
New variant
(N=2)

WES SNP array


Single fibroblast OSKM
cell isolation Reprogramming
Identify variants with 0 Find unique CNVs
reads in parental cells in sublines

Targeted deep FISH, ddPCR of CNV


resequencing regions in parental
..... ..... Expansion cells and sublines

Identify pre-existing
and de novo variants
Clonally expanded fibroblast
subclones (N=15) Clonal iPSC lines (N=5)

Fig. 1. Overall experimental design. (A) Schematic of establishment of clonal fibroblast subclones and clonal iPSCs from pooled parental fibroblast pop-
ulations. (B) Flowchart of experimental design to detect de novo mutations in daughter lines by sequencing and SNP array methods.

total, we obtained 10 fibroblast subclones from PF1 and 5 fi- parental fibroblasts (SI Appendix, Tables S2 and S3). Among the
broblast subclones from PF2. Generation of clonal iPSC lines 450 variants from the PF1 cohort, 315 (70.0%) were present in
from each starting skin fibroblast population was previously only one daughter line, whereas 135 (30%) were shared by at least
reported (17, 18). We obtained two clonal iPSC lines from PF1 two daughter lines, including 4 variants that were shared by all
and three clonal iPSC lines from PF2, analyzing both early and daughter lines (Fig. 2C and SI Appendix, Table S4). Similarly, the
late passage iPSCs to compare accumulation of somatic muta-
PF2 samples had 224 variants (60.5%) that were unique to one
tions during long-term in vitro culture. Schematic representation
daughter line and 146 variants (39.5%) shared by at least two
of each daughter line and its relationship to the parental fibro-
blasts are depicted in Fig. 1.

Whole Exome Sequencing Discovered Putative New Variant Daughter


A
162
180
Lines. WES was conducted to detect genomic variants in iPSCs or Shared Variants
160
fibroblast subclones that were not present in the parental fibro- Unique Variants
140
118

blast populations (Fig. 1B). We obtained high-quality sequence 61


Variants (N)

120
reads with an average exome coverage rate of 85% for PF1 and 100

83
50

94% for PF2 sample sets (SI Appendix, Table S1). The average
76

68

68

67
66

80
60

59
coverage depth was 88-fold and 63-fold for PF1 and PF2, re- 56
101

60

53
39
spectively (PF1 range: 56 to 105; PF2 range: 54 to 71). To
58

53
46

53

54
40
49

48
68

50

identify somatic variations in individual daughter lines, we looked

12 7 32
20

30
20
18

for variants that were completely absent in respective parental


3 15

6 15

7 13

12 11
10 11

2 6

0
fibroblasts using Shimmer (19). We identified a wide range of
14

26

3
1E

1F

1G

1G

2A

2D

2F
2Y

2Y

1G

2B

2F
somatic variants in each daughter line, ranging from 39 to 162
19

19

putative new variants per line (Figs. 2A and 3A). iPSCs Fibroblast Subclones
Variant frequency (VF) spectrums of PF1 and PF2 fibroblast
subclones and iPSCs are depicted in Figs. 2B and 3B, respectively. B VF =0.05 C Total variants = 450
100

Interestingly, most of the variants had VFs around 0.5 in the


Unique
daughter lines, leading us to hypothesize that these variants were
80

rare preexisting mutations in parental fibroblasts or resulted from 315


Shared
increased postmutation fitness. On the other hand, there were low
Variants (N)
60

135
frequent variants (VF < 0.05, vertical lines in Figs. 2B and 3B) in
daughter lines (8% of the variants detected in PF1 daughter lines
40

and 2% of the variants for PF2 daughter lines), which may rep- D Fibroblast
resent variants that were either de novo somatic mutations that subclones
20

iPSCs
occurred spontaneously during in vitro culture or preexisting var-
iants with reduced fitness (Figs. 2B and 3B and SI Appendix, Tables 88 81 281
0

S2 and S3). 0.0 0.2 0.4 0.8 1.0


0.6
We compared mutation load between fibroblast subclones and Frequency
clonal iPSCs and observed that iPSCs had a 1.3- to 1.7-fold in-
creased number of putative new variants compared with fibroblast Fig. 2. Somatic mutations detected by whole exome sequencing in parental
subclones (97 vs. 73 for PF1 and 111 vs. 66 for PF2; Figs. 2A and fibroblast 1 (PF1) clonal iPSCs and fibroblast subclones. (A) Total numbers of
3A). Differences in the mutation load of iPSCs and fibroblast sub- putative de novo variants in individual daughter lines are shown in the bar
graph. Orange bar indicates variants that are shared among PF1 daughter lines
clones for both PF1 and PF2 are shown to be marginally significant
and unique variants to individual daughter lines are shaded in blue. Total
with a P value of 0.048 (two-tailed t test). number of variants for each daughter line is indicated on Top of each bar.
(B) Variant frequency (VF) spectrum of putative de novo variants. If more than
Shared Genomic Variants Among iPSCs and Fibroblast Subclones one daughter line shared the variant, maximum VF is used. Gray vertical line
Derived from Skin Fibroblasts. A total of 450 (425 SNVs and 25 indicates VF = 0.05. (C) Venn diagram of putative de novo variants that are
indels) and 370 (357 SNVs and 13 indels) variants, respectively, either unique (blue) or shared (yellow). (D) Venn diagram of number of de
were detected in the PF1 and PF2 daughter lines but not in the novo variants that are common between iPSCs and fibroblast subclones.

2 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1616035114 Kwon et al.


Over 90% of the Putative New Variants Are Preexisting, Rare Variants
A

119
111
120 101 Shared Variants in Parental Fibroblasts. To determine whether any of the putative
Unique Variants
new variants were present in the parental fibroblasts at low fre-
100
quency, we performed targeted deep resequencing using Nim-

55

76
blegen custom capture kit at these variant sites. Approximately

72
80
67
Variants (N)

67

66
90% of targeted variant sites, which included both the unique and
72

60 shared variants, were resequenced, with average sequence depth

48
49

38
of 4,300 and 10,000 for PF1 and PF2, respectively (SI Appendix,

69

20
40

66
Tables S7 and S8). The untargeted sites have either failed custom
64
44

20 capture design due to repeats in the target regions or insufficient


29

28

28
23
read depth (<100).

7
0
1
iPSC2 iPSC3 iPSC26 1B04 1G04 1H06 3A08 3C03 Our deep targeted resequencing revealed that 9095% of the
putative de novo variants, including almost all shared variants,
iPSCs Fibroblast Subclones could be detected in parental cells at low frequency, with median
VFs of 0.00758 and 0.00063 in PF1 and PF2, respectively (Figs. 4A
B VF =0.05 C Total variants =370 and 5A). As expected, we observed much higher median VFs for
70

putative de novo variants that were shared by multiple daughter


Unique lines compared with variants that were unique to a single daughter
60

224 Shared line (Fig. 4 B and C for PF1 and Fig. 5 B and C for PF2). The
50
Variants (N)

146 finding that the shared variants exist in parental cells at higher
frequency is consistent with the fact that these variants have an
40

D increased chance of being inherited by multiple daughter lines.


30

Fibroblast Moreover, it is still possible that these variants provide functional


iPSCs
20

subclones advantage, such as increased reproductive fitness in clonal growth.

GENETICS
However, genes that contain the shared variants did not show
10

enrichment for any biological processes.


0

0.0 0.2 0.4 0.6 0.8 1.0 iPSCs and Fibroblast Subclones Have Similar Numbers of de Novo
Frequency Variants. Targeted deep sequencing revealed that only 45 and 18
putative de novo variants remained undetectable in PF1 and PF2
Fig. 3. Somatic mutations detected by whole exome sequencing in parental
fibroblast 2 (PF2) clonal iPSCs and fibroblast subclones. (A) Total numbers of
parental fibroblasts, respectively (SI Appendix, Tables S7 and S8).
putative de novo variants in individual daughter lines are shown in the bar Importantly, the numbers of these likely de novo mutations were
graph. Orange bar indicates variants that are shared among PF2 daughter lines no longer different between the iPSCs and the fibroblast subclones
and unique variants to individual daughter lines are shaded in blue. Total (average of 3.7 per iPSC line vs. 4 per fibroblast subclone for PF1
number of variants for each daughter line is indicated on Top of each bar. and 2.4 per iPSC line vs. 2 per fibroblast subclone for PF2).
Variants from early and late passage iPSC lines are combined into one bar Moreover, we observed no difference in median VFs between
graph. (B) Variant frequency (VF) spectrum of putative de novo variants. If preexisting and de novo variants in daughter lines from both PF1
more than one daughter line shared the variant, maximum VF is used. Gray and PF2 (SI Appendix, Fig. S3 A and B).
vertical line indicates VF = 0.05. (C) Venn diagram of de novo variants that are
Interestingly, we found that the transition (Ts) mutation type was
either unique (blue) or shared (yellow). (D) Venn diagram of number of de
novo variants that are common between iPSCs and fibroblast subclones.
more common for preexisting variants, whereas transversion (Tv)
was more common for de novo variants (average Ts/Tv ratio of 1.5
for preexisting variants and 0.46 for de novo variants). For the PF1
daughter lines, including 5 variants shared by all daughter lines cohort, most preexisting variants had either G > A or C > T
(Fig. 3C and SI Appendix, Table S5). Interestingly, 18% of changes, which are characteristic types of UV-induced mutations,
variants (81/450 for PF1 and 64/370 for PF2, respectively) were whereas most de novo variants had either C > A or G > T changes
shared between iPSCs and fibroblast subclones in both sample (Fig. 4 D and E). The number of de novo variants for the PF2
cohort was too small to make this comparison (Fig. 5 D and E).
sets (Figs. 2D and 3D). These shared variants, especially ones
These data suggest that the cells may have undergone different
shared among all daughter lines, were most likely rare variants mechanisms to acquire somatic changes than the ones that are
present in a subset of cells of the parental fibroblast population inherited from parental cells.
and did not show enrichment of any biological process (SI Ap-
pendix, Table S6). In parental cells, the variants were undetect- Majority of de Novo Variants Are Random Somatic Changes Present in
able by WES, possibly due to low variant frequency; however, Daughter Lines. We further characterized the de novo variants
because each daughter line was clonally expanded, the variant identified by targeted resequencing to investigate whether any
frequency rises above the detection level in these daughter lines. variants had increased fitness that may lead to clonal selection of
Consistent with this hypothesis, the shared variants in daughter cells favoring survival or reprogramming into pluripotency. First,
lines were identified as true heterozygotes with median VFs of evaluating variants that are located in the coding region demon-
0.35 for PF1 and 0.48 for PF2. strated that iPSCs lines contained fewer mutations in the coding
We checked for differences in sequencing read depth between region (4 coding variants/5 iPSC lines = 0.8 new mutations per
clone) than the fibroblast subclones (22 coding variants/15 sublones =
shared and unique variant sites to eliminate the possibility of lower
1.47 new mutations per clone), indicating no selective increase of
coverage at the shared sites of the parental populations. We found
mutations in the iPSCs (SI Appendix, Table S9). Secondly, all de
that the median sequencing coverage was slightly higher at shared novo variants were unique to one daughter line with the exception
variant sites compared with the unique variant sites (PF1: 111 vs. of one variant in the COL1A1 gene, which was shared by PF2
63; PF2: 57 vs. 49). Both unique and shared variants had roughly fibroblast subclones 1H06 and 1G04. We suspect that this shared
50% exonic variants and VFs were similar among shared variants variant is a preexisting variant in PF2 parental fibroblasts and
regardless of their genomic location (SI Appendix, Fig. S1 AD for undetected during resequencing due to low targeted capture cov-
PF1 and Fig. S2 AD for PF2). erage at this site (375, compared with 10,000 average coverage

Kwon et al. PNAS Early Edition | 3 of 6


A deep resequencing (VF < 0.0001). We observed that more than
half of the de novo variants have VFs > 0.4 in the daughter lines
(SI Appendix, Fig. S3 A and B), which suggest that these variants
originate during the very early passages of the daughter lines.

Copy Number Variation Analysis in the iPSCs and Fibroblast Subclones.


Putative new CNVs in the daughter lines were investigated using
high-density SNP arrays (Illumina HumanOmniExpressExome-
8v1.1_B). The SNP call rates were greater than 99% and all sam-
ples met desired quality control criteria except for one fibroblast

A
250 Pre existing in parental cells
B C 200
VF median = 0.00063

Variants (N)
150

100

50

0
0.00 0.05 0.10 0.15 0.20 0.25 0.30
Frequency

D B C Shared
Unique
100
150
VF median = 0.00045 80 VF median = 0.0012

Variants (N)
Variants (N)

100 60

40
50
20

0 0
0.010 0.030 0.00 0.10 0.20
E Frequency Frequency

D Pre existing mutation type


T

70
A
G

60
Variants (N)

50
C>A
T>C

G>T
40

T>G
Fig. 4. Variant frequencies and mutation types of preexisting and de novo
G>C
C>G

30
T

variants determined by targeted deep sequencing of PF1 samples. (A) Overall


C

VF spectrum of preexisting variants in the parental fibroblasts. VF spectrum of 20


T>A

unique variants (B) and shared variants (C) in the parental fibroblasts. Distri- 10
bution of variants for each mutation type for preexisting variants (D) and de 0
novo variants (E). Pink bars indicate transition mutations, and gray bars in- # of SNVs 48 56 66 29 12 16 35 6 19 18 28 24
dicate transversion mutations.
E mutation type
G>A

5
Variants (N)

at other sites). Thirdly, we performed gene enrichment analysis 4


G>C
G

G>T

T>G

using Gene Ontology (geneontology.org/) and DAVID (https:// 3


C>A
C
A>G

david.ncifcrf.gov/home.jsp) of de novo or inherited variants for 2


C
C>T

both PF1 and PF2 and observed no enrichment of any biological 1


A

0
pathways among the de novo variants. It should be noted that we # of SNVs 1 4 0 0 1 0 1 1 2 2 2 2
did not observe any enrichment of any biological pathways among
the preexisting variants either. Fig. 5. Variant frequencies and mutation types of preexisting and de novo
Collectively, these results suggest that these de novo variants variants determined by targeted deep sequencing of PF2 samples. (A) Overall
VF spectrum of preexisting variants in the parental fibroblasts. VF spectrum of
present in the fibroblast sublones and clonal iPSCs were acquired
unique variants (B) and shared variants (C) in the parental fibroblasts. Distri-
randomly during in vitro culture. Of course it cannot be ruled out bution of variants for each mutation type for preexisting variants (D) and de
completely that these variants are still preexisting variants in the novo variants (E). Pink bars indicate transition mutations, and gray bars in-
parental fibroblast cells at levels undetectable even with targeted dicate transversion mutations.

4 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1616035114 Kwon et al.


subclone, 2B12, which was excluded from further analysis. Among were derived similarly to the iPSCs, except for the reprogram-
the group, 55% of the daughter lines from PF1 (1/2 iPSCs and 5/9 ming factor treatment. Our data therefore provide strong evi-
fibroblast subclones) and 25% of the daughter lines from PF2 (1/3 dence that iPSC reprogramming is not mutagenic.
iPSCs and 1/5 fibroblast subclones) had no new CNV, and iPSCs Our findings also provide strong support that a large majority of
and fibroblast subclones had comparable numbers of putative new genetic variations present in iPSCs are preexisting mutations,
CNVs (one to three CNV events per clone) (SI Appendix, Table which are likely to be present in a small clonal population of the
S10). Putative somatic copy number changes included sizes ranging parental somatic cells (1, 15, 20, 21). These rare mosaic variants
from 18 kb to 800 kb, and numbers of CNV gain or loss events were existing in a pool of somatic cells are then overrepresented in a
also comparable in iPSCs and fibroblast subclones. We observed clonal iPSC line, leading to misclassification as de novo variants.
one homozygous deletion event in a fibroblast subclone (2A6) Therefore, it is still not trivial to distinguish true de novo variants
containing a single gene, AUTS2. The rest of them were all het- associated with the reprogramming process from rare mosaic var-
erozygous, and three deletion and six duplication events were ob- iants inherited from parental somatic tissue.
served among the PF1 cohort and six deletions and three Recent advances in genomics, especially in single-cell genomics,
duplications were observed among the PF2 cohort. Interestingly, have revealed that genome mosaicism in human somatic tissues is
two CNVs (deletion of chr1: 46,287,86946,673,924 and duplica- a far more frequent phenomenon than once believed (2226). In
tion of chr2: 206,452,025206,725,525) were shared between two this study, we demonstrated a great degree of variation in genome
fibroblast subclones, 1H06 and 1G04, suggesting that these are rare mosaicism among different somatic tissue types. However, the
CNVs preexisting in parental somatic cells that were inherited by sharing of putative de novo variants among daughter lines was a
both subclones. Moreover, other larger putative structural varia- surprising finding, as we previously reported that among thousands
tions were detected, including a mosaic chromosomal event in the of de novo variants discovered by WGS, clonal iPSC lines derived
fibroblast subclone 2A6 (20% mosaicism of t(5;11)(q35;q13), SI from primary bone marrow cells contained only unique variants
Appendix, Fig. S5) and a mosaic chromosome duplication at the (1). Reanalysis of our previously published WGS dataset of bone-
highest passage (n = 39) of an iPSC line in our study (chr12:1 marrowderived iPSC lines (accession no. SRA048525) with
133,810,935, SI Appendix, Fig. S6). Shimmer using the same filtering criteria still found no overlapping

GENETICS
de novo variants among the three iPSC lines (SI Appendix, Fig. S4).
Evaluation of Putative Structural Variations as Preexisting Variations. Our findings suggest the possibility that different somatic tissues
We set out to test for the presence of putative CNVs in the pa- may have varying levels of mosaic mutations. Gene set enrichment
rental fibroblasts and to validate identified CNVs in daughter lines analyses showed that genes containing de novo or preexisting
by fluorescent in situ hybridization (FISH) for larger structural variants did not cluster in any specific functional pathway and there
variations. Using BAC clones as FISH probes, we validated were no indications that they contributed to increased reproductive
structural variations including the mosaic t(5;11)(q35;q13) in PF1- fitness. Therefore, it appears that the de novo and the preexisting
2A6 at the comparable frequency of 20 (SI Appendix, Fig. S5). variants were largely randomly occurring mutations.
However, our FISH results did not reveal the presence of any of We observed that the iPSC lines contained more putative new
these structural changes in the parental fibroblasts, which could be variants (before deep resequencing to detect their existence in the
due to low sensitivity of the FISH method. Therefore, we further parental cells) than the fibroblast subclones (Figs. 2A and 3A). One
investigated the preexistence of these changes in the parental fi- possible reason for the increased mutation load in iPSCs is that
broblasts using digital droplet PCR (ddPCR), which allows abso- iPSCs were collected at later passages compared with fibroblast
lute quantification of CNVs at 0.1% sensitivity of detection. We subclones, increasing the chance for rare variants to rise in fre-
were able to successfully target six of the putative de novo CNV quency. Because our WES analysis was limited to detect variants
regions for PF1 daughter lines and for two putative de novo CNVs with minor variant frequency >0.001, any variants with a frequency
shared by PF2 fibroblast subclones 1H06 and 1G04, because these of <0.001 will remain undetected using our WES approach. Al-
shared CNVs are likely to be preexisting variations (SI Appendix, ternatively, the differences in mutation load could simply be caused
Figs. S7 and S8). Among the targeted putative de novo CNVs, one by chance, because the total numbers of cell lines were relatively
amplification event on chr4: 86,288,42386,420,622 was detected small, especially for the iPSCs. Such differences may disappear if
in the parental fibroblast cells (SI Appendix, Fig. S7) as well as in more cell lines were analyzed. Finally, the data may reflect true
additional daughter cell lines. Because SNP arrays are not sensi- differences in the numbers of inherited variants between iPSCs and
tive to distinguish copy number changes greater than two, it is not fibroblast subclones. Therefore, this is an issue that requires ad-
surprising that only the fibroblast subclone 2F12 was reported to ditional investigation in the future.
have amplification on chr4: 86,288,42386,420,622, as this sample The majority of CNVs we detected were also found to be pre-
had a CN = 5 compared with CN = 4 among other lines. Our existing variations that were either present in the parental fibroblast
finding indicates that at least a subset of putative somatic struc- cells or were previously reported to be present in the general
tural variations identified in our dataset were inherited from pa- population (14). We observed one difference in genomic stability of
rental fibroblast cells, although we were not able to unmask very iPSCs where a chromosome 12 duplication occurred in the highest
low frequent CNVs in these cells. We also queried each putative passage iPSC line, iPSC3 (n = 39). Several other studies have al-
CNV against the Database of Genomic Variants (dgv.tcag.ca/) to ready reported high incidence of chromosome 12 duplications in
determine whether these CNVs have been reported previously. iPSCs grown in culture for long term and have hypothesized that
We found that all of the putative de novo CNVs have been pre- these iPSCs undergo high selective pressure during prolonged
viously reported except for one duplication event (chr1: 23,884,369 in vitro culture (10, 2729).
23,941,735) in PF1 fibroblast subclone 2F3. Based on these exper- Interestingly, we identified a mosaic translocation event,
imental findings and database searches, it is likely that most, if not t(5;11)(q35;q13), that has been previously reported to be associ-
all, of the putative CNVs are preexisting variations. ated with acute myeloid leukemia (AML) subtype M2, in one fi-
broblast subclone where the parental fibroblast sample (PF1) was
Discussion isolated from a patient with FPD harboring the Y260X mutation
In this study, we have taken a unique approach to compare the in RUNX1 (17, 30). However, RUNX1 mutations are not known to
mutational history of iPSCs with fibroblast subclones derived affect mutagenesis process or DNA repair, and our data do not
from the same fibroblast population. This experimental design show significant difference between the numbers and types of
made it possible to directly assess whether iPSC-reprogramming mutations between daughter lines of PF1 and those from the
process enhances mutagenesis, because the fibroblast subclones healthy donor (PF2).

Kwon et al. PNAS Early Edition | 5 of 6


Collectively, our study supports the hypothesis that the iPSCs changes and 25 indels for PF1 and 357 single nucleotide changes and 13 indels
do not have increased numbers of de novo mutations and that for PF2 to be interrogated further in the custom capture and sequencing.
somatic mutations arise randomly in cells where there is a great
varying degree of genomic variations in different parental Variant Allele Frequency Calculation from Targeted Custom Capture Deep
Sequencing. We aligned sequencing reads derived from the targeted cus-
tissue sources.
tom-captured samples to hg19 and reads from multiple libraries were merged
into a single BAM file for each sample. Allele counts for SNVs were computed
Materials and Methods
from these alignments, including only base calls with phred quality of 20 or
Fibroblast Single-Cell Expansion and iPSC Generation. Clonal populations
greater, and highest posterior density (HPD) confidence intervals were de-
arising from single cells were derived by FACS sorting early passage fibro-
termined using the CRAN binom package with confidence level of 0.99999
blasts (P5) in a 96-well plate and clonal populations were collected at P9 for
(cran.r-project.org/web/packages/binom/).
gDNA extraction. Two iPSC lines and three iPSC line pairs were derived from
fibroblasts PF1 and PF2, respectively. PF1 was a male patient with FPD (III-5 in
Structural Variation Detection and Validation. CNV analysis was performed
ref. 17) and PF2 samples were derived from a clinically healthy male control
using Illuminas Human OmniExpressExome SNP array (958,178 SNPs) on 26
donor (18). Full methods for reprogramming of these iPSC lines are de-
scribed elsewhere (17, 18). samples, including two fibroblast parental cell lines, 15 fibroblast subclones,
eight hiPSC lines, and one technical duplicate of the PF1 parental fibroblast
line. CNVs for all samples were identified with three independent calling al-
Whole Exome and Custom Targeted Capture Sequencing. The exome capture
gorithms, PennCNV (32), CNVPartition (Illumina), and Nexus v7 (Biodiscovery).
was performed according to Illuminas TruSeq Exome Enrichment Kit pro-
Unique CNVs were validated using FISH and SKY for large variations (>500 kb)
tocol. Targeted capture sequencing was performed according to Nim-
and ddPCR was used to validate smaller CNVs (SI Appendix, SI Materials and
blegens SeqCap EZ Choice protocol. Detailed methods are described in SI
Appendix, SI Materials and Methods. Methods, Tables S10 and S11, and Figs. S5S8).

Discovery of Potential Somatic Mutations Using Whole Exome Sequencing. ACKNOWLEDGMENTS. We thank Dionyssia Clagett (Georgetown University)
for establishing fibroblast lines from patients with FPD, Ms. Ursula Harper for
Whole exome sequence reads for each sample were aligned to the hg19 ref-
performing short tandem repeat mapping for sample identify confirmation,
erence sequence using novoalign version 2.08.02 and PCR duplicates were members of the NIH Intramural Sequencing Center for their support on WES
removed using SAMtools (31). We ran Shimmer (19) to perform all-versus-all and targeted sequencing, and Julia Fekecs for expert design of Fig. 1. The
pairwise comparisons usingminqual 20 andtestall options, and filtering research was supported by the Intramural Research Programs of National
mutation predictions with a read depth of >750 in either sample or a q value Human Genome Research Institute and National Heart, Lung, and Blood
of >0.05 as likely artifacts. This resulted in a list of 425 single nucleotide Institute, NIH.

1. Cheng L, et al.; NISC Comparative Sequencing Program (2012) Low incidence of DNA 17. Connelly JP, et al. (2014) Targeted correction of RUNX1 mutation in FPD patient-
sequence variation in human induced pluripotent stem cells generated by non- specific induced pluripotent stem cells rescues megakaryopoietic defects. Blood
integrating plasmid expression. Cell Stem Cell 10(3):337344. 124(12):19261930.
2. Hussein SM, et al. (2011) Copy number variation and selection during reprogramming 18. Winkler T, et al. (2013) Defective telomere elongation and hematopoiesis from te-
to pluripotency. Nature 471(7336):5862. lomerase-mutant aplastic anemia iPSCs. J Clin Invest 123(5):19521963.
3. Ji J, et al. (2012) Elevated coding mutation rate during the reprogramming of human 19. Hansen NF, Gartner JJ, Mei L, Samuels Y, Mullikin JC (2013) Shimmer: Detection of
somatic cells into induced pluripotent stem cells. Stem Cells 30(3):435440. genetic alterations in tumors using next-generation sequence data. Bioinformatics
4. Liu P, et al. (2014) Passage number is a major contributor to genomic structural var- 29(12):14981503.
iations in mouse iPSCs. Stem Cells 32(10):26572667. 20. Quinlan AR, et al. (2011) Genome sequencing of mouse induced pluripotent stem cells
5. Hamada M, Malureanu LA, Wijshake T, Zhou W, van Deursen JM (2012) Re- reveals retroelement stability and infrequent DNA rearrangement during re-
programming to pluripotency can conceal somatic cell chromosomal instability. PLoS programming. Cell Stem Cell 9(4):366373.
Genet 8(8):e1002913. 21. Abyzov A, et al. (2012) Somatic copy number mosaicism in human skin revealed by
6. Rouhani FJ, et al. (2016) Mutational history of a human cell lineage from somatic to induced pluripotent stem cells. Nature 492(7429):438442.
induced pluripotent stem cells. PLoS Genet 12(4):e1005932. 22. ilina O, et al. (2015) Somatic mosaicism for copy-neutral loss of heterozygosity and
7. Gore A, et al. (2011) Somatic coding mutations in human induced pluripotent stem
DNA copy number variations in the human genome. BMC Genomics 16:703.
cells. Nature 471(7336):6367.
23. Freed D, Stevens EL, Pevsner J (2014) Somatic mosaicism in the human genome. Genes
8. Sugiura M, et al. (2014) Induced pluripotent stem cell generation-associated point
(Basel) 5(4):10641094.
mutations arise during the initial stages of the conversion of these cells. Stem Cell Rep
24. Vijg J (2014) Somatic mutations, genome mosaicism, cancer and aging. Curr Opin
2(1):5263.
Genet Dev 26:141149.
9. Ruiz S, et al. (2013) Analysis of protein-coding mutations in hiPSCs and their possible
25. Machiela MJ, Chanock SJ (2013) Detectable clonal mosaicism in the human genome.
role during somatic cell reprogramming. Nat Commun 4:1382.
Semin Hematol 50(4):348359.
10. Mayshar Y, et al. (2010) Identification and classification of chromosomal aberrations
26. Szulwach KE, et al. (2015) Single-cell genetic analysis using automated microfluidics to
in human induced pluripotent stem cells. Cell Stem Cell 7(4):521531.
resolve somatic mosaicism. PLoS One 10(8):e0135007.
11. Laurent LC, et al. (2011) Dynamic changes in the copy number of pluripotency and cell
27. Draper JS, et al. (2004) Recurrent gain of chromosomes 17q and 12 in cultured human
proliferation genes in human ESCs and iPSCs during reprogramming and time in
culture. Cell Stem Cell 8(1):106118. embryonic stem cells. Nat Biotechnol 22(1):5354.
12. Pasi CE, et al. (2011) Genomic instability in induced stem cells. Cell Death Differ 18(5): 28. Garitaonandia I, et al. (2015) Increased risk of genetic and epigenetic instability in
745753. human embryonic stem cells associated with specific culture conditions. PLoS One
13. Dekel-Naftali M, et al. (2012) Screening of human pluripotent stem cells using CGH 10(2):e0118307.
and FISH reveals low-grade mosaic aneuploidy and a recurrent amplification of 29. Kang X, et al. (2015) Effects of integrating and non-integrating reprogramming
chromosome 1q. Eur J Hum Genet 20(12):12481255. methods on copy number variation and genomic stability of human induced plurip-
14. Ben-Yosef D, et al. (2013) Genomic analysis of hESC pedigrees identifies de novo otent stem cells. PLoS One 10(7):e0131128.
mutations and enables determination of the timing and origin of mutational events. 30. de Oliveira FM, et al. (2007) Acute myeloid leukemia (AML-M2) with t(5;11)(q35;q13)
Cell Reports 4(6):12881302. and normal expression of cyclin D1. Cancer Genet Cytogenet 172(2):154157.
15. Young MA, et al. (2012) Background mutations in parental cells account for most of 31. Li H, et al.; 1000 Genome Project Data Processing Subgroup (2009) The Sequence
the genetic heterogeneity of induced pluripotent stem cells. Cell Stem Cell 10(5): Alignment/Map format and SAMtools. Bioinformatics 25(16):20782079.
570582. 32. Wang K, et al. (2007) PennCNV: An integrated hidden Markov model designed for
16. Bhutani K, et al. (2016) Whole-genome mutational burden analysis of three pluri- high-resolution copy number variation detection in whole-genome SNP genotyping
potency induction methods. Nat Commun 7:10536. data. Genome Res 17(11):16651674.

6 of 6 | www.pnas.org/cgi/doi/10.1073/pnas.1616035114 Kwon et al.

Вам также может понравиться