Вы находитесь на странице: 1из 2

COMMENTARY

Can the impact of human genetic variations


be predicted?
Yuval Itana,1 and Jean-Laurent Casanovaa,b,1
a of FPs, it is probably more important to ensure
St. Giles Laboratory of Human Genetics of Infectious Diseases, The Rockefeller University,
stringency regarding the number of FNs. The
New York, NY 10065; and bHoward Hughes Medical Institute, New York, NY worst-case scenario is to miss the good muta-
tion. The current methods appear to be satisfac-
tory in that regard. To ensure that the FN rate
Compared with an arbitrary reference, the pro- other mice, homozygous for null mutations is minimal, ideally null, one can test any pre-
tein-coding sequence of any human genome in these genes, have a clear immunological diction method against a set of known disease-
contains about 20,000 single-nucleotide vari- phenotype. Among the 30 missense mutations, causing mutations from curated databases,
ants, most of which are heterozygous, 20 were predicted to be damaging by most of such as the Human Gene Mutation Database
and far fewer variants of other types. When six methods, including SIFT, PolyPhen-2, and (13), HumDiv/HumVar (4), and ClinVar (14).
their frequency is >1% in a given population, CADD. The authors (6) found that only about As a matter of fact, several variant-level meth-
variants are designated as common. The others four (20%) of the variants predicted to be dam- ods used these databases as training sets to
are rare, including some that appear to be pri- aging actually caused the in vivo phenotype of differentiate deleterious from neutral alleles
vate to the individual or kindred studied. In each null alleles. None of those predicted to be benign (for example, PolyPhen-2 and HumDiv/
individual, at most two variations can un- showed a phenotype. This study thus reveals a HumVar). Of course, it is difficult to devise
derlie a monogenic disorder. It has thus been gap between the performance of current variant- software that would reduce both FPs and FNs,
hoped that computational methods would be level methods in silico and experimental pheno- because rare, nonconservative mutations of
able to prioritize these variants and point at a types in mice in vivo, with a lack of FNs but an highly conserved residues will often present
handful of candidate culprits in the exome of excess of FPs (9, 10). One may argue, however, high prediction scores, whether FPs or disease-
any patient, not only for diagnostic but also that the missense mutations predicted to be causing mutations.
more ambitiously for research purposes (1). This damaging could actually be hypomorphic or Miosge et al. (6) understandably focus on
vision culminated in the utopia, or dystopia, of hypermorphic, and therefore not mimic the missense mutations, which are the most abun-
genetic medicine relying only on a dry labora- phenotype of null alleles. The authors (6) thus dant coding nonsynonymous variations and cur-
tory to draw conclusions from genome sequenc- went one step further and measured in vitro the rently pose the greatest prediction difficulties.
ing (2). To work, such methods should show biochemical impact of all 2,314 possible mis- Popular variant-level prediction methods, such
both a low false-positive (FP) prediction rate, sense mutations in human TP53, somatic null as PolyPhen-2 and SIFT (3, 4), were designed to
to filter out irrelevant variants (the smaller mutations of which are common in human predict the impact of missense variants. How-
the size of the haystack the better), and a low cancer. They show that only 4% of the vari- ever, it is also important to discuss the predic-
false-negative (FN) rate, to avoid filtering out ants predicted to be benign were actually tion value of variant-level approaches for
the disease-causing mutations (the needle deleterious, but that 40% of the variants other classes of variants. Up to 63,541 of
must stay in the haystack). Many “variant- predicted to be deleterious had little or no the 135,953 (46.74%) Human Gene Mutation
level” approaches predict the biochemical biochemical impact. Prediction software thus Database-curated disease-causing mutations are
impact of variants. Examples include sorting in- generate a low rate of FNs but a high rate of not missense (13). These variations are thought
tolerant from tolerant (SIFT), which is based FPs for TP53. to have stronger functional consequences, be-
on protein sequence homology (3), polymor- The few other human genes that were com- cause most are nonsense, start-loss, stop-loss,
phism phenotyping v2 (PolyPhen-2), which is putationally and experimentally tested suggest splicing mutations, and insertions or deletions
based on a combination of sequence conser- that the conclusions drawn from the TP53 ex- that can be small or large in scale. However, one
vation and biochemical properties of proteins periments may be generalizable (11, 12). It should be careful in predicting a deleterious im-
and was trained by a set of known disease- would be particularly interesting to test genes pact of apparently severe mutations, as even a
causing mutations (4), and combined anno- that are known to harbor both loss-of-function premature stop codon is not synonymous
tation-dependent depletion (CADD), which and gain-of-function lesions, and various shades (so to speak) with a null allele. Indeed, there
combines existing variant-level methods (in- of them, each corresponding to a distinctive phe- can be downstream reinitiation of translation
cluding SIFT and PolyPhen-2) with an anal- notype. Perhaps in a not too-distant future, all if the stop is sufficiently upstream, or there can
ysis of the impact of actual versus simulated nonsynonymous mutations in all coding genes be stop codon read-through depending on the
human variants (5). In an important study, will be tested, each in appropriate cell types, sequence in vicinity. Or, if the stop is suffi-
Miosge et al. show experimentally that the cur- taking advantage of induced pluripotent stem ciently downstream, the truncated protein
rent software generate a low rate of FNs but a cell (iPSC) and CRISPR-Cas9 technologies. may function well. Moreover, genes can dis-
high rate of FPs (6). However, even under this optimistic sce- play multiple isoforms, and entire coding
Clinical genetic studies had occasionally nario, one should not forget that an in vitro exons may be partly or totally redundant,
shown that computational methods can lack experiment tests only one of the many functions making stop codons anywhere inconsequential
sensitivity or specificity to predict the impact of of a protein, and not in the conditions seen
variations (7). In a comprehensive study, Miosge in vivo. For example, one may argue that some
et al. (6) studied homozygous mice after muta- TP53 mutations that were neutral in the afore- Author contributions: Y.I. and J.-L.C. wrote the paper.

genesis by N-ethyl-N-nitrosourea (ENU). They mentioned assay may be pathogenic in vivo, The authors declare no conflict of interest.
focused on 30 missense mutations in 23 immu- driving malignancy or another phenotype. See companion article on page E5189.
nity genes, each found in a single mouse sub- Meanwhile, although the prediction software 1
To whom correspondence may be addressed. Email: yitan@
strain (8). These genes were selected because should reasonably aim at reducing the number rockefeller.edu or casanova@rockefeller.edu.

11426–11427 | PNAS | September 15, 2015 | vol. 112 | no. 37 www.pnas.org/cgi/doi/10.1073/pnas.1515057112


COMMENTARY
for some or all protein functions. Moreover, and gene-level approaches complement other Miosge et al. (6) assess the predictive power
exons carrying a frameshift may functionally processes. Linkage analysis is beneficial in of variant-level methods for 30 missense in 23
rescue a splice variant that is normally out-of- multiplex or consanguineous kindreds, and mouse immunity genes in vivo and for 2,314
frame (15). Finally, stop mutations in certain genetic homogeneity can be of invaluable help missense in human TP53 in vitro. This massive
proteins may even paradoxically be gain-of- when two or more kindreds are studied. The achievement suggests that current methods
function (16). Conversely, apparently synon- comparison of the frequency of variants in a set overestimate the impact of mutations. One can
ymous mutations in coding exons can be loss- of genes, in the patients versus the general hope that variant- and gene-level approaches
of-function, not necessarily by interfering with population (or specific ethnic groups), has also improve with time. However, this study reminds
the splicing process, and should not be dis- more recently shown utility (23). More gener- us that it will remain necessary to experimen-
missed blindly (17). Moreover, whole-genome ally, knowledge of the prevalence of disease and tally validate any computational prediction by
sequencing not only improves the coverage of estimates of clinical penetrance and genetic ho- functional assays (9, 10). Candidate mutant al-
exome sequencing, but also provides an op- mogeneity are essential to make the best use of leles must be tested individually, as well as
portunity to discover disease-causing muta- the prediction software, as they directly govern cells carrying the biallelic genotypes. With
tions elsewhere in the genome (18, 19). The the frequency of the candidate mutant alleles (9, the advent of iPSCs and CRISPR/Cas9 tech-
CADD method filled these blanks by predict- 10). Equally important, the mode of inheritance nologies, decisive experiments can now be con-
ing the impact of most categories of genome is a key factor when determining the relevance ducted with human cells in vitro in fields other
variants (5). These predictions must, however, of a genotype for phenotype. Regardless of the than hematology and immunology (10, 11). In
be taken with even greater caution than those variant- and gene-level prediction scores, ho-
that process, broad and profound knowledge of
of missense mutations. In particular, much mozygous and compound heterozygous var-
physiological and pathological processes are es-
needs to be learned about the biology of var- iants are stronger candidates for a recessive
sential not only to select candidate genes and
iations in nonprotein-coding genes and the trait than any heterozygous variant. A reces-
variations, but also to design ways to test them
intergenic space. sive model with homozygous variations should
Fortunately, variant-level approaches are not be favored in patients born to consanguineous experimentally. Experiments remain necessary
parents. In sporadic cases, de novo mutations for each novel candidate mutation, and when
the only computational tools available to predict
are more likely to be causative than other het- a known disease-causing mutation is found
the clinical impact of a variant. One can also
erozygous variants in a model of high clinical in a patient with a different phenotype. In
take into consideration the properties of the
designing such experiments, the aim is to
genes themselves, in “gene-level” approaches. penetrance. It is often asserted that genome-
wide approaches, in patients or populations, discover the genotype underlying a given phe-
The human gene connectome (20), which
are unbiased and hypothesis-free. This is true notype. In anyone’s genome, there are mul-
measures the biological distance between genes,
from a physiological perspective, but not entirely tiple genotypes that are responsible for a
is helpful for selecting candidate genes. Other variety of phenotypes. Establishing a causal
methods assess the population genetic properties correct, as a good genetic hypothesis is key to a
successful endeavor. The mode of inheritance, relationship requires bridging a candidate
of the genes, thereby estimating their relevance genotype and the relevant phenotype by an
level of genetic heterogeneity, and clinical pene-
to human disease. The residual variation intol- experimentally proven thread of causes and
trance are the three key hypotheses that will lead
erance score (21) ranks human genes by their to success or failure, whether for diagnosis or consequences. This relies on a series of inter-
deviation from the genome-wide average num- research, and in the latter case whether for sin- mediate phenotypes, at the molecular, cellular,
ber of nonsynonymous mutations found in gle-patient or for large-population genetic stud- and organismal levels, connected via mecha-
genes with a similar amount of global muta- ies. Variant- and gene-level software will never nisms that best ascertain the causal relation-
tional burden. It showed that known Mendelian compensate for a faulty genetic hypothesis. ship between the variant(s) and the disease.
disease-causing genes are less tolerant to coding
variations than other genes. The de novo mu-
tation excess method (22) compares the rate 1 Goldstein DB, et al. (2013) Sequencing studies in human genetics: molecular genetics, diagnostic testing and personalized genomic
Design and interpretation. Nat Rev Genet 14(7):460–470. medicine. Hum Genet 133(1):1–9.
and nature of potential and observed de novo
2 Service RF (2013) Biology’s dry future. Science 342(6155): 14 Landrum MJ, et al. (2014) ClinVar: Public archive of relationships
mutations per human gene. These two methods 186–189. among sequence variation and human phenotype. Nucleic Acids Res
have distinct algorithms and complementary 3 Kumar P, Henikoff S, Ng PC (2009) Predicting the effects of coding 42(Database issue):D980–D985.
non-synonymous variants on protein function using the SIFT 15 Bolze A, et al. (2012) A mild form of SLC29A3 disorder: A
proposed uses. The methods, however, partly algorithm. Nat Protoc 4(7):1073–1081. frameshift deletion leads to the paradoxical translation of an
overlap because the information generated in 4 Adzhubei IA, et al. (2010) A method and server for predicting otherwise noncoding mRNA splice variant. PLoS One 7(1):
both cases is inspired by studies of purifying damaging missense mutations. Nat Methods 7(4):248–249. e29708.
5 Kircher M, et al. (2014) A general framework for estimating the relative 16 Boisson B, Quartier P, Casanova JL (2015) Immunological loss-of-
selection. In the future, combining existing pathogenicity of human genetic variants. Nat Genet 46(3):310–315. function due to genetic gain-of-function in humans: Autosomal
methods and tailoring new methods may also 6 Miosge LA, et al. (2015) Comparison of predicted and actual dominance of the third kind. Curr Opin Immunol 32:90–105.
optimize prediction. For example, a method that consequences of missense mutations. Proc Natl Acad Sci USA 17 Sauna ZE, Kimchi-Sarfaty C (2011) Understanding the contribution of
112:E5189–E5198. synonymous mutations to human disease. Nat Rev Genet 12(10):683–691.
would detect (and filter out) genes likely to har- 7 Jordan DM, et al.; Task Force for Neonatal Genomics (2015) 18 Ng SB, et al. (2009) Targeted capture and massively parallel
bor FPs would neatly complement existing meth- Identification of cis-suppression of human disease mutations by sequencing of 12 human exomes. Nature 461(7261):272–276.
ods, which are optimized to detect (and select) comparative genomics. Nature 524(7564):225–229. 19 Belkadi A, et al. (2015) Whole-genome sequencing is more
8 Andrews TD, et al. (2012) Massively parallel sequencing of the mouse powerful than whole-exome sequencing for detecting exome
genes likely to harbor true positives. Importantly, exome to accurately identify rare, induced mutations: An immediate variants. Proc Natl Acad Sci USA 112(17):5473–5478.
a combination between variant-level and gene- source for thousands of new mouse models. Open Biol 2(5):120061. 20 Itan Y, et al. (2013) The human gene connectome as a map of
level approaches is synergistic. A variant pre- 9 Chakravarti A, Clark AG, Mootha VK (2013) Distilling short cuts for morbid allele discovery. Proc Natl Acad Sci USA
pathophysiology from complex disease genetics. Cell 155(1):21–26. 110(14):5558–5563.
dicted to be damaging in a poorly polymorphic 10 Casanova JL, Conley ME, Seligman SJ, Abel L, Notarangelo LD 21 Petrovski S, Wang Q, Heinzen EL, Allen AS, Goldstein DB
protein is likely to have the strongest phenotypic (2014) Guidelines for genetic studies in single patients: Lessons from (2013) Genic intolerance to functional variation and
primary immunodeficiencies. J Exp Med 211(11):2137–2149. the interpretation of personal genomes. PLoS Genet 9(8):
impact, whereas a variant predicted to be benign
11 Findlay GM, Boyle EA, Hause RJ, Klein JC, Shendure J (2014) e1003709.
in a protein crippled with variations is unlikely to Saturation editing of genomic regions by multiplex homology- 22 Samocha KE, et al. (2014) A framework for the interpretation
have a phenotypic impact (21). directed repair. Nature 513(7516):120–123. of de novo mutation in human disease. Nat Genet 46(9):
12 Starita LM, et al. (2015) Massively parallel functional analysis of 944–950.
During the in silico search for candidate
BRCA1 RING domain variants. Genetics 200(2):413–422. 23 Bolze A, et al. (2013) Ribosomal protein SA haploinsufficiency in
variations in single patients, small series of 13 Stenson PD, et al. (2014) The Human Gene Mutation Database: humans with isolated congenital asplenia. Science 340(6135):
patients, or larger populations, both variant-level Building a comprehensive mutation repository for clinical and 976–978.

Itan and Casanova PNAS | September 15, 2015 | vol. 112 | no. 37 | 11427

Вам также может понравиться