Nature Reviews Genetics - December 2013 PDF

RESEARCH HIGHLIGHTS
Nature Reviews Genetics | AOP, published online 22 October 2013; doi:10.1038/nrg3616
SMALL RNAS
Antiviral RNAi in mammals

The cleavage of viral double- of Dicer1 or of all four Argonaute Although no VSR is known for
stranded RNAs (dsRNAs) by (AGO) genes — is generally incom- EMCV, Maillard et al. and, in a
viral host RNA interference (RNAi) patible with cell and organism separate study, Li et al. studied
machinery has been shown to be an viability. host responses to Nodamura virus
accumulation
important antiviral mechanism in Maillard et al. took advantage (NoV); this virus encodes the
was restored various species, including in plants of mouse embryonic stem cells dsRNA-binding B2 protein, which
in mESCs and insects. However, in mammals, (mESCs), which can survive these functions as a VSR. Similarly to
lacking all four it was unclear whether RNAi had a manipulations. The authors infected EMCV, NoV has a ssRNA genome
similar role or whether alternative wild-type mESCs with encepha- and produces long dsRNA during
AGO proteins antiviral mechanisms, such as the lomyocarditis virus (EMCV); this its life cycle.
interferon response, predominate. virus has a single-stranded RNA Both groups used wild-type
Two new studies now provide (ssRNA) genome but produces long and B2-deficient NoV strains to
strong support for an antiviral role dsRNAs as part of its infection cycle. determine whether the absence of
for RNAi in mammals. High-throughput RNA sequencing the VSR protein could unmask an
One challenge of characterizing revealed that such infection resulted active role for RNAi in controlling
the roles for RNAi in mammalian in the production of virus-derived viral infections. Maillard et al.
cells is that the loss of key com- 21–23-nucleotide small interfering infected mESCs in vitro, whereas
ponents of the RNAi machinery RNAs (siRNAs), which are Li et al. infected hamster fibroblasts
— such as the deletion characteristic of the cleavage of in vitro and suckling mice in vivo.
long dsRNAs by mammalian Both groups observed that only
DICER1. Indeed, Dicer1–/– mESCs in the absence of the VSR were
failed to produce these siRNAs. virus-derived siRNAs produced and
Furthermore, biochemical infections effectively controlled.
analyses showed that these viral- Further supporting a role for RNAi
derived siRNAs became bound in this unmasked antiviral response,
by AGO2. Overall, these find- Li et al. found no evidence of alter-
ings provide support that viral native RNAi-independent antiviral
RNAs are processed by mam- pathways (such as the interferon
malian RNAi pathways as a response) from gene expression
putative antiviral mechanism. analyses in infected mice, and
Interestingly, Maillard et al. Maillard et al. crucially showed that
also showed that the differen- viral accumulation was restored
tiation of mESCs substantially in mESCs lacking all four AGO
decreases the processing of viral proteins.
dsRNA into siRNAs. Although It will be interesting to assess the
the reasons for this are currently relative importance of RNAi and
unclear, it suggests that mammalian other antiviral responses in different
RNAi might function as a context- contexts through different stages of
dependent antiviral mechanism mammalian development.
only in some cell types. Darren J. Burgess
A further hindrance to efforts
so far to identify the antiviral roles ORIGINAL RESEARCH PAPERS Maillard, P. V. et al.
Antiviral RNA interference in mammalian cells.
for RNAi is that viruses frequently Science 342, 235–238 (2013) | Li, Y. et al. RNA
encode viral suppressor of RNAi interference functions as an antiviral immunity
(VSR) proteins, which are likely mechanism in mammals. Science 342, 231–234
Thinkstock/NPG (2013)
to mask antiviral roles for RNAi.
NATURE REVIEWS | GENETICS VOLUME 14 | DECEMBER 2013
© 2013 Macmillan Publishers Limited. All rights reserved

RESEARCH HIGHLIGHTS
Such chimeric protein complexes

E VO L U T I O N
are postulated to be suboptimal
Proteins partner up in a owing to the divergent evolutionary

history of the proteins. However, by
vigorous relationship
engineering different combinations of
the complex components in different
genetic backgrounds, the authors
were able to show that the chimeric
The hybridization between two group of species, the authors crossed complexes were able to confer
species can result in the offspring either Saccharomyces mikatae or positive fitness effects in relevant
being fitter than the parents and can Saccharomyces uvarum with a range environmental niches. For example,
therefore have a role in the develop- of Saccharomyces cerevisiae strains some chimeric Trp2–Trp3 complexes,
ment of novel adaptive traits. Until that each carried an affinity-tagged which are involved in the first step of
recently, such hybrid vigour has been subunit in one of six protein com- tryptophan biosynthesis, were able to
attributed to the resultant allelic plexes that they had chosen to study. confer more rapid growth in medium
variation and altered transcriptional The authors used this tag to purify that lacks tryptophan relative to either
networks. However, it is known that protein complexes and identified parental complex.
hybrid species are able to form inter- constituent subunits using mass It will be interesting to see how
species chimeric protein complexes, spectrometry. commonly hybrid vigour is caused
and in a recent study, Piatkowska Four of the six complexes that by the fitness changes that are con-
et al. shows that these contribute to were studied could form chimeric ferred by chimeric protein complexes,
fitness effects. complexes (that is, complexes and how this mechanism interacts
The Saccharomyces sensu stricto containing protein subunits from with the other postulated causes of
yeasts are excellent models in both parental strains). The authors hybrid vigour.
Hannah Stower
which to study the effect of attributed the lack of formation of
hybridization — they are a diverse interspecies proteins by the other ORIGINAL RESEARCH PAPER Piatkowska, E. M.
monophyletic group that cross to two protein complexes to incompat- et al. Chimeric protein complexes in hybrid species
form stable hybrids which inhabit ible changes in protein interfaces or generate novel phenotypes. PLoS Genet. http://
Photodisc/Getty dx.doi.org/10.1371/journal.pgen.1003836 (2013)
new ecological niches. From this to stoichiometric imbalances.

RESEARCH HIGHLIGHTS
Nature Reviews Genetics | AOP, published online 7 November 2013; doi:10.1038/nrg3618
GENE EXPRESSION the regular, colinear isoforms across a

panel of differentiated cell types.
Controls and roles for trans-splicing Interestingly, the fourth trans-spliced
RNA was a novel ncRNA, termed
A few examples of trans-spliced candidate trans-spliced RNAs. They then tsRMST. This ncRNA was found to
RNAs — RNA sequences ligated applied various bioinformatic filters to be expressed at higher levels in
from separate transcripts — have remove probable false positives, such pluripotent cells (both hESCs and
been experimentally validated and as candidates that were not supported human induced pluripotent stem cells
functionally characterized in various by short-read sequencing and by both (hiPSCs)) relative to differentiated
species. Trans-spliced RNAs are cell lines, and those candidates with cells. In support of a role for tsRMST
often inferred from transcriptome sequence features that were suggestive in pluripotency, manipulation of
sequencing data sets, but dissecting of artefactual generation. These filters tsRMST levels in hESCs showed that
the genuine trans-spliced RNAs from eliminated ~99.9% of candidates; of the this ncRNA suppresses the expression
false positives resulting from technical remaining nine high-priority candidates, of differentiation-associated genes.
artefacts remains a challenge. A new four passed subsequent experimental This function is mediated in part by
study reports a pipeline for prioritizing validation. recruiting the transcription factor
candidate trans-spliced RNAs from All four validated trans-splicing events NANOG and the repressive Polycomb
transcriptome sequencing data and were intragenic; that is, they resulted protein SUZ12 to the promoters of
uses this approach to identify a role in a mis-ordered arrangement of exons these lineage-specific genes.
for a trans-spliced non-coding RNA from the same gene. Sequencing and It will be interesting to use
(ncRNA) in maintaining pluripotency in experimental data indicate that this prioritization strategies in other cell
human cells. splicing occurred through trans-splicing types to determine how widespread
Wu et al. carried out both long-read between two copies of the primary trans-spliced RNAs are, and the scope
and short-read high-throughput transcript rather than through, for of their biological functions.
transcriptome sequencing on the H9 example, intramolecular splicing to Darren J. Burgess
human embryonic stem cell (hESC) line, generate a circular mRNA.
ORIGINAL RESEARCH PAPER Wu, C. S. et al.
and used equivalent publicly available Three of the trans-splicing events
Photodisc/Getty
Integrative transcriptome sequencing identifies

sequencing data for the H1 hESC line. brought an alternative sequence trans-splicing events with important roles in
By aligning the long-read sequences upstream of a full-length coding region. human embryonic stem cell pluripotency.
Genome Res. http://www.dx.doi.org/10.1101/
to the reference human genome These events were associated with a
gr.159483.113 (2013)
sequence, they identified 8,822 different expression pattern relative to

RESEARCH HIGHLIGHTS
Nature Reviews Genetics | AOP, published online 12 November 2013
IN BRIEF
G E N E R E G U L AT I O N
Key role for translation in cell cycle control
Stumpf and colleagues used ribosome profiling — deep
sequencing of ribosome-protected mRNA fragments — to
investigate the role of translational regulation of gene
expression in the mammalian cell cycle. Working with human
cells, they found that there is pervasive translational regulation
of the expression of genes that have important roles in the cell
cycle. In addition, functionally related genes, including those
involved in metabolism, DNA repair and nuclear transport, were
found to be translationally co-regulated, which indicates that
translation mediates coordinated gene expression.
ORIGINAL RESEARCH PAPER Stumpf, C. R. et al. The translational landscape of the
mammalian cell cycle. Mol. Cell http://dx.doi.org/10.1016/j.molcel.2013.09.018 (2013)
SYNTHETIC BIOLOGY
Recoding bacterial genomes
In two new studies, Lajoie and colleagues recode and
expand the genetic code of the Escherichia coli genome by
incorporating non-standard amino acids (NSAAs). This allows the
production of novel proteins, which has potential applications
in areas such as biosafety, agriculture and medicine. In their
first study, Lajoie et al. replaced the UAG stop codon with the
synonymous UAA codon. They subsequently deleted the gene
that encodes release factor 1, which mediates translational
termination at UAG. This allowed them to re-introduce UAG
and re-assign its function from that of a stop codon to one that
incorporates chosen NSAAs. The resulting genetically recoded
organism had increased resistance to bacteriophage T7 and was
able to efficiently incorporate NSAAs. In their second study, the
authors expanded on this work by re-assigning 13 rare codons
in each of 42 highly expressed essential genes in 80 E. coli
strains. Although this recoding was successful, most strains with
recoded genes showed reduced fitness, which indicates that
combining several recoded genes into one genome may not be
feasible. Interestingly, they occasionally found that replacement
of synonymous codons, such as that of CUU with UUG, did not
produce the same effects as the native codon. Together, these
studies show that recoding the bacterial genome is feasible
and provide useful information for future genome-wide codon
re-assignment designs.
ORIGINAL RESEARCH PAPERS Lajoie, M. J. et al. Genomically recoded organisms expand
biological functions. Science 342, 357–360 (2013) | Lajoie, M. J. et al. Probing the limits of
genetic recoding in essential genes. Science 342, 361–363 (2013)
M O L E C U L A R E VO L U T I O N
Bypassing indels
Insertions and deletions (indels) are prone to being formed at
short sequence repeats owing to the misalignment of the DNA
strands during replication, and a similar mechanism allows them
to be bypassed in both transcription and translation. A recent
study has investigated how well indels are tolerated in evolution
as a result of transcriptional and translational bypassing. To
identify how frequently these mutations occur, the authors
mutated the Haemophilus aegyptius M.HaeIII gene in vitro using
an error-prone polymerase followed by sequencing. They then
expressed these mutated genes in Escherichia coli to ascertain the
effect of natural selection. They found that the longer the repeats
in M.HaeIII, the more frequently the mutations were formed and
maintained in the E. coli population and were hence tolerated.
ORIGINAL RESEARCH PAPER Rockah-Shmuel, L. et al. Correlated occurrence and
bypass of frame-shifting insertion-deletions (indels) to give functional proteins. PLoS Genet.
9, e1003882 (2013)

RESEARCH HIGHLIGHTS
Photodisc/Getty
DEVELOPMENT
Seeing the pattern

Two studies provide new insights into the spatiotemporal patterns of
transcriptional regulation of gene expression in Drosophila melanogaster by
in vivo imaging imaging RNA in live embryos. The in vivo method used by these studies allows
dynamic transcriptional events to be followed at resolutions that are not
of gene
possible using static methods.
regulation can To fluorescently label RNA in vivo both studies used the MS2–MCP (MS2
provide new coat protein) system, which is based on the RNA-binding MCP from the
insights into bacteriophage MS2 and its target — an RNA stem–loop repeat. The system
comprises two components: a transgene that consists of a reporter gene driven
the basis of
by a promoter of interest and that contains MS2 stem–loop repeats, and a
developmental fusion protein consisting of MCP and green fluorescent protein (MCP–GFP).
processes When the transgene is transcribed, MCP–GFP can bind to the stem–loops, thus
allowing mRNA to be visualized.
Both studies investigated the Bicoid (Bcd) transcription factor and one of its
downstream targets, the hunchback (hb) gene. Bcd functions at the beginning
of a cascade that specifies anterior patterning in D. melanogaster embryos,
which can be visualized by examining the expression of hb. However, the
specific transcriptional events that underlie patterning by Bcd have been
difficult to study using fixed cell methods, such as RNA-fluorescence in situ
hybridization (RNA-FISH).
Garcia and colleagues investigated the expression of Bcd-activated hb to
determine whether boundaries of gene expression are determined by the rate
of mRNA production in each cell. Specifically, they determined the absolute
number of actively transcribing polymerases in individual nuclei by measuring
the fluorescence signal and connecting it to single mRNA molecule counts
using RNA-FISH. They found that the formation of patterning boundaries
cannot be solely explained by the rate of RNA production in each nucleus.
Moreover, nuclei adopt either active or inactive transcriptional states, and it is
the combination of these two effects that leads to the formation of a pattern.
Lucas and colleagues looked at how the length of transcriptional activity
periods is important in establishing the hb boundary. They defined activity periods
by several parameters, including the intensity of the fluorescence signal from
MCP–GFP and the time that this signal persists. They propose that the
establishment of the hb boundary involves at least three processes: Bcd causes
strong and persistent hb expression in the anterior half of the embryo by
lengthening transcriptional activity periods; repression progressively
overcomes sporadic transcriptional activity in the posterior half of the embryo;
and finally, comparison with endogenous expression suggests that posterior
expression is also silenced very early on.
These studies show how in vivo imaging of gene regulation can provide new
insights into the basis of developmental processes.
Isabel Lokody
ORIGINAL RESEARCH PAPERS Garcia, H. G. et al. Quantitative imaging of transcription in living Drosophila
embryos links polymerase activity to patterning. Curr. Biol. http://dx.doi.org/10.1016/j.cub.2013.08.054 (2013)
| Lucas, T. et al. Live imaging of Bicoid-dependent transcription in Drosophila embryos. Curr. Biol. http://dx.doi.
org/10.1016/j.cub.2013.08.053 (2013)

RESEARCH HIGHLIGHTS
Thinkstock
CLINICAL GENETICS
Exomes in
the clinic
A new study provides the largest
survey so far of the use of exome
sequencing for clinical diagnosis.
Yang et al. report results for the
first 250 individuals to undergo
clinical exome sequencing at the
Baylor College of Medicine, Houston,
Texas, USA, following referral by a
physician. Most of the patients (222)
were under 18 years of age, and most
had undiagnosed disorders that
involved neurological symptoms. associations and improvements in the patients, which underlines the
In 25% of cases, a positive detection of copy-number variants importance of considering when and
diagnosis was made following exome will increase the success rate of how such findings are relayed to
sequencing — a proportion that is exome sequencing. However, they patients and their physicians.
consistent with rates from smaller also note important limitations to the Altogether, the findings reported
studies. This rate is higher than those approach, for example, the possibility in this study provide a useful basis for
for other types of genetic tests. that many causal variants may lie in considering how exome sequencing
Notably, before exome sequencing was non-coding regions of the genome. should be most effectively applied as
ordered for the patients in this study, A much-discussed aspect of its use becomes more widespread.
extensive efforts at diagnosis had been clinical exome sequencing is the
carried out, and in some cases this finding of medically actionable Louisa Flintoft
had taken longer and had cost more variants other than those that are
ORIGINAL RESEARCH PAPER Yang, Y. et al.
than whole-exome sequencing. directly related to the phenotype Clinical whole-exome sequencing for the
The authors suggest that the under investigation. Such incidental diagnosis of Mendelian disorders. N. Engl. J. Med.
http://dx.doi.org/10.1056/NEJMoa1306555 (2013)
identification of new disease–gene findings were made for 30 of the 250

RESEARCH HIGHLIGHTS
Nature Reviews Genetics | AOP, published online 12 November 2013; doi:10.1038/nrg3621
human iPSCs, and genome sequence

H U M A N E VO L U T I O N
analyses revealed more chimpanzee-
Reprogrammed cells dissect

specific than human-specific LINE-1
insertions. Cumulatively, the data
indicate a higher activity and mobil-
ape retrotransposition ity of LINE-1 elements in NHPs

relative to humans.
It will be interesting to determine
Our ability to address the important APOBEC3B were upregulated in the extent to which differential activ-
question of how differences between human iPSCs relative to NHP iPSCs. ity of LINE-1 or other transposable
both humans and other primates have Both of these genes have known links elements contributes to germline
evolved is limited by the study systems to silencing LINE-1 elements in mam- mutation rates, inter-individual vari-
APOBEC3B
that are available for doing so. A malian germlines to limit retrotrans- ability and adaptive potential among
and PIWIL2 new study uses induced pluripotent position. By transfecting a plasmid primates, and to explore how iPSCs
expression stem cells (iPSCs) of primates to reporter of LINE-1 activity into both from various species can facilitate
levels are identify differential regulation of long human and NHP iPSCs, the authors future biological investigations.
interspersed element 1 (LINE-1) indeed identified greater LINE-1
causally retrotransposons between humans activity in NHP iPSCs. Furthermore,
Darren J. Burgess
linked to this and non-human primates (NHPs). by manipulating the expression of ORIGINAL RESEARCH PAPER Marchetto,
differential The characterization of biological PIWIL2 and APOBEC3B (through M. C. N. et al. Differential L1 regulation in
pluripotent stem cells of humans and apes. Nature
LINE-1 activity processes in primates is hindered by overexpression or knockdown) in http://dx.doi.org/10.1038/nature12686 (2013)
the limited availability of primary these iPSCs, they showed that both FURTHER READING Prüfer, K. et al. The bonobo
genome compared with the chimpanzee and
cells and tissues, particularly those APOBEC3B and PIWIL2 expression human genomes. Nature 486, 527–531 (2012)
of embryonic origin. To circumvent levels are causally linked to this dif-
this limitation Marchetto et al. ferential LINE-1 activity between
used pluripotency factor genes to species. Although the mechanism
reprogramme fibroblasts from two by which APOBEC3B limits LINE-1
chimpanzees and two bonobos into activity remains unclear, LINE-1-
iPSCs. Comparisons with human complementary PIWI-interacting
embryonic stem cells and equivalently RNAs (piRNAs) were found in
derived human iPSCs revealed similar human iPSCs, which is consistent
gene expression profiles and in vitro with a piRNA-mediated role for
differentiation properties across all PIWIL2 in controlling LINE-1 activity
these pluripotent cell lines. in human iPSCs.
Among the few differentially Finally, the authors examined
expressed genes, PIWI-like RNA- endogenous LINE-1 activity —
Photodisc
mediated gene silencing 2 (PIWIL2) higher levels of LINE-1 transcripts
and the cytidine deaminase gene were found in NHP iPSCs than in

RESEARCH HIGHLIGHTS
transcription factor binding and Heinz et al. also used natural genetic
P. Morgan/NPG
actively transcribed sites as defined variation as their ‘in vivo mutagenesis
by the locations of RNA polymerase II. screen’ — they looked at differences
Kilpinen et al. analysed these features in transcription factor binding and in
in lymphoblastoid cell lines (LCLs) histone modifications between the
from parent–offspring trios and from C57BL/6J and BALB/cJ strains of
eight unrelated individuals in the 1,000 mice. They investigated the binding
Genomes Project, whereas McVicker of the proposed lineage-defining
et al. surveyed LCLs from ten unrelated transcription factors (LDTFs) PU.1 and
Yoruba individuals. Both studies CCAAT/enhancer-binding protein-α,
found allelic specificity of histone as well as the signalling-responsive
modifications, which, importantly, was transcription factor NF-κB. They found
G E N E R E G U L AT I O N
coordinated with transcription factor that the binding of the proposed LDTFs
From genetic variation to binding. This finding indicates that

sequence-specific transcription factors
is dependent on genetic variation,
as are histone modifications, which
phenotype via chromatin may specify histone modifications, and

that there is thus a sequence-specific
suggests that these LDTFs determine
histone modification patterns.
component to the deposition of histone Furthermore, the LDTFs seem to
There is known to be some correlation modifications. This allelic specificity recruit each other, and they also seem
among genetic variation, chromatin leads, in turn, to changes in both to determine the binding of NF-κB in
modifications, transcription factor enhancer choice and gene expression response to signals.
binding, gene expression and between different haplotypes, even in The overarching theme of these
phenotypes. However, the jury is still distal enhancers. papers is that genetic variation may
out on the strength of the links between The approach of Kasowski et al. was determine sites of transcription
these different organismal features and to map histone modifications by factor binding, which, in turn, specify
whether alterations in one directly cause ChIP–seq and gene expression histone modifications and enhancer
alterations in the other. Four studies by RNA sequencing in 19 individuals choice; however, some combinational
have now taken advantage of recently from the 1,000 Genomes Project. In alterations are required for gene
developed genome-wide approaches addition, the locations of two general expression changes.
and resources in both humans and transcription factors were mapped. By Hannah Stower
mice to establish that genetic variation dividing the genome into regions with
ORIGINAL RESEARCH PAPERS Heinz, S. et al.
strongly determines sites of transcription different combinations of chromatin Effect of natural genetic variation on enhancer
factor binding. This binding, in turn, modifications, they found that there selection and function. Nature http://dx.doi.
results in altered histone modifications are extensive differences in enhancer org/10.1038/nature12615 (2013) | Kasowski, M.
et al. Extensive variation in chromatin states
and enhancer choice, which lead to states between individuals, as well as
across humans. Science http://dx.doi.
changes in both gene expression and the in transcription factor-binding sites org/10.1126/science.1242510 (2013) | Kilpinen, H.
resultant phenotypes. that are associated with different et al. Coordinated effects of sequence variation
on DNA binding, chromatin structure, and
McVicker et al. and Kilpinen et al. took chromatin states. However, they found
transcription. Science http://dx.doi.org/10.1126/
a similar approach, which was to carry that, to alter gene expression, multiple science.1242463 (2013) | McVicker, G. et al.
out chromatin immunoprecipitation enhancers that are linked to a gene Identification of genetic variants that affect
followed by sequencing (ChIP–seq) must have alterations in their histone histone modifications in human cells. Science
http://dx.doi.org/10.1126/science.1242429 (2013)
to survey histone modifications, modification states.

RESEARCH HIGHLIGHTS
Nature Reviews Genetics | AOP, published online 12 November 2013
IN BRIEF
TECHNOLOGY
Amplifying single-cell cDNA without bias
Single-cell transcriptomics provides insights into the importance
of stochastic transcription and facilitates more complete
transcriptome characterization. Many current methods require
amplification of cDNA for the analysis of low copy-number
transcripts, which introduces bias. These authors optimize
the amplification of cDNA libraries that are immobilized on
beads. Current protocols include a step that digests primers
after cDNA amplification to avoid competition with the
cDNA for amplification. However, this step also degrades low
copy-number cDNA, which results in amplification bias. The new
protocol removes this step and washes the beads instead, which
allows amplification of low copy-number transcripts at high
efficiency with less bias.
ORIGINAL RESEARCH PAPER Huang, H. et al. Non-biased and efficient global amplification
of a single-cell cDNA library. Nucleic Acids Res. http://dx.doi.org/10.1093/nar/gkt965 (2013)
R E G E N E R AT I O N
Key role for polyploidization in wound healing
The Drosophila melanogaster abdominal epithelium provides
a good model for understanding how postmitotic diploid cells
contribute to repair upon tissue damage. Losick et al. show that,
after puncture wounds are made in this system, DNA replication
without cell division is induced in cells near the wound site, which
results in polyploidy. In addition, cells surrounding the wound
fuse to form multinucleate cells. The authors postulate that
polyploidization is required to restore the tissue mass that is lost
upon injury and that cell fusion speeds up re-epithelialization.
ORIGINAL RESEARCH PAPER Losick, V. P. et al. Polyploidization and cell fusion contribute
to wound healing in the adult Drosophila epithelium. Curr. Biol. http://dx.doi.org/10.1016/
j.cub.2013.09.029 (2013)
DEVELOPMENT
Enhancers ‘fine-tune’ face and skull shape
These authors identified >4,000 candidate enhancers that
are predicted to function in mouse craniofacial development.
For three of these candidates they showed that deletions
alter the expression of nearby genes that have known roles in
craniofacial development. Using micro-computed tomography
— a high-resolution three-dimensional imaging method — they
accurately measured the skulls of mice that carried these
mutations and found subtle but significant effects. These
findings have implications for understanding the genetic bases
of both normal and abnormal craniofacial morphology.
ORIGINAL RESEARCH PAPER Attanasio, C. et al. Fine tuning of craniofacial morphology
by distant-acting enhancers. Science 342, 1241006 (2013)
CANCER GENOMICS
Explaining aneuploidy patterns
Aneuploidy — the presence of an abnormal number of
chromosomes — is a common feature of cancer cells, but it
is unclear how specific patterns of aneuploidy arise. These
authors developed a computational method for identifying
candidate tumour suppressors and oncogenes on the basis of
mutation patterns in tumour samples. They found evidence that
there are many cancer-driving genes for which a continuum of
oncogenic potential exists, and they propose that the specific
combinations of these genes on chromosomes explain the
patterns of aneuploidy that arise in cancer.
ORIGINAL RESEARCH PAPER Davoli, T. et al. Cumulative haploinsufficiency and
triplosensitivity drive aneuploidy patterns and shape the cancer genome. Cell http://
dx.doi.org/10.1016/j.cell.2013.10.011 (2013)

REVIEWS
Genome dynamics during
experimental evolution
Jeffrey E. Barrick1,2 and Richard E. Lenski2,3
Abstract | Evolutionary changes in organismal traits may occur either gradually or
suddenly. However, until recently, there has been little direct information about how
phenotypic changes are related to the rate and the nature of the underlying genotypic
changes. Technological advances that facilitate whole-genome and whole-population
sequencing, coupled with experiments that ‘watch’ evolution in action, have brought new
precision to and insights into studies of mutation rates and genome evolution. In this
Review, we discuss the evolutionary forces and ecological processes that govern genome
dynamics in various laboratory systems in the context of relevant population genetic
theory, and we relate these findings to evolution in natural populations.
Mutation accumulation
Evolutionary and ecological questions that could previ‑ importance of adaptive and non-adaptive processes and
A type of evolution experiment ously be approached using only comparative or theo‑ whether the predominant tempo of evolutionary change
in which populations are retical methods are increasingly amenable to direct is gradual or episodic.
deliberately forced through a study. In experimental evolution studies, populations of Recent advances in DNA sequencing technologies
bottleneck of one or a few
organisms are maintained in controlled environments have now made it possible to identify genetic changes
breeding individuals, which
allows non-lethal mutations to
in which changes in both genotype and phenotype can between ancestral and derived organisms on a whole-
accumulate with little or no be monitored over timescales spanning many tens, genome scale for any species4,5. We begin this Review
filtering by natural selection. hundreds or even thousands of generations1,2. Bringing by examining some of the previously hidden details
evolution into the laboratory has several advantages, that whole-genome and whole-population sequenc‑
including both the ability to generate a ‘fossil’ record ing are revealing about evolution in even the simplest
for later study and the ability to test the predictability of laboratory scenarios. We then discuss genetic dynam‑
evolution across replicate populations. Studies of micro‑ ics in experiments that add back various components of
organisms also benefit from rapid generation times and the complexity that is present in the natural world. We
the viability of frozen organisms, which can be revived primarily focus on asexual microbial systems in which
either to allow an ancestor to compete head‑to‑head most studies using extensive genome sequencing have
1
Department of Molecular
against its own descendants, or to ‘replay’ evolution that been carried out so far. We also discuss multicellular
Biosciences, Institute for
Cellular and Molecular starts from various past states to investigate whether a eukaryotes and experiments in which sexual recombi‑
Biology, The University of particular outcome was contingent on some prior event. nation has a role, as genomic data from these systems are
Texas, Austin, Texas 78712, How many, and what types of, genetic changes accu‑ increasingly becoming available.
USA. mulate in evolving populations over time? The field of
2
BEACON Center for the
Study of Evolution in Action,
population genetics has developed a sophisticated math‑ Mutation rates
Michigan State University, ematical framework for describing rates of evolutionary Most experimental evolution studies begin with clonal
East Lansing, Michigan change in terms of the fundamental processes of muta‑ or inbred populations of a model organism so that there
48824, USA. tion, recombination, genetic drift and natural selection3. is a homogenous and well-characterized genetic start‑
3
Department of Microbiology
This theory guides a general understanding of evolution‑ ing point. Therefore, knowing the rates at which new
and Molecular Genetics,
Michigan State University, ary regimes and dynamics, but specific outcomes in any mutations arise and lead to both genetic and phenotypic
East Lansing, Michigan given biological system may also crucially depend on diversity in a population is useful for understanding evo‑
48824, USA. the molecular details of a particular genome and on how lutionary dynamics. Mutation accumulation experiments
e‑mails: jbarrick@cm.utexas. it encodes metabolic, regulatory and developmental allow one to estimate the intrinsic rates and the effects of
edu; lenski@msu.edu
doi:10.1038/nrg3564
pathways. Both perspectives are necessary for unravel‑ new mutations by repeatedly imposing population bottle-
Published online ling contentious issues in evolutionary biology that are necks of one or a few randomly chosen breeding individ‑
29 October 2013 related to rates of sequence evolution, such as the relative uals to minimize selection that would otherwise favour
NATURE REVIEWS | GENETICS VOLUME 14 | DECEMBER 2013 | 827

REVIEWS
Mutation accumulation Adaptive evolution
a Single-cell bottlenecks b Continuous culture c Serial transfer
∆t
∆t
∆t
∆t
∆t
1010 1010 1010

∆t ∆t
Population size
Population size
Population size
108 108 108
106 106 106
104 104 104
102 102 102
1 1 1
Time (t) Time (t) Time (t)
Figure 1 | Types of evolution experiments. There are three main ways that populations are propagated in evolution
experiments, and they all lead to different types of genetic dynamics. The mechanics of how populations are maintained
Nature Reviews | Genetics
in each set-up are illustrated for microorganisms (top panels), and representative changes in population sizes over time
are also shown for each procedure (bottom panels). Analogous procedures exist for multicellular organisms, although
population sizes are generally much smaller. a | In mutation accumulation experiments, frequent and deliberate population
bottlenecks through one or a few randomly chosen breeding individuals are accomplished by picking colonies of
microorganisms that grow from single cells on agar plates. These bottlenecks purge genetic diversity and lead to the
fixation of arbitrary mutations without respect to their effects on fitness. b | In experiments using continuous culture,
populations are maintained in conditions that consist of a constant inflow of nutrients and an outflow of random
individuals and waste in a chemostat, which leads to adaptive evolution and genetic diversity in populations that
typically maintain a nearly constant size. c | In serial transfer experiments, a proportion of the population is periodically
transferred to fresh media and allowed to regrow until the limiting nutrient is exhausted. Such batch growth also leads
to adaptive evolution because ample genetic diversity is maintained through each transfer. Alternatively, transfers can
be made before nutrient depletion, thereby allowing perpetual population growth. A second, cryptic type of population
bottleneck occurs during adaptive evolution experiments (parts b and c) as a consequence of selective sweeps,
especially in asexual populations, that drive out competing lineages and thereby reduce genetic diversity.
some variants6 (FIG. 1a). Under these specific conditions, which is in reasonable agreement with earlier esti‑
one can simply count the number of genetic changes that mates for DNA-based microorganisms from reporter-
are present in independently evolved genomes after a gene assays14. Rates of point mutations in multicellular
known number of generations to estimate the sponta‑ eukaryotes8–10 are of the order of 0.05–1.0 per genera‑
neous mutation rate (BOX 1). Recently, classic long-term tion across the entire protein-coding portions of these
Population bottlenecks
mutation accumulation studies with model organisms genomes13,15, which is still fairly low given the much
Reductions in population size
that typically also reduce — including Saccharomyces cerevisiae 7, Arabidopsis longer generation times and the multiple cell divisions
genetic diversity. Bottlenecks thaliana8, Drosophila melanogaster 9 and Caenorhabditis in the germ line between generations in these organisms.
can be deliberately imposed, elegans 10 — have been revisited using whole-genome Some types of mutations, such as insertions and dele‑
such as in a mutation sequencing to measure mutation rates. New mutation tions of one or a few bases, typically occur at a lower rate
accumulation experiment.
Cryptic bottlenecks also arise
accumulation studies of microorganisms have also than single base changes but vary more between species
as a consequence of selective been carried out with the specific aim of estimating and with sequence context 7. Other types of mutations,
sweeps, especially in asexual mutation rates11–13. such as insertions of mobile DNA elements and large-
populations, that drive out The overarching conclusion of these experiments is scale chromosomal rearrangements, are more difficult
competing lineages and thus
that spontaneous mutation rates are usually very low. to identify from short-read DNA sequencing data and
reduce genetic diversity.
Mutation accumulation experiments with bacteria11–13 have not yet been systematically examined in mutation
Mutation rate and single-celled eukaryotes7,13 typically find that the rate accumulation experiments.
The rate at which new genetic of single base mutations is of the order of 10−10–10−9 per Mutation rates can change over evolutionary time,
mutations spontaneously occur base pair per replication. Given that the typical genome so it is instructive to understand how both genetic and
during the replication and
transmission of genetic
sizes in these organisms are of the order of 106–107 base environmental factors affect these rates. In particular,
information from parent to pairs, these rates correspond to only one point mutation hypermutator lineages that have increased mutation
offspring. in every few hundred to several thousand cell divisions, rates and highly biased mutational spectra may arise
828 | DECEMBER 2013 | VOLUME 14 www.nature.com/reviews/genetics

REVIEWS
Substitution rate Box 1 | Mutation rates versus substitution rates

The rate at which new
mutations accumulate in an Relative
Relative rate
evolving lineage over time,
frequency
which typically depends on
both the mutation rate and the a b c Mutation fitness effect
effects of natural selection. Highly beneficial
Biological fitness Slightly beneficial

A quantitative measure of Neutral
the contribution of a specific
organism or genotype to
Slightly deleterious
future generations owing Highly deleterious
to differential survival,
Lethal
reproduction or both, that is
associated with its phenotype; Mutation accumulation Adaptive evolution
fitness is often expressed
relative to other organisms Mutation rate Substitution rate
or genotypes.
It is important to distinguish between the rate at which spontaneous mutations occur and the rate at which
Nature genetic
Reviews | Genetics
changes accumulate in a surviving lineage. The mutation rate reflects the probability of a change in genome sequence
between a parent and its offspring. It is the compound result of unrepaired DNA damage, polymerase errors, intragenomic
recombination events, movements of transposable elements and other molecular processes that introduce errors during
the transmission of genetic information. However, only those mutations in lineages that persist — typically in the face of
selection — contribute to the substitution rate that is measured by whole-genome sequencing. The failure to carefully
distinguish between these two types of rates is a persistent cause of confusion and misconceptions about whether
mutations are random. In the same vein, the frequency of a mutant allele in a population generally does not equal the rate
at which the corresponding mutational event occurs.
Mutations can be broadly categorized as beneficial, neutral, deleterious or lethal with respect to their effects on
biological fitness88. In various organisms, many or most new mutations are thought to be neutral or nearly neutral, and
deleterious mutations greatly outnumber beneficial mutations under most circumstances89 (see the figure, part a). Some
mutations may change the magnitude or even the sign of the fitness effects of other mutations — a phenomenon known
as epistasis90. Nevertheless, each new mutation in an evolving lineage can be classified into one of these categories
depending on its fitness effect at the time and in the genetic context in which it appears. In this Review, we discuss two
kinds of evolution experiments in which these different categories of mutations make different contributions to the
substitution rate.
In mutation accumulation experiments, populations are continually forced through a bottleneck of one or a few
breeding individuals, so the probability that any given mutation survives is essentially random and independent of its
fitness effect. Thus, all mutations, except lethal or extremely deleterious ones, accumulate at rates that are close to
their underlying mutation rates (shown as a dashed arc) in surviving lineages (see the figure, part b). The overall number
of mutations in these highly unfavourable categories is usually thought to be small, and it is therefore common to
equate substitution rates with mutation rates in mutation accumulation experiments, although this will slightly
underestimate the true mutation rate. The ultimate mutation accumulation experiment is to sequence large numbers of
parents and their offspring to avoid changes in environmental or genetic factors that might affect mutation rates during
a longer experiment91.
By contrast, in adaptive evolution experiments, beneficial mutations typically drive the genetic dynamics. The
substitution rate of beneficial mutations exceeds the actual mutation rate (shown as a dashed arc) for this category
because lineages with these rare mutations increase in frequency as they outcompete their ancestors and lineages with
other mutations (see the figure, part c). Conversely, deleterious mutations are under-represented in adaptive evolution
experiments because they are usually purged by selection, although slightly deleterious mutations can sometimes
hitchhike with beneficial ones. The nature of competition between genetically diverged lineages will affect the extent of
mutational diversity in a population, but the expected rate of accumulation of neutral mutations in any surviving lineage
will still equal the underlying mutation rate for this category.
when mutations cause a loss of normal DNA repair or Chemical mutagens are the main environmental fac‑
proofreading activities16. Mutation accumulation line‑ tor that has so far been examined with mutation accu‑
ages that were derived from a Salmonella enterica subsp. mulation experiments. At the extreme level of mutation
enterica serovar Typhimurium hypermutator strain had that can be achieved in experimental evolution studies
a 30‑fold increase in point mutation rates compared with of mutagenized bacteriophage17,18 — for example, mutat‑
a wild-type strain, and 91% of the resulting base substi‑ ing ~1% of the bases in the bacteriophage T7 genome17
tutions were G:C→T:A transversions — a bias that is con‑ — one can begin to ask questions about what sites in
sistent with the misincorporation of oxidized guanine a genome must be unchanged to remain viable. RNA
bases during DNA replication11. Another mutation accu‑ viruses, such as HIV, typically have high mutation
mulation study found that an Escherichia coli strain that rates of the order of one per genome per generation19,
is defective in mismatch repair had a 138‑fold increase and experiments using mutagens have sought to test
in mutation rate compared with wild type and had 70% whether therapies that increase that rate further could
A:T→G:C transitions12. lead to a mutational meltdown and the extinction of viral

REVIEWS
Box 2 | Adaptive evolution: optimization versus innovation

populations in infected individuals20. More generally, we
anticipate that the sequencing of mutation accumulation
When new beneficial mutations optimize the overall performance of existing genetic, lines will be used to enable a more precise version of
metabolic and developmental networks during an adaptive evolution experiment, the the Ames test 21 to define the mutagenicity of potential
fitness of organisms tends to gradually improve over time. In some experimental carcinogens in the near future.
systems, interactions between the fitness effects of these mutations typically show
It is important to caution that laboratory mutation
diminishing-returns epistasis92–94; that is, each beneficial mutation increases fitness to a
lesser extent in the presence of the other beneficial mutations than it would if it
accumulation experiments may not precisely match the
appeared alone in the ancestral genetic background (see the figure, part a). Typical mutation rates or spectra that are found under natural
beneficial mutations in the optimization regime modify gene expression levels, adjust conditions, in which organisms are often nutritionally
regulatory interactions or alter metabolic fluxes. In some cases, these adjustments deprived or otherwise stressed. For example, bacteria in
may be beneficial simply because they reduce the expression, and hence the the human gastrointestinal tract are thought to achieve
energetic costs, of unused functions. The early beneficial mutations of largest effect only approximately one generation per day in this
are often in global control ‘hubs’ of networks, whereas later mutations often target complex mixture of nutrients and biotic interactions22,
the ‘spokes’ of specific pathways36,85,90,95. As these networks and pathways become whereas bacteria in the soil probably undergo pro‑
more finely ‘tuned’ to a particular environment, it becomes more difficult to improve longed starvation and far fewer generations on average.
the overall system performance in a single mutational step. This form of gradual
Similarly, plants may be subject to extreme temperatures,
evolution is often associated with microevolutionary change.
More sudden and dramatic changes also sometimes occur during evolution
damage from predation and increased ultraviolet radia‑
experiments, particularly when beneficial mutations produce innovations that allow tion exposure in nature8. These and other environmental
the organism to occupy a new ecological niche28 (see the figure, part b). This form of and genetic factors will probably be examined in future
adaptive evolution may involve, for example, the generation of a new connection mutation accumulation experiments.
or activity in a cellular network. Innovations may arise from mutations that show
all‑or‑none epistasis72; that is, several mutations may first occur that, on their own, Adaptive evolution
have little or no effect on the trait, but this evolved genetic background is required for In this section, we consider evolutionary dynamics in
some ‘keystone’ mutation to produce the phenotypic novelty. Models of RNA folding experiments in which differential survival and repro‑
and regulatory circuits have been used to investigate the abstract properties of these duction lead to the preferential accumulation of genetic
so‑called ‘neutral networks’ and how they can promote the evolution of novelty96. This
variants that are better adapted to their environment.
form of evolution, in which new phenotypes appear suddenly, is sometimes associated
with macroevolutionary change, and it can give rise to new opportunities for further
The simplest adaptive evolution experiments maintain
adaptation and diversification97. populations that are derived from a single ancestral
Of course, multiple processes may be intertwined and their timescales may overlap. genotype in a uniform environment, such that selection
For example, some innovations may only be possible after a period of random pressures either remain constant or fluctuate in a con‑
exploration by mutation and genetic drift, as assumed in models of neutral networks. trolled way. This situation can be achieved in continu‑
Alternatively, an innovation may have been enabled by earlier beneficial mutations ous culture in which both the replenishment of resources
that arose during an epoch of optimization for a different function, which is a kind of and the removal of individuals happen at a constant rate
adaptive pre-adaptation. In this case, the large fitness gain obtained by co‑opting (FIG. 1b) or by periodic serial transfer of a proportion of
these mutations for the innovation may dwarf the fitness gains during the earlier the population to a new microcosm with fresh resources
period of optimization. However, note that even during optimization, mutations of
(FIG. 1c). In adaptive evolution experiments, selection for
large effect can give a step-like appearance to fitness trajectories if they are measured
with high resolution on short timescales98,99.
mutations with beneficial effects drives the evolutionary
dynamics (BOX 1). These dynamics are often visualized
a Optimization b Innovation in terms of successive steps as populations ‘climb’ ridges
and peaks in a fitness landscape23. Phenotypic evolution
in a population may involve gradual optimization, dis‑
Relative fitness
Relative fitness
continuous innovation (BOX 2) or perhaps some mixture

of the two. For example, a discontinuous change, such
as the ability to survive a previously lethal stress or to
grow on a new resource, might be followed by a period
of gradual refinement of that new ability.
Generations Generations
Optimization regime. In the case of asexual organisms,
Diminishing-returns epistasis All-or-none epistasis the simplest situation occurs when the rate of appear‑
ance of beneficial mutations is low enough relative to
both the fitness advantages of new beneficial mutations
Relative fitness
Relative fitness
and the population size, such that there is effectively only

one beneficial mutation present at a time. If this muta‑
tion survives stochastic loss by genetic drift while it is
still rare, then it will begin a selective sweep, whereby its
frequency increases until it reaches genetic fixation in the
Anc A B C AB BC AC ABC Anc A B C AB BC AC ABC
population (that is, the mutant completely replaces its
ancestor), before another beneficial mutation becomes
Genotype Genotype
established (FIG. 2a). These dynamics have been called
A, B and C represent mutations that can occur alone or in combination in the genome. periodic selection after classic experiments that inferred
Anc, ancestor. Nature Reviews | Genetics sweeps of beneficial mutations in E. coli populations24.

REVIEWS
However, genetic dynamics in evolution experiments the relevant mutations each conferred a direct advantage
rarely seem to be in this simple regime. Owing to the in terms of using the new resource, but this need not be
many possible routes for adaptation, the rate at which the case. The LTEE uses a glucose-limited medium, but
beneficial mutations appear is typically high enough citrate — another nutrient that E. coli generally cannot
that before one beneficial mutation can sweep to fixa‑ use — has also been present throughout the experiment.
tion, another appears in a separate lineage (FIG. 2b). In The ability to use this abundant but untapped resource
asexual populations, competition between these alter‑ evolved after 30,000 generations (~15 years) and in only
native beneficial mutations means that the rate at which one of the 12 replicate populations34. This innovation
Diminishing-returns
any one of these mutations spreads through the popula‑ was difficult because it was contingent on one or more
epistasis
Interactions among mutations tion is slowed because it must displace fitter competitors earlier ‘potentiating’ mutations that had to be present
such that the combined effect rather than only its ancestor 25. This effect is called clonal in the genetic background for the key ‘actualizing’ muta‑
of the mutations on fitness or interference, and the resulting genetic dynamics have tion to establish in the population. The key actualizing
on some other trait is less than been observed in several studies, most notably by deep mutation arose by a chromosomal duplication event that
that expected from their
individual contributions.
sequencing entire yeast populations at frequent intervals ‘rewired’ gene expression by placing a new transcrip‑
to follow the frequencies of many new mutations26. tional promoter upstream of a previously silent citrate
All‑or‑none epistasis Genetic dynamics become even more complex transporter gene to give the Cit+ phenotype (that is, the
Interactions among mutations when considering that neutral and deleterious muta‑ ability to use citrate as a nutrient source)35. The earlier
such that an entire set of
tions continually occur alongside the beneficial potentiating mutations did not confer any immediate
mutations is required to confer
a fitness advantage or a new mutations discussed above. In large populations, delete‑ advantage with respect to using citrate, but they may
trait; no subset that lacks one rious mutations would rarely reach high frequency on have been beneficial with respect to growth on glucose.
of these mutations has the their own, and neutral mutations would do so only over If so, it is possible that other populations in the LTEE
advantage or an intermediate very long timescales. However, neutral and even delete‑ have also become potentiated and might evolve the Cit+
form of the relevant trait.
rious mutations can rapidly hitchhike to prominence phenotype, although no others have done so even after
Adaptive evolution when they occur in the same genome as a beneficial >50,000 generations.
Evolution under conditions mutation. The interplay of all of the factors discussed
in which surviving organisms above can also give rise to apparently contradictory Stressful environments. An important ‘dial’ that can be
accumulate genetic changes
observations. For example, the rate of genomic change in adjusted in experimental evolution studies is the strength
that lead to a fitness advantage
over their progenitors. an E. coli population in the long-term evolution experiment of selection, particularly if the goal is to improve some
(LTEE) was surprisingly constant and clock-like — a phenotypic property. In one limit, selection may be so
Fitness landscape signature that is sometimes taken as evidence of neutral strong that it becomes a genetic screen in which only
The visualization of the evolution — even though most mutations that became rare mutants with extreme, perhaps innovative, phe‑
genotype-to‑fitness mapping
for an organism in which the
fixed in the population were beneficial27 (BOX 3). notypes can survive the stress (FIG. 2c). When selection
height of a position on the map is less stringent, more genetic diversity can usually be
represents the fitness of that Innovation regime. Some experiments have observed sustained, which allows more opportunities for popu‑
genotype and the location qualitatively new, often ‘game-changing’ abilities that lations to optimize by exploring alternative mutational
is a reduced-dimensional
have the hallmarks of evolutionary innovations28. Some paths. Indeed, divergent paths have been described in
projection of possible
genotypes. An evolutionary innovations may require only a single mutation. For most evolution experiments with microorganisms,
trajectory of genetic changes example, whole-genome sequencing of experimen‑ and these dynamics have been reconstructed in certain
can be visualized as a ‘walk’ tally evolved Myxococcus xanthus strains found that 14 cases36. A potential disadvantage of a weak selection
and adaptation as a ‘climb’ mutations were substituted after 1,000 generations in a strategy, if the goal is to maximize phenotypic change,
in the fitness landscape.
liquid medium, while social motility and the capability is that those mutations that confer the highest tolerance
Selective sweep to form fruiting bodies were lost, but only one muta‑ to stresses — such as organic solvent 37, high tempera‑
The increase in the frequency tion was involved in the subsequent restoration of those ture38,39, radiation exposure40 and antibiotic pressures41
of an advantageous allele in a functions29. That key mutation did not revert any of the — may not be favoured under this strategy. Conversely,
population as it displaces
previous mutations but occurred instead in a previ‑ if selection is too strong, then an evolving population
ancestral and competitor
alleles. ously uncharacterized small RNA30. The experimentally might be driven towards a ‘quick fix’ — a local peak in
evolved transition of a chimeric Ralstonia solanacearum the fitness landscape — that renders an even better solu‑
Genetic fixation strain from a plant pathogen into a symbiont that is tion less accessible. These concerns can potentially be
The point at which an allele has able to colonize root nodules also required only a single balanced by constructing connected microenvironments
completely displaced ancestral
and competitor alleles; that is,
mutation in hrpG, which encodes a protein that regu‑ with a gradient of conditions, such that organisms can
the allele is present in every lates the expression of several virulence factors31. It is colonize and exploit new resources in previously inhos‑
surviving individual in the also likely that some examples of yeast that evolved a pitable regions by gaining new mutations that give them
population. new multicellular ‘snowflake’ phenotype needed only greater tolerance41.
single adaptive mutations32. The effect of clonal interference depends on how
Periodic selection
The phenomenon whereby The evolution of new metabolic capabilities was closely matched the most beneficial mutation that occurs
successive beneficial mutations studied in several early experiments with microorgan‑ in a population is to the next-most beneficial mutation.
completely sweep through an isms33. These organisms often gained the ability to use Stressful environments can sometimes help to separate
evolving population. Other new compounds as nutrient sources by successive muta‑ the more beneficial mutations from the less benefi‑
mutations that are linked,
but are not beneficial, can
tions in genes that activated their transcription under cial mutations. For example, a study of bacteriophage
hitchhike with the beneficial new conditions, increased their overall expression levels ΦX174 manipulated how harsh the environment was
driver mutation. or altered their substrate specificities. In these examples, by restricting CaCl2 (which is required for the efficient

REVIEWS
Asexual reproduction
a Periodic selection
100
Genotype frequency
Allele frequency (%)

0
b Clonal interference
100
Genotype frequency

0
c Strong selection
100
Genotype frequency
Clonal interference
Competition between lineages 0
that have different beneficial Generations Generations
mutations in asexual
populations, which slows the
rate at which any particular Sexual reproduction
allele fixes in the population
d Initially clonal population
relative to a freely recombining
100
population.
Genotype frequency
Long-term evolution
experiment
(LTEE). An experiment with
Escherichia coli that has
surpassed 25 years and
55,000 generations in
duration.
0
Genetic background
The genotype of an organism;
that is, its complete genome
sequence or the alleles that e Standing genetic diversity
distinguish it from other 100
organisms.
Genotype frequency
Strength of selection
The benefit of accessible
beneficial mutations relative
to current mean population
fitness. Under strong selection,
sweeps of new genotypes
generally occur more rapidly,
and less diversity builds up 0
in a population. Generations Generations


REVIEWS
◀ Figure 2 | Genetic dynamics in evolution experiments. Five scenarios are illustrated are deleterious than beneficial (BOX 1). However, the
using Muller plots (left-hand panels), which show the frequencies of different initial spontaneous mutation rate is typically low, as
genotypes over time as coloured segments69. As new mutations appear, they are linked discussed above, so that the fitness cost of produc‑
with mutations that previously arose in their predecessors. When there is sexual ing mutations at even a 100‑fold higher rate is small.
reproduction, existing mutational variants may also be recombined to produce new
Furthermore, new hypermutator variants frequently
genotypes (as indicated by white arrows pointing to multiply shaded regions with
dashed boundaries). The frequencies of different alleles (that is, mutational variants) in
arise because loss‑of‑function mutations in many genes
the population, as would be measured by metagenomic sequencing, are also shown give rise to this phenotype. Given the balance between
for each scenario (right-hand panels). a | During periodic selection, if the rate at which their high rate of occurrence and small fitness costs,
new beneficial mutations appear is low and the fitness benefit of each mutation is hypermutators might exist at frequencies of 10−6–10−4
large, then only one mutation will usually sweep through the population at a time. in experimental populations of E. coli 46,47. However, the
These dynamics cause near step-like trajectories for fitness, phenotypic traits and hypermutator subpopulation has increased evolvability
the number of beneficial mutations that accumulate over time. Successive sweeps because it has a much higher per-capita chance of pro‑
typically take longer, as the expected marginal benefit of a later mutation decreases ducing the next highly beneficial mutation, or multiple
if evolution is in the optimization regime. b | In clonal interference, if the supply rate beneficial mutations, than a non-mutator competitor
of beneficial mutations is higher because either the population size or the overall
(FIG. 3a). Thus, despite their slight fitness costs, hyper‑
mutation rate is increased, then multiple beneficial mutations may arise before one of
them achieves fixation. In asexual populations, competition between the contending
mutators can sometimes hitchhike to fixation with the
mutations slows their progress towards fixation, which allows time for additional beneficial mutations that they generate48.
beneficial mutations to occur and gives rise to more complex trajectories for both In the long term, increased mutation rates are not
fitness and mutation number. c | If strong selection is periodically imposed, in ways that without evolutionary risk. Opportunities for muta‑
may even be lethal to most of the population (shown by dashed vertical lines), then tions that greatly improve fitness will eventually run
only one or a few genotypes may persist, and they can quickly achieve fixation after out in the optimization regime. It may then be bene‑
this selection-induced bottleneck. This scenario can lead to large and sudden changes ficial to compensate for a hypermutator defect and to
in a phenotype, such as resistance to an antibiotic or to stress. d | The scenario of sexual become less evolvable in order to lower the genetic load49
reproduction in an initially clonal population is shown. As new beneficial mutations (FIG. 3b). Indeed, both mutation-rate scenarios have been
arise, they can be recombined into the same genetic background, rather than only
observed in an E. coli population in the LTEE50. After
competing with one another as in asexual populations. Thus, sexual reproduction may
lead to more rapid genetic evolution and adaptation. e | In the case of sexual
thousands of generations, a mutation in the nucleoside
reproduction with standing genetic diversity, shuffling of the genetic diversity that is triphosphate pyrophosphohydrolase gene (mutT) that
initially present in a population may generate fitter genotypes at a faster rate than caused a ∼150‑fold increase in mutation rates spread
waiting for new beneficial mutations to arise. Even so, if many different combinations through the population, presumably by hitchhiking
of existing alleles give similar benefits, then no one allele will necessarily sweep to with one or more beneficial mutations. Later, parallel
fixation on the timescale of the experiment. mutations in the adenine DNA glycosylase gene (mutY)
arose in independent lineages; these mutations approxi‑
mately halved the mutation rate, apparently by knock‑
attachment of ΦX174 to the E. coli host) and monitored ing out a mechanism by which misincorporations of
genome sequence diversity over time42. It showed that oxidized nucleotides during DNA replication (caused
clonal interference was more prevalent in benign envi‑ by the original mutT defect) were misrepaired. The
ronments, in which more beneficial mutations of similar resulting reduction in genetic load was estimated to be
effect were evidently available, and that this led to slower ~0.5%. This value seems to be similar in magnitude to
overall rates of genetic change. other beneficial mutations that drove adaptation late in
the LTEE, whereas some beneficial mutations that were
Second-order selection for evolvability substituted early in the experiment had much greater
When there is genetic diversity in evolving populations fitness effects27. In an experiment with yeast, several
for long periods of time, there is the opportunity for populations that started as hypermutators also evolved
selection to operate not only on the immediate effects lower mutation rates and, as a result, reduced genetic
of mutations or new combinations of alleles but also on loads51. In natural populations, comparative evidence
how those new genotypes differ in their capability to indicates that the complete reversion of a hypermutator
further evolve (that is, their evolvability). In particular, to an ancestral mutation rate can occur by horizontal
prior mutational steps in a path on the fitness landscape gene transfer of an intact gene from a non-mutator 52.
may affect evolvability in at least two main ways. They In addition to affecting mutation rates, hypermutators
Genetic load
may alter mutation rates (and/or recombination rates generally change the spectrum of different types of
The indirect fitness cost to an for organisms that are capable of sexual reproduction mutations, and those differences might also influence
organism caused by producing or horizontal gene transfer), or they may lead to dif‑ the evolvability of a lineage.
offspring with mutations that ferences in epistatic interactions with potential further
either reduce their fitness or
mutations (BOX 2). Genetic architecture. As fitness landscapes are com‑
are lethal.
plex and may have multiple peaks, some mutational
Genetic architecture Mutation rates. It is fairly common for some populations paths may lead to ‘dead ends’ with no, or at least fewer,
The properties of an organism, in adaptive evolution experiments to become dominated opportunities to further improve. In other cases, cer‑
including its metabolic, by hypermutators that have increased mutation rates43–45. tain mutations may open up new opportunities for evo‑
regulatory and developmental
pathways, that determine
How do hypermutators invade populations? The imme‑ lution that could not be accessed if other routes were
how new mutations affect diate effect of an increased mutation rate on fitness is taken. The term genetic architecture refers to how geno‑
phenotypes and fitness. invariably negative, on average, because more mutations types, and mutations that alter genotypes, map onto

REVIEWS
Box 3 | Identifying adaptive mutations

in the winners but did not significantly affect the fitness
of the losers (BOX 3). It remains to be determined exactly
In molecular evolution and comparative genomics studies, the ratio of synonymous to why the earlier mutations that distinguished the losers
non-synonymous base substitutions (dN/dS) in a protein-coding gene is commonly used from the winners reduced their evolutionary potential,
to test whether the gene has been subject to positive selection or negative selection and how important such epistatic ‘cul‑de‑sacs’ are in
(also known as purifying selection). There are rarely enough base substitutions in an
natural populations.
evolution experiment to apply this test on a per-gene basis. However, because so few
mutations accumulate, it is also highly unlikely that the same gene would change in
several independently evolved genomes unless these variants had been enriched by Eco-evolutionary dynamics
selection. Therefore, such genetic parallelism provides a strong signal that the In The Origin of Species 54, Darwin memorably envisioned
mutations were beneficial. Depending on the phenotypic effect that is required for a “tangled bank” in which organisms were “dependent
adaptation, this parallelism may occur at the level of an individual nucleotide or codon, upon each other in so complex a manner” as an outcome
a specific gene, or some step in a particular metabolic or regulatory pathway27,63,72,100–102. of natural selection. To this point, we have generally
More complex patterns of covariation — such as a mutation in either one gene or ignored ecological interactions beyond ‘scramble’ com‑
another, but not both, in multiple independently evolved genomes — can be used to petition for limiting resources. Even in simple laboratory
identify new genes that are involved in the same pathway39,103. environments, evolution can lead to niche construction55
However, hot spots that undergo unusually high rates of spontaneous mutations
that enables diverged lineages of organisms to stably
relative to the rest of a genome can also lead to genetic parallelism for mutations that
are only slightly beneficial or not adaptive at all104–106. Therefore, the ‘gold standard’ for
coexist for long periods of time. Other experiments
establishing whether a particular mutation is adaptive is to use either a genome-editing have examined evolution in environments with multiple
or a genetic exchange method to make an isogenic construct that differs from another resources or multiple interacting species. In both cases,
strain by only the single mutation of interest. One can then either test for a change in a metagenomic sequencing of DNA that was isolated from
trait that is known to be related to fitness or compete the two organisms under the whole-population or whole-community samples, rather
conditions that prevailed during the evolution experiment to determine whether than from individual clones, is yielding new insights into
the mutation is beneficial, neutral or deleterious. the dynamic interactions between distinct ecotypes.
In longer evolution experiments, after many mutations have accumulated, one must
consider several potential complications when interpreting the results of these Multiple nutrients. The well-shaken flask environment
measurements. First, some mutations may be adaptive only in the context of the
of the LTEE nominally has a single niche, with a low
genetic background in which they arose, owing to interactions with mutations that
occurred earlier in that lineage90. In this case, it might be cleaner to ‘deconstruct’ the
concentration of glucose that limits growth. However,
mutation by removing it from an evolved genotype, in which one would then expect one of these E. coli populations gave rise to two distinct
fitness to decrease. However, there is, again, the potential for interactions with ecotypes, first noticed as small and large colony morpho‑
mutations that evolved after the focal mutation to alter its measured fitness effect. types after 6,000 generations, and these types coexisted
Second, ecological interactions may affect fitness measurements. These often take the for at least 30,000 generations56. The two types show
form of negative frequency dependence, in which a genotype has an advantage over negative frequency dependence, such that each type has a
some competitor only when it is rare in the population56. In other cases, non-transitive fitness advantage and can invade the other type when it
interactions may arise such that an evolved genotype is more fit than its immediate is rare in the population57. In this case, the balance that
progenitor, but this genotype is less fit than some earlier ancestor that it never leads to stable coexistence results from the large type
encountered because they were present at different times in the evolving population107.
growing faster on glucose and the small type having
better growth on metabolic by-products58. The genetic
and physiological bases of these differences are subject
changes in phenotypes and fitness; hence, differences to ongoing investigation57,58.
in evolvability can result from mutations that change Metagenomic sequencing of another population
Isogenic construct
the genetic architecture of an organism (FIG. 3c,d). We in the LTEE revealed more transient diversification59.
An organism, produced in have already discussed above how the evolution of Mutations in genes that are related to acetate use repeat‑
the laboratory using various citrate use depended on a potentiated genetic back‑ edly arose, but they never persisted for more than a
genetic tools, that has ground, which could be said to have resulted in greater few thousand generations or reached high frequencies.
defined genetic differences
evolvability 34,35 (FIG. 3c). Acetate is a by‑product excreted by E. coli during growth
from a reference organism.
It is used to study the effects A more subtle change in evolvability involved two on glucose. As glucose runs out, acetate is taken up
on fitness and on other genotypes that competed early in the history of another and used by cells. This cross-feeding opportunity for
phenotypic traits of single E. coli population from the LTEE53. One of these geno‑ acetate ‘specialists’ suggests that other populations in
mutations or combinations types prevailed despite having a significantly lower fit‑ the LTEE might also be on the cusp of evolving more
of mutations.
ness. By replaying evolution many times from these two complex ecologies.
Niche construction different starting points, it was demonstrated that the In both of these cases from the LTEE, continued evo‑
The production of a new ‘eventual winners’ reproducibly gained more fitness over lution after diversification feeds back on the ecological
resource or other ecological time than the ‘eventual losers’, such that this seemingly interactions and the stability of such interactions. In the
opportunity that is caused
unexpected outcome in the original LTEE population case of the small and large polymorphism, the large types
by the actions or evolution
of organisms. was, in fact, the more likely outcome (FIG. 3d). Genome continually encroached on the niche occupied by the
sequencing showed that the eventual winners often small type; if the small type had stopped evolving then
Metagenomic sequencing underwent subsequent mutations in the spoT gene, in it would have been driven to extinction57. In the case
The sequencing of DNA which mutations were never observed in the eventual of acetate use, descendants that were derived from the
fragments that are randomly
derived from a population
losers. The SpoT protein is a regulator of the stringent main population apparently displaced the acetate spe‑
containing a mixture of many response. Reconstruction of this mutation in the two cialists multiple times but repeatedly gave rise to new
genotypes. genetic backgrounds showed that it was highly beneficial mutants that later reinvaded this niche59. Understanding

REVIEWS
Mutation rates cells. Mutations that confer a GASP phenotype have been
a Hypermutator b Antimutator identified in rpoS, which encodes the alternative ‘star‑
vation’ σ-factor, and in other genes that encode high-
affinity amino acid transporters; all of these mutations
increase the ability of the cell to obtain and use amino
Fitness
acids for carbon and energy61. In addition, genomic anal‑
yses reveal frequent gene-amplification variants among
Genotype the survivors61. It seems that copy-number variants are
generated at a high rate during starvation, and those that
result in extra copies of genes that encode products which
prove useful during starvation can then proliferate.
Genetic architecture Substantial diversity also evolves in glucose-limited
c Accessible innovation d Antagonistic epistasis continuous-culture chemostats62,63, even without the
feast–famine seasonality in resource abundance that
occurs in serial transfer studies. Metagenomic sequenc‑
ing of populations that were propagated in chemostats
at two different dilution rates found more genetic and
phenotypic diversity in populations that were evolved
at slow dilution rates than in those that were evolved at
high dilution rates64, which possibly indicates a higher
mutation rate at the lower growth rate, more distinct
strategies for dealing with this challenge or some combi‑
Figure 3 | Second-order selection for evolvability. The success of a new mutation or nation of the two. Diversification in this system is driven
a new combination of alleles may depend on its effect on eitherNature Reviews
the rate | Genetics
or the fitness by regulatory mutations that lead to different balances in
benefits of subsequent mutations, in addition to its immediate effect on fitness. Several the trade-off between stress resistance and nutrient use,
scenarios are illustrated as alternative mutational paths in fitness landscapes. Genotypes including rpoS mutations64.
are represented by circles; thick arrows represent initial mutations that generate a new
Diversity can be encouraged experimentally by pro‑
genotype, and thin arrows represent subsequent mutational paths that are available to
the genotypes; the mutation rate of a genotype is reflected by the number of thin arrows viding a mixture of nutrients to create multiple niches.
projecting from it. a | A fitness landscape favouring a hypermutator is shown. From a Evolution experiments in which E. coli populations are
progenitor with a low ancestral mutation rate (blue circle), a variant that causes an serially propagated in a mixture of glucose and acetate
increased mutation rate (green circle) can sometimes take over an asexual population reliably give rise to two strategies: glucose specialist ‘slow
because it has a higher per-capita probability of generating beneficial mutations. Access switchers’ that slowly shift from using glucose to acetate
to these opportunities may outweigh the immediate fitness cost of an increased genetic and ‘fast switchers’ that use acetate earlier than the slow
load. b | A fitness landscape favouring an antimutator is shown. In the longer term, as a switchers65. Metagenomic sequencing of many popula‑
genotype (blue circle) evolves to approach a local optimum (the peak of the illustrated tions showed clear molecular signatures of parallel evo‑
fitness landscape) and there are fewer beneficial mutations available, a genotype with a lution in each ecotype and suggests that some mutations
lower mutation rate (green circle) may evolve and be favoured because it has a reduced
in one ecotype forced evolution in the other ecotype to
genetic load. c | A fitness landscape promoting an accessible innovation is shown. Starting
from the same progenitor genotype (black circle), two mutants may have different follow certain mutational pathways66.
probabilities of eventual success owing to differences in their evolvability. In this case,
one mutation (green circle) makes it possible for a subsequent mutation to ‘invade’ an Spatial gradients. An alternative approach to generating
open niche (beige landscape), whereas the other mutation (blue circle) does not. multiple niches is to establish spatial heterogeneity that
d | A fitness landscape with antagonistic epistasis is shown. Even in the same niche, one results in distinct microenvironments67,68. Pseudomonas
beneficial mutation (blue circle) may constrain opportunities for further fitness gain more fluorescens rapidly evolves into three ecotypes — rec‑
than another (green circle) because of antagonistic epistatic interactions. In essence, ognizable by their distinct colony morphologies — that
some beneficial mutations may lead to ‘cul‑de‑sacs’ in the fitness landscape, which populate different physical zones within unshaken flasks
allows other beneficial mutations that do not limit further adaptation to prevail, provided owing to heterogeneity in the availability of oxygen68.
that they coexist for enough time. If evolution can be ‘replayed’ many times starting with
Diversification also occurred when Burkholderia ceno-
the two different genotypes, then an over-representation or an under-representation of
mutations in specific genes would provide a signature of such epistatic effects. cepacia evolved under daily selection for both the ability
to disperse and the ability to then colonize a new surface
as a biofilm69. In this case, three distinct colony mor‑
photypes also rapidly emerged. However, metagenomic
the effects of newly evolved ecologies on evolutionary sequencing revealed more complex genetic dynamics
Negative frequency dynamics is an area that is worthy of both theoretical that were not apparent from the time course of phe‑
dependence
and empirical investigation. notypic diversification. Evolved genotypes that have a
An allele (or a trait) that
undergoes a decline in fitness Complex ecologies also evolve in closed cultures in ‘studded’ colony morphotype gave rise to new versions
as it becomes more common which the nutrients are exhausted and are not renewed. of all three ecotypes, which then drove lineages that
in a population. If the allele In these cultures, the viable cell population declines over had previously exploited other niches to extinction. In
confers an advantage when it time, as starvation takes its toll. However, not all geno‑ essence, the studded type seems to be a ‘source’ popu‑
is rare but is disadvantageous
when it is common, then a
types die at the same rate, and the survivors are enriched lation that continuously generates new variants which
genetic polymorphism is stably for growth advantage in stationary phase (GASP) periodically displace the current lineages in secondary
maintained. mutants60,61 that exploit by‑products of dead and dying niches that become evolutionary ‘sinks’ (FIG. 4a).

REVIEWS
a Multiple niches b Host–parasite co-evolution

Constant host Evolving host
Genotype abundance
Parasite genotype
Parasite genotype
frequency
frequency
Generations Generations Generations
Negative frequency dependence
Host genotype
Host genotype
20% 80%
Fitness
frequency
frequency
0% Genotype frequency 100% Generations Generations
100% 0%
Parasite mutations
Evolving host
Constant host
Generations
Figure 4 | Ecological and co-evolutionary dynamics. Examples of genetic diversification and dynamics that are
driven by ecological interactions within and between species are shown. a | Multiple niches stabilize genetic diversity
that evolves within one species. A new lineage colonizes an open niche and increases the total population size, as
shown here when the red ecotype using the primary niche gives rise to the blue ecotype that expands into an open
niche. Negative frequency dependence in the fitness of these ecotypes allows their coexistence. In some cases, one
lineage may be a source population that can evolve to recolonize the other niche and displace the lineage that
previously occupied that niche, as shown here when the yellow ecotype from the primary niche gives rise to the pink
ecotype that invades the second niche, displacing the previous occupant. b | In host–parasite co-evolution
experiments, either hosts and parasites can be allowed to co-evolve over time or one partner — in this case, the host
— can be kept unchanged by continually replenishing its population from a non-evolving stock. Genetic and
phenotypic evolution typically occurs at higher rates when the partners co-evolve in response to one another —
a phenomenon that is often referred to as ‘Red Queen’ dynamics.
Co-evolution. Theory predicts, and comparative arise, co-evolution also leads to second-order selection
studies support, ‘Red Queen’ dynamics during host– for hypermutators in this system71.
parasite co-evolution, in which the rate of genome evo‑ Another study of host–pathogen co-evolution, using
lution is accelerated, especially in the genes that encode E. coli and a virulent (that is, exclusively lytic) variant
parasite-invasion or host-protection factors. Studies of phage λ, integrates many of the concepts discussed
of P. fluorescens and its phage Φ2 have been used to above72. In some replicate populations, the phage evolved
examine the genome-wide dynamics of allele frequen‑ the ability to infect hosts through a new cell-surface
cies in replicate phage populations70. The experimen‑ receptor. This innovation was contingent on the bacte‑
tal set-up allowed the investigation of an interesting rial population evolving along a certain mutational path‑
contrast: in one treatment the host co-evolved with way — one that initially reduced, but did not eliminate,
the parasite, whereas in another the parasite evolved expression of the original receptor; it was also important
while the host genotype was kept constant by repeat‑ that later host mutations did not result in the loss of the
edly reviving the ancestral strain from a frozen stock channel that the phage uses to cross the inner mem‑
(FIG. 4b). As predicted, the phage genome evolved faster brane of the cell. Moreover, the innovation in the phage
in the co-evolution treatment than when the host was required several prior mutations in the gene that encodes
not allowed to co-evolve. By continually providing new the tail fibre of the phage; these prior mutations appar‑
opportunities for adaptive mutations of large effect to ently spread because they improved the ability of the

REVIEWS
phage to infect through the original receptor. In short, interference’, in which alternative allelic combinations
the complex interplay between ecological interactions that produce similar benefits impede any given sweep
and genetic contingency determined the evolutionary to fixation and thereby maintain genetic diversity for
outcomes in these co-evolving populations. longer than would otherwise be expected.
In another study, inbred lines of D. melanogaster
Sexual reproduction were pooled, and their offspring evolved in increas‑
Whole-genome studies of the genetic dynamics of adap‑ ingly hypoxic environments for >200 generations76.
tive evolution in multicellular animals and plants face Individual flies from end-point populations could sur‑
different challenges compared with those in microor‑ vive at low oxygen levels that were lethal to their ances‑
ganisms. Most multicellular organisms can or must tors. Genomic sequencing of the evolved populations
sexually reproduce and are diploid, so they have two found numerous apparently complete fixations of alleles,
different copies of each chromosome. Recombination, as indicated by the depletion of genetic diversity in some
usually during meiosis, in which DNA sequences are chromosomal regions. There are several potential expla‑
exchanged between homologous chromosomes, breaks nations for the difference between this study and the one
the linkage between mutations and their genetic back‑ on accelerated development. One possibility is that the
ground and produces new combinations of alleles. Many stronger selection for survival may have caused more
classical population genetics models assume recombi‑ extreme population bottlenecks. Another possibility is
nation, and certain inferential procedures (for example, that hypoxia tolerance may involve fewer genetic loci
distinguishing beneficial driver mutations from linked than developmental time. A third possible explanation
hitchhikers) may prove to be easier in sexual systems. is that this experiment began with a set of inbred lines,
In populations in which adaptation is driven by new whereas the experiment on developmental time used
beneficial mutations, recombination that brings those outbred lines that presumably harboured much more
mutations together into the same genome may outpace initial genetic diversity.
the de novo appearance of successive beneficial muta‑ In an even longer experiment with D. melanogaster,
tions in any one lineage73. Perhaps at least partly for this flies were propagated for >50 years in complete dark‑
reason, even bacteria have mechanisms — including ness, and they have also been studied by whole-genome
conjugation, transduction and transformation — that sequencing 77. However, it is difficult to draw conclu‑
allow parasexual recombination of alleles between line‑ sions from these data because the experiment suffered
ages. In experimental E. coli populations in which new an extinction of control lineages and because histori‑
beneficial mutations drive adaptation, adding recombi‑ cal DNA samples are not available. In future studies,
nation has been shown to alleviate clonal interference the planned preservation of time series of samples for
and to accelerate adaptation under some circumstances74 genomic analysis should provide additional insights into
(FIG. 2d). However, most experiments with multicellular the tempo and mode of genetic change in both animal
organisms do not (and often cannot) begin with a clonal and plant populations.
population and then wait for new beneficial mutations to
arise. Instead, they usually begin with substantial genetic Perspectives
diversity, such as the diversity that typically exists as This Review ends by briefly commenting on other sys‑
standing variation in natural populations. Under these tems in which whole-genome sequencing is likely to
conditions, the initial genotypic diversity and the new be applied to understand evolutionary dynamics in the
types that are generated by recombining alleles from near future. First, more genetics, systems biology and
across the genome are the primary sources of the genetic synthetic biology studies may be unwitting evolution
variation available for adaptation, at least in the short experiments than is commonly appreciated. In micro‑
term (FIG. 2e). biology, there is a growing realization that strains that
The effects of these issues on the tempo and mode were previously believed to be isogenic are not — addi‑
of genome evolution are just beginning to be explored tional mutations beyond those that were deliberately
in experiments with sexually reproducing multicellu‑ introduced and studied have accumulated over their
lar organisms. Whole-genome sequencing of outbred history 78,79. Furthermore, one often introduces single
D. melanogaster populations that were selected for genetic changes, or defined combinations of changes,
accelerated development (that is, a shorter time from in a reference genome in these types of studies. This
egg to adult) over 600 generations found that no alleles, genetic manipulation may be accomplished in various
neither those that were initially present nor those that ways: by spontaneous mutation, by the use of a mutagen,
arose de novo, had swept to fixation75. However, vari‑ by some means of genetic exchange, by genome editing
ous parallel changes occurred in both the distribution of technologies or by some combination of these strate‑
allele frequencies and the levels of homozygosity across gies. These manipulations may, either occasionally or
each chromosome, which indicates that selection had typically, cause unintended mutations in addition to the
repeatedly enriched certain variants. These ‘soft’ sweeps desired changes80–82. These considerations will also apply
of existing alleles may have been incomplete because to the synthesis and large-scale editing of genomes83,84,
the experiment was too short or, alternatively, because in which mutations may occur and perhaps even be
different allelic combinations resulted in similar levels favoured during these iterative processes. Such second‑
of phenotypic improvement. By analogy to clonal inter‑ ary mutations may need to be removed to accurately
ference, the second possibility suggests a sort of ‘sexual infer the effects of the intended genetic manipulations.

REVIEWS
At the same time, it will be very interesting to see what genes in the pathogen that underwent parallel changes
similarities and differences emerge between the dynam‑ that implied adaptation to the host environment 86. The
ics of genome evolution in laboratory experiments and identified genes include some that are related to thera‑
in natural settings, including medically relevant settings, peutic interventions (for example, antibiotic resistance
such as during microbial infections and tumour progres‑ genes) and host immune responses (for example, genes
sion. For example, one can sequence genomes of bacteria encoding cell-surface antigens), as well as other genes
that are sampled at multiple points over the course of that were not previously known to be important for these
chronic infections, including samples stored in the past. infections. Elucidating genome dynamics has similarly
This approach has recently been applied to Pseudomonas proved to be crucial for understanding many observa‑
aeruginosa that was sampled over the course of multiple tions regarding the progression of neoplastic tumours87.
decades as the bacteria evolved in the airways of individ‑ These types of studies will undoubtedly lead to impor‑
uals with cystic fibrosis85. It has also been used to identify tant advances in identifying specific genes and mutations
adaptive mutations that arose during a local outbreak of that contribute to disease and resistance to treatment.
Burkholderia dolosa among people with cystic fibrosis86. Future studies might also reveal the extent to which
To do so, whole-genome sequences were first used to ecological interactions and differences in evolvability in
reconstruct the transmission history between host indi‑ these genetically diverse cell populations affect disease
viduals, and analyses were then carried out to identify outcomes.
1. Garland, T. & Rose, M. R. (eds) Experimental Evolution: 21. Ames, B. N., Durston, W. E., Yamasaki, E. & Lee, F. D. 35. Blount, Z. D., Barrick, J. E., Davidson, C. J. &
Concepts, Methods, and Applications of Selection Carcinogens are mutagens: a simple test system Lenski, R. E. Genomic analysis of a key innovation in
Experiments (Univ. of California Press, 2009). combining liver homogenates for activation and an experimental Escherichia coli population. Nature
2. Kawecki, T. J. et al. Experimental evolution. Trends bacteria for detection. Proc. Natl Acad. Sci. USA 70, 489, 513–518 (2012).
Ecol. Evol. 27, 547–560 (2012). 2281–2285 (1973). This paper describes the genetic basis of the
3. Hartl, D. L. & Clark, A. G. Principles of Population 22. Savageau, M. A. Escherichia coli habitats, cell types, evolution of citrate use in the LTEE, including the
Genetics (Sinauer Associates, Inc., 2007). and molecular mechanisms of gene control. Am. Nat. potentiation, actualization and refinement stages
4. Mardis, E. R. Next-generation DNA sequencing 122, 732–744 (1983). of this innovation.
methods. Annu. Rev. Genom. Hum. Genet. 9, 23. Orr, H. A. Fitness and its role in evolutionary genetics. 36. Conrad, T. M., Lewis, N. E. & Palsson, B. O.
387–402 (2008). Nature Rev. Genet. 10, 531–539 (2009). Microbial laboratory evolution in the era of genome-
5. Schadt, E. E., Turner, S. & Kasarskis, A. A window into 24. Atwood, K. C., Schneider, L. K. & Ryan, F. J. Periodic scale science. Mol. Syst. Biol. 7, 509 (2011).
third-generation sequencing. Hum. Mol. Genet. 19, selection in Escherichia coli. Proc. Natl Acad. Sci. USA 37. Minty, J. J. et al. Evolution combined with genomic
R227–R240 (2010). 37, 146–155 (1951). study elucidates genetic bases of isobutanol tolerance
6. Halligan, D. L. & Keightley, P. D. Spontaneous mutation This study is a classic early demonstration of in Escherichia coli. Microb. Cell Fact. 10, 18 (2011).
accumulation studies in evolutionary genetics. Annu. adaptive evolution in experimental populations 38. Blaby, I. K. et al. Experimental evolution of a
Rev. Ecol. Evol. Syst. 40, 151–172 (2009). of bacteria. facultative thermophile from a mesophilic ancestor.
7. Lynch, M. et al. A genome-wide view of the spectrum 25. Fogle, C. A., Nagle, J. L. & Desai, M. M. Clonal Appl. Environ. Microbiol. 78, 144–155 (2012).
of spontaneous mutations in yeast. Proc. Natl Acad. interference, multiple mutations and adaptation in 39. Tenaillon, O. et al. The molecular diversity of adaptive
Sci. USA 105, 9272–9277 (2008). large asexual populations. Genetics 180, 2163–2173 convergence. Science 335, 457–461 (2012).
8. Ossowski, S. et al. The rate and molecular spectrum of (2008). This study uses whole-genome sequencing of a large
spontaneous mutations in Arabidopsis thaliana. 26. Lang, G. I. et al. Pervasive genetic hitchhiking and number of independently evolved populations to
Science 327, 92–94 (2010). clonal interference in 40 evolving yeast populations. examine the diversity of alternative genetic pathways
9. Keightley, P. D. et al. Analysis of the genome Nature 500, 571–574 (2013). that lead to improved fitness at high temperature.
sequences of three Drosophila melanogaster This paper presents the most detailed analysis so 40. Harris, D. R. et al. Directed evolution of ionizing
spontaneous mutation accumulation lines. Genome far of the dynamics of mutations in asexual radiation resistance in Escherichia coli. J. Bacteriol.
Res. 19, 1195–1201 (2009). populations, including the effects of clonal 191, 5240–5252 (2009).
10. Denver, D. R. et al. A genome-wide view of interference, by metagenomic sequencing. 41. Zhang, Q. et al. Acceleration of emergence of bacterial
Caenorhabditis elegans base-substitution mutation 27. Barrick, J. E. et al. Genome evolution and adaptation antibiotic resistance in connected microenvironments.
processes. Proc. Natl Acad. Sci. USA 106, in a long-term experiment with Escherichia coli. Science 333, 1764–1767 (2011).
16310–16314 (2009). Nature 461, 1243–1247 (2009). This study shows that migration between
11. Lind, P. A. & Andersson, D. I. Whole-genome This paper describes the first application of populations living in environments with different
mutational biases in bacteria. Proc. Natl Acad. Sci. whole-genome sequencing to the LTEE and includes selection strengths can speed up adaptation.
USA 105, 17878–17883 (2008). discussions of genetic parallelism, changes in 42. Pepin, K. M. & Wichman, H. A. Experimental evolution
12. Lee, H., Popodi, E., Tang, H. & Foster, P. L. Rate and mutation rates and evolution in the optimization and genome sequencing reveal variation in levels of
molecular spectrum of spontaneous mutations in the regime. clonal interference in large populations of bacteriophage
bacterium Escherichia coli as determined by whole- 28. Wagner, A. The Origins of Evolutionary Innovations ΦX174. BMC Evol. Biol. 8, 85 (2008).
genome sequencing. Proc. Natl Acad. Sci. USA 109, (Oxford Univ. Press, 2011). 43. Sniegowski, P. D., Gerrish, P. J. & Lenski, R. E.
E2774–E2783 (2012). 29. Velicer, G. J. et al. Comprehensive mutation Evolution of high mutation rates in experimental
13. Sung, W., Ackerman, M. S., Miller, S. F., Doak, T. G. & identification in an evolved bacterial cooperator and populations of E. coli. Nature 387, 703–705 (1997).
Lynch, M. Drift-barrier hypothesis and mutation-rate its cheating ancestor. Proc. Natl Acad. Sci. USA 103, 44. Brown, C. T. et al. Whole genome sequencing and
evolution. Proc. Natl Acad. Sci. USA 109, 8107–8112 (2006). phenotypic analysis of mutations found in Bacillus
18488–18492 (2012). This study is the first to use whole-genome subtilis following evolution under relaxed selection for
14. Drake, J. W. Spontaneous mutation. Annu. Rev. Genet. sequencing to discover the genetic basis of an sporulation. Appl. Env. Microbiol. 77, 6867–6877
25, 125–146 (1991). innovative change in the lifestyle of an organism (2011).
15. Lynch, M. The Origins of Genome Architecture that occurred during an evolution experiment. 45. Aguilar, C. et al. Genetic changes during a laboratory
(Sinauer Associates, Inc., 2007). 30. Yu, Y.‑T. N., Yuan, X. & Velicer, G. J. Adaptive evolution adaptive evolution process that allowed fast growth in
16. Friedberg, E. C. et al. DNA Repair and Mutagenesis of an sRNA that controls Myxococcus development. glucose to an Escherichia coli strain lacking the major
(American Society for Microbiology Press, 2006). Science 328, 993 (2010). glucose transport system. BMC Genomics 13, 385
17. Bull, J. J., Badgett, M. R., Rokyta, D. & Molineux, I. J. 31. Marchetti, M. et al. Experimental evolution of a plant (2012).
Experimental evolution yields hundreds of mutations pathogen into a legume symbiont. PLoS Biol. 8, 46. Mao, E. F., Lane, L., Lee, J. & Miller, J. H.
in a functional viral genome. J. Mol. Evol. 57, e1000280 (2010). Proliferation of mutators in a cell population.
241–248 (2003). 32. Ratcliff, W. C., Denison, R. F., Borrello, M. & J. Bact. 179, 417–422 (1997).
18. Domingo-Calap, P., Cuevas, J. M. & Sanjuán, R. Travisano, M. Experimental evolution of 47. Lenski, R. E. Phenotypic and genomic evolution during
The fitness effects of random mutations in single- multicellularity. Proc. Natl Acad. Sci. USA 109, a 20,000‑generation experiment with the bacterium
stranded DNA and RNA bacteriophages. PLoS Genet. 1595–1600 (2012). Escherichia coli. Plant Breed. Rev. 24, 225–265
5, e1000742 (2009). 33. Mortlock, R. P. (ed.) Microorganisms as Model (2004).
19. Drake, J. W., Charlesworth, B., Charlesworth, D. & Systems for Studying Evolution (Plenum Press, 48. Desai, M. M. & Fisher, D. S. The balance between
Crow, J. F. Rates of spontaneous mutation. Genetics 1984). mutators and nonmutators in asexual populations.
148, 1667–1686 (1998). 34. Blount, Z. D., Borland, C. Z. & Lenski, R. E. Genetics 188, 997–1014 (2011).
20. Springman, R., Keller, T., Molineux, I. J. & Bull, J. J. Historical contingency and the evolution of a key 49. Sniegowski, P. D., Gerrish, P. J., Johnson, T. &
Evolution at a high imposed mutation rate: innovation in an experimental population of Shaver, A. The evolution of mutation rates:
adaptation obscures the load in phage T7. Genetics Escherichia coli. Proc. Natl Acad. Sci. USA 105, separating causes from consequences. BioEssays 22,
184, 221–232 (2010). 7899–7906 (2008). 1057–1066 (2000).

REVIEWS
50. Wielgoss, S. et al. Mutation rate dynamics in a 70. Paterson, S. et al. Antagonistic coevolution accelerates 89. Eyre-Walker, A. & Keightley, P. D. The distribution of
bacterial population reflect tension between molecular evolution. Nature 464, 275–278 (2010). fitness effects of new mutations. Nature Rev. Genet. 8,
adaptation and genetic load. Proc. Natl Acad. Sci. USA This study uses whole-genome sequencing of a 610–618 (2007).
110, 222–227 (2013). bacteriophage to observe Red Queen dynamics. 90. De Visser, J. A. G. M., Cooper, T. F. & Elena, S. F.
51. McDonald, M. J., Hsieh, Y.‑Y., Yu, Y.‑H., Chang, S.‑L. & 71. Pal, C., Maciá, M. D., Oliver, A., Schachar, I. & The causes of epistasis. Phil. Trans. R. Soc. 278,
Leu, J.‑Y. The evolution of low mutation rates Buckling, A. Coevolution with viruses drives the 3617–3624 (2011).
in experimental mutator populations of evolution of bacterial mutation rates. Nature 450, 91. Kondrashov, F. A. & Kondrashov, A. S. Measurements
Saccharomyces cerevisiae. Curr. Biol. 22, 1079–1081 (2007). of spontaneous rates of mutations in the recent
1235–1240 (2012). 72. Meyer, J. R. et al. Repeatability and contingency in the past and the near future. Phil. Trans. R. Soc. 365,
52. Denamur, E. et al. Evolutionary implications of the evolution of a key innovation in phage lambda. Science 1169–1176 (2010).
frequent horizontal transfer of mismatch repair genes. 335, 428–432 (2012). 92. Khan, A. I., Dinh, D. M., Schneider, D., Lenski, R. E. &
Cell 103, 711–721 (2000). This paper characterizes multistep genetic Cooper, T. F. Negative epistasis between beneficial
53. Woods, R. J. et al. Second-order selection for pathways that lead to an innovative host–receptor mutations in an evolving bacterial population. Science
evolvability in a large Escherichia coli population. shift in bacteriophage populations and shows that 332, 1193–1196 (2011).
Science 331, 1433–1436 (2011). the shift is contingent on the evolutionary 93. Chou, H.‑H., Chiu, H.‑C., Delaney, N. F., Segrè, D. &
This study uses replay experiments to demonstrate trajectory of the bacterial host populations. Marx, C. J. Diminishing returns epistasis among
that antagonistic epistasis can lead some genotypes 73. Muller, H. Genetic aspects of sex. Am. Nat. 66, beneficial mutations decelerates adaptation. Science
towards an adaptive cul‑de‑sac and eventual 118–138 (1932). 332, 1190–1192 (2011).
extinction. 74. Cooper, T. F. Recombination speeds adaptation by 94. Kvitek, D. J. & Sherlock, G. Reciprocal sign epistasis
54. Darwin, C. The Origin of Species by Means of Natural reducing competition between beneficial mutations in between frequently experimentally evolved adaptive
Selection (John Murray, 1859). populations of Escherichia coli. PLoS Biol. 5, e225 mutations causes a rugged fitness landscape. PLoS
55. Laland, K. N., Odling-Smee, F. J. & Feldman, M. W. (2007). Genet. 7, e1002056 (2011).
Evolutionary consequences of niche construction and This study uses knowledge of the genetic targets of 95. Philippe, N., Crozat, E., Lenski, R. E. & Schneider, D.
their implications for ecology. Proc. Natl Acad. Sci. selection to show that plasmid-mediated Evolution of global regulatory networks during a long-
USA 96, 10242–10247 (1999). recombination accelerates adaptive evolution by term experiment with Escherichia coli. BioEssays 29,
56. Rozen, D. E., Schneider, D. & Lenski, R. E. Long-term overcoming clonal interference. 846–860 (2007).
experimental evolution in Escherichia coli. XIII. 75. Burke, M. K. et al. Genome-wide analysis of a long- 96. Wagner, A. Neutralism and selectionism: a network-
Phylogenetic history of a balanced polymorphism. term evolution experiment with Drosophila. Nature based reconciliation. Nature Rev. Genet. 9, 965–974
J. Mol. Evol. 61, 171–180 (2005). 467, 587–590 (2010). (2008).
57. Le Gac, M., Plucain, J., Hindré, T., Lenski, R. E. & This paper presents one of the first analyses based 97. Heard, S. & Hauser, D. Key evolutionary innovations
Schneider, D. Ecological and evolutionary dynamics of on whole-genome sequencing to characterize and their ecological mechanisms. Histor. Biol. 10,
coexisting lineages during a long-term experiment experimentally evolved populations of a sexual 151–173 (1995).
with Escherichia coli. Proc. Natl Acad. Sci. USA 109, multicellular organism. 98. Lenski, R. E. & Travisano, M. Dynamics of adaptation
9487–9492 (2012). 76. Zhou, D. et al. Experimental selection of hypoxia- and diversification: a 10,000‑generation experiment
58. Rozen, D. E., Philippe, N., Arjan De Visser, J., tolerant Drosophila melanogaster. Proc. Natl Acad. with bacterial populations. Proc. Natl Acad. Sci. USA
Lenski, R. E. & Schneider, D. Death and cannibalism in Sci. USA 108, 2349–2354 (2011). 91, 6808–6814 (1994).
a seasonal environment facilitate bacterial coexistence. This paper uses extensive sequencing to identify 99. Carroll, S. M. & Marx, C. J. Evolution after
Ecol. Lett. 12, 34–44 (2009). targets of selection in another set of introduction of a novel metabolic pathway consistently
59. Barrick, J. E. & Lenski, R. E. Genome-wide mutational experimentally evolved populations of a sexual leads to restoration of wild-type physiology. PLoS
diversity in an evolving population of Escherichia coli. multicellular organism. Genet. 9, e1003427 (2013).
Cold Spring Harbor Symp. Quant. Biol. 74, 119–129 77. Izutsu, M. et al. Genome features of “Dark-fly”, 100. Wichman, H. A., Badgett, M. R., Scott, L. A.,
(2009). a Drosophila line reared long-term in a dark Boulianne, C. M. & Bull, J. J. Different trajectories of
60. Zambrano, M. M., Siegele, D. A., Almirón, M., environment. PLoS ONE 7, e33288 (2012). parallel evolution during viral adaptation. Science
Tormo, A. & Kolter, R. Microbial competition: 78. Ferenci, T. et al. Genomic sequencing reveals 285, 422–424 (1999).
Escherichia coli mutants that take over stationary regulatory mutations and recombinational events in This paper is one of the first to use whole-genome
phase cultures. Science 259, 1757–1760 (1993). the widely used MC4100 lineage of Escherichia coli sequencing to examine the nature of genetic
61. Finkel, S. E. Long-term survival during stationary K-12. J. Bacteriol. 191, 4025–4029 (2009). changes during an adaptive evolution experiment.
phase: evolution and the GASP phenotype. Nature 79. Nahku, R. et al. Stock culture heterogeneity rather 101. Woods, R., Schneider, D., Winkworth, C. L.,
Rev. Microbiol. 4, 113–120 (2006). than new mutational variation complicates short-term Riley, M. A. & Lenski, R. E. Tests of parallel
62. Treves, D. S., Manning, S. & Adams, J. Repeated cell physiology studies of Escherichia coli K-12 molecular evolution in a long-term experiment
evolution of an acetate-crossfeeding polymorphism in MG1655 in continuous culture. Microbiology 157, with Escherichia coli. Proc. Natl Acad. Sci. USA
long-term populations of Escherichia coli. Mol. Biol. 2604–2610 (2011). 103, 9107–9112 (2006).
Evol. 15, 789–797 (1998). 80. Hobman, J. L. et al. Comparative genomic 102. Bull, J. J. et al. Exceptional convergent evolution in a
63. Maharjan, R., Seeto, S., Notley-McRobb, L. & hybridization detects secondary chromosomal virus. Genetics 147, 1497–1507 (1997).
Ferenci, T. Clonal adaptive radiation in a constant deletions in Escherichia coli K-12 MG1655 mutants 103. Crozat, E. et al. Parallel genetic and phenotypic
environment. Science 313, 514–517 (2006). and highlights instability in the flhDC region. evolution of DNA superhelicity in experimental
64. Maharjan, R. P. et al. The multiplicity of divergence J. Bacteriol. 189, 8786–8792 (2007). populations of Escherichia coli. Mol. Biol. Evol. 27,
mechanisms in a single evolving population. Genome 81. Pósfai, G. et al. Emergent properties of reduced- 2113–2128 (2010).
Biol. 13, R41 (2012). genome Escherichia coli. Science 312, 1044–1046 104. Jerome, J. P. et al. Standing genetic variation in
This paper uses whole-genome sequencing to (2006). contingency loci drives the rapid adaptation of
uncover extensive genetic diversity in populations 82. Studier, F. W., Daegelen, P., Lenski, R. E., Maslov, S. & Campylobacter jejuni to a novel host. PLoS ONE 6,
that are evolving in chemostats. Kim, J. F. Understanding the differences between e16399 (2011).
65. Saxer, G., Doebeli, M. & Travisano, M. The genome sequences of Escherichia coli B strains REL606 105. Lee, M.‑C. & Marx, C. J. Repeated, selection-driven
repeatability of adaptive radiation during long-term and BL21(DE3) and comparison of the E. coli B and genome reduction of accessory genes in experimental
experimental evolution of Escherichia coli in a K-12 genomes. J. Mol. Biol. 394, 653–680 (2009). populations. PLoS Genet. 8, e1002651 (2012).
multiple nutrient environment. PLoS ONE 5, e14184 83. Gibson, D. G. et al. Creation of a bacterial cell 106. Cooper, V. S., Schneider, D., Blot, M. & Lenski, R. E.
(2010). controlled by a chemically synthesized genome. Mechanisms causing rapid and parallel losses of ribose
66. Herron, M. D. & Doebeli, M. Parallel evolutionary Science 329, 52–56 (2010). catabolism in evolving populations of Escherichia coli
dynamics of adaptive diversification in Escherichia 84. Isaacs, F. J. et al. Precise manipulation of B. J. Bacteriol. 183, 2834–2841 (2001).
coli. PLoS Biol. 11, e1001490 (2013). chromosomes in vivo enables genome-wide codon 107. Paquin, C. E. & Adams, J. Relative fitness can decrease
This study examines whole-population genetic replacement. Science 333, 348–353 (2011). in evolving asexual populations of S. cerevisiae.
dynamics in a system that diversified into two 85. Yang, L. et al. Evolutionary dynamics of bacteria in a Nature 306, 368–370 (1983).
ecotypes which coexisted owing to different human host environment. Proc. Natl Acad. Sci. USA
strategies for nutrient use. 108, 7481–7486 (2011). Acknowledgements
67. Korona, R., Nakatsu, C. H., Forney, L. J. & 86. Lieberman, T. D. et al. Parallel bacterial evolution The authors thank the reviewers for their suggestions. They
Lenski, R. E. Evidence for multiple adaptive within multiple patients identifies candidate acknowledge support from the US National Institutes of
peaks from populations of bacteria evolving in a pathogenicity genes. Nature Genet. 43, Health (R00‑GM087550 to J.E.B.), the US National Science
structured habitat. Proc. Natl Acad. Sci. USA 91, 1275–1280 (2011). Foundation (NSF; DEB‑1019989 to R.E.L.) and the BEACON
9037–9041 (1994). This study uses whole-genome sequencing to Center for the Study of Evolution in Action (NSF Cooperative
68. Rainey, P. B. & Travisano, M. Adaptive radiation in a investigate adaptation of a bacterial pathogen Agreement DBI‑0939454 to J.E.B. and R.E.L.).
heterogeneous environment. Nature 394, 69–72 during a multi-decade outbreak in a human
(1998). population; although this is not based on an Competing interests statement
69. Traverse, C. C., Mayo-Smith, L. M., Poltak, S. R. & experiment, it relies on inferential approaches The authors declare no competing interests.
Cooper, V. S. Tangled bank of experimentally similar to those used in experimental evolution.
evolved Burkholderia biofilms reflects selection 87. Sprouffske, K., Merlo, L. M. F., Gerrish, P. J.,
during chronic infections. Proc. Natl Acad. Sci. USA Maley, C. C. & Sniegowski, P. D. Cancer in light FURTHER INFORMATION
110, E250–E259 (2013). of experimental evolution. Curr. Biol. 22, LTEE: http://myxo.css.msu.edu/ecoli/
This study characterizes the mutational R762–R771 (2012). The US National Science Foundation BEACON Center for
pathways by which ecotypes from a primary 88. Loewe, L. & Hill, W. G. The population genetics of the Study of Evolution in Action: http://beacon-center.org/
niche evolved to displace ecotypes that were mutations: good, bad and indifferent. Phil. Trans. ALL LINKS ARE ACTIVE IN THE ONLINE PDF
established in other niches. R. Soc. 365, 1153–1167 (2010).

REVIEWS
Evolution of crop species: genetics of

domestication and diversification
Rachel S. Meyer1 and Michael D. Purugganan1,2
Abstract | Domestication is a good model for the study of evolutionary processes because
of the recent evolution of crop species (<12,000 years ago), the key role of selection in
their origins, and good archaeological and historical data on their spread and
diversification. Recent studies, such as quantitative trait locus mapping, genome-wide
association studies and whole-genome resequencing studies, have identified genes that
are associated with the initial domestication and subsequent diversification of crops.
Together, these studies reveal the functions of genes that are involved in the evolution of
crops that are under domestication, the types of mutations that occur during this process
and the parallelism of mutations that occur in the same pathways and proteins, as well as
the selective forces that are acting on these mutations and that are associated with
geographical adaptation of crop species.
Quantitative trait locus

Domestication has always been considered a unique resequencing studies. In these genes, widespread foot‑
(QTL). A genomic region with a form of biological evolution — a co‑evolutionary inter‑ prints of selection have been identified in the genomes
gene (or multiple linked genes) action that leads to the establishment of new domesti‑ of maize, rice, sunflower and several millet species,
that contains mutations which cated species, the growth and reproduction of which which allow us to better understand the forces of both
result in phenotypic variation
are mostly controlled for the benefit of another species. conscious selection and unconscious selection. Recent pop‑
in populations.
Domestication has been documented to have evolved at ulation-level molecular analyses also enable us to clarify
Genome-wide association least five times in evolutionary history, and classic exam‑ the demographic histories of the domestication process
studies ples include the cultivation of fungal species by attine ants, itself (for example, the processes of domesticating rice5
(GWASs). Studies that use ambrosia beetles and termites1. However, the most pro‑ and tomato6), which, together with expanded archaeo‑
linkage disequilibrium between
dense, usually single-nucleotide
lific domesticators are humans, who have domesticated logical studies, can illuminate the origins and histories of
polymorphism, markers across hundreds of plant species (BOX 1; see Supplementary crops7,8. Furthermore, the characterization of the muta‑
the genome to identify information S1 (table)) and animal2 species as sources of tions that lead to domestication gives an indication of
significant associations food and materials, and even for companionship and aes‑ the types of mutations and the functions of genes that
between genes (or genomic
thetic value in the past 12,000 years. Crops, in particular, are involved in the generation of domestication traits.
regions) and trait phenotypes.
represent some of the most marked evolutionary transi‑ Progress made in the past few decades now provides us
tions that are associated with domestication, which has with the foundation to examine patterns and processes
prompted interest in their study since Darwin drew inspi‑ that are associated with crop plant evolution, and to focus
ration from domesticated species to illuminate genetic on the genetics of their domestication and diversification
1
Center for Genomics and variation3, evolution and the power of selection4. Research since the Neolithic period.
Systems Biology, Department on such crop evolutionary processes is also driven by its In this Review, we discuss the genetic architecture of
of Biology, 12 Waverly Place, cultural and economic importance for humans. crop plant domestication and investigate the evolution‑
New York University,
The genetic architecture of crop domestication and ary genomics of this important process. By compiling a
New York 10003, USA.
2
Center for Genomics and the nature of selection in domesticated species have been list of known domestication and diversification genes,
Systems Biology, New York major foci of molecular genetic studies over the past two we dicuss patterns of selection over the course of the
University Abu Dhabi decades. A large number of domestication genes (or domestication process and also examine the origin and
Research Institute, Abu domestication-related genes) have been identified and spread of domestication alleles. Finally, we show how
Dhabi, United Arab Emirates.
Correspondence to M.D.P.
isolated through candidate gene studies, quantitative trait these molecular genetic insights have led to a more
e‑mail: mp132@nyu.edu locus (QTL) mapping and cloning, genome-wide association robust characterization of the evolutionary development
doi:10.1038/nrg3605 studies (GWASs) and, more recently, whole-genome of crop species.

REVIEWS
Box 1 | Geographical and phylogenetic distribution of domestication genes and species

a b Fagales
3,000 Cucurbitales
years ago Rosales
2,100 Fabales
su1 years ago
Oxalidales
Zagl1 Larger cob Malpighiales
pbf, tb1 Celastrales
6,200
8,700 years ago years ago Zygophyllales
Rosids
Sh1-5.1 and Sh1-5.2 Malvales
Brassicales
Sh1-1, ra1 Tga1 Huerteales
Sapindales
Picramniaceae
Polystichy Crossosomatales
Geraniales
Origin Popcorn Myrtales
Segregating old Vitaceae
mutations or traits 6,000 Saxifragales
years ago
New mutations or traits Lamiales
Pod corn (tu1)
Boraginaceae
Solanales
Gentianales
Crop species are domesticated at particular locales, and their subsequent
Eudicots
Vahliaceae
range expansion leads to wider geographical distribution. One could use the Oncothecaceae
distribution of ancestral haplotypes to infer the origin and the spread of a Garryales
particular domestication allele; for example, the rice haplotype that is Icacinaceae
Asterids
ancestral to the glutinous allele is prevalent in Southeast Asia, which suggests Asterales
that the phenotype originated in this region85. In special cases, one can Escalloniales
directly examine the spread of alleles; for example, work on archaeological Apiales
Paracryphiales
samples93 and ancient DNA samples94 of maize directly documents the
Dipsacales
accumulation, spread and fixation of mutations and their associated Bruniales
phenotypic evolution. Archaeological samples of ~8,700-year-old corn cobs Aquifoliales
have been recovered from southern Mexico, and samples from nearby areas Ericales
dating 1,500 years later are the first to show a phenotype that is congruent Cornales
with the teosinte glume architecture1 (Tga1) mutation which exposed the Berberidopsidales
grains and prevented the detachment of the grain from the cob95,96. Studies Caryophyllales
Santalales
of ancient DNA samples reveal that another 2,500 years later, in
Dilleniaceae
northeastern Mexico, teosinte branched1 (tb1) and prolamin-box binding Gunneraceae
factor (pbf) domestication alleles were fixed, whereas sugary1 (su1) was Buxaceae
fixed in North America much later (see the figure, part a). tb1 and pbf Trochodendraceae
altered meristem activity and kernel protein content, respectively94. Sabiaceae
Archaeological evidence shows that genes that are responsible for Proteales
polystichy (that is, the whorled multiple-ranked maize ear that Ranunculales
Ceratophyllaceae
differs from the two‑ranked inflorescence of teosinte) had already
Poales
undergone selection as maize was brought into Peru ~6,200 years Zingiberales
ago, and that these varieties retained the popcorn phenotype21. Commelinales
DNA sequence patterns in extant varieties indicate when mutations Arecales
Monocots
underwent selection: for mutations Sh1-5.1, Sh1-5.2, Sh1-1 and Asparagales
ramosa1 (ra1) selection occurred at the onset of domestication, Orchidales
and for Zagl1 and tunicate (tu1) it occurred during diversification. Liliales
Pandanales
The genetic changes that are seen in examples of food crop
Dioscoreales
domestication have mostly been studied in the Poaceae family of Alismatales
grasses, but these represent only a small proportion of the total Acorales
domesticated food crops. More than 160 plant families — mostly Magnoliales
Magnoliids
within the monocots and core eudicots, but also including many Laurales
non-flowering plants and basal eudicots that are distant from Canellales
model plants — have been found to contain domesticated Piperales
Chloranthaceae
species (see Supplementary information S1 (table)). Some
Trimeniaceae
families, such as Rosaceae, Malpighiaceae and Sapindaceae, Schisandraceae
contain the highest number of domesticated food crops, Austrobaileyaceae
but few crops from these families have been studied to Nymphaeales
understand the genes and processes of domestication. Future Hydatellaceae
foci in the topic of domestication will branch out to include Amborellaceae
Gymnosperms
non-model plant species from different environments that Ginkgoaceae

Cycadales
have different domestication traits. Part b of the figure shows
Pinales
a summary phylogeny97,98 of land plants and the distribution Number of species Gnetaceae
of 353 domesticated food crops as shaded boxes (see Welwitschiaceae
Supplementary information S1 (table)). 0 33


REVIEWS
a Stage 1: b Stage 2: c Stage 3: d Stage 4:

Onset of In situ increase in frequency Formation of cultivated populations that are Deliberate breeding
domestication of desirable alleles adapted to new environments and local preferences
C C
C C
W C W W C C W C C
C C
C C
Figure 1 | The evolutionary stages of domestication and diversification. Plant exploitation involves harvesting and
stewardship over wild stands with favourable traits (pre-Stage 1). The top panel schematically shows Nature
theReviews | Genetics
evolutionary
stages of crop plants, including the formation and the diversification of phenotypically distinct cultivated populations (C)
from wild populations (W). Each circle represents a population and those of different phenotypic characteristics are
shown in different shades of the same colour. Arrows represent the evolutionary establishment of derived populations
from ancestral populations. This is illustrated by the example of Zea mays in the bottom panel. a | After extended tending
of stands or the development and establishment of cultivation, selection occurs on the new crop in an agricultural
ecosystem, which leads to the onset of domestication (Stage 1). Teosinte (left) and reconstructed primitive maize (right)
are shown. Following Stage 1 is a crop diversification phase, which can encompass three non-exclusive stages. b | Stage 2
is the continuation of Stage 1 and involves the in situ amplification of populations with desirable alleles that lead to initial
increases in yield, as well as the selection of favourable crop phenotypes. Trait variation also increases. Corn varieties that
resemble those at its centre of origin (that is, the Mexican Highlands) are shown. c | As domesticated crops evolve and
spread from their initial geographical range (Stage 3), crop populations are adapted to new diversified environments
and local preferences. Pod corn (left) was selected for ceremonial use by Native Americans; popcorn (middle left) is
preferred in Peru; Italian red sweet corn (middle right) has also been selected; and dent corn (right), which is used to make
hominy and masa, is selected by Native Americans. d | Stage 4 is the deliberate breeding of crop varieties to maximize
yield, ease of farming, uniformity and quality. Uniform improved corn varieties are produced through modern deliberate
breeding efforts. Image in part a courtesy of J. Doebley, the University of Wisconsin–Madison, USA. Pod corn image in
part c is reproduced, with permission, from REF. 120 © (2012) US National Academy of Sciences.
Domestication and diversification ecology changed from food gathering to cultivation as

Plant domestication by humans encompasses a broad the primary mode of supplying plant food resources10.
spectrum of evolutionary changes that may decrease the Domesticated plant species are found in 160 taxonomic
fitness of a plant in the wild but increase it under human families (BOX 1; see Supplementary information S1
exploitation, and complete dependence on humans for (table)), with estimates that 2,500 species have under‑
survival is considered the fullest extent of domestication. gone domestication11, and 250 species are considered as
The domestication process under this broad definition fully domesticated2,12. The evolutionary trajectory from
can span a wide range of features in crop species evolu‑ wild species to crop species is a complex multi-staged
tion; for the purpose of this Review, we use domestication process. Archaeological records suggest that there
to refer to the onset or the initiation of the process of was a period of pre-domestication cultivation while
evolutionary divergence from the wild ancestral species. humans first began to deliberate planting or caring for
We use diversification to refer to the subsequent evolu‑ wild stands that have favourable traits (pre-Stage 1)13;
tion of new varieties, including greater improvement in as human-associated cultivation reshaped the evolu‑
Conscious selection yield, adaptation or quality in crop species. tionary trajectories of these species, they were trans‑
The intentional choice, made formed into domesticated species (Stage 1) (FIG. 1). Little
by humans, of preferred
Stages of domestication and diversification. Human- is known about the pre-domestication stage; although
phenotypes in cultivated plants
for use and propagation. associated plant domestication began ~12,000 years the domestication process itself was previously thought
ago in the Middle East and the Fertile Crescent, to be rapid14, increasing numbers of studies suggest a
Unconscious selection and subsequently in different parts of the world — protracted period for Stage 1 that could last as long as
Natural selection in crop China, Mesoamerica, the Andes, Near Oceania (all 2,000 years15.
species as a result of human
cultivation practices and of
~10,000 years ago), sub-Saharan Africa (~8,000 years The diversification phase that follows initial domes‑
growth in agro-ecological ago) and eastern North America (~6,000 years ago9). The tication — sometimes referred to as the improvement
environments. evolution of crop plants began as human behavioural phase16 — involves the spread and adaptation of the

REVIEWS
domesticated species to different agro-ecological and more recently, by linkage disequilibrium mapping using
cultural environments. This phase leads to phenotypic GWASs, and transgenic or genetic complementation
and genetic divergence among domesticated popula‑ analyses are used to conclusively identify the relevant
tions, and it can be thought of as having multiple stages genes. This has primarily been undertaken in maize or
that are associated with varying selective pressures17. rice, in which high-density genetic maps and molecu‑
Some key post-domestication stages may include in situ lar markers, as well as considerable genetic resources,
amplification of populations that have desirable alleles allow thorough molecular characterization and
(Stage 2); adaptation of a domesticated species to dif‑ high-resolution mapping.
ferent environments and human cultural practices that Identifying causative mutations that lead to domestica‑
accompany geographical radiation (Stage 3); and delib‑ tion or diversification phenotypes in these loci can be
erate breeding to maximize yield, ease of farming and difficult. Few studies have used site-directed mutagen‑
quality (Stage 4) (FIG. 1). Stages 1–3 have previously been esis or transgenic complementation to directly test for
described on the basis of the domestication history of the functional effects of specific mutations. However,
seed crops17; although these stages are often sequentially in several studies, clear functional consequences of
presented, they may occur simultaneously. Conscious identified mutations in crop evolution genes — for
and deliberate breeding of plants in Stage 4 has been example, premature stop codons, and insertion and
practised as far back as 11,400 years ago (for example, deletions (indels) — have led to the inference that
the hybrid breeding of figs18), but many traits in crop they are the causative mutations (see Supplementary
species during this stage are associated with modern information S2,S3 (tables)).
breeding methods (for example, the Green Revolution). As a result of uncertainties in the phenotypes that are
Green Revolution associated with specific stages in the evolution of domes‑
A series of research, breeding Domestication traits. Which traits were selected dur‑ ticated species, it may also be problematic to distinguish
and technology transfer
programmes in the
ing domestication or post-domestication diversification genes that underlie domestication from those that give
mid‑twentieth century that stages can vary depending on the species, as well as on the rise to subsequent diversification traits. Many genes
resulted in marked increases nature and the number of domestication events (FIG. 2). that underlie phenotypes which distinguish a domesti‑
in agricultural productivity in Domestication phenotypes are, by definition, traits that cated species from its wild ancestor have been labelled
developing countries.
are selected during the initial transformation and estab‑ as domestication genes, although, in many cases, there
Complementation lishment of the new domesticated species from its wild is no evidence that these phenotypes arose as a result of
Introduction of a wild-type ancestor (or ancestors); these phenotypes often include selection during the domestication process23.
allele into a mutant individual, the loss of dormancy, increases in seed size and changes We propose that a domestication gene should meet
through either genetic crosses in reproductive shoot architecture (TABLE 1). These traits the following criteria. First, its function has been charac‑
or transgenic methods, to
confirm that a particular gene
can arise through human preferences for ease of harvest, terized and is known to underlie a trait — for example, a
causes a specific phenotype. growth advantages under human propagation and/or loss of seed dispersal and an increase in seed size — that
survival in deforested or disturbed habitats17. Both con‑ is clearly associated with Stage 1 (that is, domestication)
Causative mutations scious selection by early farmers and unconscious selec‑ in the species of interest. Second, there is evidence of
Mutations that lead to altered
tion as a result of agricultural practices or environments19 positive selection at that locus. Third, there should be
gene functions, which result in
specific phenotypes. are involved in the domestication process. complete or near-complete fixation of at least one causa‑
tive mutation that is associated with the gene in all line‑
Fixation Diversification traits. Diversification traits among crop ages from a single domestication event. Applying these
Increase in the frequency of an plant species can be even more varied (TABLE 1). They can criteria can prove difficult, as there may be multiple
allelic variant until it is found in
all individuals in a population.
be seen as variation in domesticated populations, as they selective pressures that affect the same trait, domestica‑
result from crops that are adapting to fit specific uses, pref‑ tion traits may be poorly characterized, and selection
Selective sweeps erences and ecological growing conditions. For example, signatures can be difficult to detect. Moreover, soft
Rapid increases in population photoperiod sensitivity in wheat and barley arose as a phe‑ selective sweeps on standing genetic variation rather than
frequencies of positively
notype when cultivars spread out of the Fertile Crescent 20. on new genetic variation, introgression or severe genetic
selected mutations and linked
neutral mutations, which result Other traits, such as sticky or aromatic grains in rice and bottlenecks can obscure the evolutionary and selective
in significant reductions in popcorn in maize21, were selected and maintained by history of a locus. Thus, under our conservative crite‑
nucleotide diversity in localized specific cultures. For many, if not most, of these diver‑ ria, we may not identify the full range of domestication
regions of the genome that sification traits, it is likely that they evolve under con‑ genes in a crop species; nevertheless, these criteria can
flank the selected mutations.
scious selection. Adding to this complexity in inferring provide an initial appraisal of relevant genes that are
Introgression whether a trait has been selected during crop evolution associated with the origin of a crop species.
Recurrent crossing that leads is the fact that the functional use and the specific organs Diversification or improvement genes are selected
to the sharing of alleles that are targeted for selective change can differ over time9; for after the domestication process in Stage 1 and are
between gene pools (which
for example, the initial domestication of lettuce in Egypt associated with Stages 2 to 4 (REF. 24). Defining that a
can be unidirectional), such
as between domesticated involved selection for oilseed production, whereas current gene is involved in diversification and not in domesti‑
and wild populations. selective breeding efforts focus on leaf characteristics22. cation is aided by knowledge of the population struc‑
ture of the domesticated species and by information
Genetic bottlenecks Characterizing genetic architecture on early cultivated forms from the archaeological
Marked decreases in genetic
diversity that are caused
Identifying domestication and diversification genes. record to delimit early evolving traits versus late evolv‑
by reductions in effective Domestication or diversification genes have mostly ing traits. Several loci, such as FW2.2 (also known as
population sizes. been isolated through QTL fine-mapping studies and, LOC101245309) in the Solanaceae25 and suppressor of

REVIEWS
a Wild b Wild
Onset of Onset of
domestication domestication
M
Cultivated Cultivated
Recent
improvement
M M M M
Wild Wild
c d
Two onsets of
domestication Wild Wild Wild
Onset of
domestication
M Cultivated
Recent Recent
improvement improvement Cultivated
M M
Wild Other M M
species
Figure 2 | Demographic models of crop domestication. The characterization of domestication in crop species is
dependent on understanding the initiation and the course of the domestication process. The width Nature Reviews
of the | Genetics
channels
represents population size and geographical range; M = Nem, which is the product of effective population size (Ne) and
the migration rate (m). a | Earlier models of domestication posited a single domestication event and suggested that
domestication occurred through strong selection and severe genetic bottlenecks in a small population of the wild
progenitor, which resulted in greater reproductive isolation between the wild species and the domesticated species111. As
more archaeological and molecular data are now available and the evolutionary histories of more crops are better known,
new general models for domestication have been proposed. New alternative demographic models of domestication
suggest that the extent of the genetic bottleneck in the early evolution of crop species is variable — severe during the
domestication of corn43, but minimal for that of apple112 and carrot113. Even after a domestication bottleneck, diversity can
recover during the improvement or diversification phase through processes such as introgression from wild relatives43.
Furthermore, strong reproductive isolation is not a necessary feature of domestication114, and repeated introgression
between crops and their wild progenitors or other related species have been suggested in more recent models. b | The
importance of introgression between cultivated and wild relatives is indicated in alternative single domestication models.
Many grain crops such as amaranths, common millet, foxtail millet, maize, pearl millet, rice and wheat, as well as many
fruit crops (for example, apple and tomato) and root crops (for example, carrot), are thought to have undergone a single
domestication event9. c | Alternatively, studies have also shown that multiple domestication events characterize the
history of a quarter of the world’s food crops9, in which one wild species undergoes domestication in different regions or
at different time points. This multiple domestication model is exemplified by barley, bottle gourd, coconut, common bean,
aubergine and sorghum. d | A third alternative single domestication model has been proposed, in which crops are
domesticated from interspecific hybridization followed by clonal propagation. This is especially common in tree crops115,
such as citrus and banana, but is also found in many short-lived species, such as peanut and strawberry.
sessile spikelets1 (Sos1) in maize26, have been erroneously branched1 (tb1) locus27 — which controls differences in
inferred to be domestication loci and are instead impor‑ shoot architecture between maize and its wild teosinte
tant in more recent diversification of cultivated species. progenitor. The identification of maize tb1 as a domesti‑
cation gene has been followed over the past two decades
Genetic architecture of domestication and diversifica- by the identification of numerous other domestication
tion. Despite the caveats described above, QTL map‑ and diversification loci, most of which are in cereal
ping and genetic complementation analyses led to the crop species but with a few in non-grass species, such
isolation of the first domestication gene that has been as beans, cole crops, grape, sunflower and tomato (see
characterized at the molecular level — the teosinte Supplementary information S2 (table)).

REVIEWS
Table 1 | Commonly observed traits in crops* accompanying domestication (Stage 1) and diversification (Stages 2–4)
Stage 1 Stage 2 Stage 3 Stage 4
Seed crop • Larger seeds • More seeds • Reduced vernalization • Increased yield
• Resource allocation • Increased seed size variation • Reduced photoperiod • Increased abiotic stress tolerance
• Thinner seed coat, and • Pigment change sensitivity • Increased biotic stress tolerance
increased seed softening • Flavour change • Modified hormone • Improved eating quality
and ornamentation • Change in starch content sensitivity
• Inflorescence architecture • Non-shattering seeds‡ • Synchronized flowering
(including shape, number • Reduced germination time
and determinacy) inhibition • Shortened or extended
• Increased yield potential life cycle
and productivity • Dwarfism
• Loss of dormancy
• Determinate growth
Root and • Flavour change • Reduced toxicity • Hybridization using effect • Improved nutritional quality
Tuber • Resource allocation • Vegetative propagation and of heterosis • Improve multiplication ability
• Change in starch content reduced sexual propagation • Promotion of allogamy and rate
• Ability to thrive in modified • Abiotic stress tolerance • Increased yield
landscape • Biotic stress tolerance
• Reduced branching • Extended harvest season
Fruit • Flavour change • Increased fruit size variation • Improved pollination • Delayed ripening
• Resource allocation • Selfing breeding system success • Increased post-harvest quality
• Larger seed size • Reduced fruit shedding and delayed senescence
• Larger fruit size • Continuous fruiting • Increased yield
• Shortened life cycle • Increased abiotic stress tolerance
• Softer fruit • Increased resistance
• Attractiveness and even ripening
*Examples in annual or short-lived perennial fruits, roots and seeds are shown. Fewer general traits could be identified for less well-characterized crops, such as
leaf crops and long-lived perennial species, and these were therefore excluded. ‡A Stage 1 trait in some crop species.
QTL mapping studies (FIG. 3) were among the first reported in a GWAS of foxtail millet varieties, in which
attempts to dissect the genetic architecture of plant 512 loci were found to be associated with 47 agronomic
domestication and diversification, and such studies traits42. Despite the large number of domestication and/
provided the initial steps to identifying specific genes or diversification loci that have been identified by QTL
that are involved. These early studies, which were mapping and GWASs, these may all be underestimates;
mostly carried out in maize, rice and beans, indicated for example, one study in maize43 suggests that nearly
that only one or a few genes of large effect controlled 500 genomic regions, which are estimated to span up to
many domestication traits28,29, although this pattern 2,000 genes, show evidence of directional selection that
was not universal; for example, in foxtail millet, both is consistent with possible roles in domestication.
tillering and axillary branching are controlled by many
loci of small effect 30. Many QTL studies have also Functions and mutations
demonstrated that multiple key domestication traits Biological functions of domestication and diversifica-
are controlled by the same regions of the genome31,32, tion genes. We compiled data on 60 genes that have
which indicates that either pleiotropy or tight linkage been reported as domestication and/or diversification
among several loci may be an important attribute of the loci (see Supplementary information S2 (table)). We
evolution of domesticated species. chose these 60 genes because they have been included
The number of genes or QTLs that are thought to in population genetic studies and/or have been func‑
underlie traits of the domestication syndrome33 is difficult tionally validated. We also included various genes that
to estimate. In maize, QTLs for 9 domestication traits have been investigated using a wide range of approaches
ranged from 6 to 26 (REF. 34), and in rice, 13 domestica‑ to support their roles in crop evolution. Although this
tion traits were associated with 76 QTLs35. Loci that are list is by no means comprehensive, it illustrates the state
thought to underlie the diversification traits of the pho‑ of the field.
toperiod response and of flowering time vary in number As the roles of these genes have not necessarily been
among members of the Poaceae family: 25 QTLs and delimited by previous investigators, we re‑evaluated the
4 hotspot genomic regions were observed in maize36, role of these 60 genes and categorized them as domes‑
16 in foxtail millet 37 and 14 in rice38,39. QTL analyses tication or diversification loci. Using our criteria to
Domestication syndrome
The selection of traits that have also identified clusters of mapped loci for the examine these 60 genes, 23 genes were determined as
distinguish domesticated same trait 32. probable domestication genes that are associated with
species from their wild More recent GWASs have confirmed similar numbers evolution in Stage 1 (FIG. 1; see Supplementary informa‑
progenitors; similar traits were and patterns of detectable associations. A GWAS in rice tion S2,S3 (tables)), and 32 genes were more plausible as
often observed to occur in
different crops, which led
identified 80 loci for 14 agronomic traits40, and in sor‑ diversification genes or early crop improvement genes
people to view them as a ghum, 14 loci have been identified for the inflorescence (Stages 2 or 3). Five genes seem to have undergone
‘syndrome’. branch length trait 41. Similar numbers were also recently selection in both domestication and diversification.

REVIEWS
a Discovery of domestication mutations that b Candidate domestication-related genes discovered through resequencing
alter rice bran colour through fine mapping
Whole-genome resequencing of many domesticated rice cultivars and

Oryza rufipogon plants
×
Selective-sweep mapping GWASs

Genetic mapping of the red bran trait Identify regions with significantly Generate a SNP map and phenotype data
provides a general location of the Rc locus reduced diversity or high FST in
domesticated rice but not in wild rice
Identify SNPs that are associated with

Fine mapping localizes Rc to a bHLH gene grain width in a mixed linear model
Identify alleles that are fixed in cultivars
Sequence analyses of 440 individual High minor allele frequencies found in

cultivars identify distinct Rc haplotypes, Near-complete fixation found in an two SNPs: one in a known grain width
including a predominant 14-bp deletion unannotated gene, 09G0547100 locus and one in an unknown gene
Figure 3 | From discovery to characterization of domestication genes. a | The process of discovering domestication
genes and their underlying mutations are shown, and bran colour in Oryza sativa (rice) is used as an example. Disruption of
the pigmentation pathway that leads to red bran colour occurred early in the domestication process for rice (Stage 1),
possibly because contaminants in stored grain could be more easily identified against a white background. Red bran rice
(that is, the Rc haplotype) and white bran rice (that is, the rc haplotype)79 are shown. The white bran cultivars Jefferson and
IR64 are crossed with the wild progenitor Oryza rufipogon (which has a red bran colour) in a fine-mapping study of the Rc
quantitative trait locus78, which localizes the Rc locus in mapping populations to a MYC-like basic helix–loop–helix (bHLH)
gene. Rc haplotypes in 440 individual rice cultivars82 have a 14‑base pair deletion in the protein-coding region of the
bHLH gene, which is the predominant mutation that disrupts gene function. Although this deletion is the predominant
genotype under fixation, some varieties underwent parallel selection for different mutations in the same gene. b | Candidate
domestication-related genes are identified by selective-sweep mapping and by genome-wide association studies (GWASs).
Candidate genes that are under artificial selection are identified by whole-genome resequencing of 50 rice cultivars and
their wild relatives at 15 times coverage116. Several regions (or genes) show reductions in diversity or high Wright fixation
index (FST) on chromosome 1 in O. sativa ssp. japonica (that is, a domesticated variety group) relative to O. rufipogon (that
is, the wild progenitor). The unannotated 09G0547100 gene is a candidate domestication-related gene because it shows a
strong selective sweep in japonica116. 09G0547100 encodes a putative auxin-induced protein. Alternatively, GWAS can be
carried out after resequencing. A GWAS in rice39 led to the discovery of two single-nucleotide polymorphisms (SNPs) that
are associated with grain width — a diversification trait — on chromosome 5 in a compressed mixed linear model. One
SNP is within qSW5, which is a locus that is known to have a role in grain width, whereas the other SNP has not been
previously studied with regard to grain width. The C→G SNP that is associated with the unknown gene has a minor allele
frequency of 0.21 (REF. 39). Image in part a is reproduced, with permission, from REF. 78 © (2007) John Wiley & Sons, Inc.
Genes that are thought to be involved in domes‑ (table)), colour (the grape myb-related transcription fac‑
tication (Stage 1) contribute to various traits (see tor genes MYBA1 and MYBA2; and Brassica rapa TT8)
Supplementary information S2 (table)). They regulate and starch composition traits (maize sugary1 (su1); and
inflorescence development (Brassica oleracea CAL; com‑ WAXY in multiple species). Traits for specific cultural
mon bean TFL1; and maize barren stalk1 (ba1), ramosa1 practices and preferences, such as dwarfism (O. sativa
(ra1), tb1 and Zagl1), vegetative growth habit and height SD1), fragrance (rice BADH2) and pod corn (maize
(maize tb1; and Oryza sativa PROG1 and LG1), seed pig‑ MADS19 (m19)), were also selected. Moreover, genes
ment, seed size, casing, ornamentation (rice BH4; barley that control flowering time diversity have been described
NUD; and maize teosinte glume architecture1 (Tga1) and (O. sativa HD1; barley ELF3; maize CCT (also known
prolamin-box binding factor (pbf)), seed retention (rice as LOC100281853); pearl millet MADS11; and straw‑
SH4‑1; Sorghum bicolor SH1; and the wheat aspartic berry KSN), and these genes are possibly associated with
proteinase gene WAP2), nitrogen access and efficiency adaptation of crops to new environments (Stage 3).
(O. sativa AMT1;1), and fruit flavour (strawberry NES1 There are numerous genes that are associated with
and PINS) (see Supplementary information S2 (table)). recent breeding (Stage 4), which we have not enumer‑
Diversification genes also contribute to a range of phe‑ ated. Nevertheless, some of these Stage 4 loci seem to
notypes, and evolutionary changes include fruit shape and have their origins in earlier stages in the crop evolution‑
size (tomato FW2.2, OVATE (also known as LOC543847) ary process; for example, in maize, the yellow endosperm1
and calmodulin-binding protein SUN-like (SUN)), inflo‑ (y1) gene that colours endosperm yellow was strongly
rescence architecture (barley VRS1; soybean TFL1B; selected for in the 1920s in the United States, but this
and maize Sos1) (see Supplementary information S2 mutation can be traced to localized selection by Native

REVIEWS
Americans in early diversification44,45. Another example Mutational lesions in domestication and diversifi‑
is dwarfism in rice, particularly the reduction in culm cation genes can range from SNPs, indels, transposon
length that is mostly attributable to the semidwarf-1 gene insertions and gene duplications to large-scale chro‑
(SD1). Plants with mutations in this gene were bred dur‑ mosomal rearrangements (FIG. 4). Of the 60 genes we
ing the Green Revolution in the twentieth century, but examined (see Supplementary information S2 (table)),
evidence suggests that it was originally selected for by 35 genes had at least one causative SNP, 23 genes had
early Japanese farmers46,47. indels and 9 genes had a transposable element among the
causative mutations. For 4 of these 60 genes, a causative
Molecular functions of domestication and diversifica- mutation has not been reported.
tion genes. The isolation of genes that underlie domesti‑ Overall, most causal SNPs in domestication or diver‑
cation and diversification traits provides an opportunity sification genes were found to be nonsense mutations
to examine some of the characteristics of the loci that are or were found to occur in regulatory regions such as the
associated with the evolution of crop species. These loci promoter, which causes putative cis-regulatory changes
show a wide range of functions — from transcription that are usually shown by altered expression and that are
factors to metabolic enzymes — although many encode detected by PCR (FIG. 4; see Supplemenary information S2
similar enzymes or are involved in the same pathways (table)). Also common were genes with SNPs that
across species. produce altered, but presumably functional, proteins.
Mutations in regulatory genes, such as transcription Similarly, most indels formed null mutations either by
factors, are thought to underlie phenotypic changes inducing a translational frameshift or by inducing pre‑
that are associated with domestication (reviewed in mature truncations of the translated protein, whereas
REFS 48,49). Of the 60 genes that we examined and only rarely were cis-regulatory changes induced by an
that were reported to be involved in domestication or indel. Interestingly, 15% of the genes had transposable
diversification, 37 genes (~62%) encode transcription element insertions that had functional effects, which
factors, whereas 3 other genes encode transcription co- suggests that transposable elements have an important
regulators. Enzyme-encoding genes make up the second mutational role in domesticated plant genomes.
largest class of loci (14 genes), whereas the remaining Compared with SNPs and small indels, genomic
6 genes encode transporter proteins and ubiquitin ligase. changes that involve larger sequence alterations are less
Causative mutations in crop evolution loci have a commonly observed. Copy-number variants have been
range of functional effects (see Supplementary informa‑ observed only in the maize m19 gene or the tomato
tion S2 (table)). Many of these genes contain multiple SUN gene (see Supplementary information S2 (table)).
mutations that have functional consequences, which An even rarer type of observed genetic change is large
indicates that, during crop evolution, multiple muta‑ chromosomal rearrangements, as seen in RRS2 in barley,
tions that could be subject to selective pressures arise. in which the mutation is a genomic translocation that
Such mutations may be factors in the spread and modi‑ spans the domestication locus51.
fication of selected domestication and/or diversifica‑
tion phenotypes. On the basis of the genes that we have Processes of evolution
reviewed, nonsense mutations, premature truncations or Selection at the molecular level. Selection is a hallmark
other mutations that lead to null function (for example, of domestication and should leave molecular footprints
Nonsense mutations frameshifts and splicing defects) are the predominant in the genomes of crop species. The first domestication
Point mutations that transform type of causative change (38 of 60 genes). The next major gene that was isolated — the maize tb1 locus — has a
amino acid-encoding codons
into premature stop codons,
functional class of mutations are cis-regulatory mutations 60–90‑kb selective sweep that occurred upstream of the
which result in the generation (26 of 60 genes) and, finally, missense mutations or other 5′ end of the protein-coding region52. This sweep, which
of truncated proteins. types of structural changes that alter protein function is defined as an extended region of low nucleotide diversity,
(10 of 60 genes). spans the Hopscotch transposable element insertion
cis-regulatory mutations
These results suggest that both loss‑of‑function (FIG. 4b) in the cis-regulatory region that regulates tb1
Mutations in linked, usually
non-coding portions, of genes alleles and the alleles that alter gene expression are by far expression53. Early genome-scale surveys in maize sug‑
that alter levels and/or the most common types of functional changes that are gested that as many as 2–4% of genes in this cereal crop
patterns of transcription observed during crop evolution. These types of alleles species were under positive selection54, but recent work
of the linked gene. are likely to have large phenotypic effects, which is con‑ indicates that a much larger percentage (~7.6%) of the
Missense mutations
sistent with the marked phenotypic divergence that is maize genome has been affected by domestication and
Point mutations that change observed during domestication and diversification3,4. diversification43.
the identities of encoded A recent study in maize suggests that single-nucleotide Recent studies also reveal that selective sweeps are
amino acids, which result in polymorphisms (SNPs) that are associated with overall prevalent in the genomes of other crops, such as mung‑
changes in protein sequences.
quantitative trait variation (~79%) are linked to gene bean55, rice5,56 and tomato57,58. The largest crop genome
Nucleotide diversity regulatory regions within 5 kb upstream of protein- resequencing study so far, in which the genomes of 1,529
The number of coding regions50. Thus, the pattern of mutations that we wild and cultivated rice accessions were analysed, identi‑
single-nucleotide observe, particularly the preponderance of loss‑of‑func‑ fied 55 selective sweeps, including those that are associ‑
polymorphism in a genomic tion alleles in domestication and/or diversification loci, ated with the domestication genes BH4 (which causes
region, usually estimated as
the mean level of pairwise
may be specific to crop evolutionary traits and may a loss of hull colour) and SH4‑1 (which causes a loss of
nucleotide divergence in a not be representative of overall causative variation in seed shattering)59 that show fixation of causal mutational
sample or a population. domesticated plant genomes. variants in cultivated samples5.

REVIEWS
a Sorghum bicolor SH1 b Zea mays tb1

CTGGT Hopscotch
~4.9 kb
ATG ATG
~58 kb
c Z. mays Sh1-5.1 d Oryza sativa PROG1
Translocation A→T
ATG ATG
22.8 kb
e O. sativa qSH1 f Phasolus vulgaris TFL1
T→G T→A
ATG ATG
12 kb
Figure 4 | Types of mutations in crop domestication and diversification genes. For each crop, phenotypic changes
that correspond to the mutations of wild varieties (left) versus hypothetical early domesticated Nature Reviews
varieties (right)| Genetics
are
shown. a | A deletion in the cis-regulatory region (that is, the promoter) of the Sorghum bicolor SH1 gene75 results in the
non-shattering phenotype. b | Insertion of the Hopscotch transposable element results in a cis-regulatory mutation
in the Zea mays teosinte branched1 (tb1) gene, which leads to altered shoot architecture. c | Translocation leads to
the fusion of two exons from an unknown gene after exon 3 of the Z. mays Sh1‑5.1 gene75, which results in the loss of the
YABBY domain and a reduction in shattering. d | A missense single-nucleotide polymorphism (SNP) in the Oryza sativa
PROG1 gene76 results in erect growth in domesticated Asian rice varieties. e | A SNP in the cis-regulatory region (that is,
the promoter) of the O. sativa qSH1 gene117,118 results in the non-shattering phenotype. f | A SNP in a splice site of the
Phasolus vulgaris TFL1 gene119 results in determinate inflorescences.
Several studies reveal that many genes that seem to Partial selective sweeps are also observed in diversifica‑
underlie domestication phenotypes — such as the bar‑ tion loci, in which culture-specific selection of desirable
ley RRS2 locus that confers disease resistance and the traits leads to fixation of alleles in varieties but not across
pearl millet TB1 locus that confers reproductive meris‑ the entire domesticated species. A classic example is the
tem identity — show evidence of partial sweeps in which rice WAXY gene. Mutations that confer the sticky rice
the causal alleles are not fixed within the species but are phenotype are prized by some East and Southeast Asian
found at moderate frequencies30,60. Several factors can cultures. The mutation at the splice donor site of intron 1
preclude allele fixation and maintain allelic diversity in of this gene is associated with a ~240‑kb selective sweep,
domesticated populations; for example, the tomato LC but this is mostly found in the temperate japonica variety
and OVATE loci, which are both thought to confer the ini‑ group of rice that is popular in Japan and Korea (BOX 2).
tial increase in fruit size during domestication, may cause
seed sterility if both alleles are found together 61. Another Old versus new mutations. A major issue is whether
possibility is that multiple independent genes underlie the mutations that lead to domestication or diversifica‑
domestication trait, and that different genes lead to tion phenotypes are new mutations that arise near-
the selected phenotype in different crop populations. contemporaneously with the onset of positive selection,

REVIEWS
Box 2 | Glutinous grains — parallel evolution across species

(B. oleracea ssp. botrytis) and broccoli (B. oleracea ssp.
italica) (see Supplementary information S3 (table)). This
A waxy endosperm results when starch in cereal crop grains has low or no amylose and mutation is either fixed or at high frequency in these
contains greater amounts of amylopectin, which produces a sticky glutinous grain on domesticated subspecies, but it is also present at low
boiling. Waxy grains are found among many domesticated cereals and pseudocereals99. frequency in wild B. oleracea. Other examples of pos‑
The WAXY gene encodes the granule-bound starch synthase (GBSS) enzyme84,100.
sibly old mutations that are important in crop evolution
In rice, a G→A single-nucleotide polymorphism at the splice donor site of intron 1 is
responsible for the reduction in GBSS activity, which leads to glutinous rice in some
include those in the tb1 and Zagl1 genes in maize67, the
varieties. This mutation arose only once in glutinous rice varieties, possibly in mainland INTERMEDIUM-C (INT-C) gene68 and the PPD‑H1A
Southeast Asia, and spread to temperate japonica varieties that have reduced amylose haplotype in barley 69, and the LC gene in tomato61. This
levels in grains85,101. Results from studies on the WAXY gene in various species suggest suggests that many domesticated traits arise not from new
that mutant phenotypes are rare in the wild and that many cultivar alleles probably mutations but rather from mutations that are segregating
arose through novel mutations102. in ancestral wild populations of crop species70.
Several cultures are partial to sticky grains, and this phenotype has repeatedly
evolved in different cereal crop species. In sub-Saharan Africa, sorghum (Sorghum Multiple mutations and parallelism at the molecular
bicolor) waxy mutants underwent selection during diversification99,103. Northeast Asian level. It is not uncommon to observe morphologi‑
cultural preferences for sticky grains104 also seem to have driven parallel selection on
cal homoplasy in nature71–73, which naturally leads to
the waxy mutants in numerous species. Subsequent to glutinous rice being
incorporated into Japanese culture, the grain crop Job’s tears that has a waxy
the question: does selection for particular phenotypes
phenotype was domesticated105. In northern China, three mutations in the two copies affect the same genes or distinct genes in different spe‑
of WAXY are found in tetraploid broomcorn millet, and these mutations probably cies? Domesticated species provide excellent models to
underwent selection as this crop spread into East Asia86,106, where sticky rice already study this question. Selective pressures across multiple
existed. In East Asia, mutations in WAXY also arose in foxtail millet107 and in barley108, independently evolved domesticated populations or spe‑
and they were preferentially selected for in Japanese culture. cies can act on the same traits, such as the loss of seed
In the New World, sticky grain amaranths were used to make cakes as part of Aztec dispersal or increased seed size, and the ancestral states
human sacrifice rituals in Mexico, where the domestication of both Amaranthus for these traits are well characterized for these domesti‑
cruentus and Amaranthus hypochondriacus was thought to occur109. waxy mutants have cated taxa. Darwin used these ‘analogous variations’ to
also been selected in at least three Amaranthus spp. pseudocereals in Central and
describe changes in parallel evolution4, and Vavilov devel‑
South American — Amaranthus caudatus in Peru and A. hypochondriacus in Mexico
during domestication, and A. cruentus in Mexico during diversification110.
oped the Law of Homologous Series74 through the study
A. hypochondriacus was domesticated after A. cruentus, and the waxy allele is nearly of domesticated plant species.
completely fixed in Mexican A. hypochondriacus cultivars, which suggests that it was a Parallelisms at the molecular level provide a basis for
domestication gene in this species. There are many cases other than the example of Darwin’s observations and for Vavilov’s Law. In a single
WAXY, in which processing technology or cultural practices were adopted around a species, there are cases of multiple mutations that cause
particular diversification mutation in one crop, and these innovations may have the same domestication phenotype in cultivated species;
influenced selection for similar mutations in other new crops. these represent independent origins of the domestica‑
tion trait. In S. bicolor, unique haplotypes of SH1 charac‑
terize each of the three separate origins of the loss of seed
or old mutations that have a long history of segregation shattering in this species75. In this context, the discovery
in populations before the advent of selection. Whether of independent mutations in domestication loci adds
selection has affected old or new mutations has implica‑ support to the hypothesis that multiple domestications
tions for both the nature of the selective sweeps and the of S. bicolor occurred.
dynamics of the evolution of crop species; for example, Other domestication genes have also been shown to
selective sweeps on standing variation (rather than on have multiple causal mutations, but in these cases it is
new mutations) are expected to leave a weaker signature generally believed that only one mutation is fixed and is
of selection in the genome, which highlights the neces‑ associated with domestication, whereas other mutations
sity to investigate gene polymorphisms in both wild and are in low to moderate frequencies across the species. For
domesticated populations62–64. example, the O. sativa PROG1 gene may have 10 non-
Some domestication or diversification genes, such synonymous SNPs and 6 indels in the protein-coding
as the rice LG1 gene that is associated with a closed region, as well as 27 SNPs and 2 indels in the 5′ flank‑
panicle trait 65,66 or the SUN gene duplication in tomato ing region. However, a single A→T mutation that causes
that regulates organ shape61, seem to be novel alleles in a threonine-to-serine change in the carboxyl terminus of
domesticated cultivars that are absent in wild acces‑ the protein was shown to be sufficient to cause an erect
sions. However, many domestication alleles occur in plant habit by altering the binding properties of this tran‑
low to moderate frequencies in wild progenitor species. scription factor 76. This is consistent with phylogenetic
Although the presence of domestication alleles in wild analyses of the PROG1 gene that supports the monophyly
populations could have resulted from crop‑to‑wild gene of cultivars that have PROG1 alleles arising from a single
flow, several studies have indicated that some of these population of the wild progenitor species Oryza rufipogon,
are indeed ancestral alleles found in the wild species which indicates that selection on this gene during domes‑
that underwent positive selection in the derived crop. tication occurred once5. In addition, at least four other
For example, the B. oleracea CAL gene encodes a MADS mutations in the promoter region have been proposed to
Parallel evolution
box transcription factor that regulates floral meristem regulate gene expression levels that result in intraspecific
Independent evolution of the development, and a nonsense mutation leads to the pro‑ phenotypic variation77, and these may represent parallel
same trait in different species. liferation of floral meristems in domesticated cauliflower modifier mutations that are fixed in smaller populations.

REVIEWS
Another example of multiple domestication mutations with a putative proto-indica or O. rufipogon in South
is in the domestication Rc gene (FIG. 3), which has three Asia56, which resulted in the introgression of domesti‑
causal variants that contribute to regulatory changes in cation genes into indica. The rc allele that confers white
the production of anthocyanin in the rice grain. These pigmentation is an example of a domestication gene that
three mutations are associated with the elimination of the spread into indica by hybridization from japonica5.
dominant red pigment seed colour that is found in wild Diversification genes also spread to various varieties
O. rufipogon. Only one mutation, a 14‑base pair dele‑ through hybridization as alleles move to new places and
tion in exon 7 that leads to a translational frameshift, is cultures. The BADH2 locus is responsible for aromatic
consistently found in all white seeded domesticate spe‑ rice; although there were multiple causative mutations
cies and is absent in all wild accessions78. This suggests that arose in japonica, a single mutation in the badh2.1
that this deletion is the only causal variant that is associ‑ allele recombined into indica. This recombination resulted
ated with domestication, whereas the two other variants in fragrant indica cultivars that then continued to spread
seem to be diversification mutations. One of these vari‑ across several geographical regions92. The waxy splice site
ants is fixed only in japonica cultivars5, and the other is mutation originated in glutinous rice in tropical Southeast
not fixed but actually leads to a light red (as opposed to Asia, but subsequently moved into the low-amylose
white) grain colour that is prized in certain varieties79. temperate japonica variety of Northeast Asia85 (FIG. 2).
Other mutations have also been found in the Rc locus80,81,
albeit at very low frequencies, one of which restores the Perspectives
function of the RC protein to produce fully red seeds. With the continued interest in domesticated taxa that
The history of Rc variants82 suggests that, as rice culti‑ arise as a result of their agricultural value, there are now
vation spread, parallel selection towards an increase in detailed analyses of the genetics of numerous crop spe‑
colour diversity was applied to new mutations, as well as cies, which provide opportunities to examine general
introgressed from other progenitor populations. patterns and to infer the dynamics of the evolutionary
The same gene can also undergo parallel selection processes that are associated with crop origins and diver‑
in multiple crop species and may be a recurring target sification. We can begin to discern some general outlines
of selection; for example, comparative genomics studies regarding the genetics of the evolution of domesti‑
in the Poaceae family have shown the correspondence cated plant species. We do find that, as previously sug‑
of QTLs for several independently selected domestica‑ gested49,70, many genes that underlie crop evolutionary
tion or diversification traits among genera83. Mutations traits are regulatory in nature, with either transcription
at the WAXY locus, which encodes the granule-bound factors or cofactors being the targets of selection and cis-
starch synthase enzyme for amylose synthesis, is altered regulatory mutations having a key role in evolutionary
in rice84,85, broomcorn millet 86, foxtail millet 87 and three divergence. Most genes also have mutational lesions that
Amaranthus spp. pseudocereals88 to produce sticky lead to loss of function, including nonsense mutations
grains (BOX 2). Other examples of parallel selection dur‑ or frameshift indels, which is consistent with the large
ing diversification include the fruit-weight locus FW2.2 phenotypic effects that are observed during crop evolu‑
(REF. 25) in tomato, chilli pepper and aubergine; the tion. Transposable element insertions, which have been
orthologues of both the shattering gene SH3 and the Rc thought to have a key role in plant evolution, also account
gene in Asian rice (O. sativa) and African rice (Oryza gla- for causative mutations in 15% of the domestication and
berrima)89; and tb1 orthologues in maize (tb1), pearl mil‑ diversification genes reviewed in this paper. Finally, many
let (TB1) and barley (INT‑C)68. There are also examples loci have more than one functional mutation that segre‑
of parallel selection for genes within the same gene fami‑ gates in populations of crop species, which indicates that
lies (see Supplementary information S2 (table)), such as genes associated with crop domestication and diversifica‑
the APETALA2 transcription factors SH1 in rice and the tion are subject to recurrent mutations that are possibly
paralogous WAP2 gene in wheat, both of which reduce selective targets during evolution.
shattering by the same mechanism90,91. Although we can now begin to discern some gen‑
eral patterns of the molecular evolution of species, the
Gene flow in domestication and diversification. In challenge remains to obtain greater interspecific and
recent years, there has been a greater appreciation of the intraspecific molecular genetic data, to use the informa‑
role of hybridization between domesticated species and tion to develop and test more realistic models of origin
their wild ancestors, or even between distinct popula‑ and diversification, and to expand the research beyond
tions, in the spread of domestication or diversification the well-studied cereal crop domesticates. Researchers
phenotypes (FIG. 2). The role of gene flow in the dynamics are now investigating the genetics of domestication in
of domestication has been underscored by the idea that non-model crops and perennial crops, which increases
domestication, coupled with long-range movement of our understanding of the domestication process and
plants through human migrations and trade, is a pro‑ will probably lead to the discovery of novel domesti‑
longed process with cultivars and wild relatives occa‑ cation genes and evolutionary trajectories. Finally, we
sionally occurring in sympatry; for example, a recent are making great advances in the understanding of
molecular study in rice suggests that it was domesticated how cultivation by ancestral farmers in the Neolithic
once in China, which gave rise to the japonica variety period led to the origination and adaptation of new spe‑
group. Indica rice — a genetically distinct variety group cies with yields that are capable of sustaining human
— arose through subsequent hybridization of japonica population growth.

REVIEWS
Domestication provides a fascinating model for origin of crop species. We can now begin to see what
the study of evolution, and genetic and archaeologi‑ lessons can be learnt in the quest to feed the world in
cal advances in the last decade have replaced simplis‑ the face of growing population pressures and changing
tic ideas with more robust and complex models on the climates.
1. Mueller, U. G., Gerardo, N. M., Aanen, D. K., Six, D. L. 25. Doganlar, S., Frary, A., Daunay, M. C., Lester, R. N. & 46. Asano, K. et al. Artificial selection for a Green
& Schultz, T. R. The evolution of agriculture in insects. Tanksley, S. D. Conservation of gene function in the Revolution gene during japonica rice domestication.
Annu. Rev. Ecol. Evol. Syst. 36, 563–595 (2005). Solanaceae as revealed by comparative mapping of Proc. Natl Acad. Sci. USA 108, 11034–11039 (2011).
2. Duarte, C. M., Marba, N. & Holmer, M. Rapid domestication traits in eggplant. Genetics 161, 47. Asano, K. et al. Genetic and molecular analysis of
domestication of marine species. Science 316, 1713–1726 (2002). utility of sd1 alleles in rice breeding. Breed. Sci. 57,
382–383 (2007). 26. Doebley, J., Stec, A. & Kent, B. Suppressor of sessile 53–58 (2007).
3. Darwin, C. The Variation of Animals and Plants Under spikelets1 (Sos1): a dominant mutant affecting 48. Olsen, K. M. & Wendel, J. F. A bountiful harvest:
Domestication (John Murray, 1868). inflorescence development in maize. Am. J. Bot. 82, genomic insights into crop domestication phenotypes.
4. Darwin, C. On the Origin of Species by Means of 571–577 (1995). Annu. Rev. Plant. Biol. 64, 47–70 (2013).
Natural Selection, or the Preservation of Favoured 27. Doebley, J., Stec, A. & Hubbard, L. The evolution of 49. Purugganan, M. D. & Fuller, D. Q. The nature of
Races in the Struggle for Life (John Murray, 1859). apical dominance in maize. Nature 386, 485–481 selection during plant domestication. Nature. 457,
5. Huang, X. et al. A map of rice genome variation (1997). 843–848 (2009).
reveals the origin of cultivated rice. Nature 490, This paper reports the isolation of one of the first 50. Li, X. et al. Genic and non-genic contributions to
497–501 (2012). domestication genes. natural variation of quantitative traits in maize.
This paper uses more than 1,500 accessions in 28. Koinange, E. M. K., Singh, S. P. & Gepts, P. Genome Res. 22, 2436–2444 (2012).
whole-genome resequencing and high-resolution Genetic control of the domestication syndrome in 51. Hanemann, A., Schweizer, G. F., Cossu, R., Wicker, T. &
mapping to reveal the origin of rice, and the common bean. Crop Sci. 36, 1037–1045 (1996). Roder, M. S. Fine mapping, physical mapping and
selective sweeps and fixed domestication 29. Gepts, P. Crop domestication as a long-term selection development of diagnostic markers for the Rrs2 scald
mutations. experiment. Plant Breed. Rev. 24, 1–44 (2004). resistance gene in barley. Theor. Appl. Genet. 119,
6. Blanca, J. et al. Variation revealed by SNP genotyping 30. Remigereau, M. S. et al. Cereal domestication and 1507–1522 (2009).
and morphology provides insight into the origin of the evolution of branching: evidence for soft selection in 52. Clark, R. M., Linton, E., Messing, J., Doebley, J. F.
tomato. PLoS ONE 7, e48198 (2012). the Tb1 orthologue of pearl millet (Pennisetum Pattern of diversity in the genomic region near the
7. Fuller, D. Q. Biodiversity in Agriculture: Domestication, glaucum [L.] R. Br.). PLoS ONE 6, e22404 (2011). maize domestication gene tb1. Proc. Natl Acad. Sci.
Evolution, and Sustainability. 110–135 (Cambridge 31. Poncet, V. et al. Comparative analysis of QTLs USA 101, 700–707 (2004).
Univ. Press, 2012). affecting domestication traits between two 53. Studer, A., Zhao, Q., Ross-Ibarra, J. & Doebley, J.
8. Fuller, D. Q., Willcox, G. & Allaby, R. G. Early domesticated x wild pearl millet (Pennisetum glaucum Identification of a functional transposon insertion in
agricultural pathways: moving outside the ‘core area’ L., Poaceae) crosses. Theor. Appl. Genet. 104, the maize domestication gene tb1. Nature Genet. 43,
hypothesis in Southwest Asia. J. Exp. Bot. 63, 965–975 (2002). 1160–1163 (2011).
617–633 (2012). 32. Cai, H. & Morishima, H. QTL clusters reflect character This is a functional analysis of a cis-regulatory
9. Meyer, R. S., DuVal, A. E. & Jensen, H. R. Patterns and associations in wild and cultivated rice. Theor. Appl. polymorphism that results in a domesticated trait.
processes in crop domestication: an historical review Genet. 104, 1217–1228 (2002). 54. Wright, S. I. et al. The effects of artificial selection
and quantitative analysis of 203 global food crops. 33. Hammer, K. Das domestikationssyndrom. Die on the maize genome. Science 308, 1310–1314
New Phytol. 196, 29–48 (2012). Kulturpflanze 32, 11–34 (in German) (1984). (2005).
This study provides an overview of the features of 34. Doebley, J. & Stec, A. Inheritance of the morphological This is one of the first systematic estimates of the
domesticated plant species. differences between maize and teosinte: comparison number of genes in a domesticated plant genome
10. Devore, I. & Lee, R. B. Man the Hunter (Aldine De of results for two F2 populations. Genetics 134, that shows evidence of positive selection.
Gruyter, 1999). 559–570 (1993). 55. Isemura, T. et al. Construction of a genetic linkage
11. Dirzo, R. & Raven, P. H. Global state of biodiversity This is one of the classic papers on QTL mapping of map and genetic analysis of domestication related
and loss. Annu. Rev. Environ. Resour. 28, 137–167 domestication traits. traits in mungbean (Vigna radiata). PLoS ONE 7,
(2003). 35. Thomson, M. J. et al. Mapping quantitative trait loci e41304 (2012).
12. Gepts, P. et al. (eds.) Biodiversity in Agriculture: for yield, yield components and morphological traits in 56. Molina, J. et al. Molecular evidence for a single
Domestication, Evolution, and Sustainability an advanced backcross population between Oryza evolutionary origin of domesticated rice. Proc. Natl
(Cambridge Univ. Press, 2012). rufipogon and the Oryza sativa cultivar Jefferson. Acad. Sci. USA 108, 8351–8356 (2011).
13. Willcox, G. in Biodiversity in Agriculture: Domestication, Theor. Appl. Genet. 107, 479–493 (2003). 57. Cong, B., Liu, J. & Tanksley, S. D. Natural alleles at a
Evolution, and Sustainability (eds Gepts, P. et al.) 36. Xu, J. et al. The genetic architecture of flowering time tomato fruit size quantitative trait locus differ by
92–109 (Cambridge Univ. Press, 2012). and photoperiod sensitivity in maize as revealed by heterochronic regulatory mutations. Proc. Natl Acad.
14. Hillman, G. C. & Davies, M. S. Measured QTL review and meta analysis. J. Integr. Plant. Biol. Sci. USA 99, 13606–13611 (2002).
domestication rates in wild wheats and barley under 54, 358–373 (2012). 58. Paran, I. & Van Der Knaap, E. Genetic and molecular
primitive cultivation, and their archaeological 37. Mauro-Herrera, M. et al. Genetic control and regulation of fruit and plant domestication traits in
implications. J. World Prehist. 4, 157–222 (1990). comparative genomic analysis of flowering time in tomato and pepper. J. Exp. Bot. 58, 3841–3852
15. Purugganan, M. D. & Fuller, D. Q. Archaeological data Setaria (Poaceae). G3 3, 283–295 (2013). (2007).
reveal slow rates of evolution during plant 38. Yano, M. Genetic and molecular dissection of naturally 59. Li, C., Zhou, A. & Sang, T. Rice domestication by
domestication. Evolution. 65, 171–183 (2011). occurring variation. Curr. Opin. Plant Biol. 4, reducing shattering. Science 311, 1936–1939
16. Simmonds, N. W. Evolution of Crop Plants (Longman, 130–135 (2001). (2006).
1976). 39. Zhao, K. et al. Genome-wide association mapping 60. Fu, Y. B. Population-based resequencing analysis
17. Fuller, D. Q. & Allaby, R. G. Seed dispersal and crop reveals a rich genetic architecture of complex traits in of wild and cultivated barley revealed weak
domestication: shattering, germination, and Oryza sativa. Nature Commun. 2, 467 (2011). domestication signal of selection and bottleneck in
seasonality in evolution under cultivation. Annu. Plant 40. Huang, X. et al. Genome-wide association studies of the Rrs2 scald resistance gene region. Genome 55,
Rev. 38, 238–295 (2009). 14 agronomic traits in rice landraces. Nature Genet. 93–104 (2012).
18. Kislev, M. E., Hartmann, A. & Bar-Yosef, O. Early 42, 961–967 (2010). 61. Rodriguez, G. R. et al. Distribution of SUN, OVATE, LC,
domesticated fig in the Jordan Valley. Science. 312, 41. Morris, G. P. et al. Population genomic and genome- and FAS in the tomato germplasm and the relationship
1372–1374 (2006). wide association studies of agroclimatic traits in to fruit shape diversity. Plant Physiol. 156, 275–285
19. Zohary, D. Unconscious selection and the evolution of sorghum. Proc. Natl Acad. Sci. USA 110, 453–458 (2011).
domesticated plants. Econ. Bot. 58, 5–10 (2004). (2013). 62. Teshima, K. M., Coop, G. & Przeworski, M. How reliable
20. Knüpffer, H., Terentyeva, I., Hammer, K., This is a large-scale study that identifies regions are empirical genomic scans for selective sweeps?
Kovaleva, O. & Sato, K. Ecogeographical diversity — with selective sweeps that indicate potential Genome Res. 16, 702–712 (2006).
a Vavilovian appraoch. Dev. Plant Genet. Breed. 7, diversification genomic regions. 63. Ross-Ibarra, J., Morrell, P. L. & Gaut, B. S. Plant
53–76 (2003). 42. Jia, G. et al. A haplotype map of genomic variations domestication, a unique opportunity to identify the
21. Grobman, A. et al. Preceramic maize from Paredones and genome-wide association studies of agronomic genetic basis of adaptation. Proc. Natl Acad. Sci. USA
and Huaca Prieta, Peru. Proc. Natl Acad. Sci. USA traits in foxtail millet (Setaria italica). Nature Genet. 104, 8641–8648 (2007).
109, 1755–1759 (2012). 45, 957–961 (2013). 64. Innan, H. & Kim, Y. Detecting local adaptation using
22. De Vries, I. M. Origin and domestication of Lactuca 43. Hufford, M. B. et al. Comparative population genomics the joint sampling of polymorphism data in the
sativa L. Genet. Res. Crop Evol. 44, 165–174 of maize domestication and improvement. Nature parental and derived populations. Genetics 179,
(1997). Genet. 44, 808–811 (2012). 1713–1720 (2008).
23. Fuller, D. Q. Contrasting patterns in crop domestication 44. Palaisa, K., Morgante, M., Tingey, S. & Rafalski, A. 65. Ishii, T. et al. OsLG1 regulates a closed panicle trait in
and domestication rates: recent archaeobotanical Long-range patterns of diversity and linkage domesticated rice. Nature Genet. 45, 462–465
insights from the Old World. Ann. Bot. 100, 903–924 disequilibrium surrounding the maize Y1 gene are (2013).
(2007). indicative of an asymmetric selective sweep. Proc. Natl This is an excellent example of fine mapping,
24. Takeda, S. & Matsuoka, M. Genetic approaches to Acad. Sci. USA 101, 9885–9890 (2004). identification of a selective sweep, and functional
crop improvement: responding to environmental and 45. Benz, B., Perales, H. & Brush, S. Tzeltal and Tzotzil characterization and identification of a causative
population changes. Nature Rev. Genet. 9, 444–457 farmer knowledge and maize diversity in Chiapas, mutation that validates LG1 as a domestication
(2008). Mexico. Curr. Anthropol. 48, 289–300 (2007). gene.

REVIEWS
66. Zhu, Z. et al. Genetic control of inflorescence 87. Fukunaga, K., Kawase, M. & Kato, K. Structural 108. Nakao, S. On waxy barleys in Japan. Seiken Jiho. 4,
architecture during rice domestication. Nature variation in the Waxy gene and differentiation in 111–113 (in Japanese) (1950).
Commun. 4, 2200 (2013). foxtail millet [Setaria italica (L.) P. Beauv.]: implications 109. Sauer, J. D. The grain amaranths and their relatives:
67. Weber, A. L. et al. The genetic architecture of complex for multiple origins of the waxy phenotype. Mol. a revised taxonomic and geographic. Ann. Missouri
traits in teosinte (Zea mays ssp. parviglumis): new Genet. Genom. 268, 214–222 (2002). Bot. Gard. 54, 103–137 (1967).
evidence from association mapping. Genetics 180, 88. Park, Y. J., Nishikawa, T., Tomooka, N. & Nemoto, K. 110. Jimenez, F. R. et al. Assessment of genetic diversity in
1221–1232 (2008). The molecular basis of mutations at the Waxy locus Peruvian amaranth (Amaranthus caudatus and
68. Ramsay, L. et al. INTERMEDIUM‑C, a modifier of from Amaranthus caudatus L.: evolution of the waxy A. hybridus) germplasm using single nucleotide
lateral spikelet fertility in barley, is an ortholog of the phenotype in three species of grain amaranth. Mol. polymorphism markers. Crop Sci. 53, 532–541
maize domestication gene TEOSINTE BRANCHED1. Breed. 30, 511–520 (2012). (2013).
Nature Genet. 43, 169–172 (2011). 89. Gross, B. L., Steffen, F. T. & Olsen, K. M. The molecular 111. Haudry, A. et al. Grinding up wheat: a massive loss of
69. Jones, H. et al. Population-based resequencing reveals basis of white pericarps in African domesticated rice: nucleotide diversity since domestication. Mol. Biol.
that the flowering time adaptation of cultivated barley novel mutations at the Rc gene. J. Evol. Biol. 23, Evol. 24, 1506–1517 (2007).
originated east of the Fertile Crescent. Mol. Biol. Evol. 2747–2753 (2010). 112. Cornille, A. et al. New insight into the history of
25, 2211–2219 (2008). 90. Hofmann, N. R. SHAT1, a new player in seed domesticated apple: secondary contribution of the
70. Doebley, J. F., Gaut, B. S. & Smith, B. D. The molecular shattering of rice. Plant Cell. 24, 839 (2012). European wild apple to the genome of cultivated
genetics of crop domestication. Cell 127, 1309–1321 91. Zhou, Y. et al. Genetic control of seed shattering in rice varieties. PLoS Genet. 8, e1002703 (2012).
(2006). by the APETALA2 transcription factor SHATTERING 113. Iorizzo, M. et al. Genetic structure and domestication
71. Hennig, W. Phylogenetic Systematics (Univ. of Illinois ABORTION1. Plant Cell 24, 1034–1048 (2012). of carrot (Daucus carota subsp. sativus) (Apiaceae).
Press, 1966). 92. Kovach, M. J., Calingacion, M. N., Fitzgerald, M. A. & Am. J. Bot. 100, 930–938 (2013).
72. Wood, T. E., Burke, J. M. & Rieseberg, L. H. Parallel McCouch, S. R. The origin and evolution of fragrance 114. Dempewolf, H., Hodgins, K. A., Rummell, S. E.,
genotypic adaptation: when evolution repeats itself. in rice (Oryza sativa L.). Proc. Natl Acad. Sci. USA Ellstrand, N. C. & Rieseberg, L. H. Reproductive
Genetica 123, 157–170 (2005). 106, 14444–14449 (2009). isolation during domestication. Plant Cell. 24,
73. Ralph, P. & Coop, G. Parallel adaptation: one or many 93. Piperno, D. R., Ranere, A. J., Holst, I., Iriarte, J. & 2710–2717 (2012).
waves of advance of an advantageous allele. Genetics Dickau, R. Starch grain and phytolith evidence for 115. Miller, A. J. & Gross, B. L. From forest to field:
186, 647–668 (2010). early ninth millennium B.P. maize from the Central perennial fruit crop domestication. Am. J. Bot. 98,
74. Vavilov, N. I. The law of homologous series in variation. Balsas River Valley, Mexico. Proc. Natl Acad. Sci. USA 1389–1414 (2011).
J. Genet. 12, 47–89 (1922). 106, 5019–5024 (2009). This paper is an overview of the state of
This is a classic paper on the recurring traits that 94. Jaenicke-Despres, V. et al. Early allelic selection in understanding about perennial crop domestication
are seen among crops and their influence on the maize as revealed by ancient DNA. Science. 302, traits and demographic histories.
development of core evolutionary concepts. 1206–1208 (2003). 116. Xu, X. et al. Resequencing 50 accessions of cultivated
75. Lin, Z. et al. Parallel domestication of the Shattering1 95. Dorweiler, J. & Doebley, J. Developmental analysis of and wild rice yields markers for identifying
genes in cereals. Nature Genet. 44, 720–724 (2012). teosinte glume architecture1: a key locus in the agronomically important genes. Nature Genet. 30,
76. Jin, J. et al. Genetic control of rice plant architecture evolution of maize (Poaceae). Am. J. Bot. 84, 1313 105–111 (2011).
under domestication. Nature Genet. 40, 1365–1369 (1997). 117. Konishi, S. et al. An SNP caused loss of seed shattering
(2008). 96. Wang, H. et al. The origin of the naked grains of maize. during rice domestication. Science 312, 1392–1396
77. Tan, L. et al. Control of a key transition from prostrate Nature. 436, 714–719 (2005). (2006).
to erect growth in rice domestication. Nature Genet. 97. Soltis, D. E. et al. Angiosperm phylogeny: 17 genes, This is a classic paper on the isolation of a gene for
40, 1360–1364 (2008). 640 taxa. Am. J. Bot. 98, 704–730 (2011). non-shattering, which is a major domestication trait.
78. Furukawa, T. et al. The Rc and Rd genes are involved 98. Stevens, P. F. Angiosperm Phylogeny Website, Version 118. Konishi, S., Ebana, K. & Izawa, T. Inference of the
in proanthocyanidin synthesis in rice pericarp. Plant J. 12. [online], http://www.mobot.org/MOBOT/research/ japonica rice domestication process from the
49, 91–102 (2007). APweb/ (2012). distribution of six functional nucleotide polymorphisms
79. Sweeney, M. T., Thomson, M. J., Pfeil, B. E. & 99. Kempton, J. H. Waxy endosperm in coix and sorghum. of domestication-related genes in various landraces
McCouch, S. Caught red-handed: Rc encodes a basic J. Hered. 12, 396–400 (1921). and modern cultivars. Plant Cell Physiol. 49,
helix-loop-helix protein conditioning red pericarp in 100. Sano, Y. Differential regulation of waxy gene 1283–1293 (2008).
rice. Plant Cell. 18, 283–294 (2006). expression in rice endosperm. Theor. Appl. Genet. 68, 119. Repinski, S. L., Kwak, M. & Gepts, P. The common
80. Brooks, S. A., Yan, W., Jackson, A. K. & Deren, C. W. 467–473 (1984). bean growth habit gene PvTFL1y is a functional
A natural mutation in rc reverts white-rice-pericarp to 101. Olsen, K. M. et al. Selection under domestication: homolog of Arabidopsis TFL1. Theor. Appl. Genet.
red and results in a new, dominant, wild-type allele: evidence for a sweep in the rice waxy genomic region. 124, 1539–1547 (2012).
Rc‑g. Theor. Appl. Genet. 117, 575–580 (2008). Genetics 173, 975–983 (2006). 120. Wingen, L. U. et al. Molecular genetic basis of pod
81. Gross, B. L. et al. Seeing red: the origin of grain This is an early molecular population genetic corn (Tunicate maize). Proc. Natl Acad. Sci. USA 109,
pigmentation in US weedy rice. Mol. Ecol. 19, analysis of a crop diversification gene. 7115–7120 (2012).
3380–3393 (2010). 102. Kilian, B. et al. Haplotype structure at seven barley
82. Sweeney, M. T. et al. Global dissemination of a single genes: relevance to gene pool bottlenecks, phylogeny Acknowledgements
mutation conferring white pericarp in rice. PLoS of ear type and site of barley domestication. Mol. The authors thank the members of the Purugganan labora-
Genetics 3, e133 (2007). Genet. Genom. 276, 230–241 (2006). tory for their feedback, particularly J. Flowers, A. Plessis and
This paper shows how introgression between 103. de Alencar Figueiredo, L. F. et al. Phylogeographic U. Rosas. R.S.M. is supported by a postdoctoral fellowship
populations leads to the spread of a domestication evidence of crop neodiversity in sorghum. Genetics from the US National Science Foundation Plant Genome
trait. 179, 997–1008 (2008). Research Program (NSF PGRP). Work on domestication in the
83. Paterson, A. H. et al. Convergent domestication 104. Sakamoto, S. in Redefining Nature: Ecology, Culture Purugganan laboratory is also funded by grants from NSF
of cereal crops by independent mutations at and Domestication (eds Ellen, R. & Fukui, K.) PGRP and the New York University Abu Dhabi Research
corresponding genetic loci. Science 269, 215–231 (Berg, 1996). Institute, United Arab Emirates.
1714–1718 (1995). 105. Hachiken, T. et al. Deletion commonly found in Waxy
84. Wang, Z. Y. et al. The amylose content in rice gene of Japanese and Korean cultivars of Job’s tears Competing interests statement
endosperm is related to the post-transcriptional (Coix lacryma-jobi L.). Mol. Breed. 30, 1747–1756 The authors declare no competing interests.
regulation of the waxy gene. Plant J. 7, 613–622 (2012).
(1995). 106. Araki, M., Numaoka, A., Kawase, M. & Fukunaga, K.
85. Olsen, K. M. & Purugganan, M. D. Molecular evidence Origin of waxy common millet, Panicum miliaceum L. FURTHER INFORMATION
on the origin and evolution of glutinous rice. Genetics in Japan. Genet. Res. Crop Evol. 59, 1303–1308 Plants for a future: http://pfaf.org
162, 941–950 (2002). (2012).
86. Hunt, H. V., Denyer, K., Packman, L. C., Jones, M. K. & 107. Kawase, M., Fukunaga, K. & Kato, K. Diverse origins of SUPPLEMENTARY INFORMATION
Howe, C. J. Molecular basis of the waxy endosperm waxy foxtail millet crops in East and Southeast Asia See online article: S1 (table) | S2 (table) | S3 (table)
starch phenotype in broomcorn millet (Panicum mediated by multiple transposable element insertions. ALL LINKS ARE ACTIVE IN THE ONLINE PDF
miliaceum L.). Mol. Biol. Evol. 27, 1478–1494 (2010). Mol. Genet. Genom. 274, 131–140 (2005).

REVIEWS
A new world of Polycombs:

unexpected partnerships and
emerging functions
Yuri B. Schwartz1 and Vincenzo Pirrotta2
Abstract | Polycomb group (PcG) proteins are epigenetic repressors that are essential for
the transcriptional control of cell differentiation and development. PcG-mediated
repression is associated with specific post-translational histone modifications and is
thought to involve both biochemical and physical modulation of chromatin structure.
Recent advances show that PcG complexes comprise a multiplicity of variants and are far
more biochemically diverse than previously thought. The importance of these new PcG
complexes for normal development and disease, their targeting mechanisms and their
shifting roles in the course of differentiation are now the subject of investigation and the
focus of this Review.
Homeotic genes The classical view of the mechanisms of Polycomb the role of both overexpression and underexpression of
A set of related master group (PcG) proteins is based on genetic evidence some PcG complexes in oncogenesis.
transcription regulatory factors from Drosophila melanogaster — genes were classified This Review attempts to summarize what has been
that regulate morphogenesis as belonging to the PcG on the basis of mutations that learned about the varieties of PcG complexes, the range
and tissue differentiation.
result in the derepression of D. melanogaster homeotic of roles that they might have on chromatin and non-
genes1,2. Complexes of PcG proteins are recruited to any chromatin targets, and the ways in which they may be
given homeotic gene if that gene is transiently repressed recruited to their targets. As is often the case, the techni-
by segmentation gene products, which are themselves cally challenging functional studies of PcG complexes lag
governed by maternal positional cues. As a result, PcG behind their biochemical characterization. Therefore, we
complexes keep homeotic genes repressed in specific suggest the reader to take some of the emerging new
embryonic domains, and this repressed state is, in most roles of PcG complexes with caution, as they have yet to
cases, maintained for the rest of development 3. stand the test of time in this rapidly developing research
Work in mammalian and fly systems over the past field. We first review the classical (or canonical) model
10 years has changed our perspective of this PcG para- for the structure and function of PcG complexes, and
digm. High-throughput genomic techniques have shown we then discuss various novel Polycomb repressive com-
that, in addition to homeotic genes, hundreds, and perplex 1 (PRC1)‑related complexes and their possible roles
1
Department of Molecular haps thousands, of other genes are also regulated by PcG in flies and mammals. We then focus on variant PRC2
Biology, Umeå University, proteins. Many of these target genes encode transcrip- complexes, before moving on to the problem of recruit-
Byggnad 6L, Norrlands
University Hospital,
tion factors or morphogens that control key development and concluding with a discussion of new discov-
901 87 Umeå, Sweden. mental processes. PcG-mediated repression of many of eries on the role of PcG complexes in disease. When
2
Department of Molecular these genes is dynamic and can vary during develop- mammalian results are not further specified, they refer
Biology and Biochemistry, ment and differentiation, although the repressed state to both mouse and human data.
Rutgers University,
tends to be maintained from one cell cycle to the next.
604 Allison Road,
Piscataway, New Therefore, a major question is how PcG proteins pro- PcG complexes — the canonical view
Jersey 08854, USA. vide both the flexibility and versatility that are needed Genetic and biochemical experiments in flies and mam-
Correspondence to V.P. for different developmental targets. Recent advances mals converged to give a molecular picture of the basic
e‑mail: in the biochemical characterization of PcG complexes PcG-mediated repressive mechanism. Two principal
Pirrotta@dls.rutgers.edu
doi:10.1038/nrg3603
have revealed a range of new components, which lead to multiprotein Polycomb repressive complexes PRC1 and
Published online a large number of variant PcG complexes. In addition, PRC2 are recruited to PcG-target genes and collaborate
12 November 2013 analyses of cancer-associated mutations have revealed to effect transcriptional repression. In D. melanogaster,

REVIEWS
specific Polycomb response elements (PREs) have been they bind to a specific target gene depends on the prior
identified at many of the target genes of PcG proteins; history and the chromatin state of that gene.
PREs are the binding sites to which PRC1 and PRC2 are
recruited, often together with additional proteins that Repressive functions of PcG complexes. The function of
are thought to modulate repressive functions (BOX 1). PcG complexes, which has been well demonstrated in
Neither PRC1 nor PRC2 has DNA-binding components. plants, insects and vertebrates, is to suppress the expres-
Unlike most DNA-binding transcription factors, a key sion of their target genes. How this is exactly accom-
feature of PcG complexes in both flies and mammals plished is less clear, but it is most likely that both PRC1
is that, although they are present in all cells, whether and PRC2 have repressive activities4. It is generally
Box 1 | The canonical Polycomb group complexes

PRC1
Polycomb repressive complex 1 (PRC1) has a core of four proteins122–124. In Drosophila melanogaster, these are Polycomb
(Pc), which contains a chromodomain that binds to trimethylated histone H3 lysine 27 (H3K27me3); Polyhomeotic (Ph),
which has two paralogues Polyhomeotic-proximal (Ph-p) and Polyhomeotic-distal (Ph-d); RING1, the product of Sex combs
extra (Sce); and Posterior sex combs (Psc), or the closely related Suppressor of zeste 2 (Su(z)2) (see the figure). RING1 and
Psc are structurally related and form a heterodimer, which promotes the E3 ubiquitin ligase activity of RING1 on histone
H2A125–127. The RING1–Psc heterodimer is the framework on which the core PRC1 complex is assembled. More loosely
associated with the core complex is Sex comb on midleg (Scm), a protein with two malignant brain tumour (MBT) repeats
and a sterile α-motif (SAM) domain, through which it is thought to interact with Ph122,128,129. Representations of the core
PRC1 and PRC2 complexes are shown in the figure. The areas of the circles that depict subunits of the D. melanogaster
complexes reflect the relative sizes of the corresponding proteins. The dashed outline of the Scm subunit indicates its weak
association with the PRC1 complex. The relative arrangement of the subunits reflects known direct associations.
Mammalian homologues have been discovered for each of the PRC1 proteins, and mammalian genomes have many
alternative paralogues for each (TABLE 1). Thus, mammals have RING1 and RING2, although RING2 predominates. The two
proteins seem to be interchangeable in at least some of the complexes, but this has not been systematically examined.
There are at least three Ph homologues (Polyhomeotic-like protein 1 (PHC1), PHC2 and PHC3), five Pc homologues
(chromobox protein homologue 2 (CBX2), CBX4, CBX6, CBX7 and CBX8), two Psc homologues (BMI1 and MEL18) and four
other Polycomb group RING finger proteins (PCGFs)15,19,20.
PRC2
PRC2 contains the Enhancer of zeste methyltransferase (E(z)) that monomethylates, dimethylates and trimethylates H3K27
(REFS 130–133). The methylation of H3K27 is essential for Polycomb group (PcG)-mediated repression and, in
D. melanogaster, the replacement of wild-type histone H3 with a Lys27Arg variant mimics the loss of E(z)134. D. melanogaster
E(z) is the core which binds to the WD40 domain of Extra sexcombs (Esc) (or of its close homologue Escl) and to Su(z)12,
both of which are essential for PRC2 activity because they interact with both the target and the surrounding nucleosomes
and receive inputs that regulate the methyltransferase activity91–95. The histone chaperone Caf1 binds to Su(z)12 and
contributes to the activity of PRC2 (see the figure). Mammalian PRC2 complexes contain the direct homologues EZH2 (or,
in some cases, EZH1), EED, SUZ12 and the Caf1 homologues histone-binding proteins RBBP4 and RBBP7. Although there is
only one EED gene, alternative transcription start sites result in several products that may give rise to different functions133.
An additional component, zinc-finger protein AEBP2 (the mammalian homologue of Jing in D. melanogaster), promotes the
stability of the complex and the binding to at least a subset of target sites135–137, but it is not essential for function.
Supporting components
Analyses of other D. melanogaster PcG genes showed that their products are not components of PRC1 and PRC2 but form
distinct accessory complexes. It is becoming clear that the binding and/or the repressive activities of PcG complexes result
from a multiplicity of fairly weak interactions that collectively constitute the robust repressive mechanism.
The Pho repressive complex (PhoRC) contains Pho (a DNA-binding protein that is homologous to the mammalian
transcriptional repressor protein YY1) and SFMBT (Scm-like with four MBT domains protein)138,139. As a sequence-specific
DNA-binding protein, Pho is thought to help the recruitment of PcG complexes to Polycomb response elements (PREs).
The mammalian YY1 has long been thought to interact with PcG complexes, but genomic binding profiles show little
overlap between YY1‑binding sites and PcG proteins in mammalian genomes139–141.
The Polycomb repressive deubiquitinase complex (PR-DUB) contains the Calypso ubiquitin carboxy-terminal hydrolase
and Additional sex combs (Asx). It has a specific H2A deubiquitinase activity that is paradoxically required for
PcG-mediated repression142, which
suggests that the appropriate PRC1 PRC2
regulation of ubiquitylation is
essential for PcG-mediated E(z)
Scm RING1
repression. Mammalian homologues Esc
of these proteins exist, but their role
in PcG-mediated repression has not Ph Psc Caf1
been established. Su(z)12
Jing
For more comprehensive reviews
of canonical PcG complexes and Pc
their action, see REFS 144,145.


REVIEWS
Table 1 | PRC1 and PRC2 core complex components in Drosophila melanogaster and humans
Drosophila melanogaster subunits Characteristic domains Homologous subunits in humans
Polycomb repressive complex 1 (PRC1)
E3 ubiquitin-protein ligase RING1 RING RING2 (also known as RING1B and RNF2) and RING1 (also known as
(also known as Sce) RING1A and RNF1)
Posterior sex combs (Psc) and RING BMI1 (also known as PCGF4) and MEL18 (also known as PCGF2)
Suppressor of zeste 2 (Su(z)2)
Polyhomeotic-proximal (Ph‑p) and Sterile α-motif (SAM) and Polyhomeotic-like protein 1 (PHC1; also known as EDR1),
Polyhomeotic-distal (Ph‑d) zinc‑finger PHC2 (also known as EDR2) and PHC3 (also known as EDR3)
Polycomb (Pc) Chromodomain Chromobox protein homologue 2 (CBX2),
CBX4, CBX6, CBX7 and CBX8
Sex comb on midleg (Scm) Malignant brain tumour (MBT), Sex comb on midleg homologue 1 (SCMH1) and
SAM and zinc‑finger Sex comb on midleg-like protein 2 (SCML2)
Polycomb repressive complex 2 (PRC2)
Enhancer of zeste (E(z)) SANT, CXC and SET (Su(var)3‑9– Enhancer of zeste homologue 2 (EZH2; also known as KMT6) and EZH1
Enhancer of zeste–Trithorax)
Extra sex combs (Esc) and Extra sex WD40 EED
combs-like (Escl)
Suppressor of zeste 12 (Su(z)12) Zinc‑finger and VEFS (VRN2– SUZ12
EMF2–FIS2–Su(z)12) box
Chromatin assembly factor 1 subunit WD40 Histone-binding protein RBBP4 (also known as RBAP48) and RBBP7
Caf1 (also known as RBAP46)
Jing Zinc‑finger Zinc-finger protein AEBP2
considered that the histone H2A ubiquitylation pro- was also reached for mouse embryonic stem cells by
duced by the E3 ubiquitin-protein ligases RING1 or comparing the genes that were derepressed by the knockout
RING2 components of PRC1 interferes with tran- of the gene encoding the mouse RING2 protein with
scription elongation by RNA polymerase II5, but PcG- those that were derepressed by the knockout of embry-
mediated repression has also been shown to prevent onic ectoderm development (Eed) (hence the knockout
Pol II from forming the initiation complex 6. It has also of PRC2)4.
been claimed that PRC1 induces local chromatin con- Clearly, despite two decades of intensive studies,
densation even in the absence of H2A ubiquitylation7,8. many gaps remain in our understanding of how PRC1
Repressive functions of PRC2 are less well characterized, and PRC2 effect transcriptional repression. Such repres-
but it is clear that histone H3 lysine 27 (H3K27) meth- sion most probably involves multiple mechanisms that
ylation by PRC2 prevents H3K27 acetylation, a modi- interfere with productive gene expression.
fication that is associated with both the promoter and
enhancer regions of active genes. RING2 complexes
A genetic study of PRC1 functions in D. melanogaster The RING2 protein (also known as RING1B and RNF2)
showed that different PRC1‑binding genes have different is considered the heart of the PRC1‑mediated repressive
requirements. For some, repression requires all four core mechanism. In the past few years, the nature and func-
components of PRC1, whereas others are not affected tions of RING2‑containing complexes have been discov-
by the absence of RING1 (the product of Sex comb extra ered to be far more diverse with the exuberant expansion
(Sce); also known as dRING) or Polycomb (Pc) but are in our knowledge of the range of complexes that differ in
more dependent on the Polyhomeotic (Ph), Posterior the number and variety of components (FIG. 1). Whether
sex combs (Psc) and Suppressor of zeste 2 (Su(z)2) all of these new complexes function as epigenetic repres-
components9 (TABLE 1). These results, taken at face value, sors remains an open question.
suggest four main conclusions. First, the repressive activ-
ity associated with PRC1 is far more heterogeneous than KDM2‑containing complexes. In both mammals and
expected. Second, the canonical PRC1 complex, at least flies, the RING2 protein and its activity as an H2A E3
in D. melanogaster, can be partially disassembled with- ubiquityl transferase are crucial for the repression of
out necessarily losing all repressive function. Third, HOX genes. However, a surprising discovery was that
repression does not always require H2A ubiquitylation. much of this activity does not reside in the canonical
Fourth, the repression of some genes in the absence PRC1 complex. It was first reported in D. melanogaster
of the Pc component, which binds to trimethylated that a complex called dRING-associated factors (dRAF)
H3K27 (H3K27me3), suggests that H3K27me3 is not — containing RING1, Psc, and the histone H3K36
specifically required in these cases. It is clear that some demethylase Kdm2 — is in fact responsible for most of
D. melanogaster genes bind to PRC1 in the absence of the H2A ubiquitylation11. Although the genomic distri-
PRC2 or H3K27me3 (REF. 10). This last conclusion bution of dRAF is not available, a comparison of RING1,

REVIEWS
a Canonical PRC1 b RING2–RYBP core complex Pc and Psc distributions indicates that few sites bind to
(PRC1.2; PRC1.4) both RING1 and Psc but not to Pc, which suggests that
dRAF and PRC1 target the same genes.
SCMH1 RING2 RING2 In mammals, Kdm2 has two homologues, KDM2A
and KDM2B. Both of these contain a zinc-finger‑CxxC
PHC1 BMI1
RYBP
BMI1 motif that binds to unmethylated CpG islands and
removes the dimethylation or trimethylation mark
of H3K36 that is widely distributed in mammalian
CBX4 chromatin12–14. Similarly to D. melanogaster Kdm2, mam-
malian KDM2B, but not KDM2A, forms a complex that
includes RING2 and a Psc-related protein, Polycomb
group RING finger protein 1 (PCGF1). The zinc-
d RING2–L3MBTL2 complex finger‑CxxC motif of KDM2B targets this complex to
c RING2–KDM2B complex (PRC1.6)
(PRC1.1) a subset of unmethylated CpG islands that are bound
E2F6
by PRC1 and PRC2 (REFS 12–14), where it seems to be
responsible for most of the H2AK119 ubiquitylation
RING2 RING2 L3MBTL2
KDM2B (H2AK119ub), at least in embryonic stem cells 13. In
CBX3
HDAC1 addition, the mammalian analogue of dRAF incorpo-
RYBP PCGF1 RYBP
PCGF6 rates either RYBP (RING1 and YY1‑binding protein)
USP7 or its close homologue YY1‑associated factor 2 (YAF2)
Dp-1 (see below), both of which greatly stimulate the ubiqui-
BCOR MGA
tyl ligase activity of the complex15. We may surmise that
the D. melanogaster dRAF complex also contains the fly
WDR5
RYBP homologue.
Biological roles of mammalian RING2–KDM2B com‑

e RING2–FBRS complex plexes. Complicating the function of the mammalian
(PRC1.3; PRC1.5) RING2–KDM2B complex is the fact that a large pro-
portion of KDM2B probably has roles that are inde-
RING2 pendent of RING2 and PCGF1, and binds to thousands
FBRS
of transcriptionally active unmethylated CpG-rich
RYBP PCGF3 promoters12–14. A partial knockdown of KDM2B in
CSNK2A1 mouse embryonic stem cells, in which both KDM2A
and KDM2B are expressed12, leads to subtle but distinct
defects in differentiation13,14. Thus, KDM2B‑depleted
Figure 1 | Mammalian RING2 complexes. The assignment ofNature different human
Reviews | Genetics mouse embryonic stem cells can proliferate as well
proteins to complexes is primarily based on a biochemical purification study15, but
other reports12,13 were also consulted. The areas of the circles reflect the relative sizes as control cells, but the resulting embryoid bodies are
of the primary isoforms of their corresponding proteins as defined in the UniProt denser and lack central cavities 13. In addition, the
database. The subunits present in the canonical Polycomb repressive complex 1 (PRC1) KDM2B‑depleted embryonic stem cells fail to differ-
are shown in red; names for variant complexes according to REF. 15 are shown in entiate in a monolayer culture13,14. These defects resem-
parentheses. a | A representative canonical PRC1 is shown. Some variants of this ble those caused by the knockdown of RYBP16 and are
complex (such as PRC1.2 and PRC1.4) incorporate the related chromobox protein accompanied by both the reduction of H2AK119ub
homologue (CBX) proteins and Polyhomeotic-like proteins (PHC), or MEL18 and E3 levels and the derepression of some PcG-target
ubiquitin-protein ligase RING1, instead of BMI1 and RING2; see TABLE 1 for the full list genes12,13. Collectively, these observations suggest that,
of related proteins. The dashed outline of the Sex comb on midleg homologue 1 in embryonic stem cells, the RING2–KDM2B com-
(SCMH1) subunit indicates its weak association with the PRC1 core components.
plex functions together with both PRC1 and PRC2 to
b | Although the existence of RING2–RYBP (RING1 and YY1‑binding protein; shown in
blue) or its related RING1–RYBP (not shown) core components is strongly suggested repress genes that are important for development and
by glycerol centrifugation analyses15, it remains to be seen whether this entity exists differentiation. Consistent with this, the overexpres-
in vivo or whether it is a product of partial dissociation during biochemical purification. sion of KDM2B inhibits replicative senescence and
For the complexes shown in parts b, c and d, alternative complexes in which the RYBP immortalizes mouse embryonic fibroblasts17, in which
subunit is substituted by the closely related YY1‑associated factor 2 (YAF2) protein KDM2B (but not KDM2A), together with canonical
have also been purified but are not represented here. c | The subunits that are specific PcG complexes, represses cyclin-dependent kinase
to RING2–KDM2B (lysine-specific demethylase 2B) complexes are shown in yellow. inhibitor 2A (Cdkn2a)18, which encodes two distinct
Among these subunits, only BCL‑6 co-repressor (BCOR) is known to have a related proteins (ARF and INK4A (also known as p16)) that
variant protein BCL‑6 corepressor-like protein 1 (BCORL1). d | The subunits that are normally block cell cycle progression. Curiously, the
specific to the RING2–L3MBTL2 (lethal(3)malignant brain tumour-like protein 2)
histone demethylase activity of KDM2B seems to be
complex are shown in green. Among these subunits, only histone deacetylase 1
(HDAC1) is known to be substituted in some instances by HDAC2. e | Both Polycomb dispensable for its function in mouse embryonic stem
group RING finger protein 3 (PCGF3) and PCGF5 can be incorporated into RING2– cells14 but is required for Cdkn2a repression in immor-
FBRS (probable fibrosin‑1) and its variant complexes; components that are unique to talized embryonic fibroblasts18, which indicates that the
these complexes are shown in purple. CSNK2A1, casein kinase 2, α1 polypeptide; demethylation of H3K36 may be more important for
USP7, ubiquitin carboxy-terminal hydrolase 7; WDR5, WD repeat-containing protein 5. repression in differentiated cells.

REVIEWS
Box 2 | Possible new molecular roles of variant PRC1 complexes

embryonic stem cells, in which it is needed to maintain
pluripotency. The level of CBX7 sharply drops upon
Polycomb repressive complex 1 (PRC1) and its variant complexes that contain differentiation, concomitant with an increase in CBX2
chromobox protein homologue 7 (CBX7) are predominant in embryonic stem cells and and CBX8 (REF. 25). The level of CBX7 is controlled both
are required to maintain pluripotency. In some cases, complexes that contain other at the transcriptional level — activated by the pluripo-
CBX variants may have specialized roles that are regulated by an interplay between the
tency factor OCT4 — and at the post-transcriptional
post-translational modification of CBX variants and binding to alternative non-coding
RNAs (ncRNAs). In cultured cells, the Polycomb group (PcG) protein E3 SUMO-protein
level by microRNAs of the miR‑125 and miR‑181
ligase CBX4 is methylated by the histone-lysine N-methyl transferase SUV39h at families25. In turn, Cbx2 and Cbx8 genes are directly
lysine 191 (REF. 143). This causes the binding of CBX4 to the ncRNA taurine repressed by complexes that contain CBX7, which
upregulated 1 (TUG1), changes its chromodomain-binding preference from permits the coordinated switching between these vari-
trimethylated histone H3 lysine 9 (H3K9me3) to H3K27me3 and represses its target ants. Similarly, BMI1 and MEL18 are, in some cases,
genes. The demethylation of CBX4 by lysine-specific demethylase 4C (KDM4C) exclusively present in different cell types. For example,
switches its association to a different ncRNA, metastasis-associated lung fetal liver cells require BMI1 but not MEL18 (REF. 27).
adenocarcinoma transcript 1 (MALAT1; also known as NEAT2), and its binding The dominant presence of one CBX or PCGF subunit
preference to H2A acetylated at K5 or K13, both of which are marks of transcriptional in certain kinds of cells would, in principle, explain the
activity. This switch is accompanied by the nuclear relocation of the target genes with
lack of genetic redundancy and fit with the crucial role
their associated CBX4 from the Polycomb foci, which is the location of CBX4 when they
are repressed, to interchromatin granules, where transcriptional activity takes place.
of CBX7 in maintaining pluripotency 24–25, as well as with
Furthermore, the unmethylated CBX4 sumoylates the growth regulator E2F1 that binds both haematological and neurological defects observed
to growth-promoting genes which are subject to CBX4 regulation, a modification that in Bmi1-mutant mice28. It should be noted that, although
seems to be necessary for their activation143. we know something about the tissue specificity of some
of the paralogous components, we currently have little or
no information about the tissue-specific roles of alterna-
PRC1‑related complexes and beyond tive PRC1‑related complexes, and it is clear that multiple
A spate of recent publications 12–16,19,20 has greatly PRC1 variants are generally present in the same cell.
expanded the range of RING2 complexes (or non- An attractive hypothesis is that the PRC1 vari-
canonical PRC1) discovered in both humans and mice, ants have intrinsically different biochemical proper-
and has placed the KDM2B complex in the framework ties that may be used for targeting different subsets of
of a much broader classification. RING1 (also known as genes and/‌or for context-dependent repression (BOX 2).
RING1A and RNF1) can replace RING2 in at least some Consistent with this hypothesis, the overexpression
of these complexes but is much less abundant (FIG. 1). At of different CBX subunits has different effects on the
least six alternatives are known for PCGF1, the heterodi- haematopoietic lineage26. Thus, the overexpression of
meric partner of RING2. In addition to the well-known CBX7, but not of CBX2, CBX4 and CBX8, induces self-
BMI1 (also known as PCGF4) and MEL18 (also known renewal in multipotent cells but not in more differenti-
as PCGF2), PCGF1, PCGF3, PCGF5 and PCGF6 have ated progenitors. Recent genomic experiments suggest
also been found to associate with RING2. The RING2– that this is due to the repression of a small set of genes
PCGF heterodimer is catalytically competent as an E3 that are specifically regulated by CBX7 in haematopoi-
ubiquityl transferase and is the scaffold for the assem- etic stem and progenitor cells26, but further studies are
bly of additional components21–23. The RING2–BMI1 needed to confirm this.
or RING2–MEL18 dimers can further bind to one of These conclusions are put in a broader perspective
five alternative chromobox protein homologue (CBX) when all possible RING2 or RING1 complexes are con-
components and to the remaining core subunits of the sidered. Genomic profiling in human cells shows that the
canonical PRC1 (BOX 1; FIG. 1). target genes of CBX-containing and RYBP-containing
The position occupied by CBX, together with the complexes are partially overlapping, which indicates
human homologues of Ph and Sex comb on midleg that, although these alternative complexes may often
(Scm) components, can alternatively be occupied by function in parallel, they have independent recruiting
RYBP or its close homologue YAF2 (REF. 15). Unlike CBX mechanisms15. The binding sites of different PCGFs
proteins, RYBP and YAF2 can form a complex with any show different degrees of overlap. Thus, BMI1- and
RING2–PCGF combination (FIG. 1). RING2 complexes MEL18‑binding sites are nearly identical and partially
that contain RYBP or YAF2 have no chromodomain- overlap with PCGF1‑binding sites12,13 in mouse embry-
containing CBX proteins, and their binding to chroma- onic stem cells. However, there is little overlap with
tin sites is therefore thought to be independent of histone PCGF6‑binding sites15, which is consistent with the
H3 methylation. The only exception from this rule is the idea that RING2–L3MBTL2 and its variant complexes
CpG islands RING2–L3MBTL2 (lethal(3)malignant brain tumour- are functionally distinct from other PcG complexes.
Vertebrate genomic regions of
like protein 2) class of complexes that harbour CBX3 The L3MBTL2 complexes are frequently found at genes
the order of 1 kb that are rich
in CpG dinucleotides; they (also known as HP1γ), the chromodomain of which that also bind to the cell cycle factors E2F6 and E2F4,
often lack 5‑methylcytosine recognizes both H3K9me2 and H3K9me3. and may co‑purify with these proteins29,30. When dif-
and frequently correspond to ferent RING1 or RING2-containing complexes bind to
promoter regions. Biological functions of alternative PRC1 and RING2– the same gene, it is not known whether the binding of
Embryoid bodies
RYBP complexes in mammals. The abundance of alter- different complexes occurs simultaneously, alternatively,
Three-dimensional aggregates native PRC1 subunits greatly varies between different at different stages of the cell cycle, or whether the binding of
of pluripotent stem cells. cell types24–26. Thus, CBX7 predominates in mouse one complex promotes or interferes with the binding

REVIEWS
PRC2–PHF1 PRC2–JARID2 non-PRC2 the fact that, in addition to mediating extensive trimeth-
ylation of H3K27 at PcG-target genes, PRC2 is respon-
sible for pervasive dimethylation of H3K27 throughout
EZH2 EZH2
? the transcriptionally inactive genome. H3K27me2
EED EED
? accounts for nearly 60% of all histone H3 in the genome
EZH2 and is probably accompanied by low levels of diffuse
RBBP4 SUZ12 RBBP4 SUZ12
AEBP2 AEBP2
H3K27me3 which, when added up, may well account
for much of the total genomic H3K27me3 (REFS 35–37).
PHF1 JARID2 It is also possible that a basal level of H3K27me2 is a
prerequisite for the timely onset of targeted PcG-
mediated repression, thus connecting the two H3K27
Figure 2 | Alternative enhancer of zeste complexes. The complexes are depicted methylation states.
such that the areas of the circles reflect the relative sizes of the primary isoforms of their
corresponding proteins, as defined in the UniProt database. The core Polycomb PRC2–PCL complex function. A portion of PRC2 core
repressive complex 2 (PRC2), which is stabilized by zinc-finger protein AEBP2, are shown proteins co‑purifies with D. melanogaster Polycomblike
in green. Interchangeable components PHF1 (PHD finger protein 1) and JARID2 (Jumonji, (Pcl) or its mammalian orthologues PHD finger pro-
ARID domain-containing protein 2) are shown in orange and blue, respectively. Although
tein 1 (PHF1), PHF19 and MTF2 (REFS 38–44). Pcl is
multiple laboratories have purified core complexes of PRC2, there remains a possibility
that this complex is a result of the partial dissociation of larger complexes during a ‘classical’ PcG protein, the loss of which results in
biochemical purification. Recent reports120,121 indicate that enhancer of zeste the derepression of HOX genes in flies and enhances the
homologue 2 (EZH2) can methylate non-histone substrates independently of other effects caused by the partial loss of Pc45. Consistent with
PRC2 core subunits. Currently, we do not know whether this is done by EZH2 alone or, its direct role in PcG-mediated repression, Pcl binds to
more probably, in complex with other proteins (shown in grey) that are yet to be D. melanogaster PREs39,46, and its mammalian homo-
identified. RBBP4, histone-binding protein RBBP4. logues bind to PcG-target genes42,43,47,48. The loss of Pcl
has little effect on global H3K27me2 levels39 but, report-
edly, causes a major loss of H3K27me3 at PcG-target
of another. Consistent with the variable presence of a genes40,41,47, which is replaced by H3K27me2 (REF. 39).
CBX component, only a subset of target genes of the The incorporation of the mammalian Pcl homologue
RING1 and RING2 complexes contains H3K27me3. By PHF1 subunit increases the efficiency of H3K27 tri-
contrast, all variants of RING2 complexes studied are methylation by PRC2 in vitro40. In addition, Pcl in flies
found at sites that are enriched for H2A ubiquitylation, (or PHF19 in mammals) may have a role in anchoring
although opinions differ on whether RYBP-containing PRC2 at PcG-target genes39,43,48. Both the promotion of
complexes are more active in ubiquitylation15,16. The trimethylation and the binding of PRC2 depend on the
multiplicity of these parallel binding patterns is perplex- Tudor domain of Pcl43,48. Interestingly, two recent studies
ing, but it may reflect stages in the process of recruitment have shown that, in mammalian homologues of Pcl, the
or of gene silencing. Tudor domains specifically recognize H3K36me2 and
H3K36me3 (REFS 42,43,49), which suggests that these
Variant PRC2 complexes proteins help to anchor PRC2 to partially active PcG-
D. melanogaster and mammals both have their own target genes and thereby allow their efficient re-silenc-
assortment of variants of PRC2 core subunits (FIG. 2; ing 42,43. Curiously, although D. melanogaster Pcl has a
TABLE 1). Their alternative use stems from differential similar effect on both PRC2 and H3K27 methylation as
expression of corresponding genes in specific tissues its mammalian counterparts, the Tudor domain of the
or at specific stages of development. For example, of fly Pcl has several amino acid differences that result in an
the two mouse enhancer of zeste (E(z)) paralogues, the atypical, incomplete aromatic cage50 and therefore does
expression of enhancer of zeste homologue 2 (Ezh2; also not bind to H3K36 regardless of its methylation state42,50.
known as Kmt6) predominates during early embryonic Thus, some property of Pcl other than recognition of
development and in embryonic stem cells31. Consistently, H3K36 methylation is likely to be more important for
the loss of Ezh2 causes early embryonic lethality 32. At PcG-mediated repression.
later stages of development, however, Ezh1 is broadly
expressed and is fully redundant with Ezh2 in tissues JARID2‑containing PRC2 complexes. A separate portion
such as the postnatal skin, in which the relationship of mammalian PRC2 core components associates with
between the two was carefully investigated by condi- Jumonji, ARID domain-containing protein 2 (JARID2;
tional knockout experiments33. Mice that lack Ezh1 are also known as JUMONJI)51–55, and a similar complex
phenotypically normal and fertile, which indicates that exists in flies56. Unlike Pcl, Jarid2 was not identified
Paralogues
all vital EZH1 functions can be carried out by EZH2. as a PcG gene in D. melanogaster genetic screens2,45.
Genes that are originated
by a duplication event within The incorporation of a particular EED isoform results Although several groups have found that JARID2 forms
the genome. in the methylation of lysine 26 of a histone H1 isoform34. a stable complex with the PRC2 core and promotes the
Several additional proteins often associate with the binding of PRC2 to many PcG-target genes51–55, its effect
Orthologues core components of PRC2 in a mutually exclusive man- on both PRC2 function and gene repression remains
Genes in different species that
are originated from a single
ner in both mice and D. melanogaster (FIG. 2). Untangling controversial. Thus, two studies suggest that the incor-
gene of the last common the relative contribution of these extended variant PRC2 poration of JARID2 reduces PRC2 catalytic activity and
ancestor. complexes to PcG-mediated repression is complicated by that the loss of JARID2 leads to higher H3K27me3 levels

REVIEWS
at PcG-target genes51,52, whereas two other studies report then follows by affinity for the H3K27me3 mark.
exactly the opposite53,54. The JARID2 JmjC domain, However, it is clear that, in D. melanogaster, the regions
which is characteristic of histone demethylases of the methylated by PRC2 are broad domains, whereas the
JARID family, is probably catalytically inactive owing to binding of PRC1 is much more localized at PRE sites10,66.
crucial amino acid substitutions. Unfortunately, the phe- Nevertheless, the effective interaction of PRC1 with pro-
notypes of a clean Jarid2 deletion mutant have not been moter regions is likely to require H3K27me3 to mediate
described in mice or in flies, but loss-of-function gene- looping, particularly if the PRE is distant from the pro-
trap alleles of murine Jarid2 (REF. 57) show late embry- moter. To what extent H3K27me3 helps to recruit PRC1
onic lethality and defects in both neural tube fusion and (and its variant complexes) in mammals is less clear, but
cardiovascular development 57,58. These phenotypes are not all H3K27me3 domains are also binding sites for
milder than the early embryonic lethality caused by the PRC1 (REF. 4), and mutation of the CBX chromodomain
loss of PRC2 core subunits, which suggests that JARID2 is reported to have little effect on CBX distribution70.
is dispensable for some aspects of PRC2 function.
The role of the PRC2–JARID2 complex is not Recruitment to unmethyated CpG-rich DNA sequences.
restricted to PcG-target genes. A recent study 59 shows Most mammalian PcG-target genes bind to PcG com-
that the murine PRC2–JARID2 complex methyl- plexes in close proximity to the transcription start
ates cardiac transcription factor GATA4 at lysine 299, site but over a broad region that does not suggest the
which prevents its acetylation at the same position by presence of a specific recruiting sequence. Attempts
the acetyltransferase p300 and impairs the ability of to identify a mammalian PRE-like (PRE‑L) element
GATA4 to recruit p300 to its target genes. Importantly, have mostly failed apart from two notable exceptions.
PRC2‑dependent repression of the GATA4‑target gene A sequence element PRE‑kr in the mouse Kreisler gene
Myh6 (myosin heavy chain 6, cardiac muscle-α) is not (also known as Mafb) recruits PRC1 well and PRC2
accompanied by PRC2 binding or H3K27 trimethyla- poorly 71. A fragment from the human homeobox D
tion, which indicates that GATA4 is methylated outside a (HOXD) cluster recruits PRC1 and PRC2 components
chromatin context. This and other evidence supports the and represses a reporter gene72. In a different approach,
existence of a free pool of PRC2–JARID2 complexes that the analysis of bivalent domains (that is, domains con-
may also have a role in the pervasive H3K27 dimethyla- taining both H3K27me3 and H3K4me3) in embryonic
tion of the genome or even contribute to the cytoplasmic stem cells suggested that the domains that bind to both
PRC2 fraction that is reported to play a part in signal PRC1 and PRC2 corresponded well with CpG islands
transduction60. that lack both 5‑methylcytosine and activator-binding
A different type of larger PRC2‑related complexes sites73. Tests showed that GC‑rich elements, even those
were reported to contain NAD-dependent histone derived from bacterial genomes, could indeed recruit
deacetylase Sir2 and the histone deacetylase sirtuin 1 PRC2 but not PRC1, the binding of which was identi-
(SIRT1) in D. melanogaster larvae and in human cancer fied by the presence of RING2 (REF. 74). Comparison
cells, respectively 61,62. Their role in PRC2 biology awaits across species and in either the presence or the absence
investigation. of DNA methylation supports the idea that clusters of
unmethylated CpGs that are unaccompanied by active
Targeting PcG-mediated repression transcription can recruit PcG complexes75.
PREs in D. melanogaster. A crucial question for PcG Certain proteins that contain a zinc-finger-CxxC
mechanisms is how they are recruited to specific genes, DNA-binding domain bind preferentially to unmethyl-
as the selection of target genes ultimately determines ated CpG islands76. One such protein is CXXC finger
the function of the particular PcG complex. Here, the protein 1 (CXXC1; also known as CFP1) — a component
outlook has also been changing. Functional studies in of the SET1 H3K4 methyltransferase complex — which
D. melanogaster had shown that PREs, specific DNA accounts for the presence of H3K4me3 at CpG islands
elements that are a few hundred base pairs long 63,64, were in embryonic stem cells. Two other CpG-binding pro-
responsible for the recruitment of PcG complexes3,65,66. teins are the H3K36 demethylases KDM2A and KDM2B
PREs can be tens of kilobases upstream or downstream (REFS12,13,77). As discussed above, KDM2B was found
of the target promoter, within introns or, in many cases, to be a component of a variant RING2 or RING1 com-
close to the transcription start site67. PREs are frequently plex and helps to recruit the complex to a subset of CpG
enriched in consensus binding motifs for Pleiohomeotic islands (FIG. 3b). It remains unclear how to account for
(Pho), Trithorax-like (Trl; also known as GAF), Dorsal the binding to CpG islands of PRC2 or of other PRC1
switch protein 1 (Dsp1) and other DNA-binding fac- variants in the observed distribution of PcG-mediated
tors64,68,69 that may cooperate in the recruitment of PRC1 repression. JARID2 might help to recruit PRC2. A low
and PRC2 (FIG. 3a). However, no single DNA-binding initial level of binding to CpG islands could provide
protein so far identified is capable of recruiting PcG the opportunity for PcG complexes to colonize a large
complexes to PREs. class of target genes and, when conditions are suitable,
Genetics data, as well as genomic binding and gene to establish a bivalent state or even a repressed state12.
expression data, concur that PRC1 and PRC2 generally Certain mammalian DNA-binding transcriptional
function together to produce the repressed state at tar- regulators have been reported to recruit PRC com-
get genes. This is widely taken to imply that PRC2 is plexes to their binding sites. In mice, PRC1 and PRC2
recruited first and methylates H3K27, and that PRC1 colocalize with subsets of sites that bind to the neuronal

REVIEWS
a In Drosophila melanogaster b In mammals
PRC1
PhoRC PRC2 PRC2
PRC1 PRC1
PRC2
PRE-L CpG
PRE
ncRNA CpG
PR-DUB Trx RING1–KDM2B
RING1–KDM2B
Figure 3 | Targeting of Polycomb group complexes. a | In Drosophila melanogaster, Polycomb response elements
(PREs) mediate the recruitment of all known Polycomb group (PcG) complexes, including Pho repressive complex
(PhoRC) and Polycomb repressive deubiquitinase complex (PR‑DUB), which contribute to stabilizing the
Nature binding| Genetics
Reviews of
Polycomb repressive complex 1 (PRC1) and PRC2. Although, with the exception of PhoRC, the precise DNA-binding
determinants are not known, several are thought to contribute cooperatively. Note that PREs also recruit Trithorax (Trx),
a histone methyltransferase that counteracts PcG-mediated repression, and such recruitment turns PREs into switchable
memory elements. Shapes and colours of the complexes are coordinated to identify corresponding mammalian and fly
homologues. b | The mammalian recruitment platform is probably modular. Experimental evidence indicates that the
existence of PRE-like modules (PRE‑L) is sufficient for the recruitment of PRC1 and that CpG-rich modules can recruit
PRC2 and E3 ubiquitin-protein ligase RING1–lysine-specific demethylase 2B (KDM2B) complexes. In addition,
non-coding RNAs (ncRNAs) may help to recruit PRC1 and PRC2, but it is not known how ncRNAs target specific
chromatin regions. We envision that various combinations of the two modules and/or ncRNAs are used at different
target genes and that appropriate interactions turn the weak recruitment of any individual component into a robust
targeting mechanism. Whether mixed-lineage leukaemia 1 (MLL1) and MLL2, the mammalian counterparts of Trx, are
also concomitantly recruited is unknown.
inhibitor REST or to repressors of the SNAIL family ANRIL from the human CDKN2A–CDKN2B (which
and depend on these factors to repress the genes that encodes INK4B (also known as p15)) locus binds to a
are associated with those sites78,79. PRC1‑related complex that contains CBX7 and, together
with H3K27me3, recruits the complex to the locus to
Recruitment by non-coding RNA. An alternative, and promote cell cycle progression85.
apparently entirely independent, recruitment mecha- The molecular details of the interactions of ncRNAs
nism makes use of RNA molecules either as a scaf- with either PRC1 or PRC2 are still unclear, but it is likely
fold to assemble complexes or as a targeting device. In that they differ in different situations. The allele-specific
several cases, compelling evidence has shown that non- recruitment, such as that involved in X inactivation or
coding RNAs (ncRNAs) bind to PcG complexes and that imprinted gene silencing, seems to be easier to under-
these RNAs are important for PcG-mediated regulation stand if the PcG complexes bind to nascent ncRNA86.
of some targets (FIG. 3b). The ncRNA HOX transcript Less clear is the action in trans. In some cases, ncRNAs
antisense RNA (HOTAIR) from the human HOXC gene may recruit PcG complexes to homologous sequences;
cluster binds to PRC2, as well as to the Co‑REST com- in other cases, ncRNAs may have a scaffolding function
plex that contains the H3K4 demethylase KDM1A (also that brings together multiple chromatin regulators, but
known as LSD1), and recruits them both in cis to HOXC it is not known whether base pairing has a role in target-
genes and in trans to HOXD genes80,81. When overex- ing. How pervasive the involvement of ncRNAs might be
pressed, HOTAIR also recruits PRC2 to many other is currenly unclear. Genome-wide screens for RNAs that
genomic sites, which are often developmentally regu- bind to PcG complexes have been reported to yield thou-
lated genes, but the basis for such targeting is unclear 82. sands of RNA species87,88. It has also been claimed that,
However, in mice, deletion of the Hotair gene has no in mouse and human embryonic stem cells, short RNAs
effect on PcG complex binding or on transcriptional produced from the 5ʹ region of PcG-repressed genes
regulation, which indicates a divergent or redundant bind to PRC2 and retain it to those genes, thus contrib-
function83. The ncRNA-based recruitment of PRC2 is uting to repression89. At this stage, it is probably unwise
essential at various stages in the establishment of mam- to assume that all RNA molecules that seem to associate
malian X chromosome inactivation. A sequence con- with PcG complexes are in fact functionally involved in
tained in three overlapping transcripts — RepA, inactive repression, but some of them clearly play a part.
X-specific transcripts (Xist) and X (inactive)-specific
transcript, opposite strand (Tsix) — at the X inactivation Unrecruited activities. The most abundant product of
centre binds to PRC2 and initiates the process that even- PRC2 activity is not the H3K27me3 mark that is asso-
tually spreads its binding in cis, together with PRC1, over ciated with PcG-mediated repression but H3K27me2,
large parts of the inactive X chromosome84. The ncRNA which is broadly distributed and accounts for 50–60%

REVIEWS
of total nuclear histone H3 (REF. 35). The dimethylated Altered levels of PcG proteins have been linked to cancer.
state of H3K27 is ubiquitous and is depleted only at sites The best known example is BMI1 and its role in promot-
that contain H3K27me3 and at sites of transcriptional ing B cell lymphomas105. The overexpression of BMI1
activity. The global dimethylation state must be attrib- and a few other components of PRC1 was also found
uted to a ‘hit-and-run’ activity of PRC2. It is accompa- in other types of haematological neoplasms (reviewed in
nied by a low but measurable amount of trimethylation, REFS 106,107) , as well as in medulloblastoma108 and
the deposition of which is a much slower process. The non-small-cell lung cancer 109.The oncogenic function
role of this widespread methylation is debatable, but it of BMI1 and other PRC1 components has mainly been
is most likely to be important for the establishment of attributed to their repression of the CDKN2A locus
H3K27me3 domains and of other repressed chroma- (FIG. 4a), which, when expressed, restricts cell prolifera-
tin domains90. Similarly, there are indications that a tion, but the inappropriate repression of other tumour
low global level of H2A ubiquitylation that is depend- suppressor genes may also be involved110,111.
ent on RING-containing complexes is also detectable. The overexpression of EZH2 and SUZ12 has been
These low-level distributions might have little effect on linked to haematological and other malignancies
their own, but they might be important as ‘seeds’ for (reviewed in REF. 112). In addition, certain recurring
the establishment of more targeted repressive activi- mutations in the catalytic domain of EZH2 were found
ties if we imagine PcG-mediated repression to occur in some types of B cell lymphomas. These mutations
opportunistically as suggested for CpG islands12. alter the substrate preference and/or processivity,
which leads to increased levels of total H3K27me3 in
Regulation of PcG complexes the nucleus113,114. A specific small-molecule inhibitor
Much evidence suggests that the activity of PcG com- of PRC2 catalytic activity arrests proliferation of these
plexes is modulated at various levels. Several remarkable cancer cells, which shows the causal role of the muta-
features of the core components of PRC2 allow its activ- tions and provides hope that, one day, such inhibitors
ity to be modulated by inputs from surrounding chro- may be used as a treatment for patients with ‘activating’
matin. Briefly, the presence of H3K4me3, H3K36me2 mutations in EZH2 (REFS 115,116).
or H3K36me3 decreases the catalytic activity of PRC2 Although the implication of PRC1 components in
(REFS 91–93), whereas high nucleosome density and the cancer is associated with their overexpression, surpris-
presence of H3K27me2 or H3K27me3 stimulate its cata- ingly, the deletion of Ezh2 in mice was found to cause
lytic activity 94,95, thus favouring the maintenance of the high frequency of spontaneous γδT cell acute lympho-
methylated state. PRC1 and its variant complexes also blastic leukaemia117. Added to this, two recent studies
seem to be regulated. For example, the human CBX4 have shown that missense Lys27Met mutations in genes
protein has been reported to function as an E3 SUMO that encode the human histones H3.3 and H3.1 inhibit
transferase 96,97. PRC1 components themselves are the genome-wide histone methyltransferase activity of
sumoylated through an interaction that is mediated by PRC2 and occur frequently in paediatric brain cancers
sterile α-motif (SAM) domains98, and the sumoylation of diffuse intrinsic pontine glioma type118,119.
of CBX4 seems to be important for its recruitment to The apparent tumour suppressor role of PRC2
target genes or for the stabilization of the repressive com- components but not of PRC1 components is unusual
plexes99. Sumoylation of human BMI1 by CBX4 has also and indicates that functions of EZH2 and PRC2 out-
been found to be involved in the recruitment of PRC1 side the canonical PcG mechanism may be involved.
to sites of DNA damage100. Phosphorylation of various Supporting this notion are the two recent studies of
PRC1 components has been reported, but its functions the role of EZH2 in castration-resistant prostate can-
in modulating PRC1 activities are poorly understood. cer and breast cancer cells. In one study 120, castration-
Phosphorylation of human BMI1 by MAP kinase- resistant prostate cancer cells overexpressed EZH2,
activated protein kinase 3 results in the dissociation of which was hyperphosphorylated at serine 21, probably
BMI1 from its binding sites101, whereas phosphorylation owing to increased levels of activated AKT kinase.
of MEL18 does not preclude its binding to chromatin Phosphorylated EZH2 associates with the androgen
and, in fact, increases the ability of RING2–MEL18 receptor, binds to its target genes and stimulates its
complexes to ubiquitylate nucleosomal H2A102. transcriptional activity. Strikingly, this is not accom-
panied by the binding of other PRC2 core subunits or
PcG complexes and disease by increased H3K27me3 levels, but it does require the
PcG mechanisms modulate the expression of most genes methyltransferase activity of EZH2, which directly or
that control differentiation, specify cell lineages in devel- indirectly causes methylation of the androgen recep-
opment and regulate morphogenesis. The loss of basic tor. Taken together, these observations indicate that
Polycomb functions results in early embryonic lethal- the phosphorylation of EZH2 can switch its function
ity 1,2,32,103,104. Mutations in PcG proteins may alter the from a PcG repressor to a transcriptional co‑activator
response of a PcG-target gene and result in disease. The through a PRC2‑independent methylation of a non-
two examples discussed below are remarkable because histone protein (FIG. 4b). In a second study 121, EZH2
they illustrate both interesting functional properties of monomethylated nuclear receptor RORα at lysine 38
PRC2 (and its variant complexes) and the still puzzling independently of other PRC2 subunits. Methylated
fact that both hyperactivity and loss of activity of PRC2 RORα is specifically recognized by DCAF1 (DDB1
can produce oncogenic disease. and CUL4‑associated factor 1), which targets it for

REVIEWS
a BMI1 overexpression degradation. In breast cancer cells, the levels of EZH2

and RORα are inversely correlated, and either the
overexpression of RORα or the knockdown of DCAF1
PRC1 reduces proliferation of MCF7 cells. These reports sug-
PRC2 gest that the well-known correlation between EZH2
overexpression and tumour aggressiveness is partly
due to methylation-dependent degradation of tumour
suppressor proteins such as RORα121. To conclude, the
emerging evidence indicates that high levels of EZH2
INK4B ARF INK4A
can methylate proteins other than histones indepen-
dently of other PRC2 components. It remains to be
seen whether PRC2‑independent E(Z) activity also
Cell proliferation has a role in untransformed cells and whether this
CDKN2B CDKN2A
requires the cellular E(Z) pool to exceed the levels of
ANRIL other PRC2 components.
MYC overexpression
Conclusions
Recent advances in the biochemical characterization
b S21 of mammalian RING2, RING1 and E(Z) complexes,
and in the genome-wide mapping of the binding sites
EZH2 p-AKT of these complexes have revealed an unexpected diver-
P sity. Some of these complexes are clearly involved in
S21
PcG-mediated repression, whereas the function of
EZH2 S21 others remains to be determined. The biochemical
EED studies should now be followed by in‑depth genetic
EZH2 me and genomic experiments to probe the functional roles
RBBP4 SUZ12 AEBP2
of each of the RING2, RING1 and E(Z) complexes, to
Androgen receptor investigate the poorly understood importance of the
PcG-mediated repression Activation of androgen numerous variants, as well as to understand how their
receptor-target genes functions differ or complement one another and their
Figure 4 | The roles of Polycomb group proteins in cancer. a | Overexpressed BMI1 differential role in different tissues or processes. In this
and MYC cooperate in driving the proliferation of blood cancerNature Reviews
cells. High levels| Genetics
of MYC case, the D. melanogaster model is likely to be a useful
increase cell proliferation, but they also activate the expression of ARF and INK4A starting point owing to the lower redundancy of the
(alternatively spliced isoforms encoded by the cyclin-dependent kinase inhibitor 2A PcG protein family, the smaller genome size and
(CDKN2A) locus), both of which trigger cellular senescence and counteract proliferation. the availability of genetic tools.
When BMI1 is overexpressed together with MYC, it drives the recruitment of both Important questions to be tackled concern the func-
Polycomb repressive complex 1 (PRC1; red circles) and PRC2 (green circles) to the tional role of the H2AK119ub mark and how it con-
CDKN2A locus, which leads to the repression of these genes and, consequently, to tributes to transcriptional repression. We need to learn
uncontrolled cell proliferation106. The genes encoding INK4B (also known as p15;
more about the role of the pervasive H3K27 dimethyla-
encoded by CDKN2B), ARF and INK4A are contained in a short ~35 kb-stretch of the
human genome. Whereas CDKN2B has a physically distinct open reading frame, ARF and tion of the transcriptionally inactive genome. Equally
INK4A have different promoters but share the last two exons (black rectangles). Although exciting is the idea of non-histone substrates that E(Z)
the last two exons are common to both ARF and INK4A, the proteins are encoded by homologues may methylate independently of other
alternative open reading frames and bear no similarity. There is no evidence of interplay PRC2 core subunits. Several reports have suggested that
between MYC and Polycomb group (PcG) proteins in the regulation of CDKN2B, but all certain forms of PcG proteins have a surprising role in
three genes of the CDKN2B–CDKN2A locus are targeted by PcG proteins in some cells. activating transcription. Finally, but no less importantly,
The non-coding RNA ANRIL (also known as CDKN2B antisense RNA 1; wavy arrow) is we need to understand the timing of the alternative
involved in the targeting of PcG proteins to CDKN2B. b | In castration-resistant prostate RING2 and RING1 or PRC2 complexes at a given gene
cancer cells, high levels of enhancer of zeste homologue 2 (EZH2) and activated AKT during the cell cycle and their relationships. We might
kinase (p-AKT) lead to the phosphorylation (P) of a proportion of EZH2 at serine 21
then finally be in a position to chart a dynamic picture
(white hexagon)118. Unphosphorylated EZH2 is incorporated into PRC2 (and its variant
complexes) and participates in the repression of PcG-target genes, whereas of PcG-mediated regulation, in which the turnover of
phosphorylated EZH2 binds to androgen receptor, which leads to androgen receptor both PcG complexes and histone marks yield epigeneti-
methylation (me) and the stimulation of transcriptional activity of androgen cally stable transcriptional repression, and to relate this
receptor-target genes. Dashed arrows indicate the enzymatic actions of p-AKT and picture to the roles of some PcG members beyond their
EZH2. AEBP2, zinc-finger protein AEBP2; RBBP4, histone-binding protein RBBP4. classical repressive function.
1. Jürgens, G. A group of genes controlling the spatial 3. Poux, S., Kostic, C. & Pirrotta, V. Hunchback-independent 5. Zhou, W. et al. Histone H2A monoubiquitination
expression of the bithorax complex in Drosophila. silencing of late Ubx enhancers by a Polycomb group represses transcription by inhibiting RNA polymerase
Nature 316, 153–155 (1985). response element. EMBO J. 15, 4713–4722 (1996). II transcriptional elongation. Mol. Cell 29, 69–80
2. Gaytán de Ayala Alonso, A. et al. Genetic screen 4. Leeb, M. et al. Polycomb complexes act redundantly (2008).
identifies novel Polycomb group genes in Drosophila. to repress genomic repeats and genes. Genes Dev. 24, 6. Dellino, G. I. et al. Polycomb silencing blocks
Genetics 176, 2099–2108 (2007). 265–276 (2010). transcription initiation. Mol. Cell 13, 887–893 (2004).

REVIEWS
7. Grau, D. J. et al. Compaction of chromatin by diverse 29. Trojer, P. et al. L3MBTL2 protein acts in concert with 53. Pasini, D. et al. JARID2 regulates binding of the
Polycomb group proteins requires localized regions of PcG protein-mediated monoubiquitination of H2A to Polycomb repressive complex 2 to target genes in
high charge. Genes Dev. 25, 2210–2221 (2011). establish a repressive chromatin structure. Mol. Cell ES cells. Nature 464, 306–310 (2010).
8. Eskeland, R. et al. Ring1B compacts chromatin 42, 438–450 (2011). 54. Li, G. et al. Jarid2 and PRC2, partners in regulating
structure and represses gene expression independent 30. Ogawa, H., Ishiguro, K., Gaubatz, S., Livingston, D. M. gene expression. Genes Dev. 24, 368–380 (2010).
of histone ubiquitination. Mol. Cell 38, 452–464 & Nakatani, Y. A complex with chromatin modifiers 55. Landeira, D. et al. Jarid2 is a PRC2 component in
(2010). that occupies E2F- and Myc-responsive genes in G0 embryonic stem cells required for multi-lineage
9. Gutiérrez, L. et al. The role of the histone H2A cells. Science 296, 1132–1136 (2002). differentiation and recruitment of PRC1 and RNA
ubiquitinase Sce in Polycomb repression. 31. Shen, X. et al. EZH1 mediates methylation on histone Polymerase II to developmental regulators. Nature Cell
Development 139, 117–127 (2012). H3 lysine 27 and complements EZH2 in maintaining Biol. 12, 618–624 (2010).
10. Schwartz, Y. B. et al. Genome-wide analysis of stem cell identity and executing pluripotency. Mol. Cell 56. Herz, H.‑M. et al. Polycomb repressive complex
Polycomb targets in Drosophila melanogaster. 32, 491–502 (2008). 2‑dependent and -independent functions of Jarid2 in
Nature Genet. 38, 700–705 (2006). 32. O’Carroll, D. et al. The Polycomb-group gene Ezh2 is transcriptional regulation in Drosophila. Mol. Cell.
11. Lagarou, A. et al. dKDM2 couples histone H2A required for early mouse development. Mol. Cell. Biol. Biol. 32, 1683–1693 (2012).
ubiquitylation to histone H3 demethylation during 21, 4330–4336 (2001). References 51–56 outline the properties of PRC2–
Polycomb group silencing. Genes Dev. 22, 33. Ezhkova, E. et al. EZH1 and EZH2 cogovern histone JARID2 complexes.
2799–2810 (2008). H3K27 trimethylation and are essential for hair follicle 57. Takeuchi, T. et al. Gene trap capture of a novel mouse
This study gives the first evidence of alternative homeostasis and wound repair. Genes Dev. 25, gene, Jumonji, required for neural tube formation.
protein complexes that contain core PRC1 485–498 (2011). Genes Dev. 9, 1211–1222 (1995).
components. 34. Kuzmichev, A., Jenuwein, T., Tempst, P. & Reinberg, D. 58. Lee, Y. et al. Jumonji, a nuclear protein that is
12. Farcas, A. M. et al. KDM2B links the Polycomb Different Ezh2‑containing complexes target necessary for normal heart development. Circ. Res.
repressive complex 1 (PRC1) to recognition of CpG methylation of histone H1 or nucleosomal histone H3. 86, 932–938 (2000).
islands. eLife Sciences 1, e00205 (2012). Mol. Cell 14, 183–193 (2004). 59. He, A. et al. PRC2 directly methylates GATA4 and
13. Wu, X., Johansen, J. V. & Helin, K. Fbxl10/Kdm2b 35. Peters, A. H. F. M. et al. Partitioning and plasticity of represses its transcriptional activity. Genes Dev. 26,
recruits Polycomb repressive complex 1 to CpG islands repressive histone methylation states in mammalian 37–42 (2012).
and regulates H2A ubiquitylation. Mol. Cell. 49, chromatin. Mol. Cell 12, 1577–1589 (2003). 60. Su, I.‑h. et al. Polycomb group protein Ezh2 controls
1134–1146 (2013). 36. Jung, H. R., Pasini, D., Helin, K. & Jensen, O. N. actin polymerization and cell signaling. Cell 121,
14. He, J. et al. Kdm2b maintains murine embryonic Quantitative mass spectrometry of histones H3.2 and 425–436 (2005).
stem cell status by recruiting PRC1 complex to CpG H3.3 in Suz12‑deficient mouse embryonic stem cells 61. Furuyama, T., Banerjee, R., Breen, T. R. & Harte, P. J.
islands of developmental genes. Nature Cell Biol. 15, reveals distinct, dynamic post-translational SIR2 is required for Polycomb silencing and is
373–384 (2013). modifications at lys‑27 and lys‑36. Mol. Cell. associated with an E(z) histone methyltransferase
References 12–14 shows that the zinc-finger-CxxC Proteomics 9, 838–850 (2010). complex. Curr. Biol. 14, 1812–1821 (2004).
DNA-binding domain of KDM2B recruits a variant 37. Voigt, P. et al. Asymmetrically modified 62. Kuzmichev, A. et al. Composition and histone
RING2 or RING1 complex to unmethylated CpG, nucleosomes. Cell 151, 181–193 (2012). substrates of Polycomb repressive group complexes
thereby contributing to PcG-mediated repression in 38. Tie, F., Prasad-Sinha, J., Birve, A., Rasmuson- change during cellular differentiation. Proc. Natl Acad.
mouse embryonic stem cells. Lestander, Å. & Harte, P. J. A 1‑megadalton ESC/E(Z) Sci. USA 102, 1859–1864 (2005).
15. Gao, Z. et al. PCGF homologs, CBX proteins, and RYBP complex from Drosophila that contains Polycomblike 63. Chan, C.‑S., Rastelli, L. & Pirrotta, V. A. Polycomb
define functionally distinct PRC1 family complexes. and RPD3. Mol. Cell. Biol. 23, 3352–3362 (2003). response element in the Ubx gene that determines an
Mol. Cell 45, 344–356 (2012). 39. Nekrasov, M. et al. Pcl–PRC2 is needed to generate epigenetically inherited state of repression. EMBO J.
This key paper provides a comprehensive high levels of H3‑K27 trimethylation at Polycomb 13, 2553–2564 (1994).
description of variant RING1 and RING2 protein target genes. EMBO J. 26, 4078–4088 (2007). 64. Müller, J. & Kassis, J. A. Polycomb response elements
complexes. 40. Sarma, K., Margueron, R., Ivanov, A., Pirrotta, V. & and targeting of Polycomb group proteins in
16. Tavares, L. et al. RYBP–PRC1 complexes mediate Reinberg, D. Ezh2 requires PHF1 to efficiently catalyze Drosophila. Curr. Opin. Genet. Dev. 16, 476–484
H2A ubiquitylation at Polycomb target sites H3 lysine 27 trimethylation in vivo. Mol. Cell. Biol. 28, (2006).
independently of PRC2 and H3K27me3. Cell 148, 2718–2731 (2008). 65. Horard, B., Tatout, C., Poux, S. & Pirrotta, V.
664–678 (2012). 41. Cao, R. et al. Role of hPHF1 in H3K27 methylation Structure of a Polycomb response element and
17. Pfau, R. et al. Members of a family of JmjC domain- and Hox gene silencing. Mol. Cell. Biol. 28, in vitro binding of Polycomb group complexes
containing oncoproteins immortalize embryonic 1862–1872 (2008). containing GAGA factor. Mol. Cell. Biol. 20,
fibroblasts via a JmjC domain-dependent process. 42. Ballaré, C. et al. Phf19 links methylated Lys36 of 3187–3197 (2000).
Proc. Natl Acad. Sci. USA 105, 1907–1912 (2008). histone H3 to regulation of Polycomb activity. 66. Kahn, T. G., Schwartz, Y. B., Dellino, G. I. & Pirrotta, V.
18. Tzatsos, A., Pfau, R., Kampranis, S. C. & Tsichlis, P. N. Nature Struct. Mol. Biol. 19, 1257–1265 (2012). Polycomb complexes and the propagation of the
Ndy1/KDM2B immortalizes mouse embryonic 43. Cai, L. et al. An H3K36 methylation-engaging Tudor methylation mark at the Drosophila Ubx gene.
fibroblasts by repressing the Ink4a/Arf locus. motif of Polycomb-like proteins mediates PRC2 J. Biol. Chem. 281, 29064–29075 (2006).
Proc. Natl Acad. Sci. USA 106, 2641–2646 (2009). complex targeting. Mol. Cell 49, 571–582 (2013). 67. Kharchenko, P. V. et al. Comprehensive analysis of the
19. Wang, R. et al. Polycomb group targeting through 44. Savla, U., Benes, J., Zhang, J. & Jones, R. S. chromatin landscape in Drosophila melanogaster.
different binding partners of RING1B C‑terminal Recruitment of Drosophila Polycomb-group proteins Nature 471, 480–485 (2011).
domain. Structure 18, 966–975 (2010). by Polycomblike, a component of a novel protein 68. Hodgson, J. W., Argiropoulos, B. & Brock, H. W.
20. Vandamme, J., Völkel, P., Rosnoblet, C., Le Faou, P. & complex in larvae. Development 135, 813–817 Site-specific recognition of a 70‑base-pair element
Angrand, P.‑O. Interaction proteomics analysis of (2008). containing d(GA)(n) repeats mediates bithoraxoid
Polycomb proteins defines distinct PRC1 complexes in References 39–44 combine the current knowledge Polycomb group response element-dependent
mammalian cells. Mol. Cell. Proteomics 10, of the features and roles of PRC2–PCL complexes. silencing. Mol. Cell. Biol. 21, 4528–4543 (2001).
M110.002642 (2011). 45. Duncan, I. M. Polycomblike: a gene that appears 69. Dejardin, J. et al. Recruitment of Drosophila Polycomb
21. Cao, R., Tsukada, Y.‑I. & Zhang, Y. Role of Bmi‑1 and to be required for the normal expression of the group proteins to chromatin by DSP1. Nature 434,
Ring1A in H2A ubiquitylation and Hox gene silencing. Bithorax and Antennapedia gene complexes of 533–538 (2005).
Mol. Cell 20, 845–854 (2005). Drosophila melanogaster. Genetics 102, 49–70 70. Ren, X., Vincenz, C. & Kerppola, T. K. Changes in the
22. Wei, J., Zhai, L., Xu, J. & Wang, H. Role of Bmi1 in (1982). distributions and dynamics of Polycomb repressive
H2A ubiquitylation and hox gene silencing. J. Biol. 46. Papp, B. & Muller, J. Histone trimethylation and the complexes during embryonic stem cell differentiation.
Chem. 281, 22537–22544 (2006). maintenance of transcriptional ON and OFF states by Mol. Cell. Biol. 28, 2884–2895 (2008).
23. Wu, X. et al. Cooperation between EZH2, trxG and PcG proteins. Genes Dev. 20, 2041–2054 71. Sing, A. et al. A vertebrate Polycomb response
NSPc1‑mediated histone H2A ubiquitination and (2006). element governs segmentation of the posterior
Dnmt1 in HOX gene silencing. Nucleic Acids Res. 36, 47. Walker, E. et al. Polycomb-like 2 associates with PRC2 hindbrain. Cell 138, 885–897 (2009).
3590–3599 (2008). and regulates rranscriptional networks during mouse This paper is the first report of the mammalian
24. Morey, L. et al. Nonoverlapping functions of the embryonic stem cell self-renewal and differentiation. PRE-like element. In contrast to D. melanogaster
Polycomb group Cbx family of proteins in embryonic Cell Stem Cell 6, 153–166 (2010). PREs, this element can recruit PRC1 but not PRC2.
stem cells. Cell Stem Cell. 10, 47–62 (2012). 48. Hunkapiller, J. et al. Polycomb-Like 3 promotes 72. Woo, C. J., Kharchenko, P. V., Daheron, L., Park, P. J.
25. O’Loghlen, A. et al. MicroRNA regulation of Cbx7 Polycomb repressive complex 2 binding to CpG Islands & Kingston, R. E. A region of the human HOXD cluster
mediates a switch of Polycomb orthologs during ESC and embryonic stem cell self-renewal. PLoS Genet. 8, that confers Polycomb-group responsiveness.
differentiation. Cell Stem Cell. 2012 10, 33–46 e1002576 (2012). Cell 140, 99–110 (2010).
(2012). 49. Musselman, C. A. et al. Molecular basis for 73. Ku, M. et al. Genomewide analysis of PRC1 and PRC2
26. Klauke, K. et al. Polycomb Cbx family members H3K36me3 recognition by the Tudor domain of PHF1. occupancy identifies two classes of bivalent domains.
mediate the balance between haematopoietic stem Nature Struct. Mol. Biol. 19, 1266–1272 (2012). PLoS Genetics 4, e1000242 (2008).
cell self-renewal and differentiation. Nature Cell Biol. 50. Friberg, A., Oddone, A., Klymenko, T., Müller, J. & 74. Mendenhall, E. M. et al. GC‑rich sequence elements
15, 353–362 (2013). Sattler, M. Structure of an atypical Tudor domain in recruit PRC2 in mammalian ES cells. PLoS Genet. 6,
27. Iwama, A. et al. Enhanced self-renewal of the Drosophila Polycomblike protein. Protein Sci. 19, e1001244 (2010).
hematopoietic stem cells mediated by the 1906–1916 (2010). 75. Lynch, M. D. et al. An interspecies analysis reveals a
Polycomb gene product Bmi‑1. Immunity. 21, 51. Shen, X. et al. Jumonji modulates Polycomb activity key role for unmethylated CpG dinucleotides in
843–851 (2004). and self-renewal versus differentiation of stem cells. vertebrate Polycomb complex recruitment. EMBO J.
28. van der Lugt, N. M. et al. Posterior transformation, Cell 139, 1303–1314 (2009). 31, 317–329 (2012).
neurological abnormalities, and severe hematopoietic 52. Peng, J. C. et al. Jarid2/Jumonji coordinates control of 76. Thomson, J. P. et al. CpG islands influence chromatin
defects in mice with a targeted deletion of the bmi‑1 PRC2 enzymatic activity and target gene occupancy in structure via the CpG-binding protein Cfp1. Nature
proto-oncogene. Genes Dev. 8, 757–769 (1994). pluripotent cells. Cell 139, 1290–1302 (2009). 464, 1082–1086 (2010).

REVIEWS
77. Blackledge, N. P. et al. CpG islands recruit a histone 102. Elderkin, S. et al. A phosphorylated form of Mel‑18 126. de Napoles, M. et al. Polycomb group proteins
H3 lysine 36 demethylase. Mol. Cell 38, 179–190 targets the Ring1B histone H2A ubiquitin ligase to Ring1A/B link ubiquitylation of histone H2A to
(2010). chromatin. Mol. Cell 28, 107–120 (2007). heritable gene silencing and X inactivation. Dev. Cell 7,
78. Dietrich, N. et al. REST-mediated recruitment of 103. Schumacher, A., Faust, C. & Magnuson, T. Positional 663–676 (2004).
Polycomb repressor complexes in mammalian cells. cloning of a global regulator of anterior-posterior 127. Wang, H. et al. Role of histone H2A ubiquitination in
PLoS Genet. 8, e1002494 (2012). patterning in mice. Nature 383, 250–253 (1996). Polycomb silencing. Nature 431, 873–878 (2004).
79. Arnold, P. et al. Modeling of epigenome dynamics 104. Voncken, J. W. et al. Rnf2 (Ring1b) deficiency causes 128. Bornemann, D., Miller, E. & Simon, J. The Drosophila
identifies transcription factors that mediate Polycomb gastrulation arrest and cell cycle inhibition. Proc. Natl Polycomb group gene Sex comb on midleg (Scm)
targeting. Genome Res. 23, 60–73 (2013). Acad. Sci. USA 100, 2468–2473 (2003). encodes a zinc finger protein with similarity to
80. Rinn, J. L. et al. Functional demarcation of active and 105. van Lohuizen, M. et al. Identification of cooperating polyhomeotic protein. Development 122, 1621–1630
silent chromatin domains in human HOX loci by oncogenes in Eμ-myc transgenic mice by provirus (1996).
noncoding RNAs. Cell 129, 1311–1323 (2007). tagging. Cell. 65, 737–752 (1991). 129. Peterson, A. J. et al. A domain shared by the
81. Tsai, M.‑C. et al. Long noncoding RNA as modular 106. Sparmann, A. & van Lohuizen, M. Polycomb silencers Polycomb group proteins Scm and Ph mediates
scaffold of histone modification complexes. Science control cell fate, development and cancer. Nature Rev. heterotypic and homotypic interactions. Mol. Cell.
329, 689–693 (2010). Cancer 6, 846–856 (2006). Biol. 17, 6683–6692 (1997).
82. Gupta, R. A. et al. Long non-coding RNA HOTAIR 107. Radulovic, V., de Haan, G. & Klauke, K. 130. Cao, R. et al. Role of histone H3 lysine 27 methylation
reprograms chromatin state to promote cancer Polycomb-group proteins in hematopoietic stem cell in Polycomb-group silencing. Science 298,
metastasis. Nature 464, 1071–1076 (2010). regulation and hematopoietic neoplasms. Leukemia 1039–1043 (2002).
83. Schorderet, P. & Duboule, D. Structural and 27, 523–533 (2013). 131. Czermin, B. et al. Drosophila Enhancer of Zeste/ESC
functional differences in the long non-coding RNA 108. Leung, C. et al. Bmi1 is essential for cerebellar complexes have a histone H3 methyltransferase
Hotair in mouse and human. PLoS Genet. 7, development and is overexpressed in human activity that marks chromosomal Polycomb sites.
e1002071 (2011). medulloblastomas. Nature 428, 337–341 (2004). Cell 111, 185–196 (2002).
84. Zhao, J., Sun, B. K., Erwin, J. A., Song, J.‑J. & Lee, J. T. 109. Vonlanthen, S. et al. The bmi‑1 oncoprotein is 132. Müller, J. et al. Histone methyltransferase activity of a
Polycomb proteins targeted by a short repeat RNA to differentially expressed in non-small cell lung cancer Drosophila Polycomb group repressor complex.
the mouse X chromosome. Science 322, 750–756 and correlates with INK4A–ARF locus expression. Cell 111, 197–208 (2002).
(2008). Br. J. Cancer 84, 1372–1376 (2001). 133. Kuzmichev, A., Nishioka, K., Erdjument-Bromage, H.,
85. Yap, K. L. et al. Molecular interplay of the noncoding 110. Bruggeman, S. W. et al. Bmi1 controls tumor Tempst, P. & Reinberg, D. Histone methyltransferase
RNA ANRIL and methylated histone H3 lysine 27 by development in an Ink4a/Arf-independent manner in a activity associated with a human multiprotein complex
Polycomb CBX7 in transcriptional silencing of INK4a. mouse model for glioma. Cancer Cell 12, 328–341 containing the Enhancer of zeste protein. Genes Dev.
Mol. Cell 38, 662–667 (2010). (2007). 22, 2893–2905 (2002).
86. Lee, J. T. Epigenetic regulation by long noncoding 111. Gargiulo, G. et al. In vivo RNAi screen for bmi1 targets 134. Pengelly, A. R., Copur, Ö., Jäckle, H., Herzig, A. &
RNAs. Science 338, 1435–1439 (2012). identifies TGF‑β/BMP–ER stress pathways as key Müller, J. A histone mutant reproduces the phenotype
87. Khalil, A. M. et al. Many human large intergenic regulators of neural- and malignant glioma-stem cell caused by loss of histone-modifying factor Polycomb.
noncoding RNAs associate with chromatin-modifying homeostasis. Cancer Cell 23, 660–676 (2013). Science 339, 698–699 (2013).
complexes and affect gene expression. Proc. Natl 112. Simon, J. A. & Lange, C. A. Roles of the Ezh2 histone 135. Cao, R. & Zhang, Y. SUZ12 is required for both the
Acad. Sci. USA 106, 11667–11672 (2009). methyltransferase in cancer epigenetics. Mut. Res. histone methyltransferase activity and the silencing
88. Zhao, J. et al. Genome-wide identification of 647, 21–29 (2008). function of the EED–EZH2 complex. Mol. Cell 15,
Polycomb-associated RNAs by RIP–seq. Mol. Cell 40, 113. Sneeringer, C. J. et al. Coordinated activities of 57–67 (2004).
939–953 (2010). wild-type plus mutant EZH2 drive tumor-associated 136. Kim, H., Kang, K. & Kim, J. AEBP2 as a potential
89. Kanhere, A. et al. Short RNAs are transcribed from hypertrimethylation of lysine 27 on histone H3 targeting protein for Polycomb repression complex
repressed Polycomb target genes and Interact with (H3K27) in human B‑cell lymphomas. Proc. Natl Acad. PRC2. Nucl. Acids Res. 37, 2940–2950 (2009).
Polycomb repressive complex‑2. Mol. Cell 38, Sci. USA 2010 107, 20980–20985 (2010). 137. Ciferri, C. et al. Molecular architecture of human
675–688 (2010). 114. McCabe, M. T. et al. Mutation of A677 in histone Polycomb repre ssive complex 2. eLife 1, e00005
90. Ebert, A. et al. Su(var) genes regulate the balance methyltransferase EZH2 in human B‑cell lymphoma (2012).
between euchromatin and heterochromatin in promotes hypertrimethylation of histone H3 on 138. Klymenko, T. et al. A Polycomb group protein complex
Drosophila. Genes Dev. 18, 2973–2983 (2004). lysine 27 (H3K27). Proc. Natl Acad. Sci. USA 109, with sequence-specific DNA-binding and selective
91. Ketel, C. S. et al. Subunit contributions to histone 2989–2994 (2012). methyl-lysine-binding activities. Genes Dev. 20,
methyltransferase activities of fly and worm Polycomb 115. McCabe, M. T. et al. EZH2 inhibition as a therapeutic 1110–1122 (2006).
group complexes. Mol. Cell. Biol. 25, 6857–6868 strategy for lymphoma with EZH2‑activating 139. Oktaba, K. et al. Dynamic regulation by Polycomb
(2005). mutations. Nature 492, 108–112 (2012). group protein complexes controls pattern formation
92. Schmitges, F. W. et al. Histone methylation by PRC2 is 116. Knutson, S. K. et al. A selective inhibitor of EZH2 and the cell cycle in Drosophila. Develop. Cell 15,
inhibited by active chromatin marks. Mol. Cell 42, blocks H3K27 methylation and kills mutant lymphoma 877–889 (2008).
330–341 (2011). cells. Nature Chem. Biol. 8, 890–896 (2012). 140. Mendenhall, E. M. et al. GC‑rich sequence elements
93. Yuan, W. et al. H3K36 methylation antagonizes References 115 and 116 show that treatment with recruit PRC2 in mammalian ES cells. PLoS Genet. 6,
PRC2‑mediated H3K27 methylation. J. Biol. Chem. small-molecule inhibitors is a promising therapeutic e1001244 (2010).
286, 7983–7989 (2011). strategy for lymphomas that have catalytically This study shows that GC‑rich elements that lack
94. Yuan, W. et al. Dense chromatin activates Polycomb hyperactive variants of EZH2. transcription activator-binding sites can
repressive complex 2 to regulate H3 lysine 27 117. Simon, C. et al. A key role for EZH2 and associated autonomously recruit mammalian PRC2, even when
methylation. Science 337, 971–975 (2012). genes in mouse and human adult T‑cell acute these elements are derived from bacterial
95. Margueron, R. et al. Role of the Polycomb protein EED leukemia. Genes Dev. 26, 651–656 (2012). genomes.
in the propagation of repressive histone marks. 118. Chan, K. M. et al. The histone H3.3K27M mutation in 141. Vella, P., Barozzi, I., Cuomo, A., Bonaldi, T. & Pasini, D.
Nature 461, 762–767 (2009). pediatric glioma reprograms H3K27 methylation and Yin Yang 1 extends the Myc-related transcription
References 92–95 report the effects of gene expression. Genes Dev. 27, 985–990 (2013). factors network in embryonic stem cells. Nucleic Acids
pre-existing H3K27, H3K36 and H3K4 methylation 119. Lewis, P. W. et al. Inhibition of PRC2 activity by a Res. 40, 3403–3418 (2012).
marks and of nucleosome density on the catalytic gain‑of‑function H3 mutation found in pediatric 142. Scheuermann, J. C. et al. Histone H2A deubiquitinase
activity of PRC2. glioblastoma. Science 340, 857–861 (2013). activity of the Polycomb repressive complex PR–DUB.
96. Kagey, M. H., Melhuish, T. A. & Wotton, D. 120. Xu, K. et al. EZH2 oncogenic activity in castration- Nature 465, 243–247 (2010).
The Polycomb protein Pc2 is a SUMO E3. Cell 113, resistant prostate cancer cells is Polycomb- 143. Yang, L. et al. ncRNA- and Pc2 methylation-dependent
127–137 (2003). independent. Science 338, 1465–1469 (2012). gene relocation between nuclear structures mediates
97. Kagey, M. H., Melhuish, T. A., Powers, S. E. & This paper reports a surprising PRC2‑independent gene activation programs. Cell 147, 773–788 (2011).
Wotton, D. Multiple activities contribute to Pc2 E3 role of EZH2 in the proliferation of castration- 144. Schuettengruber, B., Chourrout, D., Vervoort, M.,
function. EMBO J. 24, 108–119 (2005). resistant prostate cancer cells. Leblanc, B. & Cavalli, G. Genome regulation by
98. Zhang, H. et al. SUMO modification is required for 121. Lee, J. M. et al. EZH2 generates a methyl degron that Polycomb and Trithorax proteins. Cell 128, 735–745
in vivo Hox gene regulation by the Caenorhabditis is recognized by the DCAF1/DDB1/CUL4 E3‑ubiquitin (2007).
elegans Polycomb group protein SOP‑2. ligase complex. Mol. Cell 48, 572–586 (2012). 145. Schwartz, Y. B. and Pirrotta, V. Polycomb silencing
Nature Genet. 36, 507–511 (2004). 122. Shao, Z. et al. Stabilization of chromatin structure by mechanisms and the management of genomic
99. Kang, X. et al. SUMO-specific protease 2 is essential PRC1, a Polycomb complex. Cell 98, 37–46 (1999). programmes. Nature Rev. Genet. 8, 9–22 (2007).
for suppression of Polycomb group protein-mediated 123. Francis, N. J., Saurin, A. J., Shao, Z. & Kingston, R. E.
gene silencing during embryonic development. Reconstitution of a functional core Polycomb Acknowledgements
Mol. Cell 38, 191–201 (2010). repressive complex. Mol. Cell 8, 545–556 (2001). The work in the Y.B.S. laboratory is supported by grants from
100. Ismail, I. H. et al. CBX4‑mediated SUMO modification 124. Saurin, A. J., Shao, Z., Erdjument-Bromage, H., the Swedish Research Council, Carl Tryggers Foundation,
regulates BMI1 recruitment at sites of DNA damage. Tempst, P. & Kingston, R. E. A. Drosophila Polycomb Kempestiftelserna, Erik Philip-Sörensens Stiftelse and
Nucleic Acids Res. 40, 5497–5510 (2012). group complex includes Zeste and dTAFII proteins. European Network of Excellence EpiGeneSys. The research of
101. Voncken, J. W. et al. MAPKAP Kinase 3pK Nature 412, 655–660 (2001). V.P. is supported by the US National Institutes of Health.
phosphorylates and regulates chromatin association 125. Buchwald, G. et al. Structure and E3‑ligase activity of
of the Polycomb group protein Bmi1. J. Biol. Chem. the Ring–Ring complex of Polycomb proteins Bmi1 Competing interests statement
280, 5178–5187 (2005). and Ring1b. EMBO J. 25, 2465–2474 (2006). The authors declare no competing interests.

REVIEWS
High-resolution network biology:

connecting sequence with function
Colm J. Ryan1,2, Peter Cimermančič3, Zachary A. Szpiech3, Andrej Sali3–5,
Ryan D. Hernandez3,5,6 and Nevan J. Krogan5,7,8
Abstract | Proteins are not monolithic entities; rather, they can contain multiple domains
that mediate distinct interactions, and their functionality can be regulated through
post-translational modifications at multiple distinct sites. Traditionally, network biology
has ignored such properties of proteins and has instead examined either the physical
interactions of whole proteins or the consequences of removing entire genes. In this Review,
we discuss experimental and computational methods to increase the resolution of protein–
protein, genetic and drug–gene interaction studies to the domain and residue levels. Such
work will be crucial for using interaction networks to connect sequence and structural
information, and to understand the biological consequences of disease-associated
mutations, which will hopefully lead to more effective therapeutic strategies.
1
School of Medicine and A central challenge in biology is to understand how gen- protein interaction screens, in which the typical reported
Medical Science, University
otypes map to phenotypes. This mapping is complicated result is of the form ‘protein A interacts with proteins
College Dublin.
2
Complex and Adaptive
by the fact that genes, and their protein products, do not B and C’. Similarly, genetic and drug–gene (chemoge-
Systems Laboratory, function independently of each other. Perturbations of netic) interaction screens typically report on the con-
University College Dublin, multiple distinct genes can result in similar phenotypes sequences of removing a gene altogether rather than
Dublin 4, Ireland. (known as locus heterogeneity), whereas some pheno- on the effect of mutating specific residues. In isolation,
3
Department of Bioengineering
and Therapeutic Sciences,
types may only be observed in the presence of combina- these approaches can be used to assign function to
University of California, torial perturbations (a type of epistasis1). Consequently, whole genes or proteins but not to specific regions or
San Francisco, USA. in order to understand the genotype-to‑phenotype residues. However, mapping at such a high resolution
4
Department of problem, it would be beneficial to study proteins and is necessary for understanding how protein structure
Pharmaceutical Chemistry,
University of California,
genes in a network context. Since the turn of the century, relates to function. It is also necessary for elucidating
San Francisco, USA. high-throughput interaction mapping has emerged as how different mutations of the same gene may result
5
California Institute for an extremely useful approach for providing this context. in different phenotypic outcomes, which is particularly
Quantitative Biosciences Large-scale networks have been generated in various important in the context of understanding the conse-
(QB3), University of California,
San Francisco, USA.
model organisms, documenting which proteins physi- quences of genome sequence variation. Indeed, muta-
6
Institute for Human Genetics, cally interact, which gene pairs functionally interact and tions that result in the complete loss of a gene or severe
University of California, which genes functionally interact with specific drugs truncation of a protein are much rarer than those that
San Francisco, USA. (TABLE 1). These networks have been enormously valu- alter a single nucleotide or residue6 (BOX 1).
7
Department of Cellular and
Molecular Pharmacology,
able both for understanding the function of individual In this Review, we provide a description of three types
University of California, genes2 and for elucidating the organizing principles of of network biology screens: protein–protein, genetic
San Francisco, California biological systems3 (FIG. 1). However, a necessary limi- and drug–gene interactions (FIG. 1). These three net-
94158, USA. tation of most large-scale network biology screens is work types provide complementary views of the same
8
Gladstone Institutes,
San Francisco, California
that they treat proteins and genes as simple monolithic cellular components. Protein–protein interactions are
94158, USA. nodes in a network. In reality, most proteins are com- used to identify the pathways and complexes of the
Correspondence to N.J.K. posed of multiple domains and peptide motifs that can cell, and drug–gene interactions can be used to iden-
e-mail: bind to distinct partners4. Furthermore, their activity tify the activities that a pathway is involved in, whereas
nevan.krogan@ucsf.edu
doi:10.1038/nrg3574
and cellular localization can be dynamically regulated genetic interactions primarily report on the functional
Published online by various post-translational modifications (PTMs)5. Such dependencies within and between pathways. We show
7 November 2013 structural features of proteins are generally ignored by how the approaches used to map these interactions are

REVIEWS
Table 1 | Interaction networks in selected model organisms and in humans

Species Network type* Details Refs
Saccharomyces Y2H ~3,000 interactions; ~2,000 proteins 14
cerevisiae
AP–MS ~7,000 interactions; ~2,700 proteins 16
AP–MS ~500 complexes; ~2,700 proteins 17
Drug–gene ~6,000 genes; ~400 drugs or conditions 152
Genetic ~5.4 million measured interactions; ~4,500 genes 40
Schizosaccharomyces Genetic ~1.6 million measured interactions; ~2,400 genes 38
pombe
Drug–gene ~440 genes; 21 drugs or conditions 64
Drug–gene ~2,500 genes; 6 drugs or conditions 153
Caenorhabditis Genetic ~65,000 measured interactions; ~162 genes 50
elegans
Y2H ~3,800 interactions; ~2,600 proteins 154
Drosophila AP–MS ~550 complexes; ~5,000 proteins 155
melanogaster
Y2H ~4,800 filtered interactions; ~4,700 proteins 156
Genetic ~30,000 measured interactions; 93 genes 46
Genetic ~17,000 measured interactions; ~500 genes 157
Escherichia coli AP–MS ~6,000 interactions; ~1,800 proteins 158
Genetic ~235,000 measured interactions; ~820 genes 39
Drug–gene ~4,000 genes; 324 drugs or conditions 36
Homo sapiens Fractionation–mass ~14,000 interactions; ~3,000 proteins 159
spectrometry
Drug–gene 70 genes; 87 drugs 145
Genetic 878 validated interactions; 12 genes, each tested for 47
interactions using genome-wide RNA interference
Genetic Pairwise genetic interactions among a set of 60 genes through 45
double knockdown using RNA interference
AP–MS, affinity purification–mass spectrometry; Y2H, yeast two-hybrid. *In cases in which multiple networks of the same type
were available for a single species, details of the largest network are provided. For cases in which no network was clearly larger,
both networks are included.
being extended to identify the parts of proteins that are grouped into two distinct categories: those that seek to
responsible for specific interactions and to investigate identify direct ‘binary’ interactions, such as yeast two-
Epistasis
how different mutations of the same protein can result hybrid (Y2H)14 and protein complementation meth-
A phenomenon whereby the in different functional consequences (BOX 2). Finally, ods15, and those that identify co‑complex associations,
phenotype associated with a we aim to place into context these high-resolution net- such as the affinity purification–mass spectrometry
mutation is altered by the work biology approaches as a way to ultimately connect (AP–MS) approach16,17 (FIG. 1a). All of the methods for
presence or absence of
sequence with structure. detecting protein–protein interactions have different
additional mutations.
For brevity, we do not discuss gene regulatory net- strengths and weaknesses, and it should be noted that
Domains works, such as those derived from chromatin immuno- no high-throughput approach has perfect specificity and
Distinct functional or structural precipitation followed by sequencing experiments or sensitivity 18. Initial reports suggested that the results of
regions of a protein, which can from gene expression studies (reviewed in REFS 7,8), AP–MS experiments are of higher accuracy and higher
fold independently of the rest
of the protein. A protein may
or the determination and modelling of dynamic sig- reproducibility than Y2H methods19,20. However, subse-
contain several domains, and nalling networks (reviewed in REFS 9–11). Moreover, quent analyses argue that this is a bias of the methods
the same domain may be we do not discuss the identification of enzyme– that were used to assess quality and that, in reality, both
present in different proteins. substrate relationships, such as those between kinases and approaches favour the detection of different but comple-
their targets12. Similarly, we do not address approaches mentary types of interactions14. For example, Y2H can
Post-translational
modifications that seek to understand the functional changes that are identify transient and low-affinity interactions, whereas
(PTMs). The chemical caused by sequence variation in non-coding regions, AP–MS can identify more stable indirect interactions,
modifications of a protein as these have recently been reviewed elsewhere13. such as those that occur between proteins that belong to
after its translation, which can the same complex but that do not directly bind to each
change the enzymatic activity,
subcellular localization or
Interaction network primer other 14 (FIG. 1a). Large-scale interaction networks that use
interaction partners of the Protein–protein interactions. Experimental methods for both approaches have been generated in various model
protein. detecting protein–protein interactions can be generally organisms (TABLE 1). These networks are augmented by

REVIEWS
a Protein–protein b Genetic interactions c Drug–gene interactions Genetic interactions. Genetic or epistatic interactions
interactions report on functional interactions between mutations and
are identified when combinations of mutations pro-
E B A C A C duce a different phenotype than that expected from
Binary
the phenotypes of individual mutations 29. Although
A G
higher-order interactions have been measured30,31, for
D C B D B D practical reasons experimental studies usually focus
on interactions between pairs of mutations. Typically,
Function X Function X genetic interactions are assigned a quantitative score on
the basis of how the growth of a double mutant differs
A E E from that expected based on the growth of each of the
two single mutants32. Negative interactions are identified
Co-complex H when the growth is worse than expected and are typically
F F
interpreted as revealing redundant or parallel pathways.
C
B Positive interactions are detected when the growth is bet-
E ter than expected and often identify factors that function
D
Function Y Function Y in linear pathways29 (FIG. 1b). The most extreme example
of a negative genetic interaction is synthetic lethality, in
Negative genetic Drug-inhibition which individually mutating two genes results in a viable
Positive genetic Negative drug–gene organism, but mutating the two genes in combination
Same pathway Positive drug–gene results in cell death. Both in yeast and in bacteria, high-
Same pathway
throughput genetic strategies have been developed to
create strains that contain pairs of mutations33–37, and
d Profile similarity large-scale genetic interaction screens have been car-
ried out using comprehensive gene deletion libraries38–41.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
A Strategies have also been developed to disrupt either the
B expression or the stability of essential genes, in which a
C gene deletion would result in the loss of viability 42–44. In
D metazoans, RNA interference (RNAi)-based approaches
E are more commonly used to simultaneously target two
genes, and both cell growth45–49 and whole-organism
Figure 1 | Interaction networks. a | In protein–protein interactions, protein A growth50,51 have been used as phenotypes.
interacts with proteins B, C, D and E, either directly (top panel)Nature
or within a complex
Reviews | Genetics As with protein–protein interactions, high-throughput
(bottom panel). b | In genetic interactions, genes A and B operate in a parallel pathway
efforts to map genetic interactions are complemented
to genes C and D, whereas genes E and F operate in a linear pathway or complex.
c | In drug–gene (chemogenetic) interactions, genes A and B operate in parallel to
by literature curation22, and the results are stored in cen-
a pathway (involving genes C and D) that is inhibited by drug G. Gene E works in a tralized databases24. Compared with protein–protein
linear pathway with gene F that is inhibited by drug H. d | Profile similarity is shown. interactions, efforts to predict genetic interactions
Rows represent genes and columns represent either genes (for genetic interaction computationally have been relatively limited — most
screens) or drugs (for drug–gene interaction screens). Coloured squares display methods focus on extending existing networks52,53
negative (blue), positive (yellow) or neutral (black) interaction scores. Genes A, B and C rather than on the de novo prediction of interactions.
all have similar interaction profiles, which suggests that they function in the same Nevertheless, there are a few examples of the de novo
pathway or complex. In an analogous manner, genes D and E have similar interaction prediction approach54–56.
profiles, which suggests that they function together. The tree on the right indicates a The set of partners that a gene interacts with — or
hierarchical clustering of the profiles.
in the case of quantitative screens, the set of scores for
these interactions — is known as an interaction pro-
file. The inhibition of genes that function in the same
extensive literature curation efforts21,22, in which indi- pathway or complex tends to result in similar genetic
vidual interactions from low-throughput experiments interaction profiles; that is, they interact with the same
are manually identified from the literature. Interactions sets of genes in the same way (FIG. 1d). Although it is
from both low- and high-throughput experiments possible to predict the function of a gene from that of
are stored in databases 23,24, which allow research- its genetic interaction partners, especially using positive
ers to investigate many of the interactions that have genetic interactions, it is more common to predict gene
been reported for a protein of interest. Computational function using genetic interaction profile similarity 29.
methods have also been extensively used to predict
Deletion libraries protein–protein interactions using various sequence25, Drug–gene interactions. Chemogenetic interactions
Sets of mutant strains, each structure26 and genomic data27. Protein–protein inter- report on the functional interactions between genes and
of which has a single gene action networks can be used to assign functionality to drugs. They are conceptually similar to genetic interac-
removed. The removed gene uncharacterized proteins through ‘guilt‑by‑association’, tions, and experimental screens for detecting such inter-
is typically replaced with an
antibiotic-resistant marker to
which essentially predicts the function of a protein on actions are carried out in a similar manner. However,
allow easy selection in genetic the basis of the function of its interacting partners rather than simultaneously perturbing two genes, a single
experiments. (reviewed in REFS 2,28). gene is perturbed in the presence of a compound. As

REVIEWS
Box 1 | How sequence variation has an effect on proteins

with genetic interactions, they can be either negative,
in which the combined effect of perturbing a gene in
High-throughput sequencing has facilitated the rapid collection of genetic variation the presence of a drug is more severe than expected, or
across human genomes. This includes both germline variation that is heritable and positive, in which the combined effect is less severe than
somatic mutations that occur in certain cell lineages (for example, as precursors to expected. Similarly to genetic interactions, these can be
cancer126). Coupled with technologies that enrich DNA samples for protein-coding
interpreted as perturbing parallel or linear pathways
regions of the genome, exome sequencing has provided a wealth of information about
genetic variants that potentially affect protein function. For somatic mutations in cancer,
respectively (FIG. 1c). However, owing to both off-target
the range of effects is highly diverse and dependent on both cancer type and exposure to and nonspecific effects, such interpretations may be an
carcinogens (for example, tobacco smoke in lung cancers and ultraviolet radiation in skin oversimplification. It is important to note that a drug–
cancers)126. For germline mutations, most functional coding variation is rare127 and is gene interaction does not imply that the drug physically
specific to single populations128. To illustrate the effect of sequence variation on proteins, binds to the protein product of that gene. Rather, the
we summarize the distribution of single-nucleotide variants across three possible phenotype that is associated with the perturbation of the
categories: nonsense, missense and synonymous. Numerous computational techniques gene is modified by the presence of the drug. This can
have recently been developed to predict the functional significance of amino acid be because the drug directly binds to the encoded pro-
substitutions; here, we use the PolyPhen‑2 program to categorize missense variants as tein, but it is more frequently because the drug induces
‘benign’, ‘possibly damaging’ or ‘probably damaging’ (REF. 118).
a cellular state in which the requirement for the protein
Large-scale sequencing efforts, such as the 1,000 Genomes Project (TGP)129, have
amassed a tremendous amount of data by sequencing thousands of individuals and have
is altered. For example, many genes that are involved in
had an early emphasis on exome sequencing. If mutations were completely random, we DNA damage repair show negative interactions with
would then expect nonsense and missense mutations to collectively make up ~72% of all the DNA-damaging agent methyl methanesulphonate
coding variants observed, with a substantial fraction of these probably affecting protein (MMS)57, not because they directly bind to MMS but
function (see the figure, part a). This is nearly the case for the rarest of variants in the TGP because their functionality becomes more important
(global frequency <0.1%), for which ~63% of variants are either nonsense or missense. in the presence of the induced DNA damage. Gene
However, purifying selection is an efficient evolutionary force that purges deleterious function can be either directly inferred from interac-
variation or that at least restricts them from reaching high frequency. Thus, almost all tions with a specific drug — for example, interaction
common amino acid variation is predicted to have no functional effect. with MMS could indicate that the gene functions in the
In addition to nucleotide substitutions, short insertions and deletions (indels) can also
DNA damage response — or indirectly inferred through
affect protein function. Frameshift indels (that is, indels with lengths that are not
multiples of three) may be particularly deleterious, as they can have downstream effects
profile similarity, as genes in the same pathway tend to
during translation. The signature of purifying selection that operates against frameshift interact with the same drugs (FIG. 1d).
indels shows that the percentage of indels in the TGP129 that alter the reading frame of a
protein decreases as the global allele frequency increases — from 66% for the rarest Network integration. Each of the three interaction types
indels to 42% for the most common indels (see the figure, part b). discussed above provides mostly orthogonal informa-
tion of the same cellular components. Consequently, by
a Missense integrating multiple network types it is possible to obtain
Synonymous Benign Possibly Probably Nonsense insights that are not obvious from analysing a single net-
damaging damaging work in isolation. As these integrative approaches have
100 been reviewed elsewhere58–60, we only mention a single
90 example of the use of integrating each pair of networks
here. Protein–protein and genetic interaction data have
Percentage of variants
80
70 been integrated by various groups to identify functional
60 modules; that is, sets of proteins that are physically con-
50 nected and that show similar genetic interaction profiles.
40
In addition to improving the identification of known
30
complexes, this approach has revealed pairs of com-
20
10
plexes that are linked by either all negative or all positive
0 genetic interactions, which suggest parallel and linear
Random <0.001 0.001–0.01 0.01–0.10 >0.10 dependencies, respectively 58,61. Similarly, others have
coding
variants TGP exome variants binned by allele frequency integrated protein complexes with chemogenetic inter-
actions to identify conditionally essential complexes62
b Non-frameshift Frameshift (the members of which all show negative interactions
100
with a particular drug), which suggests that the func-
90 tion of the entire complex is required in the presence of
80 that drug. Finally, genetic and chemogenetic interaction
Percentage of indels
70 profiles have been successfully integrated to improve the

60 identification of drug targets63,64.
50
40 High-resolution protein–protein interactions
30 Identifying which parts of a protein are responsible
20
for different interactions is an important step towards
10
predicting how its function will be affected by differ-
0
<0.001 0.001–0.01 0.01–0.10 >0.10 ent mutations, as well as for understanding how a single
TGP exome indels binned by allele frequency protein can carry out multiple different functions.


REVIEWS
Box 2 | Understanding the consequences of mutations using network biology

a b c d
Gene deletion or
gene knockdown
Nonsense mutation
Genotype Wild type or out-of-frame indel Missense mutation 1 Missense mutation 2
Structure-destabilizing
mutation
P
Protein–protein
interaction
Genetic
interaction
Drug–gene
interaction
Positive genetic or Negative genetic or

Protein Gene Drug
drug–gene interaction drug–gene interaction
Traditionally, network biologists have used wild-type proteins to interrogate protein–protein interaction networks
(see the figure, part a) and either complete gene deletion or gene knockdown to investigate genetic and drug–gene
interaction networks (see the figure, part b). These approaches illuminate the global functions and interactions of
proteins, but they do not provide much information about which parts of the protein are responsible for different
interactions and how the function of a protein will be altered by different mutations. Some types of mutations, such
as extreme truncations that result from nonsense mutations or complete structure disruption that results from
frameshift insertions and deletions (indels), may be adequately modelled by the gene-knockout or gene-knockdown
approach (see the figure, part b). However, others types of mutations, including missense mutations that do not
destabilize the structure, require more detailed analyses (see the figure, parts c and d). The different screening
Exome sequencing approaches may offer different insights into the consequences of a single mutation; for example, missense
The targeted sequencing of mutation 1 results in no apparent change to the protein–protein interaction network but leads to unique genetic and
only known protein-coding
drug–gene interactions. Combining the three approaches may offer a more direct insight into how genotype maps to
regions.
phenotype; for example, missense mutation 2 results in the loss of a physical interaction with protein P, a positive
Nonsense genetic interaction with the gene coding for protein P and an increased sensitivity to drug D.
Pertaining to a mutation that
changes an amino acid codon
to a stop codon. Domain–domain interactions. One strategy to nar- are involved in Caenorhabditis elegans early embryogen-
row down the region of a protein that is responsible esis65, including members of the nuclear pore and the
Missense
Pertaining to a mutation for specific interactions is to screen multiple fragments centrosome. An average of 40 different bait protein frag-
that changes the encoded of the same protein65–70. By comparing which fragments of ments for each of these 749 genes was screened against
amino acid. a protein successfully interact with a given partner, it a library of full-length prey proteins. This approach
may be possible to determine which regions are respon- was shown to be more sensitive than screening full-
Synonymous
Pertaining to a mutation that
sible for that specific interaction (FIG. 2a). For example, length proteins (that is, more interactions were iden-
does not change the encoded a fragment-based Y2H approach was used to create a tified) and was not associated with an obvious loss of
amino acid. protein–protein interaction network for 749 factors that specificity (that is, the interactions did not seem to be

REVIEWS
a RAN-1 RAN-1
NPP-9 RAN-1
RANBP1 RANBP1
1 NPP-9 860
b EGL-1 CED-4 EGL-1 CED-4
CED-9 CED-9
F25F8.1 SPD-5 F25F8.1 SPD-5
EGL-1 CED-4
CED-9 Val200
Asp79 CED-4
F25F8.1 SPD-5 binding
site
Pro106
EGL-1 CED-4 Arg77
CED-9 SPD-5 Phe100

binding Gly82
F25F8.1 SPD-5 site? Gln110
c
RAD53-11
POL30-8
MMS22
RTT106
RTT101
RTT109
MRE11
RAD50
RAD17
RAD24
CTF18
MMS1
DDC1
DCC1
MEC3
CAC2
RAD9
HPC2
HHT1
HHT2
HHF1
HHF2
ESC1
XRS2
ASF1
CTF8
CTF4
BIM1
MSI1
RLF2
HIR1
HIR3
HIR2
RFC5-DAmP pol30-8
POL30-DAmP
ELG1
POL32
RAD27
POL30-79
POL30-8 Cac2 Msi1
CAC2
RLF2 Rlf2
MSI1
CAF1
MRX HIR-C H3 DNA damage H3K56 Chromatid pol30-79 complex
and checkpoint pathway cohesion
H4
d
RNA polymerase II
tRNA modification DNA damage Spindle Dynein and

pathway response checkpoint dynactin Prefoldin
Rpb1-F1086S
Rpb2-G977S
S score
–7.0 –4.7 –2.3 0 2.3 4.7 7.0
Rpb1
Rpb2 Negative interactions Positive interactions
(synthetically sick or lethal) (suppressive or epistatic)
870 | DECEMBER 2013 | VOLUME 14 Nature Reviews | Genetics

www.nature.com/reviews/genetics

REVIEWS
◀ Figure 2 | High-resolution physical and functional interactions. a | In a Edgetic perturbations. An alternative approach to iden-
Caenorhabditis elegans study, the interactions of NPP‑9 with GTP-binding nuclear tify regions of proteins that are responsible for particular
protein RAN‑1 were studied to a resolution at the domain level by screening a high interactions is to identify ‘edgetic’ mutations that alter
number of fragments for each protein using a yeast two-hybrid approach65. Shown some, but not all, of the interactions of a protein71–78.
on the right of the figure are minimal regions of interactions that have been
Analogous to forward genetics and reverse genetics, this
identified on NPP‑9. Each horizontal line corresponds to a fragment which interacts
with RAN‑1; the red RAN-binding protein 1 (RANBP1) boxes correspond to the
approach can be used in two different ways: ‘forward’
locations of the known RAN-binding domains. b | ‘Edgetic’ interactions of apoptosis edgetics, the goal of which is to identify the interactions
regulator CED‑9 were identified in a C. elegans study that was based on a reverse that are perturbed by a specific mutation of interest;
yeast two-hybrid system78. Mutations that specifically perturb the interaction of and ‘reverse’ edgetics, in which mutations that perturb
CED‑9 with CED‑4 (blue star), with SPD‑5 (red star) or with both (purple star) are specific interactions are identified. Such approaches can
shown. None of these mutations disrupt interactions of CED‑9 with EGL‑1 or be especially informative when integrated with struc-
F25F8.1. Mutated residues on the CED‑9 structure are highlighted; the CED‑4 tural models of the protein of interest — by mapping
binding site confirmed from this study and a potential binding site for SPD‑5 are also the mutated residues that perturb a specific interaction
shown. c | In a Saccharomyces cerevisiae study, different alleles of POL30 show onto the three-dimensional structure of the protein, the
different genetic interaction profiles; the tree to the right of the profile indicates a
regions that are important for the interacting interface
hierarchical clustering of the profiles. pol30‑79 behaves in a similar way to
pol30‑DAmP — an allele with decreased mRNA expression — which suggests that it
can be inferred.
affects the core function of the protein. pol30‑8 has similar behaviour to components In a pioneering forward edgetics study, 35 different
of the chromatin assembly factor 1 (CAF1) complex, which suggests that both the mutants of the Saccharomyces cerevisiae actin protein
specific interaction and the common function of Pol30 and this complex are were screened using a Y2H approach71. This revealed
perturbed. Shown on the right are the positions of the mutated residues of the three distinct types of mutations: those that disrupted all,
Pol30 complex, as well as the subunits of the CAF1 complex. d | Mutations of none or specific interactions. The use of this approach
residues that are on distinct subunits of RNA polymerase II but that are proximal in to aid structural studies was shown by mapping onto the
three-dimensional space show similar genetic interaction profiles in S. cerevisiae112. actin structure the locations of mutations that disrupted
Highlighted on the Pol II structure are the locations of mutations to the Rpb1 subunit its interaction with a specific protein, which revealed a
(purple) and the Rpb2 subunit (green). These mutations show negative genetic
putative binding site. The use of forward edgetics in the
interactions with genes that are involved in the tRNA modification pathway and the
DNA damage response, and positive genetic interactions with genes that are
context of human disease was highlighted in a system-
involved in the spindle checkpoint and the prefoldin complex. H3K56, histone H3 atic study of five proteins that are associated with dis-
lysine 56; MRX, the S. cerevisiae homologue of the mammalian MRE11–RAD50–NBS1 tinct Mendelian diseases77. These proteins were selected
(MRN) DNA damage repair complex. Part a is modified, with permission, from REF. 65 because they each had multiple distinct disease-associ-
© (2008) Elsevier Science. Part b is modified, with permission, from REF. 78 © (2009) ated in‑frame mutations and numerous known protein
Macmillan Publishers Ltd. All rights reserved. Part c is modified, with permission, interactions. Twenty-nine alleles of these five proteins
from REF. 29 © (2010) Elsevier Science. were screened using the Y2H method to identify any
perturbed interactions. Only 5 of these alleles resulted in
the loss of all interactions, whereas 16 of these resulted
enriched for false positives). Owing to the high num- in the loss of specific interactions, which indicates that
ber of fragments screened for each protein, the authors their associated disease phenotypes may be caused by a
were able to identify the minimal region of interaction loss of specific protein–protein interactions rather than
— the smallest region shared by all fragments for which by a total loss of functionality.
the interaction was observed — for many protein– A novel reverse edgetics experimental strategy 78
protein interactions. Furthermore, they could identify based on the reverse Y2H system was used to identify
Forward genetics
multiple minimal regions of interaction on a single pro- edgetic mutations of CED‑9, the C. elegans orthologue
The classical genetics tein that corresponded to distinct interaction interfaces of the human oncoprotein B cell lymphoma-2 (BCL‑2)78
approach, in which the (FIG. 2a). Only a limited number of proteins were identi- (FIG. 2b). A library of full-length mutant alleles encod-
genotypes that are associated fied using exclusively full-length fragments, and these ing CED‑9 was used to identify mutants that perturbed
with particular phenotypes are
proteins were, on average, quite short in length — this interactions with any of four identified protein interac-
identified.
led the authors to suggest that, in these cases, the entire tion partners. This approach identified 72 distinct alleles,
Reverse genetics protein consists of a single globular domain that cannot 30 of which disrupted all interactions (that is, they were
The inverse approach to fold when truncated. non-edgetic), and the rest perturbed a specific subset
forward genetics, in which The value of this approach in the context of human of interactions (that is, they were edgetic). By mapping
phenotypes that are
associated with a particular
disease was demonstrated in a study of the Huntington’s onto the CED‑9 structure the mutated residues result-
genotype are identified. Such disease-associated protein hungtingtin (HTT)67. A Y2H ing from these alleles, the authors found that edgetic
approaches are exemplified by screen that was carried out with multiple fragments of residues are preferentially located in accessible regions,
studies of knockout mutants. the human HTT protein identified various novel inter- which suggests that they perturb specific interfaces,
action partners. Follow‑on studies showed that one of whereas non-edgetic residues were more likely to be
Alleles
Multiple forms of a gene that these proteins, ARF GTPase-activating protein GIT1, found in the core, where they may destabilize the struc-
occur at a specific locus. influenced HTT aggregation — a phenotype that is ture of the protein. Furthermore, in human cells edgetic
linked to disease progression. Both GIT1 and HTT are mutants could be expressed at levels that are similar
Reverse Y2H large multidomain proteins, but by screening multiple to those of wild-type CED‑9, whereas the non-edgetic
(Reverse yeast two-hybrid).
A genetic strategy to select
fragments of each protein, the authors were able to nar- mutants were expressed at much lower levels, suggesting
against specific protein– row down the putative interacting regions to the amino that they encode unstable proteins. As in the study of
protein interactions. terminus of HTT and the carboxyl terminus of GIT1. actin mutants in yeast, mapping the location of residues

REVIEWS
that are associated with edgetic perturbations onto the atomic model of interactions, as shown by the structure
CED‑9 structure suggested putative binding sites for of the bacterial type II pilus system, the subunits of which
specific interactions. For an interacting pair for which were assembled from sparse NMR data91. Integrative
a co‑crystal structure was available (CED‑4–CED‑9), structure determination makes it easy to take advantage
this putative binding site agreed well with the known of all data, which results in models that are generally
interaction interface (FIG. 2b). more accurate, precise and complete than those that are
By design, most studies of edgetic protein–protein based on any individual data set92.
interactions have focused on identifying mutations that Alternatively, more coarse-grained approaches may
lead to the loss of interactions. An initial set of proteins be used; for example, interactions from multiple data-
that interact with the wild-type protein of interest is bases were integrated with additional functional data
often experimentally derived, and further analyses are to create a high-confidence protein–protein interaction
then carried out to identify which of these interactions network in S. cerevisiae83. The protein family interac-
are lost by a specific mutation. However, it is likely that tions (iPfam) database93 — which contains interactions
many mutations result in a gain of function, which between pairs of Pfam94 domains that are supported by
leads to new interactions with previously unidentified at least one representative structure in the Protein Data
partners. Investigating these mutations in the context Bank95 — was used to identify a set of structurally char-
of cancer is likely to be of particular interest, as many acterized domain–domain interactions. The protein–
cancer-associated mutations are believed to confer gain- protein interaction network was then filtered to include
of-function effects; for example, the tumour suppressor only interactions between proteins containing iPfam
p53 is one of the most commonly mutated proteins in domains that were known to interact. This ‘structurally
human cancer. Unusually for a tumour suppressor, most resolved’ interactome allowed the authors to distinguish
cancer-associated mutations of this protein are missense between ‘simultaneously possible’ interactions, in which
mutations rather than protein truncations or a complete protein A interacts with proteins B and C through dis-
gene loss. Several of these mutations result in both the tinct domains, and ‘mutually exclusive’ interactions, in
loss of tumour suppression functionality and the gain which protein A interacts with B and C using the same
of protein–protein interactions79. Specifically, mutant domain83. Notably, many of the previously observed rela-
forms of p53 can interact with various transcription tionships between network topology and other genomic
factors, which potentially results in substantially altered features could be better explained using structural prop-
transcriptional regulation. There are other examples in erties. For example, the authors elaborated on a previous
the literature of a single disease-associated mutation observation that ‘hubs’ — proteins that are involved in
causing both loss and gain of specific protein–protein many interactions — are more likely to be essential than
interactions that result in changes in functionality 80. random proteins, and they identified that hubs with
multiple interaction interfaces are twice as likely to be
Computational approaches essential as hubs with only a single interaction interface.
Experimentally mapping interactions of multiple vari- The same approach was recently used to create a
ants of each protein (for example, protein fragments or structurally resolved human protein–protein interac-
point mutants) is inherently more costly than screening tion map onto which disease-associated mutations could
a single variant. Consequently, an attractive alternative be mapped87. Importantly, there was an enrichment of
method for determining the domains and residues that these mutations on protein–protein interaction inter-
are involved in specific interactions may be to com- faces, which again suggests that many diseases are the
pute a three-dimensional model of the correspond- result of perturbed protein–protein interactions (that is,
ing macromolecular assembly by integrating existing edgetic perturbations). Furthermore, in cases in which
protein–protein interaction networks with additional multiple mutations are found on the same protein, muta-
information81–88. Depending on the quantity and quality tions of different interacting interfaces were significantly
of the available information, such integrative model- more likely to be associated with different diseases than
ling can map interactions at the resolution of subunits, mutations that affect the same interface. Finally, muta-
domains or even individual residues (BOX 3). For exam- tions that affect two distinct interacting proteins on their
ple, both an interaction map and the localization of 456 corresponding interacting domains are more likely to
constituent proteins in the yeast nuclear pore complex cause the same disease than mutations on domains that
were determined by modelling that was based on low- do not mediate their interaction87.
resolution information from multiple sources, including A disadvantage of these structurally resolved
affinity purification of protein subcomplexes, sedimenta- approaches is that they require a known three-dimen-
tion analysis and electron microscopy 89. Similarly, the 26S sional structure onto which interacting domains can be
proteasome structure, which was determined from an mapped; such information is only available for a small
electron microscopy map of the whole assembly, from proportion of domain–domain pairs. One alterna-
proteomic information and from the subunit comparative approach is to take an experimentally determined
tive models, revealed both the localization and interact- protein–protein interaction network and a list of the
ing interfaces at the resolution of individual domains and domains in each protein, and to use this information
even residues90. When atomic structures of individual to predict the domain pairs that are most likely to be
constituent proteins are available, even sparse and low- responsible for each interaction82,84–86. No structural
resolution data on the quaternary structure can lead to an information is used in this approach; instead, statistical

REVIEWS
Box 3 | Integrative structure determination of macromolecular assemblies

The most detailed information about interactions between Spatial data Subunit representation
proteins is provided by three-dimensional structures of
macromolecular assemblies. These structures generally X-ray crystallography
contribute to our understanding about how the assemblies
function and how they evolved, as well as how to control and
possibly to modify their functions. Unfortunately, it is often
difficult to solve these structures by traditional methods of
structural biology, such as X‑ray crystallography, NMR
spectroscopy and electron microscopy. The reasons for this
include the size, flexibility, transient nature, and
compositional and structural heterogeneity of the assemblies, Comparative modelling
as well as the need for pure samples of sufficient quantity. To
overcome these problems, integrative or hybrid approaches
that combine data from multiple methods through
computation were developed (reviewed in REFS 113,130,131).
The resolution of the resulting hybrid structural models
ranges from low, specifying only the positions of the protein
subunits, to high, specifying the positions of each atom.
The integrative approach iterates through four stages:
gathering structural information from as many sources as
Assembly density
possible; defining how to represent and evaluate models on
the basis of the available data; finding models that are
Enumeration of structures
consistent with the data; and analysing the input data as well
on a discrete grid
as the output models. As integrative models are computed
from all available data, they are often more accurate, precise
and complete than those produced by traditional methods.
Integrative modelling encourages the finding of all models
that fit the data, not only one such model. At least in
principle, it also facilitates the assessment of the data and the
models. Finally, integrative modelling can provide feedback
to guide future experiments, so that maximum model
improvement is achieved for minimal effort.
Various structures have been solved by integrative Localization of Rpn10 and Rpn13
approaches, including those of the bacterial type II pilus91,
chromatin segments132 and the yeast nuclear pore complex89.
We illustrate the integrative approach by its application to Analysis of best-scoring models
the regulatory particle of the 26S proteasome, which consists Model scoring
of 19 different protein subunits90 (see the figure). Structural 1,600,000
information was first gathered, including atomic models of 1,400,000
Number of models
subunits or their domains that were either determined by 1,200,000

X‑ray crystallography or computed by comparative modelling 1,000,000
based on known homologous structures; the shape of the 800,000 3.0 3.5 4.0 4.5 5.0
regulatory particle that was defined by a cryo‑electron 600,000
microscopy map at 8.4 Å resolution; the positions of two 400,000
subunits (Rpn10 and Rpn13) that were pinpointed in the Protein–protein interactions 200,000
cryo-electron microscopy electron-density maps of 0
1 5 10 15 20 25
proteasomes without these two subunits; the proximities Number of violated restraints
between pairs and larger subsets of subunits that were
Cluster analysis
defined in publicly available protein–protein interaction data, of best-scoring
including those from large-scale screens; and the proximities models
between specific residues across protein interfaces that were
defined by residue-specific inter-subunit crosslinks. Next, all
relative positions and orientations of subunits that minimally
violated the data were found by a sophisticated structural
Molecular architecture
sampling algorithm133. It turned out that a single cluster of of the 26S proteasome
solutions satisfied most of the data, thus providing a
Residue-specific crosslinking
structural model of the 26S proteasome. This model was used
to rationalize and to predict several aspects of the 26S
proteasome function. In addition to the assessment of the
model based on structural data that were not used in its
calculation, the model was most convincingly validated by a
completely independent structure determination based on
cryo‑electron microscopy maps for the entire regulatory
particle and several of its subcomplexes113.
Figure is modified, with permission, from REF. 90 © (2012) US
National Academy of Sciences.


REVIEWS
or machine-learning methods are applied to compare useful in this context 98–100, particularly for investigating
the number of observed interactions between proteins the functional consequences of PTMs98–102 that may not
that share a given domain pair with that expected from be detectable in the protein–protein interaction net-
the overall frequency of each domain in the network. work. This can be achieved by mutating the specific
In principle, because of their greater coverage, these residues that are subject to PTM, such that they mimic
approaches could have an advantage over others that either their modified or unmodified state. By compar-
require three-dimensional structural information. ing the profiles of mutants with different modification
However, their accuracy is difficult to assess, and there status, it is possible to identify functional interactions
is a poor overlap between different prediction methods that change owing to specific modifications.
even when the same input data were provided96. This approach was used to identify the functional
In addition to the identification of interactions interactions that are mediated by the phosphorylation
between pairs of globular domains, the computa- of Ies4, a subunit of the S. cerevisiae INO80 chromatin-
tional identification of interactions between globular remodelling complex 100. Five serine residues that are
peptide-recognition domains and short peptides is of differentially phosphorylated in response to MMS
great interest. Computational methods for predicting treatment were identified on this protein. Two mutant
these interactions, which are of particular importance forms of the protein were created, such that all five
for cellular signalling 4, have recently been reviewed serine residues mimicked either their phosphorylated
elsewhere97. or unphosphorylated state. No changes to the protein–
An interesting approach is to invert the problem — protein interaction network could be detected as a
rather than interpreting protein–protein interactions result of these mutations, but a genetic interaction
using structural information, it is possible to use the screen revealed subtle differences in the behaviour of
structural information to predict such interactions. A these two mutants. Only the phosphomimetic mutant
recent method showed that, by using a lenient threshold showed positive genetic interactions with genes that are
for structural similarity and by integrating orthogonal involved in the DNA damage checkpoint, which sug-
functional information about proteins, it is possible to gests that the phosphorylation of Ies4 mediates its role
computationally predict protein–protein interactions on in this process — a prediction that was ultimately con-
a proteome-wide scale26. The predicted interactions were firmed100. A similar approach was used to investigate
shown, at least for binary interactions in yeast, to be of the role of acetylation in regulating the functions of the
similar accuracy to experimental methods. In addition histone variant H2A.Z in S. cerevisiae 101. The amino-
to predicting protein–protein interactions for yeast and terminal tail of this histone contains four lysine resi-
humans, this approach provides a crude model of the res- dues that can be acetylated, which potentially regulate
idues and domains that are involved in these interactions. its function. A series of mutants were created: mutants
that were singly mutated (that is, only one of the four
High-resolution (chemo)genetic interactions lysine residues was mutated), mutants that were singly
Although the approaches detailed above may be used to acetylatable (that is, three of the four lysine residues
identify the protein–protein interactions that are per- were mutated), or mutants that were completely unacet-
turbed by a specific mutation, they cannot assess how ylatable (that is, all four lysine residues were mutated).
these mutations affect specific cellular phenotypes of Genetic interaction screening revealed that mutants
interest. Genetic and chemogenetic interaction screens that were singly mutated or singly acetylatable showed
may be used to close this gap between understanding the few interactions and behaved in a similar way to wild-
proximal mechanistic consequences of mutations and type controls. However, the completely unacetylatable
how the mutations ultimately affect specific phenotypes. mutant recapitulated a subset of the genetic interac-
Furthermore, in the case of a single protein that has sev- tions that are associated with a complete deletion of the
eral distinct functions in different pathways, genetic and H2A.Z gene, which suggests that the modifiable lysine
chemogenetic interactions can be used to identify the residues are internally redundant and that the protein
specific parts of the protein that carry out these func- can correctly function if any one of these lysine residues
tions even when no change to the physical interaction is acetylated. These results indicate that genetic inter-
network is detected. Currently, screens that analyse the action screening can be used not only to identify the
genetic and chemogenetic interaction profiles of mul- consequences of mutating specific residues but also to
tiple alleles of the same gene have been limited and identify how combinatorial perturbations of the same
almost exclusively carried out in yeast; see BOX 4 for a gene can result in unique outcomes. Genetic interac-
discussion of some studies using human systems. Some tion screening has also been used to study H2A.Z using
yeast screens have used multiple mutant alleles of the truncations of varying lengths in S. cerevisiae102 or using
same essential gene that either are temperature sensitive targeted mutation in Schizosaccharomyces pombe103.
Histone or have reduced mRNA expression40,43. Although such
A family of proteins approaches are valuable for exploring the functionality Edgetic interactions. A genetic interaction study 104 that
that package DNA into of essential genes, they usually address the problem at the mirrors the earlier experiments using Y2H screening of
nucleosomes. They consist whole-protein level and do not identify the parts of pro- actin point mutants71 identified gene deletion mutants
of a globular domain and
a tail that is subject to
teins that are responsible for specific interactions or func- that interact with haploinsufficient actin in S. cerevi‑
extensive post-translational tions. However, a few studies have shown that genetic siae. Six different point mutants of the actin gene were
modifications. and chemogenetic interaction profiling can be extremely then tested for genetic interactions with these gene

REVIEWS
deletion mutants, which revealed that different point close together on the protein surface resulted in similar
mutants interacted with different subsets of these part- genetic interaction profiles, which suggests that genetic
ners. Furthermore, an analysis of the actin structure interaction profiles could be integrated with structural
revealed that the mutations of two residues that were models to identify structure–function relationships.
Box 4 | Genetic and drug–gene interactions in human systems

Genetic interactions and disease
A major challenge in cancer therapeutics is to kill tumour cells without harming other cells in the body. One means to
achieve this is to exploit the genetic changes that distinguish cancer cells from normal cells and that may leave them
vulnerable to targeted treatments134. In this context, the identification of drugs or genes that show a strong negative
interaction (that is, synthetic lethality) with a specific oncogenic mutation has been a high priority135. Consequently,
much of the early drug–gene (chemogenetic) and genetic interaction screens in mammalian cells were carried out not
to dissect gene function but to identify potential targeted therapeutics.
To this end, various groups have used RNA interference (RNAi)-based approaches to identify genes that are only
essential in specific cancer cell lines136,137. By screening enough different cell lines, classical forward genetics approaches
may be used to identify statistical associations between specific mutations and their sensitivity to the knockdown of
specific genes. Such studies are promising but are complicated by the genetic heterogeneity of different cell lines. In the
presence of multiple mutations, which may themselves genetically interact, a clear relationship between genotype and
phenotype may be difficult to ascertain.
An alternative approach is to screen ‘isogenic’ cell lines that differ only by the mutation of a particular gene of
interest138–140. The use of this approach was demonstrated in a study that identified genes which selectively inhibit growth
in the presence of a specific activating point mutation (G13D) in the KRAS oncoprotein139. A genome-wide RNAi screen
identified hundreds of candidate genetic interaction partners for the mutated KRAS, which were significantly enriched
for components of the mitotic machinery. Subsequent analyses revealed that cells expressing KRAS‑G13D showed
increased sensitivity to an inhibitor of mitotic spindle function, which suggests that the KRAS oncogene causes increased
mitotic stress. Furthermore, specific inhibition of PLK1, which is a mitotic kinase that was shown to genetically interact
with KRAS‑G13D, resulted in reduced tumour growth in a mouse model. This result highlights the value of investigating
genetic interactions that are associated with specific cancer mutations, both for the identification of potential
therapeutic targets and for an improved understanding of the oncogenic state. Similar RNAi-based approaches have
recently been applied to the understanding of other diseases; for example, a recent RNAi screen identified genes that
suppress the phenotype associated with cells that express a fragment of the mutant form of huntingtin (HTT).
Interestingly, rather than identifying genes that inhibit cell growth, the authors identified genes that suppress caspase 3
activity, which is typically enhanced in cells that express the mutant HTT fragment141.
Drug–gene interactions and disease
As with genetic interactions, drug–gene interactions in cancer cell lines are of great interest owing to their potential
therapeutic implications. Various large-scale studies have screened hundreds of cancer cell lines against drug
libraries142–144. When combined with genotypic information, either standard forward genetics or more advanced
machine-learning approaches can be used to try to associate specific mutations with sensitivity to specific drugs. As with
genetic interaction screening, the genetic heterogeneity of different cancer cell lines can make the association of
specific drugs with specific mutations difficult; for example, one recent study noted that “single gene–drug associations
were only rarely able to explain the range of drug sensitivities observed across cell lines for any given drug” (REF. 143).
Again, the alternative is to use isogenic cell lines to interrogate the drug sensitivity of a specific mutation of interest145,146.
Such an approach was used to assess the ability of ~24,000 compounds to selectively kill 6 different tumorigenic cell lines
but not their isogenic non-tumorigenic counterparts146. A drug–gene interaction screen was recently carried out to
identify genes that, when inhibited, increase the sensitivity of KRAS-mutant cell lines to a specific drug (an inhibitor of
MEK, which is a protein in the mitogen-activated protein kinase effector pathway of KRAS signalling)147. Such hybrid
approaches may be more commonly used in the future to identify drug–gene interactions that are selectively lethal in the
presence of specific cancer-associated mutations.
Recent developments and future directions
Experimental methods have recently been developed to improve both the throughput and the accuracy of genetic
interaction mapping in mammalian cells45,47–49. In contrast to the studies of genetic interactions in cancer, in which a single
query mutation is screened for interactions with a large RNAi library, these methods allow comprehensive analyses of all
pairwise interactions between hundreds or thousands of genes. They have been used to carry out screens that are
analogous to those using gene deletions in yeast, which were used to identify the dependencies between genes that are
involved in chromatin regulation47–49 and ricin susceptibility45. Their reliance on RNAi knockdowns means that, at present,
they will be primarily used to address interactions that are associated with whole genes, rather than with specific alleles
of interest. However, recent improvements in genome editing and engineering, such as the transcription activator-like
effector nucleases (TALEN) and the clustered regularly interspaced short palindromic repeat (CRISPR)–Cas9
systems148–151, allow the rapid introduction of mutations of interest into human cells. By combining this technology with
RNAi-based genetic interaction screening and high-throughput chemogenetic interaction screening, it should be
possible to create interaction profiles for many different mutations of the same gene, analogous to the studies that have
been carried out in yeast. This will enable us to analyse structure–function relationships of human proteins at high
resolution and also to investigate the functional consequences of different mutations to the same disease-associated
protein (for example, KRAS).

REVIEWS
In a striking example from a quantitative genetic for example, mutations of residues in the lateral surface
interaction screen focused on factors that are involved in of histones typically show more phenotypes than muta-
chromosome biology in S. cerevisiae105, different alleles tions elsewhere in the structure. Similarly, residues that
of POL30 (the gene product of which is also known as are subject to PTMs have significantly more phenotypes
Pcna) showed radically different genetic interaction pro- than other residues. By contrast, mutations of residues
files29 (FIG. 2c). POL30 is a multifunctional essential gene in histone tails show significantly fewer phenotypes111.
that is involved in both chromatin assembly, and DNA A recent study genetically targeted a well-
replication and repair. For example, the pol30‑79 allele characterized, structurally defined protein machine
generated a similar interaction profile to pol30‑DAmP — RNA polymerase II. To this end, 53 different point
(that is, a mutant of Pol30 with decreased abundance by mutants in five evolutionarily conserved subunits of
mRNA perturbation), which suggests that the pol30‑79 S. cerevisiae Pol II were identified and subjected to quanti-
mutation has a general destabilizing effect on the pro- tative genetic interaction profiling 112. The resulting point-
tein. Another allele, pol30‑8, elicited a profile that is mutant epistatic miniarray profile (pE‑MAP) allowed
similar to that observed for deletions of members of the assignment of function to individual residues of this
the chromatin assembly factor 1 (CAF1) complex. complex by comparing profiles that are associated with
Furthermore, this allele showed strong positive genetic the point mutants of these residues with the existing pro-
interactions with members of the CAF1 complex, which files of gene deletion mutants. It uncovered a remarkable
suggests that these mutations perturbed the same func- coordination of many processes that Pol II is involved in,
tional pathway. Indeed, Pol30 and CAF1 physically including start-site selection, transcriptional elongation
interact, and a pol30‑8 mutant has previously been and mRNA splicing. Furthermore, it facilitated the dis-
shown to severely weaken this interaction106 but not its covery of new transcription factors and offered insights
interactions with other factors107. Thus, genetic interac- into how they function. Interestingly, there was a striking
tion screening can be used to investigate allele-specific correlation between the similarity of genetic interaction
edgetic perturbations. profiles and the proximity of the corresponding residues
in three-dimensional space, even when proximal residues
Comprehensive structure–function analyses. Although were in different protein subunits (FIG. 2d). It will be of
most studies have focused on a small number of alleles great interest to determine whether quantitative genetic
of a given gene, a few have highlighted the effectiveness of data, which are often simply based on colony size, can
screening large numbers of alleles of a single gene, be used for the structural modelling of macromolecular
especially when the results are integrated with struc- machinery 89,113, especially those that are not biochemi-
tural protein models. Histone proteins, in particular, cally tractable (such as membrane-associated complexes).
have been a primary focus for such interaction screens. The creation of mutant libraries in which a single
Indeed, various groups have created libraries of histone codon is edited is valuable but is also costly and time
alleles, in which specific residues (for example, those consuming. An alternative approach is to develop a
on the protein surface and those that are subject to strain in which the chromosomal copy of the gene to be
PTMs) have been systematically mutated, to facilitate mutated can be easily disabled, for example, by placing
the screening for drug sensitivity and other functional it under the control of a galactose-regulated promoter.
analyses108–110. In perhaps the most comprehensive func- Plasmids that contain mutated copies of the gene can
tional analysis of individual proteins that has so far been then be introduced and their fitness assessed. Recently,
carried out, a library of 486 alleles of the S. cerevisiae H3 by combining deep sequencing with a competitive
and H4 histones was created108. Each residue of the two growth assay 114, such an approach was used to measure
proteins was mutated one at a time: alanine residues the fitness of every possible point mutant of ubiquitin in
were systematically mutated to serine, whereas all other yeast 115. By expanding this approach to measure fitness
residues were mutated to alanine. Remarkably, only in the presence of different drugs or additional muta-
~10% of their point-mutant strains were completely tions, it should be possible to create high-resolution
inviable despite the very high interspecies sequence con- chemogenetic and genetic interaction profiles for all
servation. Each of the mutants was screened for growth point mutants of any yeast gene.
defects in 14 different conditions, including 5 drug
treatments, which allowed the fine-grained association Computational prediction of allele-specific interactions.
of regions of the protein structure to different functions. Surprisingly, even in yeast there are few examples of
For example, mutants that were sensitive to 6‑azauracil, computational methods to predict the phenotypic con-
a compound that is associated with defects in transcrip- sequences (including increased sensitivity to drugs or
tional elongation, were over-represented on the lateral to gene inhibition) of specific point mutations. Current
surface of the histones, which is the region that interacts approaches can predict, with reasonable accuracy, the
with DNA. The HistoneHits database111 is an example phenotypes that are associated with the complete loss
of the integration of results from multiple mutant phe- of gene function116,117. However, to our knowledge there
notyping screens of the same protein. This database are no computational tools that will either predict cases
provides a central repository of mutant–phenotype in which different mutations of the same gene result in
Pleiotropic
Pertaining to a gene that is
associations for histones. In addition to assessing the different phenotypic consequences or predict which of
associated with multiple agreement of phenotype–residue associations across the many phenotypes associated with a pleiotropic gene
distinct phenotypes. studies, it facilitates meta-analyses of these associations; are likely to be altered by a specific mutation. There are

REVIEWS
various techniques that rely on either structural informa- strain. However, given the rapidity with which viruses
tion or evolutionary conservation to predict whether a can evolve, it will be extremely informative to see how
mutation will have a significant effect on fitness (such different alleles of the same viral proteins can result in
as the PolyPhen‑2 program 118,119 (BOX 1)). However, changes to their interaction networks122.
these approaches typically annotate mutations as either As the available data on allele-specific interactions
neutral or deleterious, but they do not state which phe- increase, it will be important to have centralized data-
notypes the mutation will alter. In light of the results bases to store this information. Current interaction
presented above, the limitations of such a classification databases that integrate studies from multiple laboratories
become apparent. Histone proteins and the components and from multiple organisms tend to document interac-
of RNAPII are among the most highly conserved of all tions between whole proteins or whole genes123, whereas
eukaryotic proteins (the sequence identity of histones databases that report allelic interactions tend to be asso-
between humans and yeast ranges from 63% to 92%111), ciated with the screens from a specific laboratory 105,124 or
and consequently, one might reasonably expect most from a specific protein family 111. A centralized database
mutations in these proteins to be deleterious. However, would facilitate the types of meta-analyses that have
only ~10% of the histone point mutants are completely been carried out for standard interaction networks, and
inviable, whereas the remainder show severe growth it would also provide training data for computational
defects only under specific conditions. approaches to predict the network consequences of
specific mutations.
Conclusions The integrated analyses of protein–protein interaction
In this Review, we have highlighted computational and networks with genetic interaction networks have revealed
experimental approaches that seek to characterize the features that are not evident from studying either type
interactions and functions of genes and proteins at the of networks in isolation, and such analyses have offered
resolution of domains or residues. For most species, even an improved understanding of the relationship between
highly studied model organisms, the extant interaction these two networks58,61. One major goal is to extract from
networks are far from complete even at the protein– these networks mechanistic insights about the function
protein or genetic level. Consequently, it would seem to of individual pathways, protein complexes, proteins and
be overly ambitious to experimentally screen multiple even individual domains or residues in these proteins125.
variants of every gene or protein, at least using current As discussed above, structural information is often used
approaches. Such high-resolution analyses are likely to to help to interpret such networks; however, in the future,
be reserved for proteins that are of particular interest, it will be of great interest to determine how these types
either owing to their high conservation across species of data can ultimately be harnessed to inform structural
(such as histones and actin) or because of their associa- studies, especially those involving protein machines
tion with a particular disease (such as KRAS (BOX 4) and that are mutated in different disease states, which have
hungtingtin). For other proteins, it is likely that single been uncovered through large-scale genomic studies.
alleles will be screened and computational methods For these mutated machines, a deeper understanding at
will be required to characterize the functions of their both the biophysical and structural levels may be needed
sequence and structure in greater detail. to truly understand the underlying biology behind these
The mapping of host–pathogen interaction networks detrimental effects. Ultimately, the information from
is emerging as an important approach for understanding an integrated pipeline — from sequence to systems to
how pathogens hijack the machinery of the host cell120,121. structure — will be crucial in helping to develop targeted
So far, these studies have primarily focused on the inter- therapeutic strategies that could be genetic, chemical or
actions between the host and a specific viral or bacterial biological in nature.
1. Phillips, P. C. Epistasis – the essential role of 9. Kholodenko, B. N., Hancock, J. F. & Kolch, W. 18. Wodak, S. J., Pu, S., Vlasblom, J. & Seraphin, B.
gene interactions in the structure and evolution Signalling ballet in space and time. Nature Rev. Mol. Challenges and rewards of interaction proteomics.
of genetic systems. Nature Rev. Genet. 9, 855–867 Cell Biol. 11, 414–426 (2010). Mol. Cell Proteom. 8, 3–18 (2009).
(2008). 10. Choudhary, C. & Mann, M. Decoding signalling 19. von Mering, C. et al. Comparative assessment of
2. Sharan, R., Ulitsky, I. & Shamir, R. Network-based networks by mass spectrometry-based proteomics. large-scale data sets of protein–protein interactions.
prediction of protein function. Mol. Syst. Biol. 3, 88 Nature Rev. Mol. Cell Biol. 11, 427–439 (2010). Nature 417, 399–403 (2002).
(2007). 11. Ideker, T. & Krogan, N. J. Differential network biology. 20. Bader, G. D. & Hogue, C. W. Analyzing yeast protein–
3. Barabasi, A. L. Scale-free networks: a decade and Mol. Syst. Biol. 8, 565 (2012). protein interaction data obtained from different
beyond. Science 325, 412–413 (2009). 12. Linding, R. et al. Systematic discovery of in vivo sources. Nature Biotech. 20, 991–997 (2002).
4. Pawson, T. & Nash, P. Assembly of cell regulatory phosphorylation networks. Cell 129, 1415–1426 21. Cusick, M. E. et al. Literature-curated protein
systems through protein interaction domains. Science (2007). interaction datasets. Nature Methods 6, 39–46
300, 445–452 (2003). 13. Ward, L. D. & Kellis, M. Interpreting noncoding genetic (2009).
5. Beltrao, P. et al. Systematic functional prioritization of variation in complex traits and human disease. Nature 22. Dolinski, K., Chatr-Aryamontri, A. & Tyers, M.
protein posttranslational modifications. Cell 150, Biotech. 30, 1095–1106 (2012). Systematic curation of protein and genetic
413–425 (2012). 14. Yu, H. et al. High-quality binary protein interaction interaction data for computable biology. BMC Biol.
6. Abecasis, G. R. et al. A map of human genome map of the yeast interactome network. Science 322, 11, 43 (2013).
variation from population-scale sequencing. Nature 104–110 (2008). 23. Licata, L. et al. MINT, the molecular interaction
467, 1061–1073 (2010). 15. Tarassov, K. et al. An in vivo map of the yeast protein database: 2012 update. Nucleic Acids Res. 40,
7. Furey, T. S. ChIP–seq and beyond: new and improved interactome. Science 320, 1465–1470 (2008). D857–D861 (2012).
methodologies to detect and characterize protein– 16. Krogan, N. J. et al. Global landscape of protein 24. Chatr-Aryamontri, A. et al. The BioGRID interaction
DNA interactions. Nature Rev. Genet. 13, 840–852 complexes in the yeast Saccharomyces cerevisiae. database: 2013 update. Nucleic Acids Res. 41,
(2012). Nature 440, 637–643 (2006). D816–D823 (2013).
8. Karlebach, G. & Shamir, R. Modelling and analysis of 17. Gavin, A. C. et al. Proteome survey reveals modularity 25. Gomez, S. M., Noble, W. S. & Rzhetsky, A. Learning to
gene regulatory networks. Nature Rev. Mol. Cell Biol. of the yeast cell machinery. Nature 440, 631–636 predict protein–protein interactions from protein
9, 770–780 (2008). (2006). sequences. Bioinformatics 19, 1875–1881 (2003).

REVIEWS
26. Zhang, Q. C. et al. Structure-based prediction of 54. Lu, X., Kensche, P. R., Huynen, M. A. & 78. Dreze, M. et al. ‘Edgetic’ perturbation of a C. elegans
protein–protein interactions on a genome-wide scale. Notebaart, R. A. Genome evolution predicts genetic BCL2 ortholog. Nature Methods 6, 843–849 (2009).
Nature 490, 556–560 (2012). interactions in protein complexes and reveals cancer 79. Oren, M. & Rotter, V. Mutant p53 gain‑of‑function in
27. Jansen, R. et al. A Bayesian networks approach for drug targets. Nature Commun. 4, 2124 (2013). cancer. Cold Spring Harb. Perspect. Biol. 2, a001107
predicting protein–protein interactions from genomic 55. Pandey, G. et al. An integrative multi-network and (2010).
data. Science 302, 449–453 (2003). multi-classifier approach to predict genetic 80. Lim, J. et al. Opposing effects of polyglutamine
28. Wang, P. I. & Marcotte, E. M. It’s the machine that interactions. PLoS Comput. Biol. 6, e1000928 (2010). expansion on native protein complexes contribute to
matters: predicting gene function and phenotype 56. Folger, O. et al. Predicting selective drug targets in SCA1. Nature 452, 713–718 (2008).
from protein networks. J. Proteom. 73, 2277–2289 cancer through metabolic networks. Mol. Syst. Biol. 7, 81. Aloy, P. et al. Structure-based assembly of protein
(2010). 501 (2011). complexes in yeast. Science 303, 2026–2029 (2004).
29. Beltrao, P., Cagney, G. & Krogan, N. J. Quantitative 57. Chang, M., Bellaoui, M., Boone, C. & Brown, G. W. 82. Deng, M., Mehta, S., Sun, F. & Chen, T.
genetic interactions reveal biological modularity. Cell A genome-wide screen for methyl methanesulfonate- Inferring domain–domain interactions from
141, 739–745 (2010). sensitive mutants reveals genes required for S phase protein–protein interactions. Genome Res. 12,
30. Tong, A. H. et al. Global mapping of the yeast genetic progression in the presence of DNA damage. 1540–1548 (2002).
interaction network. Science 303, 808–813 (2004). Proc. Natl Acad. Sci. USA 99, 16934–16939 (2002). 83. Kim, P. M., Lu, L. J., Xia, Y. & Gerstein, M. B.
31. Haber, J. E. et al. Systematic triple-mutant analysis 58. Beyer, A., Bandyopadhyay, S. & Ideker, T. Integrating Relating three-dimensional structures to protein
uncovers functional connectivity between pathways physical and genetic maps: from genomes to networks provides evolutionary insights. Science 314,
involved in chromosome regulation. Cell Rep. 3, interaction networks. Nature Rev. Genet. 8, 699–710 1938–1941 (2006).
2168–2178 (2013). (2007). 84. Prieto, C. & De Las Rivas, J. Structural domain–
32. Collins, S. R., Roguev, A. & Krogan, N. J. Quantitative 59. Sharan, R. & Ideker, T. Modeling cellular machinery domain interactions: assessment and comparison
genetic interaction mapping using the E‑MAP through biological network comparison. Nature with protein–protein interaction data to improve the
approach. Methods Enzymol. 470, 205–231 (2010). Biotech. 24, 427–433 (2006). interactome. Proteins 78, 109–117 (2010).
33. Dixon, S. J. et al. Significant conservation of synthetic 60. Mitra, K., Carvunis, A. R., Ramesh, S. K. & Ideker, T. 85. Riley, R., Lee, C., Sabatti, C. & Eisenberg, D.
lethal genetic interaction networks between distantly Integrative approaches for finding modular structure Inferring protein domain interactions from databases
related eukaryotes. Proc. Natl Acad. Sci. USA 105, in biological networks. Nature Rev. Genet. 14, of interacting proteins. Genome Biol. 6, R89 (2005).
16653–16658 (2008). 719–732 (2013). 86. Wang, H. et al. InSite: a computational method for
34. Butland, G. et al. eSGA: E. coli synthetic genetic 61. Bandyopadhyay, S., Kelley, R., Krogan, N. J. & identifying protein–protein interaction binding sites
array analysis. Nature Methods 5, 789–795 Ideker, T. Functional maps of protein complexes from on a proteome-wide scale. Genome Biol. 8, R192
(2008). quantitative genetic interaction data. PLoS Comput. (2007).
35. Roguev, A., Wiren, M., Weissman, J. S. & Krogan, N. J. Biol. 4, e1000065 (2008). 87. Wang, X. et al. Three-dimensional reconstruction of
High-throughput genetic interaction mapping in the 62. Hillenmeyer, M. E. et al. Systematic analysis of protein networks provides insight into human genetic
fission yeast Schizosaccharomyces pombe. Nature genome-wide fitness data in yeast reveals novel gene disease. Nature Biotech. 30, 159–164 (2012).
Methods 4, 861–866 (2007). function and drug action. Genome Biol. 11, R30 This study integrates high-throughput protein–
36. Typas, A. et al. High-throughput, quantitative analyses (2010). protein interactions with three-dimensional
of genetic interactions in E. coli. Nature Methods 5, 63. Parsons, A. B. et al. Integration of chemical–genetic structures of interacting interfaces to interpret
781–787 (2008). and genetic interaction data links bioactive human disease-associated mutations.
37. Tong, A. H. et al. Systematic genetic analysis with compounds to cellular target pathways. Nature 88. Schuster-Bockler, B. & Bateman, A. Protein
ordered arrays of yeast deletion mutants. Science Biotech. 22, 62–69 (2004). interactions in human genetic diseases. Genome Biol.
294, 2364–2368 (2001). 64. Kapitzky, L. et al. Cross-species chemogenomic 9, R9 (2008).
38. Ryan, C. J. et al. Hierarchical modularity and the profiling reveals evolutionarily conserved drug mode 89. Alber, F. et al. The molecular architecture of the
evolution of genetic interactomes across species. of action. Mol. Syst. Biol. 6, 451 (2010). nuclear pore complex. Nature 450, 695–701 (2007).
Mol. Cell 46, 691–704 (2012). 65. Boxem, M. et al. A protein domain-based interactome This is a landmark example of the use of integrative
39. Babu, M. et al. Genetic interaction maps in network for C. elegans early embryogenesis. Cell 134, approaches to determine the structure of a
Escherichia coli reveal functional crosstalk among cell 534–545 (2008). complex macromolecule — in this case, the nuclear
envelope biogenesis pathways. PLoS Genet. 7, This is a large-scale fragment-based protein– pore complex that consists of 30 distinct proteins.
e1002377 (2011). protein interaction screen that identifies the 90. Lasker, K. et al. Molecular architecture of the 26S
40. Costanzo, M. et al. The genetic landscape of a cell. minimal regions of interaction for many proteasome holocomplex determined by an
Science 327, 425–431 (2010). interactions. integrative approach. Proc. Natl Acad. Sci. USA 109,
41. Roguev, A. et al. Conservation and rewiring of 66. Fromont-Racine, M., Rain, J. C. & Legrain, P. 1380–1387 (2012).
functional modules revealed by an epistasis map in Toward a functional analysis of the yeast genome 91. Campos, M., Nilges, M., Cisneros, D. A. &
fission yeast. Science 322, 405–410 (2008). through exhaustive two-hybrid screens. Nature Genet. Francetic, O. Detailed structural and assembly model
42. Breslow, D. K. et al. A comprehensive strategy 16, 277–282 (1997). of the type II secretion pilus from sparse data. Proc.
enabling high-resolution functional analysis of the 67. Goehler, H. et al. A protein interaction network links Natl Acad. Sci. USA 107, 13081–13086 (2010).
yeast genome. Nature Methods 5, 711–718 (2008). GIT1, an enhancer of huntingtin aggregation, to 92. Ward, A. B., Sali, A. & Wilson, I. A. Biochemistry.
43. Davierwala, A. P. et al. The synthetic genetic Huntington’s disease. Mol. Cell 15, 853–865 (2004). Integrative structural biology. Science 339, 913–915
interaction spectrum of essential genes. Nature Genet. 68. Guglielmi, B. et al. A high resolution protein (2013).
37, 1147–1152 (2005). interaction map of the yeast Mediator complex. 93. Finn, R. D., Marshall, M. & Bateman, A.
44. Mnaimneh, S. et al. Exploration of essential gene Nucleic Acids Res. 32, 5379–5391 (2004). iPfam: visualization of protein–protein interactions
functions via titratable promoter alleles. Cell 118, 69. LaCount, D. J. et al. A protein interaction network of in PDB at domain and amino acid resolutions.
31–44 (2004). the malaria parasite Plasmodium falciparum. Nature Bioinformatics 21, 410–412 (2005).
45. Bassik, M. C. et al. A systematic mammalian genetic 438, 103–107 (2005). 94. Punta, M. et al. The Pfam protein families database.
interaction map reveals pathways underlying ricin 70. Rain, J. C. et al. The protein–protein interaction map Nucleic Acids Res. 40, D290–301 (2012).
susceptibility. Cell 152, 909–922 (2013). of Helicobacter pylori. Nature 409, 211–215 (2001). 95. Berman, H. M. et al. The protein data bank. Nucleic
46. Horn, T. et al. Mapping of signaling networks through 71. Amberg, D. C., Basart, E. & Botstein, D. Defining Acids Res. 28, 235–242 (2000).
synthetic genetic interaction analysis by RNAi. Nature protein interactions with yeast actin in vivo. Nature 96. Yellaboina, S., Tasneem, A., Zaykin, D. V.,
Methods 8, 341–346 (2011). Struct. Biol. 2, 28–35 (1995). Raghavachari, B. & Jothi, R. DOMINE: a
47. Lin, Y. Y. et al. Functional dissection of lysine This is a pioneering study that highlights the use comprehensive collection of known and predicted
deacetylases reveals that HDAC1 and p300 regulate of integrating structural models with edgetic domain–domain interactions. Nucleic Acids Res. 39,
AMPK. Nature 482, 251–255 (2012). protein–protein interaction mapping. D730–735 (2011).
48. Roguev, A. et al. Quantitative genetic-interaction 72. Charloteaux, B. et al. Protein–protein interactions and 97. Reimand, J., Hui, S., Jain, S., Law, B. & Bader, G. D.
mapping in mammalian cells. Nature Methods 10, networks: forward and reverse edgetics. Methods Mol. Domain-mediated protein interaction prediction: from
432–437 (2013). Biol. 759, 197–213 (2011). genome to network. FEBS Lett. 586, 2751–2763
49. Laufer, C., Fischer, B., Billmann, M., Huber, W. & 73. Leanna, C. A. & Hannink, M. The reverse two-hybrid (2012).
Boutros, M. Mapping genetic interactions in system: a genetic scheme for selection against specific 98. Charles, G. M. et al. Site-specific acetylation mark on
human cancer cells with RNAi and multiparametric protein/protein interactions. Nucleic Acids Res. 24, an essential chromatin-remodeling complex promotes
phenotyping. Nature Methods 10, 427–431 3341–3347 (1996). resistance to replication stress. Proc. Natl Acad. Sci.
(2013). 74. Shih, H. M. et al. A positive genetic selection for USA 108, 10620–10625 (2011).
50. Lehner, B., Crombie, C., Tischler, J., Fortunato, A. & disrupting protein–protein interactions: identification 99. Fuchs, S. M., Kizer, K. O., Braberg, H., Krogan, N. J. &
Fraser, A. G. Systematic mapping of genetic of CREB mutations that prevent association with the Strahl, B. D. RNA polymerase II carboxyl-terminal
interactions in Caenorhabditis elegans identifies coactivator CBP. Proc. Natl Acad. Sci. USA 93, domain phosphorylation regulates protein stability of
common modifiers of diverse signaling pathways. 13896–13901 (1996). the Set2 methyltransferase and histone H3 di- and
Nature Genet. 38, 896–903 (2006). 75. Vidal, M., Brachmann, R. K., Fattaey, A., Harlow, E. & trimethylation at lysine 36. J. Biol. Chem. 287,
51. Byrne, A. B. et al. A global analysis of genetic Boeke, J. D. Reverse two-hybrid and one-hybrid 3249–3256 (2012).
interactions in Caenorhabditis elegans. J. Biol. 6, 8 systems to detect dissociation of protein–protein and 100. Morrison, A. J. et al. Mec1/Tel1 phosphorylation of
(2007). DNA–protein interactions. Proc. Natl Acad. Sci. USA the INO80 chromatin remodeling complex influences
52. Ryan, C., Greene, D., Cagney, G. & Cunningham, P. 93, 10315–10320 (1996). DNA damage checkpoint responses. Cell 130,
Missing value imputation for epistatic MAPs. BMC 76. Walhout, A. J. M. et al. Protein interaction mapping in 499–511 (2007).
Bioinformatics 11, 197 (2010). C. elegans using proteins involved in vulval 101. Mehta, M. et al. Individual lysine acetylations on the N
53. Wong, S. L. et al. Combining biological networks to development. Science 287, 116–122 (2000). terminus of Saccharomyces cerevisiae H2A.Z are
predict genetic interactions. Proc. Natl Acad. Sci. USA 77. Zhong, Q. et al. Edgetic perturbation models of human highly but not differentially regulated. J. Biol. Chem.
101, 15682–15687 (2004). inherited disorders. Mol. Syst. Biol. 5, 321 (2009). 285, 39855–39865 (2010).

REVIEWS
102. Wang, A. Y., Aristizabal, M. J., Ryan, C., Krogan, N. J. 120. Jager, S. et al. Global landscape of HIV-human protein 144. McDermott, U. et al. Identification of genotype-
& Kobor, M. S. Key functional regions in the histone complexes. Nature 481, 365–370 (2012). correlated sensitivity to selective kinase inhibitors
variant H2A.Z C‑terminal docking domain. Mol. Cell. 121. Shapira, S. D. et al. A physical and regulatory map by using high-throughput tumor cell line profiling.
Biol. 31, 3871–3884 (2011). of host-influenza interactions reveals pathways Proc. Natl Acad. Sci. USA 104, 19936–19941 (2007).
103. Kim, H. S. et al. An acetylated form of histone H2A.Z in H1N1 infection. Cell 139, 1255–1267 145. Muellner, M. K. et al. A chemical–genetic screen
regulates chromosome architecture in (2009). reveals a mechanism of resistance to PI3K inhibitors
Schizosaccharomyces pombe. Nature Struct. Mol. 122. Neveu, G. et al. Comparative analysis of virus–host in cancer. Nature Chem. Biol. 7, 787–793 (2011).
Biol. 16, 1286–1293 (2009). interactomes with a mammalian high-throughput 146. Dolma, S., Lessnick, S. L., Hahn, W. C. &
104. Haarer, B., Viggiano, S., Hibbs, M. A., protein complementation assay based on Gaussia Stockwell, B. R. Identification of genotype-selective
Troyanskaya, O. G. & Amberg, D. C. Modeling complex princeps luciferase. Methods 58, 349–359 (2012). antitumor agents using synthetic lethal chemical
genetic interactions in a simple eukaryotic genome: 123. Stark, C. et al. The BioGRID interaction database: screening in engineered human tumor cells. Cancer
actin displays a rich spectrum of complex 2011 update. Nucleic Acids Res. 39, D698–D704 Cell 3, 285–296 (2003).
haploinsufficiencies. Genes Dev. 21, 148–159 (2007). (2011). 147. Corcoran, R. B. et al. Synthetic lethal interaction of
105. Collins, S. R. et al. Functional dissection of protein 124. Koh, J. L. et al. DRYGIN: a database of quantitative combined BCL‑XL and MEK inhibition promotes tumor
complexes involved in yeast chromosome biology genetic interaction networks in yeast. Nucleic Acids regressions in KRAS mutant cancer models. Cancer
using a genetic interaction map. Nature 446, Res. 38, D502–507 (2010). Cell 23, 121–128 (2013).
806–810 (2007). 125. Fraser, J. S., Gross, J. D. & Krogan, N. J. From systems 148. Ding, Q. et al. A TALEN genome-editing system for
106. Zhang, Z., Shibahara, K. & Stillman, B. PCNA connects to structure: bridging networks and mechanism. generating human stem cell-based disease models.
DNA replication to epigenetic inheritance in yeast. Mol. Cell 49, 222–231 (2013). Cell Stem Cell 12, 238–251 (2013).
Nature 408, 221–225 (2000). 126. Alexandrov, L. B. et al. Signatures of mutational 149. Hockemeyer, D. et al. Genetic engineering of human
107. Ayyagari, R., Impellizzeri, K. J., Yoder, B. L., Gary, S. L. processes in human cancer. Nature 500, 415–421 pluripotent cells using TALE nucleases. Nature
& Burgers, P. M. A mutational analysis of the yeast (2013). Biotech. 29, 731–734 (2011).
proliferating cell nuclear antigen indicates distinct 127. Maher, M. C., Uricchio, L. H., Torgerson, D. G. & 150. Cong, L. et al. Multiplex genome engineering using
roles in DNA replication and DNA repair. Mol. Cell. Hernandez, R. D. Population genetics of rare variants CRISPR/Cas systems. Science 339, 819–823 (2013).
Biol. 15, 4420–4429 (1995). and complex diseases. Hum. Hered. 74, 118–128 151. Mali, P. et al. RNA-guided human genome engineering
108. Dai, J. et al. Probing nucleosome function: a highly (2012). via Cas9. Science 339, 823–826 (2013).
versatile library of synthetic histone H3 and H4 128. Gravel, S. et al. Demographic history and rare allele 152. Hillenmeyer, M. E. et al. The chemical genomic
mutants. Cell 134, 1066–1078 (2008). sharing among human populations. Proc. Natl Acad. portrait of yeast: uncovering a phenotype for all genes.
This paper describes both the systematic mutation Sci. USA 108, 11983–11988 (2011). Science 320, 362–365 (2008).
of every individual residue of two histone proteins 129. Abecasis, G. R. et al. An integrated map of genetic 153. Han, T. X., Xu, X. Y., Zhang, M. J., Peng, X. &
and the use of drug sensitivity screening to assess variation from 1,092 human genomes. Nature 491, Du, L. L. Global fitness profiling of fission yeast
the functional effects of these mutations. 56–65 (2012). deletion strains by barcode sequencing. Genome
109. Matsubara, K., Sano, N., Umehara, T. & Horikoshi, M. 130. Lander, G. C., Saibil, H. R. & Nogales, E. Go hybrid: Biol. 11, R60 (2010).
Global analysis of functional surfaces of core histones EM, crystallography, and beyond. Curr. Opin. Struct. 154. Simonis, N. et al. Empirically controlled mapping of the
with comprehensive point mutants. Genes Cells 12, Biol. 22, 627–635 (2012). Caenorhabditis elegans protein–protein interactome
13–33 (2007). 131. Russel, D. et al. Putting the pieces together: network. Nature Methods 6, 47–54 (2009).
110. Nakanishi, S. et al. A comprehensive library of histone integrative modeling platform software for structure 155. Guruharsha, K. G. et al. A protein complex network of
mutants identifies nucleosomal residues required for determination of macromolecular assemblies. Drosophila melanogaster. Cell 147, 690–703 (2011).
H3K4 methylation. Nature Struct. Mol. Biol. 15, PLoS Biol. 10, e1001244 (2012). 156. Giot, L. et al. A protein interaction map of Drosophila
881–888 (2008). 132. Bau, D. et al. The three-dimensional folding of the melanogaster. Science 302, 1727–1736 (2003).
111. Huang, H. L. et al. HistoneHits: a database for histone α-globin gene domain reveals formation of chromatin 157. Bakal, C. et al. Phosphorylation networks regulating
mutations and their phenotypes. Genome Res. 19, globules. Nature Struct. Mol. Biol. 18, 107–114 JNK activity in diverse genetic backgrounds. Science
674–681 (2009). (2011). 322, 453–456 (2008).
This paper reports a database that focuses on 133. Lasker, K., Topf, M., Sali, A. & Wolfson, H. J. 158. Hu, P. et al. Global functional atlas of Escherichia coli
a specific protein family (histones) and that Inferential optimization for simultaneous fitting of encompassing previously uncharacterized proteins.
integrates the results of phenotyping screens multiple components into a CryoEM map of their PLoS Biol. 7, e96 (2009).
of point mutants from several laboratories. It assembly. J. Mol. Biol. 388, 180–194 (2009). 159. Havugimana, P. C. et al. A census of human soluble
provides an interactive structure on which residues 134. Kaelin, W. G. Jr. The concept of synthetic lethality in protein complexes. Cell 150, 1068–1081 (2012).
that are associated with specific phenotypes can the context of anticancer therapy. Nature Rev. Cancer 160. Stelzl, U. et al. A human protein–protein interaction
be highlighted. 5, 689–698 (2005). network: a resource for annotating the proteome.
112. Braberg, H. et al. From structure to systems: 135. Ashworth, A., Lord, C. J. & Reis-Filho, J. S. Cell 122, 957–968 (2005).
high-resolution, quantitative genetic analysis of Genetic interactions in cancer progression and 161. Rual, J. F. et al. Towards a proteome-scale map of the
RNA polymerase II. Cell 154, 775–788 (2013). treatment. Cell 145, 30–38 (2011). human protein–protein interaction network. Nature
This study reports the functional dissection of RNA 136. Barbie, D. A. et al. Systematic RNA interference 437, 1173–1178 (2005).
polymerase II by genetic interaction profiling of reveals that oncogenic KRAS-driven cancers require
point mutants from multiple distinct subunits; it TBK1. Nature 462, 108–112 (2009). Acknowledgements
shows that the mutation of residues that are on 137. Cheung, H. W. et al. Systematic investigation of The authors thank G. Cagney, D. Fitzpatrick and C. Maher for
distinct subunits but that are close together in the genetic vulnerabilities across cancer cell lines their comments and feedback. They also thank M. Shales and
three-dimensional structure have similar genetic reveals lineage-specific dependencies in ovarian H. Braberg for suggestions and assistance with figures
interaction profiles. cancer. Proc. Natl Acad. Sci. USA 108, and K. Lasker for assistance with Box 3. C.J.R. is supported
113. Alber, F., Forster, F., Korkin, D., Topf, M. & Sali, A. 12372–12377 (2011). by ICON Plc and the University College Dublin Newman
Integrating diverse data for structure determination of 138. Krastev, D. B. et al. A systematic RNAi synthetic Fellowship Programme; P.C. is supported by a Howard
macromolecular assemblies. Annu. Rev. Biochem. 77, interaction screen reveals a link between p53 and Hughes Predoctoral Fellowship; A.S. is supported by the
443–477 (2008). snoRNP assembly. Nature Cell Biol. 13, 809–818 National Institutes of Health (R01 GM083960, U54
114. Hietpas, R., Roscoe, B., Jiang, L. & Bolon, D. N. (2011). RR022220, U54 GM094662, P01 AI091575, and U01
Fitness analyses of all possible point mutations 139. Luo, J. et al. A genome-wide RNAi screen identifies GM098256); N.J.K. is supported by the US National
for regions of genes in yeast. Nature Protoc. 7, multiple synthetic lethal interactions with the Ras Institutes of Health (P50 GM082250, R01 GM084448,
1382–1396 (2012). oncogene. Cell 137, 835–848 (2009). P01 AI090935, P50G M081879, R01 GM098101, R01
115. Roscoe, B. P., Thayer, K. M., Zeldovich, K. B., This is a genome-wide RNAi screen of isogenic G M 0 8 4 2 7 9 a n d P 01 A I 0 91 5 7 5 ) a n d t h e
Fushman, D. & Bolon, D. N. Analyses of the effects human cell lines to identify genes that are D e f e n s e Ad va n c e d R e s e a rc h P ro j e c t s A g e n c y
of all ubiquitin point mutants on yeast growth rate. synthetically lethal with a specific oncogenic (DARPA‑10‑93‑Prophecy-PA‑008).
J. Mol. Biol. 425,1363–1377 (2013). mutation.
116. McGary, K. L., Lee, I. & Marcotte, E. M. Broad 140. Wang, Y. et al. Critical role for transcriptional Competing interests statement
network-based predictability of Saccharomyces repressor Snail2 in transformation by oncogenic The authors declare no competing interests.
cerevisiae gene loss‑of‑function phenotypes. Genome RAS in colorectal carcinoma cells. Oncogene 29,
Biol. 8, R258 (2007). 4658–4670 (2010).
117. Lee, I. et al. Predicting genetic modifier loci using 141. Miller, J. P. et al. A genome-scale RNA-interference FURTHER INFORMATION
functional gene networks. Genome Res. 20, screen identifies RRAS signaling as a pathologic BioGRID: http://thebiogrid.org
1143–1153 (2010). feature of Huntington’s disease. PLoS Genet. 8, DRYGIN: http://drygin.ccbr.utoronto.ca/
118. Adzhubei, I. A. et al. A method and server for e1003042 (2012). HistoneHits: http://histonehits.org
predicting damaging missense mutations. Nature 142. Barretina, J. et al. The Cancer Cell Line Encyclopedia iPfam: http://ipfam.sanger.ac.uk/
Methods 7, 248–249 (2010). enables predictive modelling of anticancer drug Krogan lab interactome database: http://interactome-cmp.
119. Kumar, P., Henikoff, S. & Ng, P. C. Predicting the sensitivity. Nature 483, 603–607 (2012). ucsf.edu/
effects of coding non-synonymous variants on protein 143. Garnett, M. J. et al. Systematic identification of Yeast fitness database: http://fitdb.stanford.edu/
function using the SIFT algorithm. Nature Protoc. 4, genomic markers of drug sensitivity in cancer cells. ALL LINKS ARE ACTIVE IN THE ONLINE PDF
1073–1081 (2009). Nature 483, 570–575 (2012).

REVIEWS
NON-CODING RNA
Gene regulation by antisense

transcription
Vicent Pelechano1 and Lars M. Steinmetz1–3
Abstract | Antisense transcription, which was initially considered by many as
transcriptional noise, is increasingly being recognized as an important regulator of gene
expression. It is widespread among all kingdoms of life and has been shown to influence —
either through the act of transcription or through the non-coding RNA that is produced —
almost all stages of gene expression, from transcription and translation to RNA
degradation. Antisense transcription can function as a fast evolving regulatory switch and
a modular scaffold for protein complexes, and it can ‘rewire’ regulatory networks. The
genomic arrangement of antisense RNAs opposite sense genes indicates that they might
be part of self-regulatory circuits that allow genes to regulate their own expression.
In recent years, our view of RNA has markedly changed previously been reviewed for bacteria15,16, plants17 and
— from regarding these molecules solely as interme- humans18. Therefore, we go beyond particular species
diates of genetic information to appreciating their or taxonomic groups to discuss the diverse biological
variety of functions that are independent of their protein- roles of antisense transcription, as well as its implications
coding potential 1–7. One of the best-characterized on gene regulation, genome architecture and evolution.
non-coding RNAs (ncRNAs) that mediate gene regula- We begin by discussing the characteristics of anti-
tion is X-inactive specific transcript (XIST), which has sense transcripts and how they can be identified, and
a key role in mammalian X chromosome inactivation2. review the advances and challenges in the genome-wide
More recently, the development of high-throughput characterization and functional annotation of these tran-
approaches has revealed pervasive transcription in all scripts. We then describe different mechanisms of gene
genomes that have been investigated so far. This phenom- regulation by antisense transcription and the biological
enon produces numerous types of previously unknown effects of such regulation. We discuss how antisense
ncRNAs and challenges our traditional definitions of RNAs may have advantages over other gene regulators
genes and functional regions of the genome8–11. Some (such as transcription factors) for integrating multiple
classes of short ncRNAs (<200 nucleotides in length) are kinds of regulatory signals, establishing on–off (that is,
already accepted as fundamental players in gene regu- bistable) switches and even ‘rewiring’ gene regulatory
lation; these include small interfering RNAs (siRNAs), networks. We end by discussing the evolutionary impli-
1
Genome Biology Unit, microRNAs (miRNAs) and PIWI-interacting RNAs cations of antisense transcription and its consequences
European Molecular (piRNAs) (reviewed in REF. 12). However, we have only for genome organization.
Biology Laboratory, begun to understand the functions of the vast majority of
Meyerhofstrasse 1, 69117
Heidelberg, Germany.
long ncRNAs (>200 nucleotides in length)10,11,13,14. Characteristics and expression
2
Department of Genetics, In this Review, we focus on antisense transcripts, Antisense transcripts were initially discovered in bacte-
Stanford University School which is a class of long ncRNAs. Antisense tran- ria more than 30 years ago19; soon after this, examples
of Medicine, Stanford, scripts are transcribed from the strand opposite to that were found in eukaryotes20. Only with the introduc-
California 94305, USA.
of the sense transcript of either protein-coding or non- tion of genomic approaches less than 10 years ago did
3
Stanford Genome Technology
Center, 855 California Ave, protein-coding genes. Here, we refer to the originally it become apparent that antisense transcripts are wide-
Palo Alto, California 94305, annotated transcript as the sense transcript and the spread throughout the genomes of a range of species21–23.
USA. more recently identified one on the opposite strand as Notably, more than 30% of annotated transcripts in
Correspondence to L.M.S. the antisense transcript. The study of gene regulation humans have antisense transcription24. However, anti-
e‑mail: larsms@embl.de
doi:10.1038/nrg3594
by antisense transcription is particularly intriguing, as sense transcripts are generally low in abundance and are,
Published online their genomic arrangement immediately indicates that on average, more than 10‑fold lower in abundance than
12 November 2013 they may act on each other. Antisense transcripts have sense expression24.

REVIEWS
In contrast to protein-coding mRNAs, which accu- Expression of antisense transcripts. Antisense transcripts
mulate in the cytoplasm25, antisense transcripts pref- arise from promoters, and their expression is often
erentially accumulate in the nucleus26. However, some subject to similar regulation as for other genes. They can
antisense transcripts have been found to be associated arise from independent promoters, bidirectional pro-
with chromatin27,28 and in a range of distinct locations, moters of divergent transcription units30,34–37 or cryptic
including the mitochondria and the cytoplasm25. promoters38–41 (BOX 1). In gene-dense regions, promoter
The expression of some antisense transcripts is bidirectionality can give rise to a large proportion of
linked to the activity of neighbouring genes29,30, whereas antisense transcripts; for example, in yeast, most anti-
many others have distinct expression patterns during sense transcripts seem to originate from bidirectional
different processes, such as cellular differentiation and promoters34,35. Promoter bidirectionality, which, until
cancer progression31, in different environmental condi- recently, was considered exceptional, has been found
tions or on different genetic backgrounds29. Although to be widespread in species that range from yeast 34,35 to
the apparently low fidelity of transcription initiation humans30,36,37, although the degree of bidirectionality is
suggests that some antisense transcripts arise from species-dependent; for example, low levels of bidirec-
transcriptional noise32, it is clear that many others carry tionality are observed in Drosophila melanogaster 42. The
out specific functions2–7,33. Before discussing the func- bidirectional activity of each promoter is influenced by
tional potential of antisense transcripts, it is necessary other factors such as the three-dimensional organiza-
to understand how they are generated and their intrinsic tion of chromatin43 and the density of polyadenylation
characteristics. signals that surround the promoter 44,45. Finally, some
Box 1 | Classification of antisense transcripts

Antisense transcripts can be classified according to different criteria, such as their origin, genomic orientation,
mode of action, length, stability and even the species in which they are expressed. These transcripts have been
found to originate from independent promoters, shared bidirectional promoters or cryptic promoters that are
situated within genes (see the figure). According to their orientation with respect to sense genes, they can be
further classified as head‑to‑head, tail‑to‑tail or internal (that is, when they are fully covered by the sense transcripts).
Antisense transcripts can exert their function locally, distally, in cis or in trans, and they can also function in multiple
subcellular compartments. Cis-acting mechanisms of these transcripts can act either locally (for example, in
promoter–gene interactions) or distally (for example, in enhancer–gene interactions). Trans-acting mechanisms can
also act either locally (for example, antisense transcripts affecting the allele from which they originated and/or any
additional allele) or distally (for example, antisense transcripts affecting other genes). Moreover, antisense
transcripts can be classified into short (<200 nucleotides) and long (>200 nucleotides) non-coding RNAs (ncRNAs),
and stable or unstable RNAs.
Short ncRNAs are accepted as fundamental players in gene regulation. Although they are widespread among
eukaryotes, relevant differences exist among species; for example, PIWI-interacting RNAs (piRNAs) are found in
animals but not in plants or fungi (reviewed in REF. 12). In this Review, we focus on the much less studied long
antisense ncRNAs. Species-specific differences in mechanisms of action might be expected when these mechanisms
depend on an accessory machinery, such as the RNA interference machinery, that is not present in all species. As an
example, the pairing of sense–antisense transcripts and their consequent degradation by RNase III in Gram-positive
Transcriptional noise
bacteria are not seen in Gram-negative bacteria, which suggests a different processing pattern of double-stranded
Random fluctuations that are
intrinsic to the gene expression RNAs108. Similarly, any effect of an antisense transcript that is mediated by DNA methylation75,76 is not expected to
process and that cause function in Saccharomyces cerevisiae, in which the appropriate DNA methylation machinery is lacking125. However,
differences in the levels of mechanisms of action that are based on the general and highly conserved transcription machinery — for example,
specific RNAs among cells transcriptional interference by chromatin modifications — are more likely to be conserved across species. TABLE 1
in a clonal population. provides several examples of antisense transcripts with mechanisms that are found in multiple species. So far, few
direct comparative studies have been done. It would be interesting to carry out systematic comparative studies that
Cryptic promoters focus on the commonalities and differences of each particular mechanism between species.
Weak promoters, the use of
which is associated with
disruption of chromatin Shared
structure. Transcripts produced Cryptic bidirectional
from such promoters often
promoter promoter Independent
Sense
have unknown functions. promoter
DNA
Head-to‑head
Antisense
Pertaining to two transcripts
that are divergently oriented Cis-acting Trans-acting
and have overlapping
5′ regions.
Nucleus
Tail‑to‑tail Cytoplasm
Pertaining to two transcripts
that are convergently
oriented and have AAAAAA
overlapping 3′ regions. AAAAAA


REVIEWS
antisense transcripts originate from cryptic promoters approaches are limited by the overlap of antisense tran-
that are situated within the transcribed region of their scripts with the sense gene, the low expression levels of
sense gene38–40 or even from the termination region of these transcripts24,25,29,35 and their limited evolutionary
the sense gene46. conservation58–63.
The expression of antisense transcripts is also subject
to regulation at the level of RNA stability; for example, Identification of antisense transcripts. The initial discov-
many antisense transcripts in budding yeast are cryptic ery of pervasive antisense transcription21–23 was met with
unstable transcripts, which are targeted for early degra- justifiable scepticism64. The identification of such tran-
dation by the nuclear exosome34,35. This control is used scripts is technically challenging, as it requires strand-
in meiosis in budding yeast during which the nuclear specific approaches65,66 (BOX 2). Therefore, many studies
exosome is downregulated47. Other degradation path- initially mistakenly identified antisense transcripts as
ways also affect antisense expression; for example, one sense transcripts. Antisense transcripts and their struc-
class of antisense ncRNAs called XRN1‑sensitive unsta- ture can be directly studied either using strand-specific
ble transcripts (XUTs) are specific targets of the XRN1 quantification of RNA abundance29,35,48,54,55,65,66 or by
cytoplasmic 5ʹ→3′ RNA exonuclease48, whereas other capturing the process of active transcription using tech-
long ncRNAs are controlled by 5′ decapping activity 49. niques such as global run-on sequencing (GRO-seq)37
In addition to the generation of antisense transcripts and native elongating transcript sequencing (NET-seq)67
by RNA polymerases using DNA as a template, they can (BOX 2). Antisense expression can also be indirectly
also originate from RNA-dependent RNA polymerase detected through its consequences on chromatin modi-
(RdRP) activity 50,51. Specifically, it has been proposed fication states. These indirect measures are independent
that, in humans, some antisense transcripts that contain of the stability or the abundance of antisense transcripts,
non-genomically encoded polyuracil stretches are gen- as they measure the effects of the process of antisense
erated using mRNAs as templates52. However, further transcription rather than the transcripts themselves. The
research is needed to determine the extent and relevance analyses of chromatin signatures have been powerful for
of RdRP activity in vivo and to understand the biological detecting long intergenic ncRNAs68, but for antisense
implications of such RNAs. transcripts, the effectiveness of this approach is limited
by the lack of strand specificity of chromatin modifi-
Structure of antisense transcripts. Aside from their cation and the generally higher expression levels of the
antisense orientation, antisense transcripts do not sense transcript.
have unique biochemical features. In general, they lack
protein-coding potential, as their sequence is con- Functional analyses of antisense transcripts. The func-
strained by the overlapping sense transcript. However, tion of an antisense transcript can be mediated by
there are many examples of pairs of sense and antisense either the transcript itself or the act of its transcription.
transcripts that only partially overlap and that are both Additionally, it is possible to distinguish between func-
functional mRNAs with protein-coding activity 53–55. tional effects that are exerted by antisense expression in
In general, the untranslated regions (UTRs) of these cis (that is, those that affect alleles on the DNA strand
mRNAs, which are devoid of coding potential, overlap from which they are produced, usually locally) and
with neighbouring genes to mediate functions that are those that are exerted in trans (that is, those that affect
similar to those of other ncRNAs11,16,53. Such antisense alleles on different DNA strands) (BOX 1). In cases of
transcripts are especially common in organisms that antisense effects in trans, the interpretation is usually
have compact genomes16,54,55; for example, in budding that antisense transcription exerts its effects through the
yeast, up to 40% of genes have antisense transcripts that RNA molecules that are transcribed. By contrast, anti-
terminate within their 3′ UTRs54. sense effects in cis are often assumed to be due to the act
Independent of protein-coding activity, antisense of antisense transcription. However, neither assumption
transcripts are similar to other long ncRNAs in that they is strictly correct. Regions of antisense transcription can
can contain specific domains that interact with DNA, interact with other loci through the three-dimensional
RNA or proteins10,11. The intrinsic flexibility of RNA organization of chromatin, which can mediate trans
molecules to evolutionarily rearrange their sequence effects; similarly, antisense transcripts can remain at
has led to the suggestion that antisense transcripts, and the locations of their synthesis (such as through stalled
long ncRNAs in general, can form flexible modular scaf- polymerases, R‑loops or triple helices), which allows
folds in which different domains that interact with DNA, the RNA to exert its function in cis. Although many
RNA or proteins are combined to form specific func- examples of trans-acting antisense transcripts have been
tional complexes10,11,56. For example, antisense transcripts described3,4,69,70, the fact that both antisense and sense
can recognize their reverse-complementary sequence transcripts are transcribed from the same region suggests
in RNA or DNA and can also carry protein-binding that antisense transcripts function more frequently in cis
domains to modulate gene expression2,3,57. than other ncRNAs that commonly function in trans10.
A classic approach to determine the function of a
Identification and functional analyses gene is to perturb its expression, followed by pheno-
High-throughput approaches have become instrumen- typic analysis. However, the genomic arrangement of
tal for both the identification and the functional char- antisense transcripts makes it difficult to perturb anti-
acterization of antisense transcripts. Nonetheless, these sense expression without also affecting sense expression,

REVIEWS
Box 2 | Genomic techniques for studying antisense expression

following environmental perturbations has proven
promising 38,71. These studies involve measuring genome-
Since the discovery of the widespread transcription of non-coding RNAs (ncRNAs), wide gene expression levels at different time points and
there has been much debate about how much of this transcription is real and how comparing the transcriptional responses of cells with
much is simply a result of experimental artefacts64. Initial studies had difficulty in and without the key components that are involved in
distinguishing between bona fide antisense transcripts and artefacts that are derived
gene regulation by antisense transcription (for example,
from their overlapping sense transcripts, which are usually expressed at higher levels.
In recent years, several approaches have been developed to measure strand-specific
the histone methyltransferases SET1 (REF. 71) and SET3
transcription and to minimize experimental artefacts. Furthermore, new genome-wide (REF. 38); see below) to determine the function of these
techniques are delivering promising insights into RNA localization, single-cell key components. Another interesting approach is to use
expression and chromatin binding. allele-specific measurement of gene expression to deter-
Transcript abundance mine whether an antisense transcript can function in
Most techniques that are used for measuring RNA abundance (for example, trans to affect a distant sense allele72. At the moment,
quantitative PCR or RNA sequencing) require the production of cDNA molecules by however, the detailed molecular dissection of specific
reverse transcription as their first step. However, the inherent ability of reverse cases has been the most fruitful way of understand-
transcriptase to use either RNA or DNA as a template can result in the production of ing the functional consequences of gene regulation by
artefactual double-stranded cDNA, leading to false-positive identification of antisense antisense transcription.
transcripts. This can be solved by adding actinomycin D65, a drug that specifically
inhibits the DNA-dependent DNA polymerase activity of reverse transcriptase. Many Mechanisms of gene regulation
strand-specific protocols that minimize the false-positive identification of antisense
The orientation, stability, subcellular localization and
transcripts have been developed65,66. In the future, the direct sequencing of RNA
molecules24 promises to solve problems that are derived from sample preparation by inherent features, such as sequence or secondary struc-
eliminating the need to produce cDNA. ture, of antisense transcripts can all affect their mecha-
nisms of action. Antisense transcripts, or the act of their
Strand-specific measurements of RNA polymerases
Chromatin immunoprecipitation is a tool that is commonly used to study the transcription, can affect almost all stages of the gene
transcription machinery, but as it enriches for double-stranded DNA fragments that expression process. Here, we discuss the different steps
are associated with proteins, it lacks strand-specificity. Alternative methodologies can of gene expression that antisense expression affects,
be used to map both the position and the orientation of RNA polymerase. Global including transcriptional initiation, co‑transcriptional
run-on sequencing (GRO-seq)37 measures the presence and orientation of active processes and post-transcriptional processes (TABLE 1).
polymerases that are capable of run‑on elongation, whereas native elongating In addition, it should be noted that, similarly to how
transcript sequencing (NET-seq)67 allows the measurement of nascent transcripts from siRNAs repress gene expression in both the nucleus and
engaged RNA polymerases and is independent of their ability to elongate. In both the cytoplasm (reviewed in REFS 9,12), antisense tran-
approaches, the RNAs produced by these polymerases are sequenced, thereby
scripts can also simultaneously function at different
achieving strand-specificity.
stages of the gene expression process. For example, the
Subcellular localization antisense transcript to the Ty1 retrotransposon in bud-
The application of strand-specific techniques to RNAs that are derived from different
ding yeast silences Ty1 transcription in trans through
subcellular compartments25 (for example, the cytoplasm and the nucleus) helps to
chromatin modification70 and simultaneously controls
define different subpopulations of antisense transcripts, and thereby provides clues
about their potential mechanisms of action (for example, those affecting translation its retrotransposition post-transcriptionally 73.
are expected to be present in the cytoplasm).
Single-cell studies
Effects on transcription initiation. Antisense expression
Single-molecule fluorescence in situ hybridization approaches115 allow individual RNA can affect transcription initiation through transcrip-
molecules to be measured. As these methods are applied to single cells, they are also tional interference, in which one act of transcription
informative about transcriptional noise. Analyses of transcriptional noise will be further negatively affects a second one in cis 74. This has been
supported by the refinement of single-cell transcriptomics approaches. shown to occur by promoter competition (that is, when
Chromatin binding of ncRNAs the assembly of the transcription machinery at one
Genome-wide mapping of chromatin-binding sites of ncRNA molecules27,28 will help to promoter physically prevents the assembly at the sec-
expand the identification of their targets and to characterize their modes of action. ond one), by occlusion of binding sites due to the pas-
sage of RNA polymerase or even by chromatin or DNA
modifications74.
In particular, antisense expression has been shown
and this approach has only succeeded in isolated exam- to regulate transcription initiation by affecting DNA
CpG islands
Genomic regions that contain
ples29,69. The fact that most antisense transcripts func- methylation — the process by which specific cytosines
a high frequency of CG tion in cis also makes it difficult to use genome-wide are methylated, for example, at CpG islands in mam-
dinucleotides; they are often approaches, such as systematic knockdown 56 and malian promoters, which usually leads to their long-
associated with mammalian overexpression, to identify putative functions of these term repression75. One example is the repression of the
promoters and are targets
transcripts. Approaches that are based on guilt‑by‑ haemoglobin α1 gene (HBA1) in patients with a class of
of cytosine methylation.
association, in which transcripts are linked to specific α‑thalassemia76. In this case, an aberrant LUC7L (puta-
Gene imprinting biological processes on the basis of common expression tive RNA-binding protein Luc7‑like) transcript runs
An epigenetic process by which patterns across cell types and tissues10,68, are difficult to antisense into the HBA1 locus and methylates its pro-
the expression of each allele of apply to antisense transcripts because their expression moter CpG island, which silences HBA1 expression and
a gene depends on its parent
of origin; for example, on
can be affected by the sense transcript 29. To attempt to consequently causes disease (FIG. 1a). Antisense transcrip-
whether it is the paternal disentangle the causes and consequences of antisense tion has also been implicated in gene imprinting in mice77,
or maternal allele. transcription, the dynamic analysis of transcription in which the transcription of an antisense transcript

REVIEWS
Table 1 | Examples of functional antisense transcription across all kingdoms of life

Mechanism Antisense Effects Species Refs
of action locus
DNA LUC7L Methylates HBA1 promoter CpG island, which represses its Humans 76
methylation expression
Airn Regulates Igf2r imprinting by DNA methylation Mice 77,78
Chromatin XIST and Inactivates X chromosome gene expression Mammals 2

modifications TSIX
ANRIL Represses the tumour suppressor locus CDKN2B–CDKN2A Humans 57,
by both histone H3 lysine 27 (H3K27) methylation and DNA 80
methylation
BDNF‑AS Represses BDNF by histone modification Mammals 81
HOTAIR Silences the HOXD locus in trans by the recruitment of Humans 3

Polycomb proteins
COOLAIR Represses FLC sense gene by H3K4 demethylation and recruits Plants 85,
Polycomb proteins, which increase H3K27me3 levels 86
COLDAIR Antisense to COOLAIR; represses FLC sense gene by the Plants 88

recruitment of Polycomb proteins
AS to Represses PHO84 by histone deacetylation both in cis and S. cerevisiae 4,69
PHO84 in trans
RTL Silences transcription of the Ty1 retrotransposon in trans S. cerevisiae 70,
through chromatin modification and post-transcriptionally 73
controls its retrotransposition
Transcriptional RME2 Represses IME4 by transcriptional interference in cis and S. cerevisiae 5,99
interference functions after transcription initiation of IME4
Isoform ZEB2‑AS Induces exon skipping in ZEB2, which produces an alternative Humans 7
variation isoform that has increased translation efficiency
Translation AS to Uchl1 Increases translation efficiency of Uchl1 using a SINEB2 domain Mice 33
efficiency
SymR Decreases translation efficiency of SymE by competing with Enterobacteria 6
binding of the 30S ribosome
RNA stability BACE1‑AS Increases stability of BACE1 by masking an microRNA-binding Humans 105,
site 106
WDR83 Increase their mutual stability by forming a duplex within their Humans 53
and DHPS 3′ untranslated regions
Airn, antisense Igf2r RNA; ANRIL, antisense non-coding RNA in the INK4 locus; AS, antisense; BACE1, β‑site APP-cleaving enzyme 1;
BDNF, brain-derived neurotrophic factor; COLDAIR, COLD-ASSISTED INTRONIC NON-CODING RNA; COOLAIR, COLD-INDUCED
LONG ANTISENSE INTRAGENIC RNA; DHPS, deoxyhypusine synthase; FLC, FLOWERING LOCUS C; HBA1, haemoglobin α1; HOTAIR,
HOX transcript antisense RNA; HOXD, homeobox D; Igf2r, insulin-like growth factor 2 receptor; IME4, inducer of meiosis; LUC7L,
Putative RNA-binding protein Luc7‑like 1; PHO84, phosphate metabolism; RME2, regulator of meiosis 2; RTL, antisense to LTR (long
terminal repeat); S. cerevisiae, Saccharomyces cerevisiae; SymE, SOS-induced yjiW gene with similarity to MazE; TSIX, XIST antisense
RNA; Uchl1, ubiquitin carboxy-terminal hydrolase L1; WDR83, WD repeat domain 83; XIST, X inactive-specific transcript; ZEB2,
zinc-finger E‑box-binding homeobox 2.
Airn (antisense to insulin-like growth factor 2 receptor chromosome, other antisense transcripts silence spe-
(Igf2r)), and not the Airn transcript itself, represses Igf2r cific loci. For example, ANRIL (antisense ncRNA in the
by both transcriptional interference and DNA methyla- INK4 locus; also known as CDKN2B‑AS1), the expres-
tion78. However, antisense transcription can also have sion of which is increased in prostate cancer, mediates
activating effects by protecting promoters from de novo the specific repression of the tumour suppressor locus
methylation18 through R‑loop formation, which involves CDKN2B–CDKN2A57, which encodes p15 (also known
DNA–RNA hybrids, during transcription79. as INK4B), p14 (also known as ARF) and p16 (also
Antisense expression can also control transcription known as INK4A). Specifically, the nascent antisense
initiation by affecting histone modifications. A classic transcript recruits PRC2 in cis, which induces histone
example is mammalian X chromosome inactivation, in H3 lysine 27 methylation (H3K27me) and thus represses
which the long ncRNA XIST spreads over one copy of transcription from this locus57 (FIG. 1b). Interestingly,
the X chromosome and recruits repressive chromatin- heterochromatin formation that is induced by ANRIL
remodelling complexes, such as Polycomb repres- also leads to promoter DNA methylation after cellular
sive complex 2 (PRC2). In mice, the action of Xist differentiation80. The specific inhibition of cis-repressing
is antagonized in cis by its own antisense transcript antisense transcripts also holds promise as a therapeutic
— X (inactive)-specific transcript, opposite strand tool to increase the expression of specific target genes81,82.
(Tsix) 2. Whereas XIST and TSIX affect the whole Specifically, it has been shown in mammals that the

REVIEWS
a Humans b Humans
DNA
methylation HBA1 CDKN2B CDKN2A
H3K27me
ANRIL RNA pol
Aberrant transcript LUC7L DNA
methylation Histone modification
PRC2
c Humans
HOXC
d Budding yeast
HOTAIR H3K4me2
PRC2
Cryptic promoter
HOXD H3K27me Set3

Histone modification Histone modification
Figure 1 | Effects of antisense expression on transcription initiation. a | Aberrant transcriptional extension

Nature of the
Reviews LUC7L
| Genetics
(putative RNA-binding protein Luc7‑like) locus produces an antisense transcript that overlaps with the haemoglobin α1
gene (HBA1), which methylates the HBA1 promoter and represses its expression76. b | The nascent antisense transcript
ANRIL (antisense non-coding RNA in the INK4 locus) recruits Polycomb repressive complex 2 (PRC2) in cis, which induces
histone H3 lysine 27 methylation (H3K27me). This represses transcription of the tumour suppressor locus CDKN2B–
CDKN2A57 and also leads to long-term promoter DNA methylation at this locus80. c | HOX transcript antisense RNA
(HOTAIR) silences the homeobox D (HOXD) locus in trans through PRC2 recruitment3. d | Antisense transcripts that
originate from internal cryptic promoters can modify the chromatin of their associated sense promoters by depositing
H3K4me2, thereby modulating the binding of the histone deacetylase Set3 and gene expression dynamics38.
targeted degradation of BDNF‑AS — the antisense tran- repress FLC expression. One of these pathways involves
script of brain-derived neurotrophic factor (BDNF) — the use of a proximal polyadenylation site upon cold
increases BDNF expression, probably by decreasing the treatment, which increases H3K27me3 levels through
repressive chromatin marks that are deposited by the function of Polycomb proteins85. The other pathway
the antisense transcript81. involves specific RNA-binding proteins that promote the
Antisense transcripts can also mediate chromatin use of the proximal polyadenylation site, which causes
modifications in trans. The best-known example is HOX local histone H3K4 demethylation86. In this case, multiple
transcript antisense RNA (HOTAIR) in mammals, which ncRNA-based mechanisms interact, in that COOLAIR
is an antisense transcript to the homeobox C (HOXC) transcription is limited by the specific stabilization of an
locus and a predictor of both metastasis and death when R‑loop over its promoter 87 and that another transcript,
it is expressed at high levels in primary breast tumours83. COLD-INDUCED LONG ANTISENSE INTRAGENIC
HOTAIR silences the HOXD locus in trans through RNA (COLDAIR; antisense to COOLAIR), can also
the recruitment of Polycomb proteins3 (FIG. 1c). In fact, recruit Polycomb proteins over the FLC locus88. In bud-
guidance of Polycomb proteins by antisense transcripts ding yeast, the antisense transcript to PHO84 (which
is likely to be common because PRC2 directly interacts encodes a phosphate transporter) is induced in response
with more than 3,000 antisense transcripts28, which could to chronological ageing, which causes repression of the
target Polycomb proteins to specific genomic locations sense gene by histone deacetylation both in cis and in
where they could subsequently function27,28. Thus, anti- trans 69. Moreover, in budding yeast, the silencing of the
sense transcripts can provide sequence specificity by GAL1–GAL10 locus (the protein products of which are
interacting either with DNA or with the nascent transcript involved in galactose metabolism) by antisense tran-
of the sense strand and by serving as a scaffold for the scription and the resulting activity of the histone meth-
chromatin-modifying machinery 10,11,84. This is similar to yltransferase Set1 have been extensively studied89,90. This
how siRNAs in Schizosaccharomyces pombe use nascent mechanism has been shown to operate genome wide48
transcripts that originate near the centrosome as assembly — hundreds of XUTs have been shown to silence their
platforms to guide heterochromatin formation12. sense counterparts through Set1 (REF. 48). However,
Chromatin modifications mediated by antisense despite a few clear examples of trans-acting XUTs, recent
Nascent transcript transcripts that suppress transcription initiation are work suggests that XRN1 could also function as a tran-
An RNA molecule that results not restricted to animals. In plants, COLD-ASSISTED scription factor in the nucleus91. These and other well-
from ongoing transcription
and that is still associated
INTRONIC NON-CODING RNA (COOLAIR), which is characterized examples suggest that antisense-mediated
with DNA through the RNA a set of antisense transcripts to FLOWERING LOCUS C repression of sense transcription through histone
polymerase. (FLC), uses two chromatin modification pathways to modification is common (reviewed in REFS 9,84).

REVIEWS
The act of antisense transcription, rather than the 5′ intron that contains an internal ribosome entry site
produced transcript, has also been shown to induce on the ZEB2 mRNA7. This does not change the abun-
chromatin modifications, which are deposited during dance of the ZEB2 mRNA but increases its translation
transcription and subsequently regulate the expression efficiency (FIG. 2c). Throughout metazoan evolution,
of the modified regions38,71,92. For example, in budding genes that produce multiple spliced isoforms are asso-
yeast, transcription of antisense units that arise from ciated with antisense transcription, which indicates
internal cryptic promoters directly modifies the chro- that antisense-mediated regulation could be a common
matin of the associated sense genes, which delays their mechanism to control alternative splicing 100. Antisense-
transcription initiation38,89,90 (FIG. 1d). mediated exon skipping has also been exploited thera-
A less investigated mechanism for the repression of peutically to change the levels of alternatively spliced
transcription initiation is the formation of triple helices isoforms or to restore disrupted open reading frames101.
between transcripts and DNA93. We currently lack reli- Antisense expression can also lead to alternative
able estimates of the prevalence of this mechanism in transcript isoforms by mechanisms that are independ-
antisense-mediated regulation, but several cases have ent of splicing. For example, transcriptional interfer-
been described for sense transcripts. For example, the ence by antisense enhancer RNAs that are expressed in
human dihydrofolate reductase gene (DHFR) produces a mouse embryonic stem cells during specific differentia-
sense transcript that overlaps the promoter and 5′ region tion stages can lead to the appearance of shorter sense
of DHFR. Consequently, this RNA represses the main transcript isoforms with alternative termination sites102
promoter both in cis and in trans by forming a stable (FIG. 2d). In bacteria, a similar mechanism called tran-
triple RNA–DNA helix 94. scription attenuation has been shown to affect the length
of sense mRNAs. This phenomenon induces premature
Co‑transcriptional effects of antisense expression. termination of the sense transcript 103 through the inter-
Antisense expression can regulate gene expression action between sense and antisense transcripts, which
after transcription initiation by transcriptional inter- also allows differential regulation of genes in a single
ference that occurs co‑transcriptionally. This effect can operon103.
be mediated by direct collision of RNA polymerases,
by ‘sitting-duck’ interference (that is, when an elon- Post-transcriptional effects of antisense expression.
gating polymerase removes another that is already Finally, antisense expression can regulate the post-
bound to its promoter) or by one RNA polymerase transcriptional ‘life’ of a sense mRNA (FIG. 3). This effect
acting as a ‘roadblock’ for other incoming elongating can be indirectly exerted, as in the case of ZEB2 above,
polymerases74. in which antisense expression controls translation effi-
If a DNA region is simultaneously transcribed in both ciency by affecting the produced transcript isoform7
directions, this leads to a collision of the transcription (FIG. 3c). In this section, however, we focus on direct
machinery (FIG. 2a). Although phage polymerases that post-transcriptional effects of antisense transcripts.
transcribe opposite DNA strands are able to bypass each These effects are potentially faster than the mechanisms
other in vitro95, this is not the case for more complex bac- described above, as they act on mRNA molecules that
terial96 or eukaryotic97 RNA polymerases. Transcriptional are already present in the cell and are not affected by the
interference by direct polymerase collision is most likely lag between a change in transcription rate and the estab-
when there are two strong convergent transcription lishment of a new mRNA concentration level. One limi-
units, as it is unlikely for two weak transcription units tation of direct post-transcriptional regulation is that it
to be simultaneously transcribed. However, polymerase requires both the sense and the antisense RNA molecules
pausing can increase transcriptional interference, even to be simultaneously present in the same cell. This is a
for weakly transcribed units, by extending the time of limitation in organisms such as yeast, in which genes are
polymerase occupancy 98. An example of transcriptional expressed, on average, at a level of only one mRNA mol-
interference that functions after transcription initiation ecule per gene per cell104 and in which levels of antisense
is the repression of the IME4 locus (which encodes a key transcripts are even lower. Notably, when antisense tran-
regulator of meiosis) in budding yeast by its antisense scription affects sense expression through the chromatin-
transcript regulator of meiosis 2 (RME2)5,99. In this case, mediated mechanisms that are discussed above, it is not
a 450‑bp internal region of the IME4 gene is necessary necessary for the antisense and sense transcripts to be
for antisense-mediated repression, which suggests that present in the same cell at the same time; steady state
antisense-mediated transcriptional interference blocks levels of antisense transcripts also become irrelevant, as
the elongation, but not the initiation, of the IME4 such effects are maintained by chromatin modifications.
transcript 99 (FIG. 2b). The direct post-transcriptional effects of antisense–
Antisense transcription can also regulate which sense transcript interactions that have been described
Polymerase pausing
A process in which an RNA transcript isoforms are produced by the sense gene. For are diverse and include both positive and negative effects
polymerase temporarily halts example, antisense expression can affect mRNA splicing on translation and mRNA stability.
elongation while remaining by masking specific splice sites and preventing their pro- An example of an activating effect on translation
associated with DNA. cessing. A well-known example in humans is the zinc- is the mouse ubiquitin carboxy-terminal hydrolase L1
It is associated with
transcriptional regulation after
finger E-box-binding homeobox 2 gene (ZEB2), which (Uchl1) antisense transcript, which increases the trans-
initiation and is particularly encodes a transcriptional repressor of E‑cadherin. Its lation of Uchl1 (REF. 33). Specifically, the antisense tran-
frequent in metazoans. antisense transcript prevents the processing of a large script binds to the 5′ region of the sense transcript, and

REVIEWS
a b Budding yeast
RNA polymerase RNA pol

collision STOP
RNA pol RNA pol IME4
RNA pol
RME2
c Humans ZEB2
ZEB2-AS1
or
IRES IRES
Splicing interference
Splicing
IRES
Low translation
Efficient translation
d Mice Short isoform
Different
differentiation
states
Long isoform
Figure 2 | Co‑transcriptional effects of antisense transcription. a | Head‑to‑head transcription can lead to RNA
polymerase collision96,97. b | Transcriptional interference of IME4 (which encodes a key regulator of Nature Reviews
meiosis) | Genetics
by its antisense
gene regulator of meiosis 2 (RME2)5 requires the presence of an internal sequence in a specific orientation. This supports a
model in which RME2 blocks polymerase elongation at the IME4 locus but not initiation of its transcription99. c | The
transcription of ZEB2 antisense RNA 1 (ZEB2‑AS1) prevents the processing of a 5′ intron that contains an internal ribosome
entry site (IRES)7. A sequence in the 5′ untranslated region of the ZEB2 mRNA limits ribosome scanning, such that only the
presence of this IRES in the final product allows efficient ZEB2 translation. d | Transcriptional interference by antisense
transcripts can limit the length of the sense transcript and lead to the production of shorter sense transcript isoforms102. By
contrast, when the level of antisense transcript expression is low (dashed line), the long sense isoform can be produced.
the SINEB2 domain on the antisense molecule then end of the SymE transcript, where it blocks the bind-
increases Uchl1‑translation efficiency (FIG. 3a). In addi- ing site of the 30S ribosomal subunit, thereby inhibiting
tion, nuclear–cytoplasmic shuttling of the antisense SymE translation (FIG. 3b).
transcript regulates the efficiency of Uchl1 translation33. Antisense expression can also affect the stability of
Although this is currently an isolated example, it sug- target mRNAs. Antisense transcripts have been shown
gests a modular mechanism of antisense transcript to increase the stability of their target sense mRNAs
function, in which one element recognizes the target by masking specific sites that would otherwise lead to
mRNA molecule and other elements (in this case, the mRNA degradation. One example, which comes from
SINEB2 domain) affect its post-transcriptional behav- humans, is that of the antisense transcript to the β-site
iour. Antisense transcripts can mediate not only activat- APP-cleaving enzyme 1 gene (BACE1), which encodes
ing effects but also repressive effects on translation, as in β‑secretase 1 — an enzyme that has a central role in the
the case of SymE, which encodes an enterobacterial toxin progression of Alzheimer’s disease. The antisense tran-
that is induced by SOS (that is, an inducible DNA repair script forms an RNA duplex with the sense mRNA105,
system)6. Its antisense transcript SymR binds to the 5ʹ and this duplex masks a binding site for the miRNA

REVIEWS
Translation mRNA stability
a Mice c Humans
Uchl1 BACE1
AS Uchl1
SINEB2 BACE1-AS
Nucleus
Positive Cytoplasm
miRNA-mediated degradation
b E. coli d Gram-positive bacteria

SymE
SymR
Negative Low sense expression High sense expression
Ribosome entry
RNase III degradation
50S
30S
Figure 3 | Post-transcriptional effects of antisense transcription. a | The 5′ region of an antisense transcript to the
ubiquitin carboxy-terminal hydrolase L1 gene (AS Uchl1) recognizes its sense transcript and increases the translation
efficiency of Uchl1 — an effect that depends on its SINEB2 domain33. b | In Escherichia coli, the antisense SymR
transcript competes with the 30S ribosome subunit for the binding to SymE (the sense transcript), consequently
decreasing the translation efficiency of SymE6. c | An antisense transcript to the β-site APP-cleaving enzyme 1 gene
(BACE1-AS) forms an RNA duplex with the sense transcript (BACE1); this masks the binding site of the microRNA
miR‑485‑5p and, consequently, suppresses the miRNA-mediated degradation and translational repression of BACE1
(REF. 106). d | Spurious low-abundance sense transcripts (blue) produced in Gram-positive bacteria can form
double-stranded RNAs by pairing with their corresponding antisense transcripts (red), which are also constitutively
expressed at low levels. These double-stranded RNAs are consequently processed by RNase III to produce pervasive
short RNAs108. Only highly expressed sense transcripts surpass the threshold of antisense-mediated degradation.
miR‑485‑5p, which consequently suppresses miRNA- case, it has been postulated that the presence of anti-
induced decay and translational repression of BACE1 sense transcripts imposes a threshold, so that only highly
(REF. 106) (FIG. 3c). This case is especially noteworthy, expressed transcripts will escape degradation, whereas
as it illustrates the competition between two different transcripts that are expressed at lower levels (that is,
kinds of regulatory RNAs (that is, miRNAs and anti- cryptic transcripts) will pair with their antisense tran-
sense transcripts) to ‘fine-tune’ gene expression levels, scripts and will consequently be immediately targeted
and suggests a role for antisense transcripts in directly for degradation108 (FIG. 3d).
binding to miRNAs and acting as miRNA sponges107.
Regulation of gene expression by antisense expression Biological relevance
can also occur between two convergent protein-coding So far, we have discussed different mechanisms of action
RNAs. For example, in humans, the stabilities of the for antisense transcription, which can affect different
WD repeat domain 83 mRNA (WDR83) and the deox- steps of the gene expression process and can even func-
yhypusine synthase mRNA (DHPS) are increased by tion simultaneously at multiple steps. However, the regu-
miRNA sponges
RNA molecules that have the formation of an RNA duplex that consists of their lation of gene expression by antisense transcripts does
multiple binding sites for 3′ UTRs53. not exist in isolation — it is integrated with other mecha-
specific microRNAs (miRNAs); Sense–antisense transcript pairing can also have nisms to achieve complex regulatory effects. Here, we
they are therefore able negative effects on mRNA stability; for example, in conceptualize different ways that cells integrate antisense
to function as decoys to
sequester miRNAs and
Gram-positive bacteria, double-stranded RNAs that transcription and focus on the biological advantages of
prevent them from binding are formed by the genome-wide pairing of sense– gene regulation by antisense transcription compared
to their targets. antisense transcripts are degraded by RNase III108. In this with that by protein regulators.

REVIEWS
a TF b
Off state On state
SUR7
GAL80
c d
Low or transient
stimulus TF
Without
Expression level
antisense
With antisense TF
Sense induction Rewiring regulatory networks
Figure 4 | Biological implications of antisense expression. a | Antisense transcripts can transmit regulatory signals
to neighbouring promoters29. Upon galactose induction in budding yeast, the same transcription Nature Reviews
factor | Genetics
(TF)-mediated
activation pathway induces the expression of GAL80 and an antisense transcript that originates bidirectionally from
its promoter. This antisense transcript runs upstream into the promoter of SUR7 and represses its activity, thereby
spreading GAL80 regulatory signal to both genes. b | Sense–antisense pairs can be regarded as self-regulatory
circuits, in which the unit can be in either an on state (that is, the sense gene (blue) is expressed (solid arrows) and
the antisense gene (red) is repressed (dashed arrows)) or an off state (that is, the target sense gene is repressed and the
antisense gene is expressed). c | The presence of antisense transcripts (red) can induce a threshold-dependent (that is,
ultrasensitive117) on–off switch on sense-gene regulation (blue). When the gene does not have an antisense transcript,
its expression follows different kinetics (grey). d | Antisense or non-coding RNA transcription can ‘rewire’ regulatory
networks, which inverts the final effect of a transcription factor. A transcription factor that activates gene expression
(upper panel) can also behave as a repressor (lower panel) if it activates the expression of a non-coding RNA (red)
that has repressive effects on the downstream gene (blue). Part a is modified, with permission, from REF. 29 © (2011)
Macmillan Publishers Ltd. All rights reserved.
Antisense transcripts as regulatory hubs. Owing to the can scaffold complexes that are bound to different
ability of transcription factors to recognize short DNA regions of the genome, thus bringing them together.
sequences that are present in the promoters of their tar- Therefore, another attractive hypothesis is the role of
get genes, they are generally better suited than antisense antisense transcripts in remodelling three-dimensional
transcripts for globally coordinating the expression of chromatin structure, as described for the human HOXA
groups of genes. However, antisense transcription can distal transcript antisense RNA (HOTTIP)109.
also coordinate gene expression of multiple genes both Another way in which antisense transcripts can act
in cis and in trans. In addition, antisense transcripts are as regulatory hubs is by locally spreading regulatory
established as hubs for gene regulation by the ability of signals to neighbouring genes. This has been observed
antisense-mediated regulation to integrate diverse types in antisense transcripts that originate from bidirectional
of regulatory signals, including both transcriptional and promoters in organisms with compact genomes, such
post-transcriptional ones, and to function at multiple as in budding yeast 29 and fission yeast 110. For example,
steps of the gene expression process. One of the factors upon galactose induction in budding yeast, the anti-
that allow antisense transcripts to integrate these diverse sense transcript that originates bidirectionally from the
signals is the intrinsic modular flexibility of ncRNAs10,11. GAL80 promoter runs upstream into the promoter of
An illustrative example is HOTAIR, which regulates SUR7, thereby repressing its activity. In this manner, the
gene expression both by binding to diverse loci across regulatory signals that impinge on the GAL80 promoter
the genome27,28 and by recruiting chromatin-modifying are spread to the promoter of SUR7 (REF. 29) (FIG. 4a).
machinery 3. Antisense transcripts, such as HOTAIR, Such local crosstalk also occurs in bacteria, in which
can also function as scaffolds for chromatin-associated some protein-coding transcripts have long UTRs that
complexes. Most components of the chromatin- silence the expression of neighbouring operons encod-
modifying machinery lack sequence specificity, but ing opposing cellular functions16. Local transcriptional
RNAs can bind to specific sequences and therefore crosstalk is not limited to organisms with compact
recruit and scaffold chromatin-associated complexes genomes and can also affect larger genomic regions.
into larger functional units10–12,56,84. In addition, ncRNAs For example, in mice, upon growth factor stimulation,

REVIEWS
‘ripples’ of transcription that originate from a target to respond differently to identical environmental stim-
promoter spread across 100‑kb regions. This induces uli, which could be advantageous for some cells in the
histone acetylation and the coordinated upregulation population114. This increased variability is expected to be
of both coding and non-coding neighbouring genes111. particularly relevant for stress-related genes, which are
in fact enriched for the presence of antisense transcripts
Sense–antisense pairs as self-regulatory circuits. The in budding yeast 29. As recently confirmed in the case
precise locus specificity that is afforded by nucleotide of PHO84 in budding yeast, anticorrelated sense and
sequence complementarity allows antisense transcripts, antisense expression4,69 can actually be due to exclusive
or the act of their transcription, to have specific effects expression of either transcript among single cells in a
on their targets. As sense and antisense transcription population115, which provides an opportunity for anti-
units are reciprocally complementary, in principle, sense-mediated regulation to contribute to cell‑to‑cell
they can mutually affect one another to establish self- phenotypic variability.
regulatory circuits between sense and antisense expression.
In the case of mutual repression (FIG. 4b), the effects can Effects on the kinetics of transcriptional regulation.
be subtle when antisense expression slightly modulates Antisense expression not only alters the abundance of
the expression of the sense gene (that is, fine-tuning its sense transcripts but also affects the kinetics of the tran-
expression). Alternatively, there might be more drastic sition between differential gene expression states116. In
effects, in which the pair can be in either an ‘on’ state (in general, regulation by antisense transcripts is faster
which the sense gene is expressed and the antisense gene than that by transcription factors, especially for anti-
is repressed) or an ‘off ’ state (in which the target sense sense transcripts that function post-transcriptionally112.
gene is repressed and the antisense gene is expressed) — Additionally, the observed equilibrium between sense and
that is, the pair can form a bistable switch. It is important antisense transcription (FIG. 4b) supports a model in which
to note that these switches are established independently antisense expression induces a threshold-dependent
for each locus. Thus, cases in which antisense and sense (that is, ultrasensitive 117) on–off switch for sense
expression are positively correlated24,29 in experiments expression29 (FIG. 4c). Specifically, the activation of the
using cell populations are compatible with alternative sense transcript needs to be high enough to oppose
expression from different cells or alleles. the repressive (that is, buffering) effect of the antisense
Studies have shown that the repressive effect of anti- transcript before the equilibrium can be altered and an
sense transcripts is sufficient to establish switch-like increase in sense expression can be produced. This anti-
behaviour of the sense gene. For example, budding yeast sense-dependent ultrasensitivity allows buffering against
seems to use the equilibrium between sense and anti- low levels of activating stimuli, such as transient spuri-
sense transcripts to increase both the dynamic range of ous activation signals, and also enables non-linear gene
gene expression and the cell‑to‑cell variability in levels expression responses. For example, in Synechocystis spp.
of protein expression29. In the case of SUR7 expression, cyanobacteria, the presence of an antisense transcript
its dynamic range is expanded in the lower range by (isrR) allows cells to ignore transient stimuli118. Only
antisense transcription that reduces basal or leaky levels upon continued stimulation does the production of the
of sense expression in the off state, thus increasing the sense iron-stress chlorophyll-binding protein transcript
range between minimum and maximum expression29. (isiA) overcome the repressive degradation effect of the
The ability of antisense transcription to regulate mul- antisense isrR transcript, which results in an accumu-
tiple levels of gene expression could further enhance lation of the sense transcript and thereby allows the
this response. In fact, systems that involve both tran- bacteria to adapt to iron stress conditions118,119 (FIG. 4c).
scriptional and post-transcriptional antisense-mediated Additionally, upon removal of the stimulus, a system that
regulation have been shown to achieve more efficient is based on antisense regulation can rapidly recover back
gene repression, in which any transcriptional leakage to its basal state, especially if the antisense transcript is
is blocked post-transcriptionally 112. Thus, antisense quickly degraded113.
transcripts provide a more robust (that is, less noisy)
sense-gene repression in the off state than transcription Rewiring regulatory networks. Antisense transcripts
factor-based mechanisms. can function in conjunction with protein regulators to
Dynamic range In the on state, when the sense transcript is expressed, modify their effects. For example, cells can use ncRNA
The range of expression
levels between the minimum
antisense-mediated regulation is expected to lead to expression to invert the final effect of other transcrip-
expression level of a gene in its noisier expression of the sense gene. This is due to the tional regulators. This ncRNA-mediated regulation not
basal or repressed state and low level of antisense transcripts that are present in the only allows the coordinated expression of neighbouring
its maximum expression level on state, which makes them sensitive to transcriptional genes, in which an antisense transcript originating from
upon full activation.
bursting113. Cell‑to‑cell variability in gene expression is a bidirectional promoter can spread signals that regulate
Transcriptional bursting likely to be increased in the on state, both because even sense expression29 (FIG. 4a), but can also function as a sim-
A stochastic process in which low levels of antisense expression increase transcrip- ple mechanism to rewire regulatory networks (FIG. 4d).
a promoter changes from an tional noise and because sense–antisense equilibrium is An example of this rewiring is the transcription of a long
inactive state to an active or established independently for each locus within individ- ncRNA IRT1 that overlaps the IME1 promoter, which is
open state that allows the
production of multiple RNAs in
ual cells29. Such variability, which can be achieved even at a key regulator of sporulation in budding yeast 92. The
a short period of time, before low levels of antisense transcription, could be important transcription factor Rme1 activates the transcription
returning to the inactive state. for adaptation. It allows cells in the same populations of IRT1, which arises from a promoter that is upstream

REVIEWS
of the IME1 promoter, but this activating upstream for new genes. This represents an evolutionary contin-
signal is transformed into a repressive one by the uum between antisense transcripts or other ncRNAs and
ncRNA, which silences the downstream IME1 promoter. protein-coding genes122.
Although IRT1 overlaps the IME1 promoter on the same Antisense transcripts may also participate in shap-
strand, antisense transcripts can mediate similar effects; ing the evolution of genome architecture. Namely, they
for example, both sense and antisense transcripts that provide an intrinsic mechanism for the regulation of
are repressed by Set3C (which is a histone deacetylase gene expression, in which not only the promoter and
that recognizes H3K4me) regulate the expression of its terminator, but also other neighbouring regions
their neighbouring genes in a similar manner 38. Thus, and transcripts, affect gene expression through ncRNAs
the rewiring of regulatory networks by both antisense that invade a gene boundary 29. In addition, when consid-
transcripts and ncRNAs could be a general mechanism ering groups of neighbouring loci, both the interleaved
for switching transcription factor functions between organization of the transcriptome25,35,55 and the tran-
activators and repressors. scriptional crosstalk between loci29 constrain the evolu-
tion of genome architecture. In particular, gene shuffling
Rapid evolution of antisense-mediated regulation could be evolutionarily restricted in regions that support
Regulation by antisense transcripts has potential advan- beneficial interactions between loci, whereas in other
tages over regulation by transcription factors because it regions, new rearrangements could give rise to novel
allows rapid evolution. The sequences of ncRNAs are interaction networks that are mediated by ncRNAs and
constrained by factors such as RNA secondary structure, antisense transcripts.
genomic position and expression level; collectively, the
sequences that encode these ncRNAs accumulate fewer Conclusions
substitutions than neighbouring neutral sequences61. In Research over the past few years shows that antisense
addition, both the presence and regulation of sense–anti- transcription is pervasive and, in many cases, it regulates
sense units show evolutionary conservation58–60. However, gene expression at multiple stages, from the control of
the appearance of new ncRNAs is less evolutionarily chromatin state to the modulation of the post-transcrip-
constrained than strategies that are dependent on pro- tional life of mRNAs. When studying a particular gene
teins because it does not involve the modification of locus, its overlapping antisense and ncRNA transcription
protein-coding regions. This allows a rapid generation should be taken into account to improve experimental
of antisense transcripts and contributes to the evolution of designs that are aimed at understanding its regulation
regulatory circuitries. In the simplest case, in which the and phenotypic consequences. However, to deepen our
act of transcription is the main regulatory function of understanding of antisense transcription and to fully
an antisense transcript, no evolutionary limitations exist characterize the different mechanisms of action used
with respect to the RNA sequence itself or to its final by cells to regulate gene expression, many technical and
abundance. Additionally, antisense-mediated regulation conceptual developments are required.
does not require any factors that are not already intrinsic From a technical point of view, both detailed bio-
to sense expression; thus, evolutionarily, it could have chemical dissection of antisense-mediated regulation
provided a regulatory mechanism that did not require and additional genome-wide functional studies will be
the invention of new machinery. instrumental. To distinguish between effects caused by
Antisense transcripts in bacteria appear rapidly on the act of transcription and those caused by the produced
an evolutionary timescale62,63. This high evolvability of transcripts, it will be necessary to disentangle the meas-
ncRNAs is also seen in higher eukaryotes, as shown in a urement of RNA levels35 from its transcription and deg-
study of closely related mammals61. These studies have radation37,67 (BOX 2). This, together with the application
found that the gain or loss of gene loci is more frequent of methods such as chromatin immunoprecipitation
for non-protein-coding genes than for protein-coding followed by sequencing (ChIP–seq), will improve our
ones. In fact, the rate of evolutionary turnover for anti- mechanistic understanding of cis-acting antisense
sense transcripts is similar to those reported for tran- transcripts. The study of specific subcellular popula-
scription factor-binding events120 or for other regulatory tions of transcripts25 will help to differentiate among
sequences121. It has been suggested that rapid evolution of co‑transcriptional, post-transcriptional or chromatin-
non-coding regulatory mechanisms allows rapid adapta- mediated effects. Experimental designs that involve
tion61. However, this rapid evolution also implies that time course analyses will be instrumental for determin-
the same antisense transcript cannot be easily studied ing the dynamics of the non-coding transcriptome and
across different species. In addition, although ncRNAs will thus be critical for distinguishing between the causes
or antisense transcripts may not initially encode pro- and consequences of antisense expression. Furthermore,
teins, they can provide ‘raw material’ for evolution to the application of new single-cell115 and transcriptomic55
act on to give rise to proteins. Therefore, although we technologies promises a clearer picture of both the func-
do not know the proportion of ncRNAs with regu- tion and structure of antisense transcripts, as the variabil-
latory functionality, some are likely to provide fit- ity of sense–antisense expression among different cells of
ness advantages in the future. Along these lines, both the same population might obscure the actual effects
ncRNAs and antisense transcripts have been proposed of antisense expression in individual cells. Carrying out
to be proto-genes that generate a pool of transcripts comparative studies across species will help to under-
encoding short polypeptides, which acts as a reservoir stand how the different antisense-mediated mechanisms

REVIEWS
evolved and the extent to which they are conserved. genetic tools, such as transcriptional gene silencing pro-
Finally, the development and application of new tech- duced by convergent sense and antisense transcripts124,
niques, such as the genome-wide mapping of specific but will also reveal new mechanisms that cells might be
binding of RNAs to chromatin27,28 and the identification using to control gene expression.
of proteins that are bound to each RNA molecule123, will In conclusion, we believe that the field of gene
open new avenues for studying trans effects of ncRNAs. regulation by antisense transcription, and by ncRNAs
As our understanding of the process of antisense more generally, will continue to expand and will pro-
expression improves, the use of antisense transcripts as vide important advances in our understanding of gene
biotechnological tools to control gene expression is also regulation, crosstalk between neighbouring loci and the
likely to increase. Such studies will not only offer new evolution of genome architecture.
1. Eddy, S. R. Non-coding RNA genes and the modern 24. Ozsolak, F. et al. Comprehensive polyadenylation site 42. Core, L. J. et al. Defining the status of RNA
RNA world. Nature Rev. Genet. 2, 919–929 (2001). maps in yeast and human reveal pervasive alternative polymerase at promoters. Cell Rep. 2, 1025–1035
2. Pontier, D. B. & Gribnau, J. Xist regulation and function polyadenylation. Cell 143, 1018–1029 (2010). (2012).
explored. Hum. Genet. 130, 223–236 (2011). 25. Djebali, S. et al. Landscape of transcription in human 43. Tan-Wong, S. M. et al. Gene loops enhance
3. Rinn, J. L. et al. Functional demarcation of active and cells. Nature 489, 101–108 (2012). transcriptional directionality. Science 338, 671–675
silent chromatin domains in human HOX loci by 26. Derrien, T. et al. The GENCODE v7 catalog of human (2012).
noncoding RNAs. Cell 129, 1311–1323 (2007). long noncoding RNAs: analysis of their gene structure, 44. Almada, A. E., Wu, X., Kriz, A. J., Burge, C. B. &
4. Camblong, J., Iglesias, N., Fickentscher, C., evolution, and expression. Genome Res. 22, Sharp, P. A. Promoter directionality is controlled by
Dieppois, G. & Stutz, F. Antisense RNA stabilization 1775–1789 (2012). U1 snRNP and polyadenylation signals. Nature 499,
induces transcriptional gene silencing via histone 27. Chu, C., Qu, K., Zhong, F. L., Artandi, S. E. & 360–363 (2013).
deacetylation in S. cerevisiae. Cell 131, 706–717 Chang, H. Y. Genomic maps of long noncoding RNA 45. Ntini, E. et al. Polyadenylation site-induced decay of
(2007). occupancy reveal principles of RNA–chromatin upstream transcripts enforces promoter directionality.
5. Hongay, C. F., Grisafi, P. L., Galitski, T. & Fink, G. R. interactions. Mol. Cell 44, 667–678 (2011). Nature Struct. Mol. Biol. 20, 923–928 (2013).
Antisense transcription controls cell fate in 28. Zhao, J. et al. Genome-wide identification of 46. Murray, S. C. et al. A pre-initiation complex at the
Saccharomyces cerevisiae. Cell 127, 735–745 polycomb-associated RNAs by RIP–seq. Mol. Cell 40, 3ʹ‑end of genes drives antisense transcription
(2006). 939–953 (2010). independent of divergent sense transcription. Nucleic
6. Kawano, M., Aravind, L. & Storz, G. An antisense RNA 29. Xu, Z. et al. Antisense expression increases gene Acids Res. 40, 2432–2444 (2012).
controls synthesis of an SOS-induced toxin evolved expression variability and locus interdependency. 47. Lardenois, A. et al. Execution of the meiotic noncoding
from an antitoxin. Mol. Microbiol. 64, 738–754 Mol. Systems Biol. 7, 468 (2011). RNA expression program and the onset of
(2007). This paper shows that the expression of antisense gametogenesis in yeast require the conserved
7. Beltran, M. et al. A natural antisense transcript transcripts in budding yeast can function as an exosome subunit Rrp6. Proc. Natl Acad. Sci. USA
regulates Zeb2/Sip1 gene expression during on–off switch, which increases gene expression 108, 1058–1063 (2011).
Snail1‑induced epithelial-mesenchymal transition. variability and mediates the spread of 48. van Dijk, E. L. et al. XUTs are a class of Xrn1‑sensitive
Genes Dev. 22, 756–769 (2008). transcriptional signals between neighbouring genes. antisense regulatory non-coding RNA in yeast. Nature
8. Wei, W., Pelechano, V., Jarvelin, A. I. & Steinmetz, L. M. 30. Sigova, A. A. et al. Divergent transcription of long 475, 114–117 (2011).
Functional consequences of bidirectional promoters. noncoding RNA/mRNA gene pairs in embryonic stem 49. Geisler, S., Lojek, L., Khalil, A. M., Baker, K. E. &
Trends Genet. 27, 267–276 (2011). cells. Proc. Natl Acad. Sci. USA 110, 2876–2881 Coller, J. Decapping of long noncoding RNAs regulates
9. Wery, M., Kwapisz, M. & Morillon, A. Noncoding RNAs (2013). inducible genes. Mol. Cell 45, 279–291 (2012).
in gene regulation. Wiley Interdiscip. Rev. Syst. Biol. 31. Hung, T. et al. Extensive and coordinated transcription 50. Lehmann, E., Brueckner, F. & Cramer, P.
Med. 3, 728–738 (2011). of noncoding RNAs within cell-cycle promoters. Nature Molecular basis of RNA-dependent RNA polymerase II
10. Guttman, M. & Rinn, J. L. Modular regulatory Genet. 43, 621–629 (2011). activity. Nature 450, 445–449 (2007).
principles of large non-coding RNAs. Nature 482, 32. Struhl, K. Transcriptional noise and the fidelity of 51. Wagner, S. D. et al. RNA polymerase II acts as an
339–346 (2012). initiation by RNA polymerase II. Nature Struct. Mol. RNA-dependent RNA polymerase to extend and
11. Mercer, T. R. & Mattick, J. S. Structure and function of Biol. 14, 103–105 (2007). destabilize a non-coding RNA. EMBO J. 32, 781–790
long noncoding RNAs in epigenetic regulation. Nature 33. Carrieri, C. et al. Long non-coding antisense RNA (2013). 
Struct. Mol. Biol. 20, 300–307 (2013). controls Uchl1 translation through an embedded 52. Kapranov, P. et al. New class of gene-termini-
12. Moazed, D. Small RNAs in transcriptional gene silencing SINEB2 repeat. Nature 491, 454–457 (2012). associated human RNAs suggests a novel RNA
and genome defence. Nature 457, 413–420 (2009). This study illustrates an example of antisense copying mechanism. Nature 466, 642–646 (2010).
13. Mattick, J. S. The genetic signatures of noncoding transcript modularity, in which one region of an 53. Su, W. Y. et al. Bidirectional regulation between
RNAs. PLoS Genet. 5, e1000459 (2009). antisense transcript recognizes its target mRNA, WDR83 and its natural antisense transcript DHPS in
14. Esteller, M. Non-coding RNAs in human disease. and another region of the same molecule increases gastric cancer. Cell Res. 22, 1374–1389 (2012).
Nature Rev. Genet. 12, 861–874 (2011). the efficiency of translation of the target mRNA. 54. Wilkening, S. et al. An efficient method for genome-
15. Georg, J. & Hess, W. R. cis-antisense RNA, another 34. Neil, H. et al. Widespread bidirectional promoters are wide polyadenylation site mapping and RNA
level of gene regulation in bacteria. Microbiol. Mol. the major source of cryptic transcripts in yeast. Nature quantification. Nucleic Acids Res. 41, e65 (2013).
Biol. Rev. 75, 286–300 (2011). 457, 1038–1042 (2009). 55. Pelechano, V., Wei, W. & Steinmetz, L. M.
16. Sesto, N., Wurtzel, O., Archambaud, C., Sorek, R. & 35. Xu, Z. et al. Bidirectional promoters generate Extensive transcriptional heterogeneity revealed by
Cossart, P. The excludon: a new concept in bacterial pervasive transcription in yeast. Nature 457, isoform profiling. Nature 497, 127–131 (2013).
antisense RNA-mediated gene regulation. Nature Rev. 1033–1037 (2009). This paper presents a strand-specific, genome-wide
Microbiol. 11, 75–82 (2013). 36. Seila, A. C. et al. Divergent transcription from active method — transcript isoform sequencing (TIF-seq)
17. Ietswaart, R., Wu, Z. & Dean, C. Flowering time promoters. Science 322, 1849–1851 (2008). — to map the boundaries of capped and
control: another window to the connection between 37. Core, L. J., Waterfall, J. J. & Lis, J. T. Nascent RNA polyadenylated RNAs, which allows the precise
antisense RNA and chromatin. Trends Genet. 28, sequencing reveals widespread pausing and divergent structure and the overlap between sense and
445–453 (2012). initiation at human promoters. Science 322, antisense transcripts to be determined.
18. Guil, S. & Esteller, M. cis-acting noncoding RNAs: 1845–1848 (2008). 56. Guttman, M. et al. lincRNAs act in the circuitry
friends and foes. Nature Struct. Mol. Biol. 19, This paper presents a strand-specific, genome-wide controlling pluripotency and differentiation. Nature
1068–1075 (2012). method — GRO-seq — to evaluate the presence of 477, 295–300 (2011).
19. Wagner, E. G. & Simons, R. W. Antisense RNA control active elongating polymerases. 57. Yap, K. L. et al. Molecular interplay of the noncoding
in bacteria, phages, and plasmids. Annu. Rev. 38. Kim, T., Xu, Z., Clauder-Munster, S., Steinmetz, L. M. RNA ANRIL and methylated histone H3 lysine 27 by
Microbiol. 48, 713–742 (1994). & Buratowski, S. Set3 HDAC mediates effects of polycomb CBX7 in transcriptional silencing of INK4a.
20. Vanhee-Brossollet, C. & Vaquero, C. Do natural overlapping noncoding transcription on gene induction Mol. Cell 38, 662–674 (2010).
antisense transcripts make sense in eukaryotes? Gene kinetics. Cell 150, 1158–1169 (2012). 58. Yassour, M. et al. Strand-specific RNA sequencing
211, 1–9 (1998). 39. Carrozza, M. J. et al. Histone H3 methylation by Set2 reveals extensive regulated long antisense transcripts
21. Katayama, S. et al. Antisense transcription in the directs deacetylation of coding regions by Rpd3S to that are conserved across yeast species. Genome Biol.
mammalian transcriptome. Science 309, 1564–1566 suppress spurious intragenic transcription. Cell 123, 11, R87 (2010).
(2005). 581–592 (2005). 59. Rhind, N. et al. Comparative functional genomics of
22. David, L. et al. A high-resolution map of transcription 40. Kaplan, C. D., Laprade, L. & Winston, F. Transcription the fission yeasts. Science 332, 930–936 (2011).
in the yeast genome. Proc. Natl Acad. Sci. USA 103, elongation factors repress transcription initiation from 60. Goodman, A. J., Daugharthy, E. R. & Kim, J. Pervasive
5320–5325 (2006). cryptic sites. Science 301, 1096–1099 (2003). antisense transcription is evolutionarily conserved in
23. Kampa, D. et al. Novel RNAs identified from an 41. Whitehouse, I., Rando, O. J., Delrow, J. & budding yeast. Mol. Biol. Evol. 30, 409–421 (2012).
in‑depth analysis of the transcriptome of human Tsukiyama, T. Chromatin remodelling at promoters 61. Kutter, C. et al. Rapid turnover of long noncoding
chromosomes 21 and 22. Genome Res. 14, 331–342 suppresses antisense transcription. Nature 450, RNAs and the evolution of gene expression. PLoS
(2004). 1031–1035 (2007). Genet. 8, e1002841 (2012).

REVIEWS
62. Raghavan, R., Sloan, D. B. & Ochman, H. Antisense 85. Swiezewski, S., Liu, F., Magusin, A. & Dean, C. 106. Faghihi, M. A. et al. Evidence for natural antisense
transcription is pervasive but rarely conserved in Cold-induced silencing by long antisense transcripts transcript-mediated inhibition of microRNA function.
enteric bacteria. mBio 3, e00156‑12 (2012). of an Arabidopsis Polycomb target. Nature 462, Genome Biol. 11, R56 (2010).
63. Nicolas, P. et al. Condition-dependent transcriptome 799–802 (2009). 107. Ebert, M. S. & Sharp, P. A. Emerging roles
reveals high-level regulatory architecture in Bacillus 86. Liu, F., Marquardt, S., Lister, C., Swiezewski, S. & for natural microRNA sponges. Curr. Biol. 20,
subtilis. Science 335, 1103–1106 (2012). Dean, C. Targeted 3ʹ processing of antisense R858–861 (2010).
64. Johnson, J. M., Edwards, S., Shoemaker, D. & transcripts triggers Arabidopsis FLC chromatin 108. Lasa, I. et al. Genome-wide antisense transcription
Schadt, E. E. Dark matter in the genome: evidence of silencing. Science 327, 94–97 (2010). drives mRNA processing in bacteria. Proc. Natl Acad.
widespread transcription detected by microarray tiling 87. Sun, Q., Csorba, T., Skourti-Stathaki, K., Sci. USA 108, 20172–20177 (2011).
experiments. Trends Genet. 21, 93–102 (2005). Proudfoot, N. J. & Dean, C. R‑loop stabilization This study shows that the genome-wide generation
65. Perocchi, F., Xu, Z., Clauder-Munster, S. & represses antisense transcription at the Arabidopsis of short RNAs is achieved by RNase III
Steinmetz, L. M. Antisense artifacts in transcriptome FLC locus. Science 340, 619–621 (2013). endoribonuclease activity in Gram-positive bacteria
microarray experiments are resolved by actinomycin 88. Heo, J. B. & Sung, S. Vernalization-mediated through the digestion of overlapping sense–
D. Nucleic Acids Res. 35, e128 (2007). epigenetic silencing by a long intronic noncoding RNA. antisense transcript pairs.
This study establishes that the use of actinomycin Science 331, 76–79 (2011). 109. Wang, K. C. et al. A long noncoding RNA maintains
D during reverse transcription eliminates the 89. Pinskaya, M., Gourvennec, S. & Morillon, A. active chromatin to coordinate homeotic gene
artifactual detection of antisense transcripts. H3 lysine 4 di- and tri-methylation deposited by expression. Nature 472, 120–124 (2011).
66. Levin, J. Z. et al. Comprehensive comparative analysis cryptic transcription attenuates promoter activation. 110. Bitton, D. A. et al. Programmed fluctuations in
of strand-specific RNA sequencing methods. Nature EMBO J. 28, 1697–1707 (2009). sense/antisense transcript ratios drive sexual
Methods 7, 709–715 (2010). 90. Houseley, J., Rubbi, L., Grunstein, M., Tollervey, D. & differentiation in S. pombe. Mol. Systems Biol. 7,
67. Churchman, L. S. & Weissman, J. S. Nascent transcript Vogelauer, M. A ncRNA modulates histone 559 (2011).
sequencing visualizes transcription at nucleotide modification and mRNA induction in the yeast GAL 111. Ebisuya, M., Yamamoto, T., Nakajima, M. &
resolution. Nature 469, 368–373 (2011). gene cluster. Mol. Cell 32, 685–695 (2008). Nishida, E. Ripples from neighbouring transcription.
This paper presents a strand-specific, genome-wide 91. Haimovich, G. et al. Gene expression is circular: Nature Cell Biol. 10, 1106–1113 (2008).
method — NET-seq — to evaluate the presence of factors for mRNA degradation also foster mRNA 112. Shimoni, Y. et al. Regulation of gene expression by
nascent transcripts associated with RNA synthesis. Cell 153, 1000–1011 (2013). small non-coding RNAs: a quantitative view. Mol. Syst.
polymerases. 92. van Werven, F. J. et al. Transcription of two long Biol. 3, 138 (2007).
68. Guttman, M. et al. Chromatin signature reveals over a noncoding RNAs mediates mating-type control 113. Mehta, P., Goyal, S. & Wingreen, N. S. A quantitative
thousand highly conserved large non-coding RNAs in of gametogenesis in budding yeast. Cell 150, comparison of sRNA-based and protein-based gene
mammals. Nature 458, 223–227 (2009). 1170–1181 (2012). regulation. Mol. Syst. Biol. 4, 221 (2008).
69. Camblong, J. et al. trans-acting antisense RNAs This paper reveals how the act of transcription 114. Lopez-Maury, L., Marguerat, S. & Bahler, J.
mediate transcriptional gene cosuppression in of an ncRNA is used to rewire a regulatory Tuning gene expression to changing environments:
S. cerevisiae. Genes Dev. 23, 1534–1545 (2009). network, which changes the final effect of a from rapid responses to evolutionary adaptation.
70. Berretta, J., Pinskaya, M. & Morillon, A. A cryptic transcription factor. Nature Rev. Genet. 9, 583–593 (2008).
unstable transcript mediates transcriptional trans- 93. Buske, F. A., Mattick, J. S. & Bailey, T. L. 115. Castelnuovo, M. et al. Bimodal expression of PHO84
silencing of the Ty1 retrotransposon in S. cerevisiae. Potential in vivo roles of nucleic acid triple-helices. is modulated by early termination of antisense
Genes Dev. 22, 615–626 (2008). RNA Biol. 8, 427–439 (2011). transcription. Nature Struct. Mol. Biol. 20, 851–858
71. Margaritis, T. et al. Two distinct repressive 94. Martianov, I., Ramadass, A., Serra Barros, A., (2013).
mechanisms for histone 3 lysine 4 methylation Chow, N. & Akoulitchev, A. Repression of the human 116. Uhler, J. P., Hertel, C. & Svejstrup, J. Q. A role
through promoting 3ʹ‑end antisense transcription. dihydrofolate reductase gene by a non-coding for noncoding transcription in activation of the
PLoS Genet. 8, e1002952 (2012). interfering transcript. Nature 445, 666–670 (2007). yeast PHO5 gene. Proc. Natl Acad. Sci. USA 104,
72. Gagneur, J. et al. Genome-wide allele- and strand- 95. Ma, N. & McAllister, W. T. In a head‑on collision, two 8011–8016 (2007).
specific expression profiling. Mol. Syst. Biol. 5, 274 RNA polymerases approaching one another on the 117. Koshland, D. E. Jr., Goldbeter, A. & Stock, J. B.
(2009). same DNA may pass by one another. J. Mol. Biol. Amplification and adaptation in regulatory and
73. Matsuda, E. & Garfinkel, D. J. Posttranslational 391, 808–812 (2009). sensory systems. Science 217, 220–225 (1982).
interference of Ty1 retrotransposition by antisense 96. Crampton, N., Bonass, W. A., Kirkham, J., Rivetti, C. & 118. Legewie, S., Dienst, D., Wilde, A., Herzel, H. &
RNAs. Proc. Natl Acad. Sci. USA 106, 15657–15662 Thomson, N. H. Collision events between RNA Axmann, I. M. Small RNAs establish delays and
(2009). polymerases in convergent transcription studied by temporal thresholds in gene expression. Biophys. J.
74. Shearwin, K. E., Callen, B. P. & Egan, J. B. atomic force microscopy. Nucleic Acids Res. 34, 95, 3232–3238 (2008).
Transcriptional interference – a crash course. Trends 5416–5425 (2006). 119. Duhring, U., Axmann, I. M., Hess, W. R. & Wilde, A.
Genet. 21, 339–345 (2005). 97. Hobson, D. J., Wei, W., Steinmetz, L. M. & An internal antisense RNA regulates expression of the
75. Lister, R. et al. Human DNA methylomes at base Svejstrup, J. Q. RNA polymerase II collision interrupts photosynthesis gene isiA. Proc. Natl Acad. Sci. USA
resolution show widespread epigenomic differences. convergent transcription. Mol. Cell 48, 365–374 103, 7054–7058 (2006).
Nature 462, 315–322 (2009). (2012). 120. Schmidt, D. et al. Five-vertebrate ChIP–seq reveals
76. Tufarelli, C. et al. Transcription of antisense RNA 98. Palmer, A. C., Ahlgren-Berg, A., Egan, J. B., Dodd, I. B. the evolutionary dynamics of transcription factor
leading to gene silencing and methylation as a novel & Shearwin, K. E. Potent transcriptional interference binding. Science 328, 1036–1040 (2010).
cause of human genetic disease. Nature Genet. 34, by pausing of RNA polymerases over a downstream 121. Meader, S., Ponting, C. P. & Lunter, G. Massive
157–165 (2003). promoter. Mol. Cell 34, 545–555 (2009). turnover of functional sequence in human and other
77. Lyle, R. et al. The imprinted antisense RNA at the 99. Gelfand, B. et al. Regulated antisense transcription mammalian genomes. Genome Res. 20, 1335–1343
Igf2r locus overlaps but does not imprint Mas1. controls expression of cell-type-specific genes in yeast. (2010).
Nature Genet. 25, 19–21 (2000). Mol. Cell. Biol. 31, 1701–1709 (2011). 122. Carvunis, A. R. et al. Proto-genes and de novo
78. Latos, P. A. et al. Airn transcriptional overlap, but not 100. Morrissy, A. S., Griffith, M. & Marra, M. A. gene birth. Nature 487, 370–374 (2012).
its lncRNA products, induces imprinted Igf2r silencing. Extensive relationship between antisense transcription This paper presents an evolutionary model by
Science 338, 1469–1472 (2012). and alternative splicing in the human genome. which functional genes evolve de novo from
79. Ginno, P. A., Lott, P. L., Christensen, H. C., Korf, I. & Genome Res. 21, 1203–1212 (2011). spurious translation of putative ncRNAs.
Chedin, F. R‑loop formation is a distinctive 101. Aartsma-Rus, A. & van Ommen, G. J. Antisense- 123. Castello, A. et al. Insights into RNA biology from an
characteristic of unmethylated human CpG island mediated exon skipping: a versatile tool with atlas of mammalian mRNA-binding proteins. Cell 149,
promoters. Mol. Cell 45, 814–825 (2012). therapeutic and research applications. RNA 13, 1393–1406 (2012).
80. Yu, W. et al. Epigenetic silencing of tumour 1609–1624 (2007). 124. Gullerova, M. & Proudfoot, N. J. Convergent
suppressor gene p15 by its antisense RNA. Nature 102. Onodera, C. S. et al. Gene isoform specificity transcription induces transcriptional gene silencing
451, 202–206 (2008). through enhancer-associated antisense transcription. in fission yeast and mammalian cells. Nature Struct.
81. Modarresi, F. et al. Inhibition of natural antisense PloS ONE 7, e43511 (2012). Mol. Biol. 19, 1193–1201 (2012).
transcripts in vivo results in gene-specific 103. Stork, M., Di Lorenzo, M., Welch, T. J. & Crosa, J. H. 125. Proffitt, J. H., Davie, J. R., Swinton, D. & Hattman, S.
transcriptional upregulation. Nature Biotech. 30, Transcription termination within the iron transport- 5‑methylcytosine is not detectable in Saccharomyces
453–459 (2012). biosynthesis operon of Vibrio anguillarum requires an cerevisiae DNA. Mol. Cell. Biol. 4, 985–988 (1984).
82. Wahlestedt, C. Targeting long non-coding RNA to antisense RNA. J. Bacteriol. 189, 3479–3488
therapeutically upregulate gene expression. Nature (2007). Acknowledgements
Rev. Drug Discov. 12, 433–446 (2013). 104. Miura, F. et al. Absolute quantification of the budding The authors thank R. Aiyar, A.I. Järvelin, J. Zaugg, W. Wei and
83. Gupta, R. A. et al. Long non-coding RNA HOTAIR yeast transcriptome by means of competitive PCR the members of the Steinmetz laboratory for their discussions
reprograms chromatin state to promote cancer between genomic and complementary DNAs. BMC and critical comments on the manuscript. L.M.S acknowl-
metastasis. Nature 464, 1071–1076 (2010). Genomics 9, 574 (2008). edges support by the Deutsche Forschungsgemeinschaft and
84. Magistri, M., Faghihi, M. A., St Laurent, G., 3rd & 105. Faghihi, M. A. et al. Expression of a noncoding RNA is the US National Institutes of Health.
Wahlestedt, C. Regulation of chromatin structure by elevated in Alzheimer’s disease and drives rapid feed-
long noncoding RNAs: focus on natural antisense forward regulation of β-secretase. Nature Med. 14, Competing interests statement
transcripts. Trends Genet. 28, 389–396 (2012). 723–730 (2008). The authors declare no competing interests.

CORRESPONDENCE L I N K T O O R I G I N A L A RT I C L E
L I N K T O A U T H O R S ’ R E P LY
statistically sound methods to quantify
A commentary on Pitfalls of the relative contribution of each marker-

derived principal component to estimates
predicting complex traits from SNPs

of variances and predictions of genetic
values. Because principal components are
linear functions of genotypes, removing
their effects will, by construction, remove
Gustavo de los Campos and Daniel A. Sorensen
genetic signal that is potentially captured
by markers. In general, unless the underlying
In their recent Opinion article (Pitfalls of bound on R2TST that is considerably lower causes of the signals that are captured by
predicting complex traits from SNPs. Nature than the finite sample estimate of h2G-BLUP. a principal-component analysis can be
Rev. Genet. 14, 507–515 (2013))1, Wray and The same study5 also presents simulation unambiguously interpreted, it is not clear
co-authors discuss prediction of complex scenarios with nominally unrelated indi- that ‘correcting’ for their effects will mitigate
traits using single-nucleotide polymor- viduals, where R2TST can be extremely low in the problems emerging from having a
phisms (SNPs). We would like to further situations with markedly different h2G-BLUP, non-representative TST sample.
elaborate and qualify some topics. suggesting a tenuous relationship between Gustavo de los Campos is at the Section on Statistical
As stated by Wray and co-authors1, h2G-BLUP and R2TST, even with moderately Genetics, Biostatistics Department, University of
knowing the proportion of variance of large TRN samples. Alabama at Birmingham, Birmingham,
a trait that is explained by regression on Alabama 35294, USA.
markers in the population (h2M) is relevant Assessment of prediction accuracy Daniel A. Sorensen is at the Department of Molecular
because, in principle, h2M represents the In the models discussed by Wray and co- Biology and Genetics, Faculty of Science and
Technology, Aarhus University, PB 50, DK‑8830 Tjele,
maximum prediction accuracy (R2TST) that authors1, R2TST is expected to be zero when
Denmark.
is achievable in testing (TST) data if marker TRN and TST samples are statistically inde-
e-mails: gcampos@uab.edu;
effects were known2. Following one study3, pendent5. Therefore, we disagree with the
daniel.alberto.sorensen@gmail.com
Wray and co-authors1 suggest estimating statement “problems occur in the validation
doi:10.1038/nrg3457-c1
h2M using a ratio of variance components stage, when data are not fully independent Published online 18 November 2013
that are inferred from a G‑BLUP analysis from those in the discovery phase” (REF. 1).
(h2G-BLUP). However, the realized propor- We agree with Wray and co-authors that 1. Wray, N. R. et al. Pitfalls of predicting complex traits
from SNPs. Nature Rev. Genet. 14, 507–515 (2013).
tions of allele sharing at markers and at estimates of R2TST can be biased if the TST 2. Goddard, M. Genomic selection: prediction of accuracy
causal loci can be very different4 owing to, sample is not representative of the popula- and maximisation of long term response. Genetica
136, 245–257 (2009).
for example, imperfect marker–causal loci tion in which predictions will be used. But 3. Yang, J. et al. Common SNPs explain a large
linkage disequilibrium (LD). Consequently, we cannot reconcile this with the general proportion of the heritability for human height. Nature
Genet. 42, 565–569 (2010).
the marker-based model may largely mis- advice of eliminating individuals in the TST 4. Hill, W. G. & Weir, B. S. Variation in actual relationship
represent the data-generating process; this sample based on predetermined thresholds as a consequence of Mendelian sampling and linkage.
Genet. Res. 93, 47–64 (2011).
is exacerbated with unrelated individu- for SNP-based relationships. Each predic- 5. de los Campos, G., Vazquez, A. I., Fernando, R.,
als5. Under these conditions, it is not clear tion problem has its own level of accuracy, Klimentidis, Y. C. & Sorensen, D. Prediction of complex
human traits using the genomic best linear unbiased
that the finite sample estimate of h2G-BLUP and proper representation may or may not predictor. PLoS Genet. 9, e1003608 (2013).
is an unbiased estimate of h2M (REF. 5), con- involve realized relationships above such 6. Janss, L., de los Campos, G., Sheehan, N. &
Sorensen, D. A. Inferences from genomic models in
seqeuenty, it is not obvious that R2TST can thresholds. stratified populations. Genetics 192, 693–704
achieve values equal to the finite sample Wray and co-authors1 discuss problems (2012).
7. Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M.
estimate of h2G-BLUP. In a recent article5, we due to stratification in TST samples and, as GCTA: a tool for genome-wide complex trait analysis.
studied the R2TST of G‑BLUP and its relation- practical advice, suggest including principal Am. J. Hum. Genet. 88, 76–82 (2011).
ship with h2G-BLUP. We show analytically that component covariates. One study6 shows Acknowledgements
mis-specification of the training–testing that inclusion of principal components as G.d.l.C. acknowledges financial support from the US National
I n s t i t u t e s o f H e a l t h g ra n t s R 01 G M 0 9 9 9 9 2 a n d
(TRN–TST) genomic relationships (owing fixed effects in a G‑BLUP analysis3,7 leads R01GM101219.
to, for example, imperfect marker–causal to a procedure with undesirable statisti- Competing interests statement
loci LD) can impose a large-sample upper cal properties. The same study6 provides The authors declare no competing interests.
NATURE REVIEWS | GENETICS www.nature.com/reviews/genetics

CORRESPONDENCE L I N K T O O R I G I N A L A RT I C L E
L I N K TO I N I T I A L C O R R E S P O N D E N C E
Naomi R. Wray, Jian Yang and Peter M. Visscher are
Author reply to A commentary on

at The Queensland Brain Institute, The University of
Queensland, QBI Building, St Lucia,
Queensland 4071, Australia.
Pitfalls of predicting complex traits Jian Yang and Peter M. Visscher are at The University
of Queensland Diamantina Institute, Level 7, 37 Kent
from SNPs
Street, Translational Research Institute,
Woolloongabba,Queensland 4102, Australia.
Ben J. Hayes and Mike E. Goddard are at the

Biosciences Research Division, Department of
Naomi R. Wray, Jian Yang, Ben J. Hayes, Alkes L. Price, Michael E. Goddard Primary Industries, GPO Box 4440, Melbourne,
and Peter M. Visscher Victoria 3001, Australia.
Ben J. Hayes is at the Dairy Futures Cooperative

Research Centre, AgriBio, Centre for AgriBioscience,
Following our recent Opinion article (Pitfalls This is incorrect. R depends on two fac-
2
TST 5 Ring Road, La Trobe University, Bundoora, Victoria
of predicting complex traits from SNPs. tors — h2M and the accuracy with which 3083, Australia; and La Trobe University, Bundoora,
Nature Rev. Genet. 14, 507–515 (2013))1, we the marker effects are estimated4,9. If the Victoria 3086, Australia.
received correspondence by de los Campos marker effects are estimated with no error, Alkes L. Price is at the Department of Epidemiology,
and Sorensen (A commentary on Pitfalls of then R2TST = h2M. In practice, the accuracy Harvard School of Public Health, 677 Huntington
predicting complex traits from SNPs. Nature of estimating SNP effects is usually low in Avenue, Boston, Massachusetts 02115, USA; the
Rev. Genet. 14, 894 (2013))2. We thank them humans, and this also explains the low R2TST Department of Biostatistics, Harvard School of Public
Health,655 Huntington Avenue, Boston,
for their comments, which follows their that is often reported. Their recent study3 Massachusetts 02115, USA; the Broad Institute of MIT
recent work3. de los Campos and Sorensen claims that “the estimated h2G did not pro- and Harvard, Cambridge, Massachusetts 02142, USA;
agree that maximum prediction accuracy vide a good indication of prediction R2”. and the Program in Molecular and Genetic
depends on h2M, which is defined as the vari- In their simulations of unrelated individu- Epidemiology, Harvard School of Public Health,
655 Huntington Avenue, Boston,
ance explained by genotyped markers in the als (GEN cohort; h2 = 0.8), they state that
Massachusetts 02115, USA.
population. They claim that estimates of h2M “when [non-causal] markers were used we
Mike E. Goddard is at the Faculty of Land and Food
in a finite sample (h2G-BLUP or h2G) may over- observed only a small extent of missing her-
Resources, University of Melbourne, Melbourne,
estimate h2M, and that this is exacerbated itability [h2G = 0.737, versus h2G = 0.773 for Victoria 3010, Australia.
for unrelated individuals. We respond by causal markers] but the reduction in R2 due
Correspondence to P.M.V.
showing how and why we disagree with to use of markers that were in imperfect LD e-mail: peter.visscher@uq.edu.au
these claims. with causal loci was dramatic [R2 = 0.071,
doi:10.1038/nrg3457-c2
h2G and h2G-BLUP are estimates of the same versus R2 = 0.517 for causal markers]”. Even
Published online 18 November 2013
parameter from equivalent models4–7 and though the number of causal loci was the
so, for the same data set, they must have the same, the number of markers differed: 1. Wray, N. R. et al. Pitfalls of predicting complex traits
from SNPs. Nature Rev. Genet. 14, 507–515 (2013).
same value. Both measure the proportion 300,000, corresponding to M = 60,000 inde- 2. de Los Campos, G. & Sorensen, D. A. A commentary
of the phenotypic variance that is explained pendent markers versus M = 5,000 in the on Pitfalls of predicting complex traits from SNPs.
Nature Rev. Genet. 14, 894 (2013).
by the markers. This proportion depends causal set. The following equation1 (where 3. de Los Campos, G., Vazquez, A. I., Fernando, R.,
on linkage disequilibrium (LD) between Nd is the sample size in the discovery sam- Klimentidis, Y. C. & Sorensen, D. Prediction of complex
human traits using the genomic best linear unbiased
the single-nucleotide polymorphisms ple) demonstrates that R2 decreases with predictor. PLoS Genet. 9, e1003608 (2013).
(SNPs) and causal variants (also known as higher M (which increases the variance of 4. Goddard, M. E., Wray, N. R., Verbyla, K. L. &
Visscher, P. M. Estimating effects and making
quantitative trait loci (QTLs)). If the LD is the estimated genetic relationships). predictions from genome-wide marker data. Statist.
imperfect, then h2M will be less than the Sci. 24, 517–529 (2009).
h2M 5. Habier, D., Fernando, R. L. & Dekkers, J. C. The impact
conventional heritability (h2), which is R2 = of genetic relationship information on genome-assisted
the proportion of variance explained by all 1+ M2 (1–R2) breeding values. Genetics 177, 2389–2397 (2007).
Nd hM 6. VanRaden, P. M. Efficient methods to compute genomic
causal variants. The extent of LD depends predictions. J. Dairy Sci. 91, 4414–4423 (2008).
on the relatedness of the sample of individu- de los Campos and Sorensen say that 7. Goddard, M. E. Genomic Selection: predicion of
accuracy and maximisation of long term response.
als used. If closely related individuals are R2TST is zero if the training and testing data Genetica 136, 245–257 (2009).
included in the sample, there is long-range sets are independent. This is a distracting 8. Yang, J. et al. Common SNPs explain a large
proportion of the heritability for human height. Nature
LD generated even between SNPs and QTLs statement because individuals within a Genet. 42, 565–569 (2010).
on different chromosomes. Thus, inclu- species are always related to some degree. 9. Goddard, M. E., Hayes, B. J. & Meuwissen, T. H.
Using the genomic relationship matrix to predict the
sion of close relatives increases h2M and its They also question our focus on the pre- accuracy of genomic selection. J. Anim. Breed.Genet.
estimates. Usually, the parameter we wish to diction accuracy that can be obtained in 128, 409–421 (2011).
10. Makowsky, R. et al. Beyond missing heritability:
estimate is the h2M among individuals who an independent validation sample. We prediction of complex traits. PLoS Genet. 7,
are no more closely related than randomly disagree with the opinion of de los Campos e1002051 (2011).
11. Janss, L., de Los Campos, G., Sheehan, N. &
sampled individuals from the population8. and Sorensen that the prediction accuracy Sorensen, D. Inferences from genomic models in
de los Campos and Sorensen state that that can be obtained in a non-independent stratified populations. Genetics 192, 693–704 (2012).
the accuracy of prediction (R2TST) does not validation sample is a quantity of equal Competing interests statement
approach h2M even in an infinite sample. interest. The authors declare no competing interests.
NATURE REVIEWS | GENETICS www.nature.com/reviews/genetics

Nature Reviews Genetics - December 2013 PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Nature Reviews Genetics - December 2013 PDF

Загружено:

Авторское право:

Доступные форматы

RESEARCH HIGHLIGHTS

Nature Reviews Genetics | AOP, published online 22 October 2013; doi:10.1038/nrg3616

Antiviral RNAi in mammals

NATURE REVIEWS | GENETICS VOLUME 14 | DECEMBER 2013

© 2013 Macmillan Publishers Limited. All rights reserved

Such chimeric protein complexes

Proteins partner up in a owing to the divergent evolutionary

NATURE REVIEWS | GENETICS VOLUME 14 | DECEMBER 2013

© 2013 Macmillan Publishers Limited. All rights reserved

GENE EXPRESSION the regular, colinear isoforms across a

Integrative transcriptome sequencing identifies

NATURE REVIEWS | GENETICS VOLUME 14 | DECEMBER 2013

© 2013 Macmillan Publishers Limited. All rights reserved

NATURE REVIEWS | GENETICS VOLUME 14 | DECEMBER 2013

© 2013 Macmillan Publishers Limited. All rights reserved

Seeing the pattern

NATURE REVIEWS | GENETICS VOLUME 14 | DECEMBER 2013

© 2013 Macmillan Publishers Limited. All rights reserved

NATURE REVIEWS | GENETICS VOLUME 14 | DECEMBER 2013

© 2013 Macmillan Publishers Limited. All rights reserved

human iPSCs, and genome sequence

Reprogrammed cells dissect

ape retrotransposition ity of LINE-1 elements in NHPs

NATURE REVIEWS | GENETICS VOLUME 14 | DECEMBER 2013

© 2013 Macmillan Publishers Limited. All rights reserved

From genetic variation to binding. This finding indicates that

phenotype via chromatin may specify histone modifications, and

NATURE REVIEWS | GENETICS VOLUME 14 | DECEMBER 2013

© 2013 Macmillan Publishers Limited. All rights reserved

NATURE REVIEWS | GENETICS VOLUME 14 | DECEMBER 2013

© 2013 Macmillan Publishers Limited. All rights reserved

NATURE REVIEWS | GENETICS VOLUME 14 | DECEMBER 2013 | 827

© 2013 Macmillan Publishers Limited. All rights reserved

Mutation accumulation Adaptive evolution

a Single-cell bottlenecks b Continuous culture c Serial transfer

1010 1010 1010

828 | DECEMBER 2013 | VOLUME 14 www.nature.com/reviews/genetics

© 2013 Macmillan Publishers Limited. All rights reserved

Substitution rate Box 1 | Mutation rates versus substitution rates

Biological fitness Slightly beneﬁcial

NATURE REVIEWS | GENETICS VOLUME 14 | DECEMBER 2013 | 829

© 2013 Macmillan Publishers Limited. All rights reserved

Box 2 | Adaptive evolution: optimization versus innovation

continuous innovation (BOX 2) or perhaps some mixture

and the population size, such that there is effectively only

830 | DECEMBER 2013 | VOLUME 14 www.nature.com/reviews/genetics

© 2013 Macmillan Publishers Limited. All rights reserved

NATURE REVIEWS | GENETICS VOLUME 14 | DECEMBER 2013 | 831

© 2013 Macmillan Publishers Limited. All rights reserved

Allele frequency (%)

Allele frequency (%)

Allele frequency (%)

Allele frequency (%)

Allele frequency (%)