Вы находитесь на странице: 1из 7

PROGRESS

A P P L I C AT I O N S O F N E X T- G E N E R AT I O N S E Q U E N C I N G
The beginnings of CLIP. CLIP relies on the

ProteinRNA interactions: principle that precise and stringent mapping


of binding sites is achieved by preserving the

new genomic technologies


invivo proteinRNA interactions by irradia-
tion of living cells or tissue with ultraviolet C
(UVC) light 7,8. The UVC light induces the
and perspectives formation of covalent crosslinks only at sites
of direct contact between proteins and RNA.
On cell lysis, the proteinRNA complex is
Julian Knig, Kathi Zarnack, Nicholas M.Luscombe and Jernej Ule immunoprecipitated with an antibody that
is specific for the protein of interest (FIG.1). If
Abstract | RNA-binding proteins are key players in the regulation of gene
no antibody is available, the RBP can alter-
expression. In this Progress article, we discuss state-of-the-art technologies that natively be fused to an epitope tag, which
can be used to study individual RNA-binding proteins or large complexes such as is then expressed as a transgene for affinity
the ribosome. We also describe how these approaches can be used to study purification1012. The co-purified RNA mol-
interactions with different types of RNAs, including nascent transcripts, mRNAs, ecules are reverse-transcribed and amplified
microRNAs and ribosomal RNAs, in order to investigate transcription, RNA with the aid of 5 and 3 adaptors. In the
original CLIP protocol, individual clones
processing and translation. Finally, we highlight current challenges in data
of the resulting cDNAs were subjected to
analysis and the future steps that are needed to obtain a quantitative and Sanger sequencing. The resulting sequences
high-resolution picture of proteinRNA interactions on a genome-wide scale. were then mapped to the reference genome
to reveal the sites of protein binding within
During and after transcription, RNAs are (CLIP) was developed7,8. CLIP combines the corresponding transcripts.
subject to multiple processing and regulatory UV crosslinking of RBPs to their cognate The accuracy of CLIP was demonstrated
steps that are coordinated by RNA-binding RNA molecules with rigorous purification in a study of NOVA-dependent splicing
proteins (RBPs)1,2. Therefore, to understand schemes. Recently, CLIP has been coupled regulation in the brain7. These first CLIP
the fate and function of RNA molecules, a to high-throughput sequencing, which has experiments revealed how NOVA binds at
key task is to map proteinRNA interactions allowed comprehensive genome-wide studies. different positions to silence or to enhance
and to determine their effects on the tran- In addition, CLIP-related techniques have exon inclusion. Other applications of CLIP
scriptome. In recent years, there has been been developed to determine RNA interac- have included uncovering a role for the
great progress in the field of ribonomics, tions with larger complexes such as the ribo- heterogeneous nuclear ribonucleoprotein
which uses genome-wide tools to study some. With these developments at hand, we HNRNPA1 in microRNA (miRNA) process-
how the interactions of RNAs and proteins are now entering an exciting era of broad ing 13 and identifying actively transported
modulate co-transcriptional and post- applications of ribonomic methods. In this transcripts in fungal filaments14.
transcriptional regulation of gene expression. Progress article, we describe these recent
The first ribonomic approaches combined advances in ribonomic techniques. We also CLIP goes high-throughput. To map protein
RNA immunoprecipitation with differential introduce approaches for data analysis, RNA binding sites more comprehensively,
display or microarray analysis (RIPchip) highlight the major challenges in this field Licatalosi and co-workers15 replaced the
to identify RNAs that are bound by specific and conclude with an outlook on future Sanger method with high-throughput
RBPs35. However, these methods were developments. sequencing, which enables millions
limited to stable ribonucleoprotein particles of sequences to be determined in a single
(RNPs) and were prone to detecting non CLIP: landscapes of RNA binding run. This approach, which is known as
specific interactions6. Moreover, the resulting Most RBPs recognize short, degenerate high-throughput sequencing of CLIP
data were of low resolution, as the binding RNA motifs, and therefore they often bind cDNA library (HITS-CLIP or CLIPseq16),
site in the co-purified RNA molecule at several sites on most RNAs. Thus, it is not provides more comprehensive binding
remained unresolved. Therefore, defining sufficient to determine whether a protein information (FIG.1). The power of coupling
precise RBP binding sites and reducing the interacts with a particular RNA, but it is CLIP and high-throughput sequencing was
number of false positives have been primary important to define the full landscape of first demonstrated by an analysis of NOVA-
challenges for experimental ribonomics. interactions of the protein with the RNA. dependent RNA processing in the brain15.
To identify the positions of proteinRNA CLIP is a state-of-the-art technology that The greater sequencing depth compared
interactions with a higher resolution and allows users to define these RNA land- with Sanger sequencing provided new
specificity, a method known as ultraviolet scapes9. Here, we describe the basic concepts insights into the NOVA-dependent splic-
(UV) crosslinking and immunoprecipitation of CLIP and introduce recent developments. ing regulation and also led to the discovery

NATURE REVIEWS | GENETICS VOLUME 13 | FEBRUARY 2012 | 77

2012 Macmillan Publishers Limited. All rights reserved


PROGRESS

PAR-CLIP HITS-CLIP
UV 365 nm UV 254 nm
U U
U

RBP U U U RBP
U

U
U RBP U AAA
RBP
AAA
U
U
U 4-thiouridine Lysis

Immunoprecipitation of crosslinked proteinRNA complexes


Lysis
RBP 3 RNA adaptor ligation
5 3

Proteinase K leaves polypeptide ( ) at the crosslink nucleotide iCLIP

5 RNA adaptor ligation

Reverse transcription Reverse transcription Reverse transcription


Transition
Deletion or mutation Truncation
U
G
cDNA cDNA
Reverse transcription
or or primer: two
cleavable adapter
regions (blue) and
Read-through Read-through barcode (green)
U
A Circularization
cDNA

PCR PCR
Linearization and PCR

High-throughput sequencing High-throughput sequencing High-throughput sequencing


C
G

Figure 1 | Comparison of HITS-CLIP and its latest variants, PAR-CLIP the RNA is reverse transcribed, and the resulting cDNAs
Nature are PCR
Reviews ampli-
| Genetics
and iCLIP. For high-throughput sequencing of RNA isolated by ultraviolet fied with primers that are complementary to the 5 and 3 adaptor
(UV) crosslinking and immunoprecipitation (HITS-CLIP), living cells or tis- regions. The resulting cDNA library is subjected to high-throughput
sue samples are irradiated with UV light at a wavelength of 254nm sequencing. For photoactivatable ribonucleoside-enhanced CLIP (PAR-
(shown in the centre of the figure). This induces the formation of covalent CLIP) (shown on the left of the figure), cells are fed with 4thiouridine,
crosslinks between proteins and RNA, which are restricted to sites of which becomes incorporated into newly transcribed RNA. This allows
direct contact. The cells are then lysed and RNA is partially digested to crosslinking with UV light at 365nm. During reverse transcription, the
an approximate length of 3050 nucleotides. Next, the proteinRNA nucleoside analogue causes a base transition that can be used to pin-
complex is immunoprecipitated with an antibody that is specific for the point the crosslinked nucleotide. In the individual nucleotide resolution
protein of interest. After stringent washing, the RNA is radioactively CLIP (iCLIP) protocol (shown on the right of the figure), crosslinking is
labelled and an adaptor is ligated to the 3 end of the RNA. Further puri- carried out as in HITS-CLIP at 254nm. However, in order to capture
fication is achieved through denaturing gel electrophoresis and transfer cDNAs that truncate at the peptide that remains at the crosslinked nucle-
to a nitrocellulose membrane, which removes nonspecific RNAs. The otide after proteinase K digestion, the 5 adaptor is added after reverse
radioactive label on the RNA is used to guide the excision of the protein transcription. This is achieved through priming reverse transcription with
RNA complex from the membrane. The protein is then removed from an oligonucleotide that contains the 3, as well as the 5, adaptor region
the RNA by proteinase K digestion. An adaptor is ligated to the 5 end, followed by circularization of the generated cDNAs.

78 | FEBRUARY 2012 | VOLUME 13 www.nature.com/reviews/genetics

2012 Macmillan Publishers Limited. All rights reserved


PROGRESS

of an unexpected role of NOVA in 3 end An alternative method for achieving Considerations and applications for CLIP.
processing. HITS-CLIP is also being used nucleotide resolution is known as All CLIP variants HITS-CLIP, PAR-CLIP
to study the tripartite complex between individual nucleotide resolution CLIP and iCLIP produce data of high
the Argonaute proteins, miRNAs and their (iCLIP)21. This method is based on the quality and precision. However, the library
target transcripts1719. Although the direct concept that reverse transcription can stop preparation protocols for these techniques
pairing of an miRNA with its target mRNA at nucleotides that are crosslinked to the require a large number of enzymatic steps
cannot yet be deduced from these data, the peptides that remain after proteinase K that potentially affect binding site detection.
detection of Argonaute binding sites in both digestion24. However, the truncated cDNAs For example, it is important to optimize
miRNAs and mRNAs enabled the discovery that are produced would lack the 5 adaptor the conditions of partial RNase digestion,
of endogenous mRNA target sites. Recently, region that is required for PCR amplifica- as overdigestion can decrease the number
an important step was made towards direct tion and would be lost during the standard of identified sites22. Furthermore, it has
monitoring of inter-RNA interactions CLIP library preparation. To capture these been shown that the use of different RNA
within tripartite complexes: a method called truncated cDNAs, iCLIP uses an alternative ligases can influence the cloning of short
crosslinking, ligation and sequencing of strategy for adaptor ligation and reverse tran- RNAs26, and it remains to be seen how the
hybrids (CLASH) was developed, which scription, replacing one of the intermolecular choice of ligase influences the different CLIP
exploits the formation of intermolecular RNA ligation steps with an intramolecular protocols.
RNA ligation events. As a proof of principle, cDNA circularization (FIG.1). Importantly, The crosslinking efficiency with UVC
this method was used to map invivo RNA sequencing the truncated cDNAs provides (HITS-CLIP and iCLIP) or UVA (PAR-CLIP)
RNA contact sites of small nucleolar RNAs direct identification of the crosslink position, varies for different proteins, and the optimal
(snoRNAs) with precursor ribosomal RNAs which is located one nucleotide upstream protocol needs to be experimentally deter-
(pre-rRNAs) during ribosome biogenesis20. of the truncation site. As a demonstration of mined individually for the protein of inter-
this method, iCLIP was used to resolve the est 22. However, the application of PAR-CLIP
Advancing towards nucleotide resolution. footprint of adjacent HNRNPC binding sites is currently limited to cultured cells that can
In the traditional CLIP protocol, the within uridine tracts21. efficiently incorporate nucleoside analogues.
resolution of binding site detection mostly Conversely, the timing of nucleoside appli-
corresponds to the length of the fragmented The goal: quantitative CLIP analysis. cation provides the opportunity to restrict
RNAs. However, Granneman and Owing to the small amount of starting mate- crosslinking to transcripts that were newly
colleagues11 showed that crosslink-induced rial and the numerous steps at which synthesized, promising new insights into
point mutations and deletions can be used material can be lost, the number of CLIP RBP binding to nascent transcripts. CLIP
to identify the crosslink sites of RBPs within cDNAs generated from crosslinked RNA has already been used to study RBP bind-
snoRNAs11. Recently, two approaches intro- is an important limiting factor. As a conse- ing to diverse types of transcripts, including
duced new strategies that are based on mod- quence, the resulting cDNA libraries rarely introns, mRNAs, miRNAs, snoRNAs and
ified crosslinking or library-preparation contain the full range of RNA binding sites. rRNAs11,15,17. Moreover, it should be pos-
protocols to identify the crosslink sites on a An additional concern is that, owing to sible to use CLIP technologies in any living
genome-wide scale12,21. biases in the PCR amplification step, librar- organism, and they have already been used
In the photoactivatable ribonucleoside- ies can result in thousands of sequences that in yeast, fungi, worms and mammals7,11,14,18.
enhanced CLIP (PAR-CLIP) approach12, originated from a single cDNA. This can
photoactivatable nucleotide analogues such lead to data of limited complexity and infor- Studying larger RNP complexes
as 4thiouridine (4SU) or 6thioguanosine mational content and can distort the quanti- CLIP technologies determine the direct
(6SG) are used (FIG.1), which can be tative analysis of proteinRNA interactions. contacts between individual RBPs and their
efficiently crosslinked with ultraviolet A One way to avoid amplification artefacts cognate RNAs. In some cases, however, it
(UVA) light (at a wavelength of 365nm). is to count identical sequences only once; is desirable to investigate the interactions
The nucleotide analogues are readily taken however, this approach reduces the dynamic of larger complexes with RNAs. Two recent
up by cells and become incorporated into range of the resulting data. A more sophis- approaches use the purification of intact
newly synthesized transcripts. Importantly, ticated way to control for library complex- ribosomes or RNA polymerase complexes to
they lead to a base transition at the ity is to use a randomized sequence in the monitor translation and transcription on a
crosslink site during reverse transcription. adaptor or reverse transcription primer. genome-widescale.
Therefore, mutation analysis of the This sequence, which is referred to as a ran-
resulting cDNA sequences can be used domer, a degenerate or a random barcode, Footprinting the ribosome. Through an
to pinpoint crosslink sites at nucleotide can be used to discriminate independent approach that is termed ribosome profiling,
resolution (discussed below). This method cDNAs from PCR duplicates17,21,25. For the Weissman laboratory provided high-
was successful in identifying crosslink example, two sequences that map to the resolution analysis of translation on a
sites of pumilio homologue 2 (PUM2), same genomic location and that share an genome-wide scale27 (FIG.2a). This was
quaking (QKI), insulin-like growth factor 2 identical randomer are treated as PCR achieved by stalling ribosomes on the
mRNA-binding protein 1 (IGF2BP1), duplicates, whereas they are identified as transcripts using cycloheximide treatment,
IGF2BP2 and IGF2BP3, the Argonaute two unique cDNAs if they possess different followed by cell lysis, RNase treatment and
proteins and HUR (also known as ELAVL1) randomers. It was recently shown that this purification of the RNA fragments that
in HEK293 cells12,22. Similarly, analysis of approach can also improve the accuracy of were protected by the ribosome invivo.
point mutations and deletions was used for other high-throughput sequencing meth- The fragments were then subjected to
the genome-wide identification of crosslink ods25, although random barcoding may also circularization-based library preparation and
sites from HITS-CLIP data22,23. have its limitations. high-throughput sequencing. The resulting

NATURE REVIEWS | GENETICS VOLUME 13 | FEBRUARY 2012 | 79

2012 Macmillan Publishers Limited. All rights reserved


PROGRESS

a Ribosome proling b NET-seq ribosome footprints were used to obtain


quantitative information about translation
rates and ribosome density within tran-
Pol II scripts. Notably, the resolution of the data is
DNA precise enough to gain information about
the translated reading frame. However, it is
important to note that this information was
m7G m7G not obtained at the level of individual codons,
AAA Nascent RNA
but was inferred from an average signal over
Ribosome
the complete open reading frame. Initially
developed in yeast and used to study trans-
Stall ribosomes with Freeze and lyse lational changes during the stress response27,
cycloheximide and ribosome profiling was later adapted for use
cell lysis
Immunoprecipitation of tripartite
with mammalian cell lines28. In this system,
Pol II, DNA and RNA complex research using ribosome profiling sug-
RNase treatment gested that miRNAs predominantly function
through the destabilization of target tran-
scripts, rather than by silencing translation28.
5 3
Chasing the RNA polymerase. A genome-
wide view of transcription can be obtained
from high-throughput sequencing of DNA
Purify ribosomes with fragments that are crosslinked to RNA
protected RNA fragments polymerase II (Pol II) a technique that
Purify RNA is known as Pol II chromatin immunopre-
Polyadenylation cipitation followed by sequencing (ChIPseq)
5
3 RNA adaptor ligation
or from approaches that are based
AAA
on nuclear run-on, such as global run-on
~28 nucleotides 5 3
sequencing (GRO-seq)29. In order to monitor
transcriptional states of unperturbed cells
with a high resolution and strand specific-
ity, the Weissman laboratory developed an
Reverse transcription Reverse transcription
5
approach to study Pol II binding to nascent
AAA RNAs, which is referred to as native elon-
TTT
cDNA cDNA gating transcript sequencing (NET-seq)30.
Reverse transcription Reverse transcription Without prior crosslinking, this technique
primer: T-tract primer: two
and adaptor cleavable adaptor combines Pol II affinity purification with
regions (blue) regions (blue) sequencing of the 3 ends of the co-purified
RNAs, and so provides insights into tran-
TTT
scription at single-nucleotide resolution
(FIG.2b). This approach is feasible owing to
Circularization Circularization the high stability of the ternary complex that
is formed between Pol II, the transcribed
DNA and the nascent RNA. Churchman
and Weissmann30 exploited the strand infor-
Linearization and PCR Linearization and PCR mation in the data to reveal a link between
histone H4 acetylation and antisense tran-
High-throughput sequencing High-throughput sequencing scription at promoters. In addition, the study
investigated Pol II backtracking and nucle-
TTT osome-induced pausing, which reflects the
broad range of applications for NET-seq30.
Figure 2 | Ribonomic methods to study transcription and translation. a | For
Nature ribosome
Reviews profil-
| Genetics This technique promises to be a valuable
ing, ribosomes are stalled on the translated RNAs through cycloheximide treatment. After cell tool for researchers who are interested in all
lysis, the RNA that is not covered by the ribosomes is degraded with RNase. The ribosomes are aspects of transcription.
then purified together with the protected RNA fragments. The RNA fragments are then polyade-
nylated, which allows priming of reverse transcription and circularization-based cDNA library
Data analysis and interpretation
preparation. b | For native elongating transcript sequencing (NET-seq), cells are flash-frozen and
lysed. The tripartite complex of RNA polymerase II (Pol II), DNA and nascent RNA is immunopuri- The large amounts of data generated by
fied. The nascent RNAs are separated and an adaptor is ligated to their 3 ends. Upon reverse ribonomic approaches require considerable
transcription, cDNA libraries are prepared using a circularization-based approach. High- computational efforts for biological interpre-
throughput sequencing of these libraries provides information about the position of Pol II at tation. The first level of analysis is genomic
nucleotide resolution. mapping of the sequence reads, followed by

80 | FEBRUARY 2012 | VOLUME 13 www.nature.com/reviews/genetics

2012 Macmillan Publishers Limited. All rights reserved


PROGRESS

a second level of clustering and normaliza- to ensure that binding at a given site is repro- Therefore, it is likely that only a subset of
tion to identify highly occupied binding ducible15 or the calculation of significant the interactions is associated with specific
sites. At the third level, the binding sites are enrichment over the background signal in functions. In order to identify functional
integrated with functional information in surrounding areas on the same gene16,21,22. In interactions, the physical maps of protein
order to deduce general regulatory principles. this context, it is important to keep in mind RNA interactions can be integrated with
We discuss these different layers of data that CLIP read counts are not necessarily a other genome-wide data sets that provide
analysis and interpretationbelow. direct measure of RBP affinity, as they can be functional information about the RBP. For
affected by other factors, such as the half-life example, the integration of binding data with
Mapping the sequence reads. Fast and effi- of the bound RNA region or the crosslinking information from splice-junction microar-
cient alignment algorithms such as Bowtie31 efficiency of a given sequence. rays or RNA-seq can be used to generate RNA
or BurrowsWheeler alignment (BWA)32 are In addition to read-cluster identification, maps, demonstrating position-dependent
standard tools for mapping high-throughput several different approaches have been splicing regulation by RBPs41,42. So far, such
sequencing reads to the genome. However, implemented that directly identify the nucle- maps have been successfully applied to
if RBPs bind mature RNAs, the cDNA otide that is crosslinked (FIG.3a). It is impor- determine the functional binding sites of
sequences often span exonexon junctions. tant to note that the crosslinked nucleotide several splicing regulators, including NOVA,
Therefore, mapping of sequence reads that may not always reside within the binding FOX2 and HNRNPC15,16,21,41. Similarly, it
are produced by CLIP approaches or ribo- site of the protein, regardless of which CLIP will be interesting to study the concerted
some profiling should ideally include either technology is used. For example, binding binding of different RBPs in more detail.
the use of splicing-aware algorithms such as motifs of NOVA are mainly enriched in the For example, recent PAR-CLIP studies indi-
TopHat 33 or direct alignment to processed sequences immediately surrounding, but not cated that HUR binding sites in the 3 UTR
transcripts. Another challenge is the map- including, the crosslinked nucleotide23. The are enriched in the vicinity of Argonaute
ping of sequences to genes that are present in potential for such shifts has to be considered miRNA complexes in the same region43,44.
multiple copies in the genome, such as small when using the position of crosslink nucleo- Finally, a promising future direction will be
nuclear RNAs (snRNAs), rRNAs and snoRNAs. tides to investigate the sequence of the RNA to combine CLIP with other emerging ribo-
One solution is to use non-redundant motifs that are required for the high-affinity nomic assays, such as ribosome profiling,
databases that offer consensus sequences protein binding. Common tools for identi- which would allow the direct monitoring of
for multi-copy genes11. Another option fying binding motifs such as motif em for the effect of RBP binding on translation.
is to allow mapping to multiple positions motif elicitation (MEME) and Phylogibbs
in the genome, but care needs to be taken are complemented by approaches that Future directions
when interpreting such data. To account for search for the enrichment of certain kmers To date, CLIP studies have mainly been
sequencing errors and crosslink-induced in the vicinity of read clusters or crosslink used for qualitative descriptions of RBP
point mutations11,12, it can be advantageous nucleotides10,16,39,40. binding, and the generation of reliable
to allow one or more mismatches in the quantitative information on RBP binding
alignments. To capture crosslink-induced Integrating functional information. Several remains a major challenge. We expect to
deletions, algorithms such as Novoalign, CLIP studies indicate that many RBPs see further improvements in CLIP library
segemehl34 or genomic short-read nucleo- show high-affinity binding to thousands preparation that will increase the com-
tide alignment program (GSNAP)35, which of different positions in the transcriptome. plexity of cDNA libraries and allow better
allow gapped alignments, should be used11,23.
A valuable resource is the dedicated servers
and databases that are available for the map- Glossary
ping and analyses of CLIP data that are Argonaute proteins NOVA
generated by the different protocols3638. Core components of the RNA-mediated silencing A regulator of a biologically coherent set of RNAs
pathways. They provide the platform for target important for synaptic function. It is involved in the
Identification of binding sites. The high mRNA recognition by small non-coding RNAs and neurological disorder paraneoplastic opsoclonus
harbour the catalytic activity for mRNA cleavage. myoclonus ataxia.
stringency of library preparation achieved
with the different CLIP approaches is docu- Differential display Ribonomics
mented by the low number of nonspecific A PCR-based approach that was used to study differences The genome-scale study of proteinRNA interactions
reads in control experiments, which use in RNA populations. It has now been superseded by and their functional consequences.
knockout tissue, omit the antibody or omit microarray and RNA sequencing approaches.
Ribonucleoprotein particles
UV crosslinking. Thus, with proper purifica- Global run-on sequencing (RNPs). Complexes consisting of protein and RNA
tion of the proteinRNA complex, the vast (GRO-seq). A technique that combines nuclear run-on components.
majority of CLIP reads represent protein assays with high-throughput sequencing to obtain
RNA interaction sites. The occupancy of the genome-wide information about active transcription. Small nuclear RNAs
(snRNAs). A class of non-coding RNAs that are
RBP at these sites varies considerably; bind-
Heterogeneous nuclear ribonucleoprotein found in the nucleus of eukaryotic cells and that
ing sites with low occupancy usually (HNRNP). The core protein components of heterogeneous constitute core components of all subunits of the
outnumber highly occupied binding sites. nuclear ribonucleoprotein particles that associate spliceosome.
Importantly, highly occupied binding with all nascent transcripts. They are involved in diverse
sites appear as clusters of reads when the aspects of post-transcriptional regulation. Small nucleolar RNAs
(snoRNAs). A class of small non-coding
CLIP library is of sufficient complexity. k-mers RNAs that are involved in guiding chemical
Approaches for identifying such read clusters Nucleic acid sequences with a number of nucleotides modifications of other RNAs, such as ribosomal
involve the analysis of replicate experiments of length k. or transfer RNAs.

NATURE REVIEWS | GENETICS VOLUME 13 | FEBRUARY 2012 | 81

2012 Macmillan Publishers Limited. All rights reserved


PROGRESS

a Identication of binding sites


Mapped CLIP reads
Reference sequence

High-occupancy binding site Low-occupancy binding sites

PAR-CLIP HITS-CLIP iCLIP


C
C

Crosslink nucleotide Crosslink nucleotide Crosslink nucleotide


(U-to-C transitions) (deletion sites) (cDNA truncations)

b Normalization to control for transcript abundance


occupancy (site A) > occupancy (site B)
Site A Site B
Normalize using 1 2 Normalize using
CLIP data RNA-seq data
Mapped CLIP reads

Low expression High expression


RNA-seq

Reference sequence

Gene annotation
Gene A Gene B

Figure 3 | Identification of binding sites and normalization. a | High- CLIP (iCLIP), the crosslink nucleotide is located one nucleotide upstream of
affinity binding sites appear as clusters of ultraviolet crosslinking and immu- the truncation sites. b | A schematic descriptionNature Reviews
of different | Genetics
normalization
noprecipitation (CLIP) reads. In photoactivatable ribonucleoside-enhanced strategies to correct for transcript abundance is shown. Normalization can
CLIP (PAR-CLIP), the crosslink nucleotide can be identified through UtoC be carried out (step 1) based on the overall protein binding within the
transitions, and in high-throughput sequencing of RNA isolated by CLIP transcript or (step 2) by incorporating external information on transcript
(HITS-CLIP) through deletion sites. In individual nucleotide resolution abundance using methods such as RNA sequencing (RNA-seq).

quantification of RBP binding to individual not efficiently quantified by standard developed: for example, to model combina-
RNA sites. It is also clear that read counts RNA-seq techniques. In addition, bioin- torial RNA binding of multiple RBPs. These
depend on the expression level of the cor- formatic approaches need to be developed advances will take us closer to the goal of
responding transcript. Therefore, normali- to account for the effects of local sequence obtaining a complete picture of the diverse
zation of CLIP data will be required before environment on the efficiency of protein proteinRNA complexes in thecell.
binding sites can be compared across the RNA crosslinking. First efforts in this direc-
Julian Knig and Jernej Ule are at the Medical
full transcriptome (FIG.3b). This could be tion have been made22,23, but more analyses Research Council Laboratory of Molecular Biology,
achieved, for example, by normalizing to are needed to fully understand the sequence Hills Road, Cambridge CB2 0QH, UK.
the average CLIP count within the tran- biases at the crosslink sites that have been Kathi Zarnack and Nicholas M.Luscombe are at the
script or by using expression information identified by the different CLIP protocols. European Molecular Biology Laboratorys European
obtained from RNA-seq experiments. In summary, the time is ripe to take CLIP Bioinformatics Institute (EMBLEBI), Wellcome Trust
Using RNA-seq has proved to be useful from a qualitative assay to a quantitative tool. Genome Campus, Cambridge CB10 1SD, UK.

for analyses of ribosome profiling data28, A potential advance in the near future would Nicholas M.Luscombe is also at the Okinawa Institute
for Science and Technology Graduate University,
and was also recommended by a recent be the combination of CLIP with single-
19191 Tancha, Onna-son, Kunigami-gun,
study comparing several CLIP normaliza- molecule RNA sequencing 45, which could Okinawa 9040495, Japan.
tion strategies22. However, normalization monitor stalling at the crosslink nucleotide
Correspondence to J.U.
to total CLIP counts within transcripts in real time. Finally, in parallel with the e-mail: jule@mrc-lmb.cam.ac.uk
might be more applicable to nuclear RBPs experimental advances, sophisticated com- doi:10.1038/nrg3141
that bind pre-mRNAs, because these are putational analysis methods will need to be Corrected online 31 January 2012

82 | FEBRUARY 2012 | VOLUME 13 www.nature.com/reviews/genetics

2012 Macmillan Publishers Limited. All rights reserved


PROGRESS

1. Moore, M.J. From birth to death: the complex lives of 19. Leung, A.K. etal. Genome-wide identification of Ago2 37. Corcoran, D.L. etal. PARalyzer: Definition of RNA
eukaryotic mRNAs. Science 309, 15141518 (2005). binding sites from mouse embryonic stem cells with binding sites from PAR-CLIP short-read sequence
2. Keene, J.D. RNA regulons: coordination of post- and without mature microRNAs. Nature Struct. Mol. data. Genome Biol. 12, R79 (2011).
transcriptional events. Nature Rev. Genet. 8, Biol. 18, 237244 (2011). 38. Yang, J.H. etal. starBase: a database for exploring
533543 (2007). 20. Kudla, G., Granneman, S., Hahn, D., Beggs, J.D. & microRNA-mRNA interaction maps from Argonaute
3. Trifillis, P., Day, N. & Kiledjian, M. Finding the right Tollervey, D. Cross-linking, ligation, and sequencing CLIPseq and Degradome-seq data. Nucleic Acids Res.
RNA: identification of cellular mRNA substrates for of hybrids reveals RNARNA interactions in yeast. 39, D202D209 (2011).
RNA-binding proteins. RNA 5, 10711082 (1999). Proc. Natl Acad. Sci. USA 108, 1001010015 (2011). 39. Bailey, T.L. etal. MEME SUITE: tools for motif
4. Brooks, S.A. & Rigby, W.F. Characterization of the 21. Knig, J. etal. iCLIP reveals the function of hnRNP discovery and searching. Nucleic Acids Res. 37,
mRNA ligands bound by the RNA binding protein particles in splicing at individual nucleotide resolution. W202W208 (2009).
hnRNP A2 utilizing a novel invivo technique. Nature Struct. Mol. Biol. 17, 909915 (2010). 40. Siddharthan, R., Siggia, E.D. & van Nimwegen, E.
Nucleic Acids Res. 28, e49 (2000). 22. Kishore, S. etal. A quantitative analysis of CLIP PhyloGibbs: a Gibbs sampling motif finder that
5. Tenenbaum, S.A., Carson, C.C., Lager, P.J. & methods for identifying binding sites of RNA-binding incorporates phylogeny. PLoS Comput. Biol. 1, e67
Keene, J.D. Identifying mRNA subsets in messenger proteins. Nature Methods 8, 559564 (2011). (2005).
ribonucleoprotein complexes by using cDNA arrays. 23. Zhang, C. & Darnell, R.B. Mapping invivo proteinRNA 41. Ule, J. etal. An RNA map predicting NOVA-dependent
Proc. Natl Acad. Sci. 97, 1408514090 (2000). interactions at single-nucleotide resolution from HITS- splicing regulation. Nature 444, 580586 (2006).
6. Mili, S. & Steitz, J.A. Evidence for reassociation of CLIP data. Nature Biotechnol. 29, 607614 (2011). 42. Witten, J.T. & Ule, J. Understanding splicing
RNA-binding proteins after cell lysis: implications for 24. Urlaub, H., Hartmuth, K. & Lhrmann, R. regulation through RNA splicing maps. Trends Genet.
the interpretation of immunoprecipitation analyses. A two-tracked approach to analyze RNA-protein 27, 8997 (2011).
RNA 10, 16921694 (2004). crosslinking sites in native, nonlabeled small nuclear 43. Lebedeva, S. etal. Transcriptome-wide analysis of
7. Ule, J. etal. CLIP identifies NOVA-regulated RNA ribonucleoprotein particles. Methods 26, 170181 regulatory interactions of the RNA-binding protein
networks in the brain. Science 302, 12121215 (2002). HuR. Mol. Cell 43, 340352 (2011).
(2003). 25. Kivioja, T. etal. Counting absolute numbers of 44. Mukherjee, N. etal. Integrative regulatory mapping
8. Ule, J., Jensen, K., Mele, A. & Darnell, R.B. CLIP: A molecules using unique molecular identifiers. Nature indicates that the RNA-binding protein HuR couples
method for identifying proteinRNA interaction sites Methods 20Nov 2011 (doi:10.1038/nmeth.1778). pre-mRNA processing and mRNA stability. Mol. Cell
in living cells. Methods 37, 376386 (2005). 26. Hafner, M. etal. RNAligasedependent biases in 43, 327339 (2011).
9. Darnell, R.B. HITS-CLIP: panoramic views of protein- miRNA representation in deep-sequenced small RNA 45. Schadt, E.E., Turner, S. & Kasarskis, A. A window into
RNA regulation in living cells. Wiley Interdiscip. Rev. cDNA libraries. RNA 17, 16971712 (2011). third-generation sequencing. Hum. Mol. Genet. 19,
RNA 1, 266286 (2010). 27. Ingolia, N.T., Ghaemmaghami, S., Newman, J.R. & R227R240 (2010).
10. Wang, Z. etal. iCLIP predicts the dual splicing effects Weissman, J.S. Genome-wide analysis invivo of
of TIA-RNA interactions. PLoS Biol. 8, e1000530 translation with nucleotide resolution using ribosome Acknowledgments
(2010). profiling. Science 324, 218223 (2009). This work was supported by the Medical Research Council,
11. Granneman, S., Kudla, G., Petfalski, E. & Tollervey, D. 28. Guo, H., Ingolia, N.T., Weissman, J.S. & Bartel, D.P. the European Molecular Biology Laboratory (grant number
Identification of protein binding sites on U3 snoRNA Mammalian microRNAs predominantly act to decrease U 10 51 8 5 8 5 8 ) , t h e E u r o p e a n R e s e a r c h C o u n c i l
and pre-rRNA by UV cross-linking and high-throughput target mRNA levels. Nature 466, 835840 (2010). (206726CLIP) and by a Human Frontiers Science Program
analysis of cDNAs. Proc. Natl Acad. Sci. USA 106, 29. Fuda, N.J., Ardehali, M.B. & Lis, J.T. Long-Term fellowship and an EMBL EIPOD fellowship to J.K.
96139618 (2009). Defining mechanisms that regulate RNA polymerase II and K.Z., respectively.
12. Hafner, M. etal. Transcriptome-wide identification of transcription invivo. Nature 461, 186192 (2009).
RNA-binding protein and microRNA target sites by 30. Churchman, L.S. & Weissman, J.S. Nascent transcript Competing interests statement
PAR-CLIP. Cell 141, 129141 (2010). sequencing visualizes transcription at nucleotide The authors declare no competing financial interests.
13. Guil, S. & Caceres, J.F. The multifunctional RNA- resolution. Nature 469, 368373 (2011).
binding protein hnRNP A1 is required for processing 31. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L.
of miR18a. Nature Struct. Mol. Biol. 14, 591596 Ultrafast and memory-efficient alignment of short FURTHER INFORMATION
(2007). DNA sequences to the human genome. Genome Biol. Nicholas M.Luscombes homepage: http://www.ebi.ac.
14. Knig, J. etal. The fungal RNA-binding protein Rrm4 10, R25 (2009). uk/~luscombe
mediates long-distance transport of ubi1 and rho3 32. Li, H. & Durbin, R. Fast and accurate short read Jernej Ules homepage: http://www2.mrc-lmb.cam.ac.uk/
mRNAs. EMBO J. 28, 18551866 (2009). alignment with BurrowsWheeler transform. groups/jule
15. Licatalosi, D.D. etal. HITS-CLIP yields genome-wide Bioinformatics 25, 17541760 (2009). CLIP forum: http://megazord.rockefeller.edu/public/forum
insights into brain alternative RNA processing. Nature 33. Trapnell, C., Pachter, L. & Salzberg, S.L. CLIPZ: http://www.clipz.unibas.ch
456, 464469 (2008). TopHat: discovering splice junctions with RNA-seq. GSNAP: http://share.gene.com/gmap
16. Yeo, G.W. etal. An RNA code for the FOX2 splicing Bioinformatics 25, 11051111 (2009). iCLIP questions and answers: http://goo.gl/4tSci
regulator revealed by mapping RNAprotein 34. Hoffmann, S. etal. Fast mapping of short sequences iCount pipeline: http://icount.biolab.si
interactions in stem cells. Nature Struct. Mol. Biol. 16, with mismatches, insertions and deletions using index Novoalign: http://www.novocraft.com/main/index.php
130137 (2009). structures. PLoS Comput. Biol. 5, e1000502 (2009). Segemehl: http://www.bioinf.uni-leipzig.de/Software/
17. Chi, S.W., Zang, J.B., Mele, A. & Darnell, R.B. 35. Wu, T.D. & Nacu, S. Fast and SNP-tolerant detection segemehl
Argonaute HITS-CLIP decodes microRNAmRNA of complex variants and splicing in short reads. starBase: http://starbase.sysu.edu.cn
interaction maps. Nature 460, 479486 (2009). Bioinformatics 26, 873881 (2010). Uwe Ohlers Research Group PARalyzer (PAR-CLIP data
18. Zisoulis, D.G. etal. Comprehensive discovery of 36. Khorshid, M., Rodak, C. & Zavolan, M. CLIPZ: a analyzer): http://www.genome.duke.edu/labs/ohler/
endogenous Argonaute binding sites in database and analysis environment for experimentally research/PARalyzer
Caenorhabditis elegans. Nature Struct. Mol. Biol. 17, determined binding sites of RNA-binding proteins. ALL LINKS ARE ACTIVE IN THE ONLINE PDF
173179 (2010). Nucleic Acids Res. 39, D245D252 (2011).

NATURE REVIEWS | GENETICS VOLUME 13 | FEBRUARY 2012 | 83

2012 Macmillan Publishers Limited. All rights reserved

Вам также может понравиться