Академический Документы
Профессиональный Документы
Культура Документы
Rickard Sandberg
Assistant Professor
Ludwig Institute for Cancer Research
Department of Cell and Molecular Biology
Karolinska Institutet
muscle cells
kidney cells
hair cells
hippocampal neuron
muscle cells
kidney cells
hair cells
hippocampal neuron
zygote
blastocyst
muscle cells
kidney cells
hair cells
hippocampal neuron
zygote
blastocyst
muscle cells
kidney cells
hair cells
hippocampal neuron
mRNA-seq protocol
Isolate polyA+ RNA
mRNA-seq protocol
Isolate polyA+ RNA
mRNA-seq protocol
Isolate polyA+ RNA
polyA+ RNAs
rRNA- RNAs
short RNAs (e.g. miRNAs)
Ribosome footprint
sequencing
! GRO-Seq (Global Run On
sequencing)
! CLIP-Seq (RNA-protein
interactions)
!
!
!
!
! non-RNA applications:
ChIP-Seq, DNAse
hypersensitive sites,...
Wang et al. 2009 Nat Rev Gen
Wednesday, March 16, 2011
log10(reads)
gene SLC25A3
Testes
Liver
Skeletal Muscle
Heart
2
0
2
0
2
0
2
0
AK074759
BC011574
AK092689
3B
3B
log2(intensity)
3A
10
Testes
Liver
Skeletal Muscle
Heart
0
10
0
10
0
10
0
3A
log2(probe intensity)
10
0
10
0
10
$
#
"
!
!(%" !(%!
)*+, -,./+, -,.0/10,23
0
10
0
3B
%' %&("
%&
7654
log10(reads)
Wang*, Sandberg* et al. 2008 Nature Mortazavi et al. 2008 Nat Methods
Strand-specific RNA-Seq
Many protocols exists:
RNA ligation preserves strand information (Illumina short RNA kit)
Incorporation of dUTP in second strand synthesis, uracil-Nglycosylase
(Parkhomchuk et al. NAR 2009)
Digital RNA counting methods
Sequencing
ACGCG...
TCGAG...
AGGTA...
CCGTG...
CTGCG...
Sequencing
ACGCG...
TCGAG...
AGGTA...
CCGTG...
CTGCG...
Biol. Replicates
MAQC samples
UHR (cell line mix)
Brain
Spiked-in RNAs
<10M reads
~20M reads
Wednesday, March 16, 2011
ES
TS
XEN
EpiSC
Nanog
6525
20
263
Cdx2
124
6256
Sox17
11
9814
99
Sox3
151
1234
796
Shh
Ihh
12
107
17
Dhh
10
212
575
80
!"#$%&'($)*+,,
"!!!
#!!!!
#"!!!
$!!!!
$"!!!
"!!!
#!!!!
#"!!!
$!!!!
$"!!!
!"#$%&'($)*+,,
"!!!
#!!!!
#"!!!
$!!!!
$"!!!
"!!!
#!!!!
#"!!!
$!!!!
$"!!!
0.05 RPKM
1 RPKM
Brain
Brain
Figure 4. Ramskld et al
Muscle
Muscle
2000
500
1000
1500
2000
Length in nucleotides
2500
Liver
3000
Position on chromosome 19
7933000
1.0
Fraction of reads
10
r=0.80 1500
500
1000
Length inn=465
nucleotides
101
Liver
10-1
10-2
0.5
0.0
10-3
7931000
# reads
brain
UHR
0
60
140
10-4
165
34
-3
-2
10-4 value10
10-1
10
101
102 exact test: p=10-27
Fisher's
Solexa
/ TaqMan 10
value
TaqMan (expression relative to POLR2A)
TandemUTR genes
Correlation
Correlation
Brain UHR p-value UHR p-value
Full transcript
Without 3'UTR
Without 5'UTR
Coding region
Internal exons
Constitutive exons
Truncated at first
predicted poly(A) site
0.78
0.79
0.70
0.81
0.81
0.0007
0.77
0.79
0.04
0.81
0.81
0.001
0.81
0.81
0.9
0.79
0.78
0.6
0.79
0.80
0.03
Differential expression
b
1.0
300
0.8
250
Suppleme
200
150
100
0.6
0.4
0.2
50
10
-4
10
-3
-2
-1
10
10
10
10
10
Reads per kilobase and million mappable reads
10
The density of reads in exons and introns of Ensembl genes with one
annotated isoform, for 10 human tissue samples. The density is an average
across
all introns or exons of a gene.
Wednesday, March
16, 2011
10
0.0
1
10
10
Number of most
b
1.0
300
80%
0.8
92%
Portion of mRNA pool
250
Suppleme
200
150
100
0.6
0.4
0.2
50
10
-4
10
-3
-2
-1
10
10
10
10
10
Reads per kilobase and million mappable reads
10
The density of reads in exons and introns of Ensembl genes with one
annotated isoform, for 10 human tissue samples. The density is an average
across
all introns or exons of a gene.
Wednesday, March
16, 2011
10
0.0
1
10
10
Number of most
b
1.0
300
11-13,000 genes
per tissue
80%
250
0.8
92%
absolute expression
levels
Suppleme
200
150
100
0.6
0.4
0.2
50
10
-4
10
-3
-2
-1
10
10
10
10
10
Reads per kilobase and million mappable reads
10
The density of reads in exons and introns of Ensembl genes with one
annotated isoform, for 10 human tissue samples. The density is an average
across
all introns or exons of a gene.
Wednesday, March
16, 2011
10
0.0
1
10
10
Number of most
G"+,%)76N7B%!%&7'!7*$$
40...
G"+,%)76N7;%(%#(%;7B%!%&
KLQJ
"
4P...
41...
4R...
42...
44...
4....
.
1
0
3
4.
J'$$'6!&76N7)%*;&7"&%;
42
41...
42...
4....
3...
0...
1...
2...
7.
4.
4P
G"+,%)76N7&*+C$%&
2.
The number of shared genes are sensitive to the expression level used for detection,
but 5-10 times higher than microarray and SAGE based estimates
%-()*#%$$"$*)
Wednesday, March 16, 2011
%-()*#%$$"$*)
+%+,)*!%
'!()*#%$$"$*)
!"#$%"&
./.
./2
./1
./0
./3
5)*#('6!7!6!8",'9"'(6"&
4/.
Ramskold D., et al. PLoS Comp Biol 2009
Brain
Colon
Fat
Liver
Breast
Mouse tissues
Brain
Liver
Muscle
80%
80%
Colon
Lymph node
40%
Fat
Breast
Testes
Mouse brain
Human brain
20%
0%
Human muscle
Heart
Mouse liver
Mouse muscle
Human liver
60%
10
100
1000
Number of most expressed genes
P ortion of mR NA pool
100%
P ortion of mR NA pool
100%
60%
40%
20%
10000
0%
10
Num
d
Brain
5'UTR
3'UTR
0.0
b
0.2
0.4
0.6
0.8
Read coverage relative to coding region
1.0
Liver 5'UTR
Muscle
Extracellular
Plasma membrane
All genes
5'UTR
CDS
3'UTR
Brain
Relative number of
tissue-specific genes
Muscle
0 20
0
c
500
1000
1500
Length in nucleotides
2000
Tissue-specific
5'UTR
CDS
3'UTR
Brain
Liver
Testese
Liver
5'UTR
Liver
Skeletal Muscl
Heart
d
Brain
5'UTR
3'UTR
0.0
b
0.2
0.4
0.6
0.8
Read coverage relative to coding region
1.0
Muscle
Extracellular
AK074759
BC011574
AK092689
Plasma membrane
All genes
5'UTR
CDS
3'UTR
Brain
Relative number of
tissue-specific genes
Muscle
0 20
0
c
500
1000
1500
Length in nucleotides
2000
Tissue-specific
5'UTR
CDS
3'UTR
Brain
Liver
d
Brain
5'UTR
3'UTR
5'UTR
0.0
0.2
0.4
0.6
0.8
Read coverage relative to coding region
3'UTR
1.0
Testese
Liver
5'UTR
Liver
Skeletal Muscl
Heart
Muscle
d
Brain
Extracellular
AK074759
BC011574
AK092689
Muscle
Extracellula
Plasma
1.0 membrane
0.2
0.4
0.6
0.8
Expression weighted
UTR length
Read coverage
relative toestimates
coding region
5'UTR
CDS
3'UTR
Relative number of
0 20
Plasma membr
b All genes
Brain
tissue-specific genes
All genes
0.0
5'UTR
CDS
3'UTR
Muscle
Liver
0
c
500
Tissue-specific
0
5'UTR
c
Wednesday, March 16, 2011
1000
1500
Length in nucleotides
500
Tissue-specific
Muscle
2000
1000
1500
CDSLength in nucleotides 3'UTR
Brain
Relative numbe
f g
tissue-specific
Liver
2000
Brain
RNA-Sequencing:
Transcriptome Reconstruction
RNA-Sequencing:
Transcriptome Reconstruction
Transcript reconstruction
Transcript reconstruction
Alternative Promoters
Extens.
Core
Alternative Promoters
Alternative Splice Sites
Extens.
Core
Alternative Promoters
Alternative Splice Sites
Mutually Exclusive Exons
MXE1
MXE2
Extens.
Core
Alternative Promoters
Alternative Splice Sites
Mutually Exclusive Exons
Skipped Exons
MXE1
MXE2
5 Exon
SE
3 Exon
pA
Extens.
Core
Alternative Promoters
Alternative Splice Sites
Mutually Exclusive Exons
Skipped Exons
Alternative Polyadenylation
MXE1
MXE2
5 Exon
SE
3 Exon
pA
Soluble
Inhibition of apoptosis
7
5
Soluble
Inhibition of apoptosis
7 Membrane-bound
Apoptosis
7
5
Soluble
Inhibition of apoptosis
7 Membrane-bound
Apoptosis
7
5
Soluble
Inhibition of apoptosis
7 Membrane-bound
Apoptosis
2v
7
5
Soluble
Inhibition of apoptosis
7 Membrane-bound
Apoptosis
2v
Low frequencies
7
5
Soluble
Inhibition of apoptosis
7 Membrane-bound
Apoptosis
2v
Low frequencies
3
2
High frequencies
Ramanathan et al. 1999
D
ORJUHDGV
#
Ability to detect alternative isoforms will depend on read
coverage
"
Multi-exon genes
Isoform 2
Isoform 1
Isoform 2
$OWHUQDWLYHO\VSOLFHGJHQHV
6LJPRLGILWWRREVHUYHG
$OWHUQDWLYHO\VSOLFHG
JHQHVVXEVDPSOHG
)UDFWLRQRI*HQHV
)UDFWLRQRI*HQHV
Isoform 1
1RRIUHDGVORJ
Wang*, Sandberg* et al. 2008 Nature
Wednesday, March 16, 2011
7HVWHV
/LYHU
6NHO0XVFOH
+HDUW
ES
)UDFWLRQRI*HQHVZLWK$6WRSELQ
(VWLPDWHG)UDFWLRQRI*HQHVZLWK$6
)UDFWLRQRI*HQHV
FKU
0LQLPXP0LQRU,VRIRUP)UDFWLRQ
Isoform 1
Isoform 2
Fra
7HVWHV
/LYHU
6NHO0XVFOH
+HDUW
0.2
0.0
Extent of Alternative Splicing:
controls
0
1
2
3
4
5
$.
$.
ES
FKU
)UDFWLRQRI*HQHVZLWK$6WRSELQ
(VWLPDWHG)UDFWLRQRI*HQHVZLWK$6
1.0
Fraction of genes
)UDFWLRQRI*HQHV
0LQLPXP0LQRU,VRIRUP)UDFWLRQ
Isoform 1
Isoform 2
0.8
0.6
0.4
0.2
0.0
2
3
4
No. of reads (log10)
6 (
6(
0;(
S.6 H
6ZLWFKVFRUH
0HDQSKDVW&RQVVFRUH
from Expression
to Regulation, RNA-maps
G
6(VZLWFKVFRUH
KLJK
PHGLXP
ORZ
;(
S H
3RVLWLRQUHODWLYHWRVSOLFHMXQFWLRQ
H
S
D.,
Licatalosi
et al. Nature 2008
8*&$8*)R[
7LVVXHELDVHGLQFOXVLRQ
DGLSRVH
EUDLQ
EUHDVW
FHUHEHOOXP
FRORQ
7LVVXHELDVHGH[FOXVLRQ
KHDUW
ORJSYDOXH
OLYHU
6ZLWFKVFRUHELQ
O\PSKQRGH
VNHOPXVFOH
WHVWHV
FRQVWLWXWLYHH[RQ
VNLSSHGH[RQ
Conclusions
LICR, Stockholm
Jonas Muhr
Questions?