Вы находитесь на странице: 1из 35


(Adapted from Fullwood M. 2009)

Promises: for huge improvements in human & animal healthcare and production Contemporary Study: Genomes are read out as linear sequences In the cell- complex interactions and mechanisms to transduce DNA information into biological function Conventional DNA sequencing -explore genetic elements and structure Practical Challenges: high sequencing costs and low throughputs limited in-depth analysis of genomic elements

Next Generation Sequencers

Next (or 3rd) generation sequencers came onto the scene in the early 2000s General characteristics : Amplification of genetic material by PCR Ligation of amplified material to a solid surface Sequence of the target genetic material is determined using Sequence-by-Synthesis (using labelled nucleotides or pyrosequencing for detection) or Sequence by ligation Sequencing done in a massively parallel fashion and sequence information is captured by a computer

Next Generation Sequencing

DNA is fragmented Adaptors ligated to fragments Several possible protocols yield array of PCR colonies

Emulsion PCR
Bridge PCR




fluorescently tagged nucleotides Cyclic readout by imaging the array

(Shendure, 2008)

Next-Generation Sequencing Workflow

Illumina, Roche 454 or ABI SOLiD?

Applications of Next-Generation Sequencing

Next Generation Sequencing Technology

Transforming the field of genomic science Read DNA templates in a highly parallel manner to generate massive amounts of sequencing data Read length for each DNA template- short compared to Sanger capillary sequencing instruments Massively parallel and short read strategy of DNA sequencing
opens new ways for interrogating human and animal genomes short read lengths- limitations in infering biological applications

Focus on overcoming the limitation of short tags for genome-wide analysis

Strategies for overcoming limitations of NGSTs

The paired-end tag (PET) sequencing is one such strategy for improving DNA sequencing efficiency and enabling biological functions

Sequencing-based methods for understanding genetic elements in genomes

(Fullwood et al., 2009)

Short Tag Sequencing

Sequencing only a short stretch of DNA information, typically less than 100 bp. The NGS platforms generate only short tag sequences, typically 1650 bp. Ideally in the future, tags would be 50100 bp.

EST (expressed sequence tag)

ESTs are short DNA sequences corresponding to a fragment of a complimentary DNA (cDNA) molecule and which may be expressed in a cell at a particular given time A short DNA sequence (a tag) from a cDNA clone (hence it is expressed)
5 UTR mRNA 5 Protein coding 3 UTR AAAA 3
Duplex inserts in cDNA clones

ESTs are sequences from each end of the cDNA inserts Unigene cluster is an group of overlapping ESTs, likely from one gene

Sequence-tagged site (STS)

Short (200 to 500 base pair) DNA sequence that has a single occurrence in the genome and whose location and base sequence are known Easily detected by PCR Useful for constructing genetic and physical maps from sequence data When STS loci contain genetic polymorphisms (e.g. SSLPs, SNPs), they become valuable genetic markers as microsatellites (SSRs, STMS or SSRPs), SCARs, CAPs, and ISSRs- loci can be used to distinguish individuals Used in shotgun sequencing- to aid sequence assembly Very helpful for detecting microdeletions in some genes

SAGE (Serial Analysis of Gene Expression)

A method for preparation of short tags (13 bp), mostly for cDNA analysis to profile transcriptomes

Variants of SAGE include LongSAGE (20 bp) SuperSAGE (25 bp)

MPSS (massive parallel signature sequencing)

A short-tag approach similar to SAGE but using a ligationbased sequencing method to profile transcriptomes Means of determining abundance of RNA species

Unique tags added to cDNAs Tags hybridized to oligonucleotides on microbeads

CAGE (Cap-associated analysis of gene expression)

For the identification of transcription start sites (TSSs) and their promoters, 5 endspecific signature sequences are required for higher annotations of expression profiles. A method using the Captrapper method to retain 5 intact transcripts and a SAGE-like approach to extract 5 tags (20 bp) Variants include 5 LongSAGE

Schematic procedure of the CAGE protocol

(Shiraki et al., 2003)

Paired End vs. Unpaired Reads

Millions of reads are generated. Repetitive regions within the genome cause the reads to be mapped to multiple locations. Polymorphism in a read can cause it to be mapped to a wrong location. Discarding ambiguous reads can reduce coverage

Paired End Ditag Method

This method covalently links the 5 tag and 3 tag of a DNA fragment into a ditag structure for sequencing analysis, thus combining the benefits of the cost-effective SAGE and the linkage information from paired-end sequencing

Sequencing-based methods for understanding genetic elements in genomes

(Fullwood et al., 2009)

PET Converged from 2 technological concepts

Paired End Sequencing Short tag sequencing

(Hong, 1981)

Schematic view of PET methodology

Mme I (type IIS RE)- 18/20 bp ds EcoP15I (type III RE)- 25/27 bp ds Recircularization of tag-vector-tag tag-linker-tag

Tags of 106 bp size unbalanced with tags under 15 bp

(Fullwood et al., 2009)

Cloning Method
Benefit: Preserves original flcDNA/ ChIP DNA fragments as library clones Limitation: - Construction process long (2-4 weeks) - Technically challenging

Cloning-free Method

Straightforward Avoid biases related to cloning

Applications of PET

(Fullwood et al., 2009)

RNA-PET for Transcriptome Studies Demarcation of 1st and Last Exons

Define the TSSs and PASs Connectivity between two sites

Limitation: no information of internal exons Unique Feature: Ability to detect unconventional fusion genes Early version of RNA-PET was cloning based method i.e. GIS-PET analysis

RNA-PET for Transcriptome Studies Early version of RNA-PET was cloning based method i.e. GIS (Gene Identification signature)-PET analysis It has revealed its ability to high-throughput identification of fusion genes Now, Cloning-free methods Alternatively, perform shotgun paired-end sequencing of cDNA templates

ChIP-PET :identifying regulatory and epigenetic elements

Sequencing based approach to to characterize ChIP-enriched DNA fragments ChIP-PET provides linked 5 and 3 sequences for ChIP-enriched DNA molecules, which are mapped to the reference genome such that the complete ChIP DNA fragment can be inferred from the genome sequence in between the 5 and 3 tags, and the enriched TFBS can be determined

ChIA-PET : identifying chromatin interactions Introduce a linker sequence in the junction of two DNA fragments during nuclear proximity ligation to build connectivity of DNA fragments that are tethered together by protein factors All linker connected ligation products can be extracted for the taglinkertag constructs that can be analyzed by ultra-high-throughput PET sequencing

When mapped to reference genome, the ChIAPET sequences are read out to detect the relationship of two DNA fragments in chromatin interactions captured by chromatin proximity ligation ChIA-PET has the potential to be an unbiased genome-wide approach for de novo detection of chromatin interactions

DNA-PET: Genome Structure Analysis Ideal method for sequencing and assembling genomes as well as studying genome structural variations First demonstrated in resequencing an evolved E. coli genome using the polony sequencing-by ligation method Applied to map cancer genome rearrangements DNA-PET can become a vital part in the concept of personal genomics for personal medicine

Detection of structural variations: insertions, deletions, duplications, inversions, translocations

Future of PET Technology Limitations: - require more starting material - involve more molecular manipulations - certain DNA portions not recovered Versatile readily adapt to new sequencing technology Couple methods for asking biological questions with NGS Future is bright.