Академический Документы
Профессиональный Документы
Культура Документы
Illumina HiSeq X
1.8 Tbp
(3 billion reads) in ~3 days
(as of 11/6/2014)
November 2014
The Genome Access Course
Whole Genome Shotgun Sequencing
Randomly
Fragment
Genomic DNA
Sequence
Fragments
...ATCCGTAAATGGGCTGATACTACTAATGCCAAACTGTACTAGTCCTG...
Contiguous
Sequence
(Contig)
November 2014
The Genome Access Course
Sequence
cDNA
Fragments
• Genome sequencing
– Novel genomes
– Resequencing
• Transcriptome sequencing (RNA-seq)
– Characterize transcripts with or without reference genome
• Typical length
• Short (microRNAs, …)
– Find differentially expressed transcripts
• Other
– Methyl-seq
– ChIP-seq
November 2014
The Genome Access Course
November 2014
The Genome Access Course
Illumina Sequencing
Sequencing by
DNA Sample Synthesis
Construct
Library
Cluster Generation
in Flow Cell
November 2014
The Genome Access Course
200-500 bp
Mate-Pair Reads - 5’ and 3’
2-5 kbp
November 2014
The Genome Access Course
November 2014
The Genome Access Course
Quality
(ASCII character for
each base)
Files so big
that they
break them
up in 40
million reads
per file
November 2014
The Genome Access Course
FASTQ FASTQ
(_R1.txt) (_R2.txt)
Align Reads to Genome
November 2014
The Genome Access Course
Unbiased Reads
Biased Reads
November 2014
The Genome Access Course
November 2014
The Genome Access Course
November 2014
The Genome Access Course
Trimmomatic
November 2014
The Genome Access Course
FASTQ FASTQ
(_R1.txt) (_R2.txt)
Align Reads to Genome
November 2014
The Genome Access Course
SAM File
samtools
PileUp
File BAM File
Call
Variants
…
Pileup output file
chr1 272 T 24 ,.$.....,,.,.,...,,,.,..^+. <<<+;<<<<<<<<<<<=<;<;7<&
chr1 273 T 23 ,.....,,.,.,...,,,.,..A <<<;<<<<<<<<<3<=<<<;<<+
chr1 274 T 23 ,.$....,,.,.,...,,,.,... 7<7;<;<<<<<<<<<=<;<;<<6
chr1 275 A 23 ,$....,,.,.,...,,,.,...^l. <+;9*<<<<<<<<<=<<:;<<<<
chr1 276 G 22 TTTTTTTTTTTTTTTTTTTTTTT 33;+<<7=7<<7<&<<1;<<6<
chr1 277 T 22 ....,,.,.,.C.,,,.,..G. +7<;<<<<<<<&<=<<:;<<&<
chr1 278 G 23 ....,,.,.,...,,,.,....^k. %38*<<;<7<<7<=<<<;<<<<<
chr1 279 C 23 A..T,,.,.,...,,,.,..... ;75&<<<<<<<<<=<<<9<<:<<
November 2014
The Genome Access Course
November 2014
The Genome Access Course
November 2014
The Genome Access Course
November 2014
The Genome Access Course
November 2014
The Genome Access Course
November 2014
The Genome Access Course
November 2014
The Genome Access Course
Galaxy (http://main.g2.bx.psu.edu)
November 2014
The Genome Access Course
SRA Formatted
Files FASTQ Files
Automatically
Forward FASTQ
SRA ToolKit Files to Galaxy
FASTQ Files
November 2014
The Genome Access Course
NCBI BioProject
November 2014
The Genome Access Course
November 2014
The Genome Access Course
FASTQ Files
November 2014
The Genome Access Course
November 2014
The Genome Access Course
November 2014