Академический Документы
Профессиональный Документы
Культура Документы
NCBI Databases
January 2008
The National Center for
Biotechnology Information
Total 116,236,515
What is GenBank?
NCBI’s Primary Sequence
NIG •Submissions
•Updates
SRS
getentry EMBL
GenBank: NCBI’s Primary Sequence Database
Record
SOURCE Prunus persica (peach)
ORGANISM Prunus persica
Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons;
rosids; eurosids I; Rosales; Rosaceae; Amygdaloideae; Prunus.
REFERENCE 1 (bases 1 to 2540)
AUTHORS Bassett,C.L., Artlip,T.S. and Callahan,A.M.
TITLE Characterization of the peach homologue of the ethylene receptor,
PpETR1, reveals some unusual features regarding transcript
processing Header
JOURNAL Planta 215 (4), 679-688 (2002)
PUBMED 12172852
REFERENCE 2 (bases 1 to 2540)
AUTHORS Bassett,C.B., Artlip,T.S. and Nickerson,M.L.
TITLE Direct Submission
JOURNAL Submitted (29-JAN-1999) Appalachian Fruit Research Station,
Accession
•Stable
ACCESSION U07418 •Reportable
•Universal
VERSION U07418.1 GI:466461
Version GI number
Tracks changes in sequence NCBI internal use
well annotated
• >600 Projects
• >600 Taxa
– 423 bacteria
– 186 eukaryotes
• 62 fungi
• 87 animals
• 5 flowering plants
Mammalian WGS
PIR 21,713
PRF 12,079
PDB 110,035
(PAT Division 920,869)
Total 18,971,426
Etc.
Protein Sequences from Structures
GA
AT
A C
CA
TGC
A
CCG
TTG TTGACA Updated
CTA
CGTGA
CG
AC
ACG
G
CG C
A TAT TA continually
A
GT
AGC TTGA
A
GC
C
ATTGTG
C GA
TA
TG
TAT
CG GA
G
TA
C
CAGCTACT
T C by NCBI
GA
A ATT GACA
A
ATTG TATAGCCG G
ATATAGCCG
AT TATAGCCG
T
TATAGCCG
TA
AT T
TA TT C
GA GenBank
AT UniGene
Updated ONLY
TACTTTCTT
GA G A
GA GA by submitters C TC A A
GA G
GA G
T
A ATCA C ATCATCT Algorithms
RefSeq: NCBI’s Derivative Sequence Database
ftp://ftp.ncbi.nih.gov/refseq/release/
Genomes: Two Paths
Curated mRNA
Curated Protein
Curated non-coding RNA
Predicted mRNA
Predicted Protein
Predicted non-coding RNA
Contig
WGS Supercontig
Two Paths to RefSeq
NM_000249 NM_116983
NT_022517
(36974983..37032341) NC_003075 Transcript
NC_000003
(37009983..37067341)
RefSeq
GenBank
Sequences
RefSeq Benefits
Other
WGS GenBank UniGene
Transcript
RefSeq
Contig
BAC
RefSeq
Transcript
NCBI Field Guide
Expressed Sequences
UniGene
GEO
NCBI Expressed Sequences
5’ EST hits
3’ EST hits
Chordates
UniGene
Fungi et al.
Invertebrates
Gene Catalog: X. tropicalis
Uncharacterized ESTs
Associating Sequences: Human Thrombin
4
For each protein chain,
2
locate SSEs (secondary
structure elements),
5 6
SH2
SH2
TyrKC
SH3
Cn3D
NCBI Field Guide
NCBI’s SNP Database