Вы находитесь на странице: 1из 30

Human Genome Project

Presented by : Vaishali Gade & Sandhya Singh

What is Genome?
All of the DNA found within each of the cells of an organism. Eukaryotic genome can be subdivided into their nuclear genome and their mitochondrial genome. GENOME A BLUE PRINT OF LIFE.

Human Genome and its complexity


It consist of 3 billion base pairs containing 50-100,000 genes arranged on 23 chromosomes.
3% coding region 97% non coding region a) Repeat sequences b) Promoters,transcriptional regulatory sequences

Human Genome Project


Analysis of human genome and mapping the exact location of every human gene was undertaken up by two centres. NCHGR within the National Institute of Health & the DOE in US CELERA GENOMICS.

Project Status
Since Human Genome is not completely sequenced we have a rough draft with many gaps which are yet to be filled. Due to large scale sequencing of Human Genome, they are kept in HTGS division of Genbank where the sequences undergo a maturation phase from 0 3(unfinished to finished). Genome without annotations.

Goals
To produce a single continous sequence for each of the 24 human chromosome To delineate exact location of every Human gene. Gene annotations using comparative genomics. Development of efficient algorithms for comparing large,Genome scale sequences.

Significance
Better understanding of how organisms evolved. Identification of unique & common segments between different species at the Genome level. Shift of research from single gene & its product to web of interaction among gene pathway & how outside influences affect the whole system. Revolutionize the medicine by curation of diseases at molecular level. Will synthetic life will ever be possible???

Genome Databases
SEQUENCE DATABASES Vast amount of data produced by HGP is stored in the High Throughput Genome Sequence (HTGS) division of international nucleotide database. NCBI EMBL DDBJ

Genome Database
GENOMIC VIEWER DATABASE Some sites offer a mixture of genomic viewers & web searchable dataset which allow analysis of human genome sequences without the need to run complex software locally. NCBI ENSEMBL UCSC GDB

Query System
RETRIEVAL PROGRAM These are designed to take advantage of connection that are captured by different data models. Entrez at NCBI Nucleotide division retreive sequence. Genome division On the fly display of certain region of large genomes. SRS at EBI

Query System
ENSEMBL :Joint project of EMBL-EBI & the Sanger Institute provides both sequences as well as On the fly display by chromosome number. UCSC : Gives On the fly display of a specified region in a vast Human Genome. Note:Loose Network of sites (NCBI, Ensembl, UCSC) will probably coalesce into a more coordinated network of sites offering informative web pages and resources.

Query System
GDB(Genome Database) is the official repository for genomic mapping data created by the Human Genome Project.It consist of three types of object from humans.Genomic segments,Maps and variations.Gdb provides a full featured query interface to its database.

Data are not information without interpretation, which is knowledge without understanding.

Gene Finding Strategies


Content based method Frequency of occurrence of codons periodicity of repeats composition complexity Site based methods presence or absence of a specific sequence,pattern.eg. Donor or acceptor,splice sites

Gene Finding Strategies


Comparative methods Based on sequence homology. Limitation 1)Most newly discovered genes dont have gene products that match anything in the protein database 2)Modular nature of proteins

Software Used In Gene Recognition


Many web based software are freely available Unix based Neural network Trained on some basic known sequences Graphical output GRAIL http://compbio.ornl.gov/tools/index.shtml GENSCAN http://genes.mit.edu/GENSCAN.html

Limitation Of Using Single Method


Sensitive to G+C content Learning process involved in neural networks Dependency on the type of input sequence Better perform most of method on single input sequence. GENEMACHINE MZEF,Genscan,Grail2, Fgenes etc

Sequence Comparison
Generally, sequence determines structure and structure determines function By studying sequence similarity, we hope to find co-relations between our sequence & other sequence with known structure and function This approach is often succesful, however many molecules have low sequence similarity, still share similar structure or functions

The omics Series


Genomics -Gene identification and characterization Transcriptomics -Expression profiles of mRNA Proteomics -functions and interactions of proteins Structural Genomics -Large scale structure determination Cellinomics -Metabolic pathways -Cell-cell interactions Pharmacogenomics

Fundamental Dogma
Although a few database already exist to distribute molecular information, the post genomic era will need many more to collect,manage and publish the coming flood of new findings.

Genome Analysis
Database searching Sequence alignment Protein classification Gene finding Functional Genomics Phylogenetic Inference

Computational Molecular Biology


Applying quantitative & computational approaches to problems in molecular biology Database sequence search Detection of homology Multiple alignment Identification of patterns Phylogenetic analysis

Computational Structural Biology


Applying quantitative & computational approaches to problems in structural biology Secondary structure prediction Tertiary structure prediction Transmembrane helix detection Drug Design

Human Genome Analysis


With the recent progress in rapid, genome scale sequencing, gene identification and their annotations of complete genomes have become the limiting steps in most genome projects. Accurate identification of genes in prokaryotes and unicellular eukaryotes can be achieved by homology to known genes in other species. Stastical methods Genmark,Glimmer

Human Genome Analysis


Accuracy is much poorer for multicellular Eukaryotes, especially Human -Order of magnitude more difficult because of large and complex intron regions & alternate splicing. -Statistical methods : Genscan,Gene -Statistical analysis + homology : PROCRUSTES -m RNA sequences & homology with close genomes. Manual adjustment is often required as the last step.

Challenges of Human Genome Analysis


Large size of DNA sequences to be aligned I.e memory & speed Occurrence of both short & long insertions and deletions. Large scale changes such as tandem repeats & large scale reversals. High degree of divergance in the 3rd position of codons.

Conclusion
Working Draft of Human genome provide us with all the alphabets and now its Bio-Informatician turn to decode the language it is coding via taking advantages of the complete genomes as well as the advances in IT.

Challenges for post genomic era


Scientific challenges Algorithmic challenges Computational challenges GENOMES->GENE PRODUCTS -> STRUCTURE AND FUNCTION -> PATHWAYS AND PHYSIOLOGY -> POPULATION AND EVOLUTION -> ECOSYSTEM

TOOLS FOR HUMAN GENOME ANALYSIS


Cluster of orthologus groups for analysis of whole genomes. COGnitor Tools for sequence analysis Used for efficient data mining For e.g. Vecscan,Orf finder. Software for genetic analysis. For e.g. FASTLINK.

Вам также может понравиться