What is bioinformatics? “Bioinformatics” • Before the era of bioinformatics, only two ways of performing biological experiments were available: • Within a living organism (so-called in vivo) or in an artificial environment (so-called in vitro, from the Latin in glass) • Taking the analogy further, we can say that bioinformatics is in fact in silico biology, from the silicon chips on which microprocessors are built • More specifically, we can define bioinformatics as the computational branch of molecular biology “Bioinformatics” • Bioinformatics created from the interaction of biology and computer science • Uses computer databases and computer algorithms (a series of steps followed to solve logical and mathematical problems by computer) to analyze proteins, genes, and the complete collections of deoxyribonucleic acid (DNA) that comprises an organism (the genome). • Used for solving biological problems – data problems: representation (graphics), storage and retrieval (databases), analysis (e.g. statistics) – biology problems: sequence analysis, structure or function prediction, data mining, etc. Database storage
The Commercial Market • Current bioinformatics market is worth 300 million / year • Prediction: increase by $2 billion / year in 5-6 years • ~50 Bioinformatics companies:
But how did bioinformatics arise? Huge amount of molecular information was being generated
Drew Sheneman, New Jersey--The Newark Star Ledger
Pre-Computer Era:
• Molecular sequences were assembled, analysed, and compared by:
• Manually writing them on pieces of paper, • Taping them side by side on laboratory walls, • And/or moving them around for optimal alignment (now called pattern matching) • You can imagine how labour intensive this maybe Solution • Set up of computers and algorithms that allow: • Access, processing, storing, sharing, retrieving, visualizing,….. • Bioinformatics coined in 1960s • Margaret Oakley Dayhoff created: IBM 7090 computer
• The first protein database
• The first program for sequence assembly Significance of Bioinformatics • Molecular medicine – identification of genetic components for various conditions that help in diagnosis based on sequence or gene expression – determine appropriate gene therapy • Pharmacogenomics – developing highly targeted drugs • Toxicogenomics – elucidating which genes are affected by various chemicals Focus of Bioinformatics Approaches to Bioinformatics • The first perspective or approach on bioinformatics is the cell: Molecular Biology main focus is on individual genes, messenger RNA (mRNA) transcripts, and proteins • But: The focus of bioinformatics is the complete collection of DNA (the genome),RNA (the transcriptome),and protein sequences(the proteome) that have been collected • genome = complete set of genes present in a cell or organism), transcriptome= all set of messenger RNA expressed in a single cell or set of cells),proteome = all set of proteins that is or can be expressed in a cell, tissue or organism) • Bioinformatics can help answer biological questions that can be approached from levels ranging from: • Cellular phenotype is all the distinct components of multiple cellular processes involving gene and protein expression that result in the elaboration of a cell's particular morphology and function. Approaches to Bioinformatics • Second perspective focuses on individual organisms: • Here bioinformatics tools can be applied to describe changes through developmental time, changes across body regions, and changes in a variety of physiological or pathological states • For example - Gene expression varies in disease states or in response to a variety of signals, both internal and environmental • Expressed genes and proteins derived from different tissues and conditions. • DNA microarrays measure the expression of thousands of genes in biological samples Second approach to Bioinformatics Approaches to Bioinformatics • Third perspective focuses on multiple species (tree of life): • Here the focus is on the variations that occur between species and among members of a species where we can deduce the evolutionary history of life on Earth. • Uses comparative genomics in which genomes are compared Third approach to Bioinformatics Databases Molecular Databases • Publicly available databanks now contain billions of nucleotides of DNA sequence data collected from thousands of different organisms. • > 300 other publicly available databases pertaining to molecular biology • GenBank > 61 million sequence entries > 65 billion bases • UnitProtKB / Swis-Prot > 277 thousand protein sequence entries > 100 million amino acids • Protein Data Bank 45,632 protein (and related) structures • * all numbers current about 9/07 Thursday: Chapter 2 •Access to sequence data and literature information • Read before class