Вы находитесь на странице: 1из 11






Learning Outcomes

After reading this students should able to:-

Understand and apply the basic principles that underpin computer science.

Demonstrate a range of practical software development and other computing skills in accord with best modern engineering practice.

Pursue deeper study in specialist subjects and have an acquaintance with current research interests.

Demonstrate analytical and problem-solving skills, enabling them to apply their knowledge in a wide variety of situations.

Show self-motivation and the ability to work independently as well as being effective team members.

Demonstrate a range of transferable skills including communication, self-organization, and basic mathematical and logical reasoning.

Appreciate the wider context of computer science in society, in industry, and in academia.

To begin my introduction to this chapter, I would like to ask you to erase everything that you have learned and heard about Bioinformatics. Although you may think this is not a good way of going about it, it will overall be helpful in attaining my goal. You may think this will make it harder for me to teach, or that I am attempting to pollute your mind with my views, and with false information. However, I give you my word that I will do my best to do none of that. My purpose is to educate you. I want to give you everything you need to make an educated decision, based on the facts, and from the point of view of everyone and everything. I will do this by giving you scientific information about each and every topic, views of scientists, any relevant comments from religious perspectives, and opinions from the general population. This way, all your thoughts and questions on any topic will be clearly answered.

“Begin at the beginning and go on till you come to the end, then stop.” This, of course, is a good strategy to stick to, if you know WHERE to begin. If you are a complete novice in any field. The best learning method according to me is the asking How’s, Why’s and what’s and Where’s and who’s (The five W’s) on any topic.

Well ……

Let us begin with

What is Bioinformatics? To understand the basic idea of bioinformatics, one might think of a written language. The text you are reading consists of a series of letters, words, sentences, and paragraphs. If you did not know the meanings of the words and the rules of the language, this page would just be a collection of meaningless

symbols. Similarly, the first time scientists saw gene and protein sequences, they saw a string of symbols with no clear meaning in terms of biological function. But now, bioinformatics is showing us many things about what sequences mean. Using bioinformatics, sequences are being used to reveal relationships among different life forms that we could not find out any other way. Bioinformatics is revealing the rules and meaning of a language that is new to human beings but in fact is billions of years old — the “Language of Life.”

Bioinformatics is an important part of modern biology because

it allows scientists useful, powerful ways to look at their data.

It’s one thing to have several DNA sequences from different organisms written down on a piece of paper, but it’s quite another to have those sequences available in computer databases and to be able to use computers to compare how similar those sequences are, investigate what functions the DNA sequences might have, etc. Another important point is that the number of available DNA sequences is growing exponentially, so bioinformatics work is becoming

There are several definitions for bioinformatics here I give my own definition.

Bioinformatics is the Synergy of Biology And Informatics which Means

Bioinformatics > Bio+Informatics

Biology: is the science that studies living organisms

Informatics: In short Informatics is the application of computer science and information science to the management and processing of data, information, and knowledge.

In general in biology field we study the living organisms in in vivo (In the body, in a living organism especially lab experi-

ments.), in vitro (Outside a living organism Literally, “in glass,” i.e., in a test tube or in the laboratory) which involves much time consumption .We all are aware of by using Information Technology How time factor can be controlled in all the applications for example your e-business. So the question here

is why cannot we computerize our biological techniques and

have the information technology advantage .The answer given is yes we can make it in silico (A process that is completed entirely by the use of a computer.)

You know that our biological data is massive information such as nucleic acid (DNA/RNA) and protein sequences, structures, functions, pathways and interactions which has be stored,

retrieved and analyzed properly. Bioinformatics has evolved into

a full-fledged multidisciplinary subject that integrates develop- ments in Information and Computer Technology as applied to Biotechnology and Biological Sciences. Bioinformatics uses Computer software tools for database creation, data manage- ment, data warehousing, data mining and global communication networking. Bioinformatics is the recording,



annotation, storage, analysis, and searching/retrieval of nucleic acid sequence (genes and RNA’s), protein sequence and struc- tural information. This includes databases of the sequences and structural information as well methods to access, search, visualize and retrieve the information.

Function genomics, bimolecular structure, proteome analysis, cell metabolism, biodiversity, downstream, processing in chemical engineering, drug and vaccine design are some of the areas in which Bioinformatics is an integral component.

Bioinformatics concern the creation and maintenance of databases of biological information whereby researchers can both access existing information and submit new entries.

The most pressing tasks in bioinformatics involve the analysis of sequence information. Computational Biology is the name given to this process, and it involves the following

Finding the genes and Coding Sequences, Promoter regions, repeat regions etc in the DNA sequences of various organisms

Developing methods to predict the structure and/or function of newly discovered proteins and structural RNA sequences.

Clustering protein sequences into families of related sequences and the development of protein models.

Aligning similar proteins and generating Phylogenetic trees to examine evolutionary

Sub-disciplines within Bioinformatics

There are three important sub-disciplines within bioinformatics involving computational biology:

The development of new algorithms and statistics with which to assess relationships among members of large data sets;

The analysis and interpretation of various types of data including nucleotide and amino acid sequences, protein domains, and protein structures and

The development and implementation of tools that enable efficient access and management of different types of information

Activities in bioinformatics we can split the activities in bioinformatics in two areas:

The organization

The analysis of biological data


The creation of databases of biological information

The maintenance of these databases

Today, we are sequencing tens of Millions of bases a year and undertaking to sequence whole organism genomes. The growth of the sequence databases is an unbroken exponential.

The current size of the EMBL nucleotide database (release 42 of March 1995 ) is a staggering 262,000,000 bases.

The most important databases today in biology are probably the protein sequence databases.

Analysis The most pressing tasks in bioinformatics involve the analysis of sequence information. Computational Medicinal Chemistry is the name given to this process, and it involves the following:

Development of methods to predict the structure and/or function of newly discovered proteins and structural RNA sequences.

Clustering protein sequences into families of related sequences and the development of protein models.

Aligning similar proteins and generating Phylogenetic trees to examine

I hope you could get some concept about what is

bioinformatics. Now I will try to define it even though it is difficult .Why I am saying it is difficult to define is

Bioinformatics is a multi disciplinary concept. Is there only one definition of Bioinformatics? Absolutely not. Bioinformatics is

a bright new field. This is exemplified in the lack of a standard

definition for the word. I give you few definitions; all certainly have a high degree of validity.

Definition of Bioinformatics Roughly, bioinformatics describes any use of computers to handle biological information. In practice the definition used by most people is narrower;

Bioinformatics to them is a synonym for “computational molecular biology”- the use of computers to characterize the molecular components of living things.

The Appropriate Definition

“Classical” Bioinformatics

“The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information.

“The Loose” Definition

There are other fields-for example medical imaging / image analysis which might be considered part of bioinformatics. There is also a whole other discipline of biologically inspired computation; genetic algorithms, Artificial Intelligence, neural networks. Often these areas interact in strange ways. Neural networks, inspired by crude models of the functioning of nerve cells in the brain, are used in a program called PHD to predict, surprisingly accurately, the secondary structures of proteins from their primary sequences. What almost all bioinformatics has in common is the processing of large amounts of biologically derived information, whether DNA sequences or breast X-rays.

“We should not think all biological computing is bioinformatics, e.g. mathematical modeling is not bioinformatics, even when connected with biology-related problems. In my opinion, bioinformatics has to do with management and the subsequent use of biological information, particular genetic information.”

Even though the three terms: bioinformatics, computational biology and bioinformation infrastructure are often times used interchange- ably, broadly, the three may be defined as follows:



1. Bioinformatics: refers to database-like activities, involving persistent sets of data that are maintained in a consistent state over essentially indefinite periods of time;

2. Computational biology: encompasses the use of algorithmic tools to facilitate biological analyses; while

3. Bioinformation infrastructure: comprises the entire collective of information management systems, analysis tools and communication networks supporting biology. Thus, the latter may be viewed as a computational scaffold of the former two

We can define bioinformatics as the study of information content and information flow in biological systems and processes. It has evolved to serve as the bridge between observations (data) in diverse biologically related disciplines and the derivations of understanding (information) about how the systems or processes function, and subsequently the application (knowl- edge). A more pragmatic definition in the case of diseases is the understanding of dysfunction (diagnostics) and the subsequent applications of the knowledge for therapeutics and prognosis.

Definitions Bioinformatics

It is the application of computer technology to the management and analysis of biological data. The result is that computers are being used to gather, store, analyze and merge biological data.

The application of computer technology to the management of biological information. Specifically, it is the science facilitates and expedites biological research, particularly in of developing computer databases and algorithms to genomics.

The study of the application of computer and statistical techniques to the management of biological information. In genome projects, bioinformatics includes the development of methods to search databases quickly, to analyze DNA sequence information, and to predict protein sequence and structure from DNA sequence data.

The analysis of biological information using computers and statistical techniques; the science of developing and utilizing computer databases and algorithms to accelerate and enhance biological research.

The science of informatics as applied to biological research. Informatics is the management and analysis of data using advanced computing techniques.

(Computational biology). This word has not a clear definition. It involves the analysis and interpretation of data and the development of algorithms and statistics. The term was coined to encompass computer applications in biological sciences but is now used to mean rather different things, from artificial intelligence and robotics to genome analysis. The term was originally applied to the computational manipulation and analysis of biological sequence data (DNA and/or protein), but now tends also to be used to embrace the manipulation and analysis of 3d structural data.

The use of computers to handle biological information. The term is often used to describe computational molecular biology – the use of computers to store, search and

characterize the genetic code of genes, the proteins linked to each gene and their associated functions.

The application of computational techniques to the management and analysis of biological information.

The science that uses advanced computing techniques for management and analysis of biological data. Bioinformatics is particularly important as an adjunct to genomic research, which generates a large amount of complex data, involving billions of individual DNA building blocks, and tens of thousands of genes. (SNP consortium).

The science of managing and analyzing biological data using advanced computing techniques. Especially important in analyzing genomic research data.

The use of computers in solving information problems in the life sciences. It mainly involves the creation of extensive electronic databases on genomes, protein sequences etc. Also involves techniques such as three-dimensional modeling of biomolecules and biological systems.

Computational or algorithmic approaches to the analysis and integration of genomic, proteomic, or chemical data residing in databases. Bioinformatics includes applications for the analysis of DNA and protein sequence patterns and similarities, tools for t

An interdisciplinary area at the intersection of biological, computer, and information sciences necessary to manage, process, and understand large amounts of data, for instance from the sequencing of the human genome, or from large databases containing information about plants and animals for use in discovering and developing new drugs.

A scientific discipline that comprises all aspects of the gathering, storing, handling, analyzing, interpreting and spreading of biological information. Involves powerful computers and innovative programs, which handle vast amounts of coding information on genes and proteins from genomics programmers. Comprises the development and application of computational algorithms for the purpose of analysis, interpretation, and prediction of data for the design of experiments in the biosciences. [cub]

An emerging science that applies computer and database technology to biological data. Used extensively in genomics, proteomics (the study of proteins and their interactions), and combinatorial chemistry to track ever-growing amounts of information.

The discipline of obtaining information about genomic or protein sequence data. This may involve similarity searches of databases, comparing your unidentified sequence to the sequences in a database, or making predictions about the sequence based on current knowledge of similar sequences. Databases are frequently made publicly available through the internet, or locally at your institution.

The acquisition, storage, arrangement, analysis, display and communication of information related to the biology of living things, generally assisted by the use of computers.

The science of informatics as applied to biological research. Informatics is the management and analysis of data using



advanced computing techniques. Bioinformatics is particularly important as an adjunct to genomics research, because of the large amount of complex data this research generates.

The discipline of using computers to address information problems in the life sciences; it involves the creation of electronic data bases on genomes, protein sequences, etc.

The science dealing with the classification, storage, retrieval, and analysis of genomic information.

The field of study that relates to the collection, organization and analysis of large amounts of biological data using networks of computers and databases (usually with reference to the genome project and DNA, rna or protein sequence or structure information)

The collection, organization and analysis of large amounts of biological data, using networks of computers and databases.

Information about human and other animal genes and related biological structures and processes

The management and analysis of data from biological research.

The use of it to acquire, store, manage and analyze any type of biological data. Today’s accelerated progress in genetic research is possible, in part, because of this combination of biology, powerful algorithm tools and immense

Is the application of computer technology to biology; a combination of techniques and models in statistical, computational, and life sciences to understand the significance of biological data.

Bioinformatics is an interdisciplinary research area that is the interface between the biological and computational sciences. The ultimate goal of bioinformatics is to uncover the wealth of biological information hidden in the mass of data and obtains a clearer insight into the fundamental biology of organisms. This new knowledge could have profound impacts on fields as varied as human health, agriculture, the environment, energy and biotechnology.

Well students I hope you have got good idea after seeing these many definitions. See there are two areas in bioinformatics wherein you can stand that is either Bioinformaticist or Bioinformatician. Now I give you the clear distinction in between these two areas and this is the option to you to choose where you want to shine.

A bioinformaticist is an expert who not only knows how to

use bioinformatics tools, but also knows how to write inter- faces for effective use of the tools.

A bioinformatician, on the other hand, is a trained individual

who only knows to use bioinformatics tools without a deeper understanding.

Thus, a bioinformaticist is to *.omics as a mechanical engineer is

to an automobile. A bioinformatician is to *.omics as a technician is to an automobile.

Now we well see why Bioinformatics is so important and what are needs.

Why is Bioinformatics Important?

The Need for Bioinformatics:

Whole Genome Analyses and Sequences

Experimental Analyses involving Thousands of Genes simultaneously

DNA Chips and Array Analyses


Expression Arrays

Comparative Analyses between Species and Strains

Proteomics: ‘Proteome’ of an Organism Spec

2D gels, Mass

Medical applications: Genetic Disease


Pharmaceutical and Biotech Industry

Forensic applications

Agricultural applications

The greatest challenge facing the molecular biology community today is to make sense of the wealth of data that has been produced by the genome sequencing projects. Traditionally, molecular biology research was carried out entirely at the experimental laboratory bench but the huge increase in the scale of data being produced in this genomic era has seen a need to incorporate computers into this research process. Sequence generation, and its subsequent storage, interpretation and analysis are entirely computer dependent tasks. However, the molecular biology of an organism is a very complex issue with research being carried out at different levels including the genome, proteome, transcriptome and metabalome levels. Following on from the explosion in volume of genomic data, similar increase in data have been observed in the fields of proteomics,transcriptomics and metabalomics.The first challenge facing the bioinformatics community today is the intelligent and efficient storage of this mass of data. It is then their responsibility to provide easy and reliable access to this data. The data itself is meaningless before analysis and the sheer volume present makes it impossible for even a trained biologist to begin to interpret it manually. Therefore, incisive computer tools must be developed to allow the extraction of meaningful biological information

There are three central biological processes around which bioinformatics tools must be developed:

DNA sequence determines protein sequence

Protein sequence determines protein structure

Protein structure determines protein function

The integration of information learned about this key biological process should allow us to achieve the long-term goal of the complete understanding of the biology of organisms.

Now we will see few challenges of bioinformatics by which we can know the importance of bioinformatics.

Challenges of Bioinformatics

Precise, predictive model of transcription initiation and termination: ability to predict where and when transcription will occur in a genome



Precise, predictive model of RNA splicing/alternative splicing: ability to predict the splicing pattern of any primary transcript in any tissue

Precise, quantitative models of signal transduction pathways:

ability to predict cellular responses to external stimuli

Determining effective protein: DNA, protein: RNA and protein: protein recognition codes

Accurate ab initio protein structure prediction Rational design of small molecule inhibitors of proteins

Mechanistic understanding of protein evolution:

understanding exactly how new protein functions evolve

Mechanistic understanding of speciation: molecular details of how speciation occurs

Continued development of effective gene ontologies - systematic ways to describe the functions of any gene or protein

Education: development of appropriate bioinformatics curricula for secondary, undergraduate and graduate education this challenge is very important in India why because as the bioinformatics is mix of biology and informatics and we do not have a full fledged course which covers both the biology and information technology as of now. We need to introduce such courses from the school level, as the bioinformatics requires a person who is strong in biology and computer science.

History of Bioinformatics Bioinformatics encompasses the use of tools and techniques from three separate disciplines; molecular biology (the source of the data to be analyzed), computer science (supplies the hardware for running analysis and the networks to communi- cate the results), and the data analysis algorithms, which strictly define bioinformatics. For this reason, the editors have decided to incorporate events from these areas into a brief history of the field.

1933: A new technique, electrophoresis, is introduced by Tiselius for separating proteins in solution.

1951: Pauling and Corey propose the structure for the alpha helix and beta-sheet (Proc. Natl. Acad. Sci. USA, 27: 205-211, 1951; Proc. Natl. Acad. Sci. USA, 37: 729-740, 1951).

1953:Watson and Crick propose the double helix model for DNA based on x-ray data obtained by Franklin and Wilkins (Nature, 171: 737-738, 1953).

1954:Perutz’s group develops heavy atom methods to solve the phase problem in protein crystallography.

F. Sanger announces 1955:The sequence of the first protein to be analyzed, bovine insulin.

1958: The first integrated circuit is constructed by Jack Kilby at Texas Instruments. The Advanced Research Projects Agency (ARPA) is formed in the US.

1968: Packet-switching network protocols are presented to ARPA.

1969: The ARPANET is created by linking computers at Stanford, UCSB, The University of Utah and UCLA.

1970: The details of the Needleman-Wunsch algorithm for sequence comparison are published

1971: Ray Tomlinson (BBN) invents the email program

1972: The first recombinant DNA molecule is created by Paul Berg and his group

1973: The Brookhaven Protein Data Bank is announced (Acta. Cryst. B, 1973, 29: 1746).

Robert Metcalfe receives his Ph.D. from Harvard University. His thesis describes Ethernet

1974: Vint Cerf and Robert Kahn develop the concept of connecting networks of computers into an “internet” and develop the Transmission Control Protocol (TCP).

Charles Goldfarb invents SGML (Standardized General Markup Language).

1975: Microsoft Corporation is founded by Bill Gates and Paul Allen.

Two-dimensional electrophoresis, where separation of proteins on SDS polyacrylamide gel is combined with separation according to isoelectric points, is announced by P. H. O’Farrell (J. Biol. Chem., 250: 4007-4021, 1975).

E. M. Southern published the experimental details for the Southern Blot technique of specific sequences of DNA (J. Mol. Biol., 98: 503-517, 1975).

1976: The Unix-To-Unix Copy Protocol (UUCP) is developed at Bell Labs.

1977: The full description of the Brookhaven PDB (http:// www.pdb.bnl.gov) is published (Bernstein, F.C.; Koetzle, T.F.; Williams, G.J.B.; Meyer, E.F.; Brice, M.D.; Rodgers, J.R.; Kennard, O.; Shimanouchi, T.; Tasumi, M.J.; J. Mol. Biol., 1977, 112:, 535).

Allan Maxam and Walter Gilbert (Harvard) and Frederick Sanger (U.K. Medical Research Council), report methods for sequencing DNA.

1978: The first Usenet connection is established between Duke and the University of North Carolina at Chapel Hill by Tom Truscott, Jim Ellis and Steve Bellovin.

1980: The first complete gene sequence for an organism (FX174) is published. The gene consists of 5,386 base pairs which code nine proteins.

Wüthrich et. al. publish paper detailing the use of multi- dimensional NMR for protein structure determination (Kumar, A.; Ernst, R.R.; Wüthrich, K.; Biochem. Biophys. Res. Comm., 1980, 95:, 1).

IntelliGenetics, Inc. founded in California. Their primary product is the IntelliGenetics Suite of programs for DNA and protein sequence analysis.

1981: The Smith-Waterman algorithm for sequence alignment is published.

IBM introduces its Personal Computer to the market.

1982: Genetics Computer Group (GCG) created as a part of the University of Wisconsin of Wisconsin Biotechnology Center. The company’s primary product is The Wisconsin Suite of molecular biology tools.



1983: The Compact Disk (CD) is launched.

Name servers are developed at the University of Wisconsin

1984: Jon Postel’s Domain Name System (DNS) is placed on- line.

The Macintosh is announced by Apple Computer

1985:The FASTP algorithm is published.

The PCR reaction is described by Kary Mullis and co-workers.

1986:The term “Genomics” appeared for the first time to describe the scientific discipline of mapping, sequencing, and analyzing genes. The term was coined by Thomas Roderick as a name for the new journal.

Amoco Technology Corporation acquires IntelliGenetics.

NSFnet debuts.

The SWISS-PROT database is created by the Department of Medical Biochemistry of the University of Geneva and the European Molecular Biology Laboratory (EMBL).

1987: The use of yeast artificial chromosomes (YAC) is described (David T. Burke, et. al., Science, 236: 806-812).

The physical map of e. coli is published (Y. Kohara, et. al., Cell 51: 319-337).

PERL (Practical Extraction Report Language) is released by Larry Wall.

1988: The National Center for Biotechnology Information (NCBI) is established at the National Cancer Institute.

The Human Genome Initiative is started (Commission on Life

Sciences, National Research Council. Mapping and Sequencing the Human Genome, National Academy Press: Washington, D.C.),


The FASTA algorithm for sequence comparison is published by Pearson and Lupman.

A new program, an Internet computer virus designed by a student, infects 6,000 military computers in the US.

1989: The Genetics Computer Group (GCG) becomes a private company.

Oxford Molecular Group, Ltd. (OMG) founded in Oxford, UK by Anthony Marchington, David Ricketts, James Hiddleston, Anthony Rees, and W. Graham Richards. Primary products:

Anaconda, Asp, Cameleon and others (molecular modeling, drug design, protein design).

1990: The BLAST program (Altschul, et. al.) is implemented.

Molecular Applications Group is founded in California by Michael Levitt and Chris Lee. Their primary products are Look and SegMod which are used for molecular modeling and protein design.

InforMax is founded in Bethesda, MD. The company’s products address sequence analysis, database and data manage- ment, searching, publication graphics, clone construction, mapping and primer design.

The HTTP 1.0 specification is published. Tim Berners-Lee publishes the first HTML document.

1991: The research institute in Geneva (CERN) announces the creation of the protocols which make-up the World Wide Web.

Linus Torvalds announces a Unix-Like operating system which later becomes Linux.

The creation and use of expressed sequence tags (ESTs) is described (J. Craig Venter, et. al., Science, 252: 1651-1656).

Incyte Pharmaceuticals, a genomics company headquartered in Palo Alto California, is formed.

Myriad Genetics, Inc. is founded in Utah. The company’s goal is to lead in the discovery of major common human disease genes and their related pathways. The Company has discovered and sequenced, with its academic collaborators, the following major genes: BRCA1, BRCA2, CHD1, MMAC1, MMSC1, MMSC2, CtIP, p16, p19, and MTS2.

1992: Human Genome Systems, Gaithersburg Maryland, is formed by William Haseltine.

The Institute for Genomic Research (TIGR) is established by Craig Venter.

Genome Therapeutics announces its incorporation.

Mel Simon and coworkers announce the use of BACs for cloning.

1993: CuraGen Corporation is formed in New Haven, CT.

Affymetrix begins independent operations in Santa Clara, California

Compugen begins operations in Israel.

InterNIC is created by the National Science Foundation.

1994: Netscape Communications Corporation founded and releases Navigator, the commercial version of NCSA’s Mozilla.

Gene Logic is formed in Maryland.

The PRINTS database of protein motifs is published by Attwood and Beck.

Oxford Molecular Group acquires IntelliGenetics.

1995: Microsoft releases version 1.0 of Internet Explorer.

Sun releases version 1.0 of Java. Sun and Netscape release version 1.0 of JavaScript

Version 1.0 of Apache is released.

The Haemophilus influenzea genome (1.8 Mb) is sequenced.

The Mycoplasma genitalium genome

1996: The-working draft for XML is released by W3C.

Oxford Molecular Group acquires the MacVector product from Eastman Kodak.

The genome for Saccharomyces cerevisiae (baker’s yeast, 12.1 Mb) is sequenced.

The Prosite database is reported by Bairoch, et.al.

Affymetrix produces the first commercial DNA chips.

Structural Bioinformatics, Inc. founded in San Diego, CA

1997: The genome for E. coli (4.7 Mbp) is published.

Oxford Molecular Group acquires the Genetics Computer Group.

LION bioscience AG founded as an integrated genomics company with strong focus on bioinformatics. The company is built from IP out of the European Molecular Biology Labora- tory (EMBL), the European Bioinformatics Institute (EBI), the



German Cancer Research Center (DKFZ), and the University of Heidelberg.

Paradigm Genetics Inc., a company focussed on the application of genomic technologies to enhance worldwide food and fiber production, is founded in Research Triangle Park, NC.

deCode genetics publishes a paper that described the location of the FET1 gene, which is responsible for familial essential tremor, on chromosome 13 (Nature Genetics

1998: The genomes for Caenorhabditis elegans and baker’s yeast are published.

The Swiss Institute of Bioinformatics is established as a non- profit foundation.

Craig Venter forms Celera in Rockville, Maryland.

PE Informatics was formed as a Center of Excellence within PE Biosystems. This center brings together and leverages the complementary expertise of PE Nelson and Molecular Informatics, to further complement the genetic instrumentation expertise of Applied Biosystems.

Inpharmatica, a new Genomics and Bioinformatics company, is established by University College London, the Wolfson Institute for Biomedical Research, five leading scientists from major British academic centers and Unibio Limited.

GeneFormatics, a company dedicated to the analysis and prediction of protein structure and function, is formed in San Diego.

Molecular Simulations Inc. is acquired by Pharmacopeia

1999: deCode genetics maps the gene linked to pre-eclampsia as a locus on chromosome 2p13.

2000: The genome for Pseudomonas aeruginosa (6.3 Mbp) is published.

The A. thaliana genome (100 Mb) is sequenced.

The D. melanogaster genome (180Mb) is sequenced.

Pharmacopeia acquires Oxford Molecular Group.

2001::The human genome (3,000 Mbp) is published.

2002::Structural Bioinformatics and GeneFormatics merge.

2004: The draft genome sequence of the brown Norway laboratory rat, Rattus norvegicus, was completed by the Rat Genome Sequencing project Consortium. The paper appears in the April 1 edition of Nature.

Scope of Bioinformatics

Bioinformatics has evolved into a full-fledged scientific disci- pline over the last decade. The definition of Bioinformatics is not restricted to computational molecular biology and compu- tational structural biology. Bioinformatics uses advances in the area of computer science, information science, computer and information technology, communication technology to solve complex problems in life sciences like comparative genomics, structural genomics, transcriptiomics, Proteomics, cellunomics and metabolic pathway engineering. And particularly in biotechnology. Developments in these fields have direct implications to healthcare, medicine, discovery of next genera- tion drugs, development of agricultural products, renewable energy, environmental protection etc. Data capture, data

warehousing and data mining have become major issues for biotechnologists and biological scientists due to sudden growth in quantitative data in biology such as complete genomes of biological species including human genome, protein sequences, protein 3-D structures, metabolic pathways databases, cell line & hybridoma information, biodiversity related information. Advancements in information technology, particularly the Internet, are being used to gather and access ever-increasing information in biology and biotechnology. Functional genomics, proteomics, discovery of new drugs and vaccines, molecular diagnostic kits and pharmacogenomics are some of the areas in which bioinformatics has become an integral part of Research & Development. The knowledge of multimedia databases, tools to carry out data analysis and modeling of molecules and biological systems on computer workstations as well as in a network environment has become essential for any student of Bioinformatics. Bioinformatics, the multidisciplinary area, has grown so much that one divides it into molecular bioinformatics, organal bioinformatics and species bioinformatics. Issues related to biodiversity and environment, cloning of higher animals such as Dolly and Polly, tissue culture and cloning of plants have brought out that Bioinformatics is not only a support branch of science but is also a subject that directs future course of research in biotechnology and life sciences. The importance and usefulness of Bioinformatics is realized in last few years by many industries. Therefore, large Bioinformatics R & D divisions are being established in many pharmaceutical companies, biotechnology companies and even in other conventional industry dealing with biological. Bioinformatics is thus rated as number one career in the field of biosciences.

The need of trained manpower in this area is sharply on the rise but there are very few training institutions in the world where such training is provided. In short, Bioinformatics deals with database creation, data analysis and modeling. Data capturing is done not only from printed material but also from network resources. Databases in biology are generally in the multimedia form organized in relational database model. Modeling is done not only on single biological molecule but also on multiple systems thus requiring a use of high performance computing systems.

Potential of Bioinformatics The potential of bioinformatics in the identification of useful genes leading to the development of new gene products, drug discovery and drug development has led to a paradigm shift in biology and biotechnology-these fields are becoming more & more computationally intensive. The new paradigm, now emerging, is that all the genes will be known “in the sense of being resident in database available electronically”, and the starting point of biological investigation will be theoretical and a scientist will begin with a theoretical conjecture and only then turning to experiment to follow or test the hypothesis. With a much deeper understanding of the biological processes at the molecular level, the Bioinformatics scientist have developed new techniques to analyze genes on an industrial scale resulting in a new area of science known as ‘Genomics’.



The shift from gene biology has resulted in the development of strategies-from lab techniques to computer programs to analyze whole batch of genes at once. Genomics is revolutionizing drug development, gene therapy, and our entire approach to health care and human medicine.

The genomic discoveries are getting translated in to practical biomedical results through Bioinformatics applications. Work on proteomics and genomics will continue using highly sophisticated software tools and data networks that can carry multimedia databases. Thus, the research will be in the develop- ment of multimedia databases in various areas of life sciences and biotechnology. There will be an urgent need for develop- ment of software tools for data mining, analysis and modeling, and downstream processing. Security of data, data transfer and data compression, auto checks on data accuracy and correctness will also be major research area of bioinformatics. The use of virtual Reality in drug design, metabolic pathway design, and unicellular organism design, paving the way to design and modification of unicellular organisms, will be the challenges, which are the challenges, which Bioinformatics scientist and specialist have to tackle. It has now been universally recognized that Bioinformatics is the key to the new grand data-intensive molecular biology that will take us into 21 century

Some questions will arise definitely in you which I need to clarify then and there .I try to guess those questions from you and give you answers .

How does bioinformatics help biologists?

Biology in the 21st century is being transformed from a purely lab-based science to an information science as well. Major advances in the field of molecular biology over the past few years including the ever growing genomic data have led to large amount biological information, which is difficult to decipher by the scientific community. Bioinformatics is all about retrieving, organizing, analyzing and storing data, which will require the help of computational methods. It delivers easy access of information and projects a method for extracting only that information that is specifically asked by the biologists. There- fore, the field of bioinformatics has evolved such that the most pressing task involves the analysis and interpretation of various types of data, including nucleotide and amino acid sequences, protein domains, and protein structures. This ultimately helps the biologist to obtain a comprehensive picture of the cellular activities and thus base their research on how these are altered under various conditions.

Does bioinformatics pertain only to data mining?

No, it does not. It involves a lot more intensive research and analysis of the huge data that is unmanageable otherwise. A bioinformatician is usually involved in areas such as biological tool development – using neural networks, genetic algorithms; comparative genomics, functional genomics, structural genomics, database development and management, integration of various fields of life sciences to develop systems biology tools, phylogenetic analysis, proteomic studies, and the like.

What are the areas of research in bioinformatics?

Research in bioinformatics will include development and implementation of tools that enable efficient access to various

types of information, which should be usable as well as manageable. Development of algorithms for prediction of a number of different biological data like genes, protein func- tions, protein structure, domains, and also to assess the relationship between large amounts of data sets is also another area of research. It is very common now for a scientist to conduct vast numbers of database searches to formulate hypothesis and to design large-scale experiments. The areas of Genomics, Proteomics have come a long way due to inputs from bioinformatics analysis.

I close this lesson by giving few definitions to the terminology, which we came across.

Language of Science

Genome All the genetic material in the chromosomes of a particular organism; its size is generally given as the total number of base pairs. One complete set of genes in an organism (a haploid set). Except for occasional unrepaired damage to its DNA (= mutations), the genome is fixed.

Proteome All of the proteins produced by a given species, just as the genome is the totality of the genetic information possessed by that species. The complete profile of proteins expressed in a given tissue, cell or biological system at a given time.

Two Popular Definitions

All the proteins that can be synthesized by the cell. (The

original definition.) All the proteins synthesized by a particular cell at a particular time

Transcriptome The entire set of messenger RNA expressed while building, running and maintaining an organism. The transcriptome is all the mRNA transcribed from genes within a given genome.

Meabalome All the metabolic machinery, e.g.,



Small metabolites, like

The intermediates in glycolysis and cellular respiration

Nucleotides present in a cell at a given time. Varies with the differentiated state of the cell and its current activities.

Genomics The comprehensive study of whole sets of genes and their interactions rather than single genes or proteins.

Proteomics The study of the full set of proteins encoded by a genome

Transcriptomics The generation and studies of complete mRNA expression profiles




Is the computing of emergent proper biological systems such

as development, biological clocks, and infer kinetic models of DNA, RNA, and proteins.

Data Mining An information extraction activity whose goal is to discover hidden facts contained in databases. Using a combination of machine learning, statistical analysis, modeling techniques and database technology, data mining finds patterns and subtle

relationships in data and infers rules that allow the prediction

of future results. Typical applications include market segmenta-

tion, customer profiling, fraud detection, evaluation of retail promotions, and credit risk analysis.

Phylogeny Evolutionary relationships within and between taxonomic

levels, particularly the patterns of lines of descent. Phylogenetics -The taxonomical classification of organisms based on their degree of evolutionary relatedness. Phylogenetic tree - A variety

of dendrogram (diagram) in which organisms are shown

arranged on branches that link them according to their related- ness and evolutionary descent.


A dendrogram is a ‘tree-like’ diagram that summaries the

process of clustering. Similar cases are joined by links whose position in the diagram is determined by the level of similarity between the cases

Protein Domains In the light of structural and biochemical evidence which has accumulated over recent years it has become increasingly clear that the traditional view that ‘polypeptide = protein’ is inad- equate to describe some naturally occurring polypeptides. In particular, it can be shown that different regions along a single polypeptide chain can act as independent units, to the extent that they can be excised from the chain, and still be shown to fold correctly and often still exhibit biological activity. These independent regions are termed domains.

Protein Motifs

A conserved element of a protein sequence alignment that

usually correlates with a particular function. Motifs are generated from a local multiple protein sequence alignment corresponding

to a region whose function or structure is known. It is sufficient

that it is conserved, and is hence likely to be predictive of any subsequent occurrence of such a structural/functional region in any other novel protein sequence. mRNA – messenger RNA Mutagen – An agent that increases the rate of mutations in an organism. Mutation – An inheritable change of a gene, which includes genetic (point or single base) changes, from one allelic form to another; or larger scale alterations such as chromosomal deletions or rearrangements.



finite set of step-by-step instructions for a problem-solving


computation procedure, especially one that can be imple-

mented by a computer

Pharmacogenomics The science of understanding the correlation between an individual patient’s genetic make-up (genotype) and their response to drug treatment. Some drugs work well in some patient populations and not as well in others. Studying the genetic basis of patient response to therapeutics allows drug developers to more effectively design therapeutic treatments.

Molecular Dynamics The study of intramolecular conformations and molecular motions, using computational simulations. Calculations simulating the motion of each atom in a molecular system at a fixed energy, fixed temperature, or with controlled temperature changes.

Biodiversity The variety and variability among living organisms and the ecosystems in which they occur. Biodiversity includes the number of different items and their relative frequencies; these items are organized at many levels, ranging from complete ecosystems to the biochemical structures that are the molecular basis of heredity. Thus, biodiversity encompasses expressions of the relative abundances of different ecosystems, species, and genes.

(OR) IN Other Words Biodiversity, or biological diversity, is the term for the variety of life and the natural processes of which living things are a part. This includes the living organisms and the genetic differences between them and the communities in which they occur. The concept of biodiversity represents the ways that life is organized and interacts on our planet. These interactions can take place on scales ranging from the smallest, at the chromosome level, to organisms, ecosystems, and even to entire landscapes.

SNP Single Nucleotide Polymorphism. When comparing the same sequence from two individuals, there can often be single base pair changes. These can be useful genetic markers


European Molecular Biology Labs. The EMBL Nucleotide Sequence database is a comprehensive database of DNA and RNA sequences. The database is produced in collaboration with GenBank and the DNA Database of Japan (DDBJ)

DNA chips /DNA Microarrays This technology promises to monitor the whole genome on a single chip so that researchers can have a better picture of the interactions among thousands of genes simultaneously. Or standard blotting membranes, and can be created by hand or make use of robotics to deposit the sample. In general, arrays are described as macro arrays or micro arrays, the difference being the size of the sample spots. This technology promises to monitor the whole genome on a single chip so that researchers can have a better picture of the interactions among thousands of genes simultaneously.

Forensic The branch of science that employs scientific technology to assist in the determination of facts in the courts of law



RNA Splicing/Aternative Splicing Alternative RNA splicing operates in multicellular organisms to generate rich proteomic diversity and to regulate the appearance of tissue-specific mRNA transcripts. Yet there is a limited understanding of these complex mechanisms, and how they respond to physiological inputs and developmental cues

Ab Initio

it is a Latin word which means from the beginning. Molecular

orbital Calculations, which use all the molecular orbitals in a calculation, not just the valence electron orbitals. Absolute configuration the way, in 3-dimensional space, in which 4 different substituents are arranged off a chiral carbon. This can only be determined by X-ray Crystallography. However other compounds, which can be, related to one with known configu- ration, by syntheses where there are no changes at a chiral centre, bcan also be assigned an absolute configuration

Ontology Ontology is the study of being, and it encompasses everything involved with the beings within humans, the process of becoming our beings fully, and relationships between degrees of being and the ontological worlds they create. Ontological refers to anything that has to do with the real self. For example, ontologically sensitive people are sensitive to the real selves within themselves and within others

Speciation The process by which one or more populations of a species become genetically different enough to form a new species. The process often requires populations to be isolated for a long period of time.

Computer Science The systematic study of computing systems and computation. The body of knowledge resulting from this discipline contains theories for understanding computing systems and methods; design methodology, algorithms, and tools; methods for the testing of concepts; methods of analysis and verification; and knowledge representation and implementation. Or briefly Study of the implementation, organization, and application of computer software and hardware resources.

Information Science Pure and applied science involving the collection, organization, and management of information

Information Technology

Acquisition, processing, storage and dissemination of all types of information using computer technology and telecommuni- cation systems

Comparative Genomics The study of human genetics by comparisons with model

organisms such as mice, the fruit fly, and the bacterium E. coli. The comparison of genomes and of distinct individuals within

a genome. Comparative genomics makes possible the applica-

tion of information gained from a simple genome to a more complex genome, and is the basis for the understanding of genetic variation amo

Structural Genomics the branch of genomics that determines the three-dimensional structures of proteins


In abstract, one could visualize the combination of different

molecules in a particular cell at an instant of time as a cellular state. The set of all states that a particular cell could enter is known as the cellome; cellomics is the study of a particular cellome

Metabolic Pathway Engineering

It offers

Higher yields and productivities

Less side-products

Good stereo specificity

Optimal conditions for biological reactions

Conversion of a cheap raw material to a high-value product

Novel products and processes

Environmentally friendly production methods

Biotechnology The simplest definition of biotechnology is “applied biology.” The application of biological knowledge and techniques to develop products. It may be further defined as the use of living organisms to make a product or run a process. By this defini- tion, the classic techniques used for plant and animal breeding, fermentation and enzyme purification would be considered biotechnology. Some people use the term only to refer to newer tools of genetic science. In this context, biotechnology may be defined as the use of biotechnical methods to modify the genetic materials of living cells so they will produce new substances or perform new functions. Examples include recombinant DNA technology, in which a copy of a piece of DNA containing one or a few genes is transferred between organisms or “recombined” within an organism.

Organal Bioinformatics Collecting, storing retrieving the organs (A part of the body that consists of different types of tissue and that performs a particular function. Examples include the kidneys, heart and brain. ) Data and analyzing is called organal bioinformatics.

Data warehousing:


data warehouse is a collection of data gathered and organized


that it can easily by analyzed, extracted, synthesized, and

otherwise be used for the purposes of further understanding

the data. It may be contrasted with data that is gathered to meet immediate business objectives such as order and payment transactions, although this data would also usually become part

of a data warehouse.

Sequence One can use this word as a verb or as a noun. A sequence

(noun) is a string of bases in DNA or a string of amino acids

in a protein. Also, to sequence (verb) means the experimental

process of determining the order of the bases in a DNA fragment or the order of amino acids in a protein.



Path way

A system of proteins that work together. For example, a

pathway could include protein, A which sends a signal to protein B, which sends a signal to protein C, and so on until a biological effect occurs. Both p53 and pRB are members of pathways in human cells.


A DNA site to which RNA polymerase will bind and initiate

transcription. A DNA sequence that is located in front of a gene and controls gene expression. Promoters are required for binding of RNA polymerase to initiate transcription.

Repeat The distance from the center of one motif or pattern to the center of the next.

Gene The fundamental physical and functional unit of heredity. A gene is an ordered sequence of nucleotides located in a particular position on a particular chromosome that encodes a specific functional product (i.e., a protein or RNA molecule.

CDS Coding sequence. This is the portion of an mRNA or genomic sequence that encodes for a protein sequence.


Clustering is the use of multiple computers and storage devices

to create what seems to be a single system. Clustering is often

used to increase a system’s availability and for load balancing on highly-trafficked Web sites

Case Study

Use of Bioinformatics In Development Of Peptide Vaccine

Traditional vaccines particularly against viruses have limitations such as difficulty in ascertaining complete loss of pathogenic potential, need of multiple dosages immunity is for short period and higher cost. Among new approaches recommended recombinant DNA vaccines and peptide vaccines have high potential and are being tried in several laboratories. As a case study peptide vaccine against Japanese encephalitis (JE) virus is being developed. The method used include multiple alignment of sequences of envelop glyco protein (Egp) of Flavi viruses those are closely related to JE virus such as MVE virus, WN Virus, Kun Virus, Den Virus, and YF Virus. Using in house algo- rithms the antigen program is developed. This program is used


predict antigenic determinants of each of these proteins. One


the consensus determinants 155-163 was synthesized in

laboratory and polyclonal antibodies are raised against the peptide. This antigenic determinant 155-166 was proven to be not only antigenic but also virus neutralizing epitope. Molecular modeling studies are carried out and the stable conformation was predicted by adding flanking peptides to N-terminal and C- terminal regions. These molecular modeling studies are carried on 149-163, 151-163, 155-163, 155-167, 149-167 and 157-167 peptides. The conformationally stable peptide was attached to previously predicted T-helper cell binding peptide. Its immune protecting capacity is also checked in laboratory to confirm that peptide can be used as potential vaccine. Further knowledge

based homology. Modeling studies are carried out to predict 3- D structure of the Egp of JE virus. The template used is the structure of Egp of TBE virus. Conformational epitopes are predicted using this model of the Egp. Thus it has been showed that, the use of Bioinformatics tools and techniques not only reduces the time required to identify the candidate peptide as vaccine but also provides an insight in structure function relationship of virus protein.

Testing your knowledge:

This section is to test yourself that is how much you under- stood from this lesson

Try to answer the Following Questions

1. What is Bioinformatics?

2. What is the scope of Bioinformatics?

3. What are the challenges of Bioinformatics?

4. How old is bioinformatics?

5. Are there any standards in Bioinformatics?

6. Can you give only one exact definition to Bioinformatics?

7. How can we tackle tasks in bioinformatics?

The scientist is not a person who gives the right answers; he is one who asks the right questions”