Вы находитесь на странице: 1из 70

MOSTI

The Malaysian Technology


Roadmap for Bioinformatics

September 2007
TABLE OF CONTENTS

1. INTRODUCTION 3
1.1 BIOTECHNOLOGY 3
1.2 BIOINFORMATICS 4
1.3 OVERVIEW OF THE GLOBAL BIOINFORMATICS INDUSTRY 13

2. KEY PROCESSES IN DEVELOPING THE ROADMAP 16


2.1 PRIOTISATION PROCESS 19

3. BIOINFORMATICS FRAMEWORK 19

3.1 GOVERNANCE 21
3.2 COMMON ENABLING TOOLS 22
3.3 BIOINFORMATICS APPLICATIONS 24
3.4 SYSTEMS BIOLOGY 27
3.5 STRUCTURAL BIOINFORMATICS 28
3.6 MOLECULAR BIOINFORMATICS 29

4. TECHNOLOGY ROADMAP FOR BIOINFORMATICS 36


4.1 THE TECHNOLOGY ROADMAP 34
4.2 IMPLEMENTATION SCHDEULE 35
4.3 BUDGET SUMMARY 35

5. RECOMMENDATIONS 36

6. REFERENCES 36

7. ACKNOWLDEGEMENTS 36

8. LIST OF CONTRIBUTORS 37

APPENDIX A - Motivation for Roadmapping 41

APPENDIX B – Priority Areas for Domain #1: Governance 44

APPENDIX C - Priority Areas for Domain #2: Common Enabling Tools 46

APPENDIX D - Priority Areas for Domain #3: Bioinformatics Applications 51

APPENDIX E - Priority Areas for Domain #4: Systems biology 58

APPENDIX F - Priority Areas for Domain #5: Structural Bioinformatics 61

APPENDIX G - Priority Areas for Domain #6: Molecular Bioinformatics 68

2
1. INTRODUCTION

1.1. Biotechnology

Biotechnology basically involves the use of biological systems, living


organisms, or derivatives thereof, to make or modify products or processes for
specific use [Source : UN Convention on Biological Diversity]. It is
recognized as one of the fastest growing sectors in the world and is now seen
as a major area of investment and target for support by governments around
the world. Early application of biotechnology involves the cultivation of plants
to produce food suitable to humans.

In recent times, biotechnology has been responsible for hundreds of medical


diagnostic tests that keep the blood supply safe from the AIDS virus and
detect other conditions early enough to be successfully treated. Home
pregnancy tests are also biotechnology diagnostic products. There are more
than 300 biotech drug products and vaccines currently in clinical trials
targeting more than 200 diseases, including various cancers, Alzheimer’s
disease, heart disease, diabetes, multiple sclerosis, AIDS and arthritis
[Source : Biotechnology Industry Organisation, http://www.bio.org/speeches/
pubs/er/statistics.asp, accessed in May 2006].

Table 1 shows some of the major indicators for the biotechnology


industry in several countries. All numbers reported are from the 2003 fiscal
year.

USA Europe Canada Australia


Sales/Revenue (USD) 47.4 B 7.5 B 1.7 B 1.0 B
Annual R&D Investments (USD) 14.3 4.2 0.6 B 0.1 B
# Companies 1,473 1,878 470 226
# Employees 146,100 32,470 7,440 6,393
# Public Companies 318 96 81 58
Market Capitalization (USD) 344.4 B 25.6 B 13.8 B 5.0 B
Source : Burrill & Company, Ernst & Young

Table 1 : Biotechnology indicators for US, Europe, Canada and Australia

Consumers have also been enjoying biotechnology foods such as papaya,


soybeans and corn. In addition, biopesticides and other agricultural products
are being used to improve the food supply and to reduce the dependence on
conventional chemical pesticides.

However, the process of developing an enhanced plant, organism or a new


drug is challenging. For example, in the case of drug design and
development, the process is complex, expensive, and time-consuming, with
very high risks involved. It can take anywhere from a minimum of 10 years to
25 years before a new drug can be launched. A study conducted by William
Bains (Source : “Failure rates and times for drug discovery & development”,
William Bains, 2004), on the success rate for each stage of the drug design

3
life cycle is illustrated in table 2. It clearly shows a significantly high
probability for failure at the early stages.

Discovery Preclinical Clinical 1 Clinical II Clinical III

Time per stage (years) 3.50 0.46 1.40 2.30 5.10

Chance of failure (%) 40.0 35.0 22.0 30.0 10.0

# projects needed to achieve 30 13 6.2 3.6 1.7


one successful launch

Table 2 : Failure rates and times for drug discovery and development

Hence, it is very important that we are able to select the promising candidates
early and accurately, as this will have a major impact on the success rate.
This is graphically represented in figure 1.

Figure 1: Success rate of selecting the correct projects/candidates

1.2. Bioinformatics

Bioinformatics is the application of computer technology to the management


and analysis of biological data. It is an interdisciplinary research area that lies
at the interface of biological and computational sciences. Many also see this
as a convergence of biotechnology with information (and communication)
technology, as shown in figure 2. The key driver to the growth in this domain
has been the efficiencies that are brought about by the reduction in the time

4
taken to select promising candidates and the resulting success rates, as
shown previously in figure 1.

Nanotech

Biotech Infotech

Bioinformatics

Figure 2: Convergence of biotechnology, nanotechnology and information


technology.

However, due to its multidisciplinary nature, bioinformatics is supported by


several key disciplines or areas. This is illustrated in figure 3 below.

Molecular
Physics Mathematics
Biology

Immunology

Bioinformatics Statistics
Structural
Biology

Genomics Information Computer


Chemistry
Technology Science

Figure 3: Supporting areas for bioinformatics


(Adapted from the presentation by Assoc. Prof. Dr. Amir Merican during the
bioinformatics technology roadmap workshop on 27 September 2005)

5
Following the completion of the Human Genome project, there has been a
massive increase in the amount of experimental data available. Central
dogma of life is that DNA is transcribed into mRNA; mRNA is then translated
into a protein. DNA sequence governs protein sequence; protein sequence
governs protein structure which in turn governs protein function. It is the
malfunction or deficient function of a protein that leads to occurrence of an
anomaly. Therefore it is believed that study of protein structure and in turn its
function would help us understand the molecular basis of anomalies.

However there is a huge amount of biological data available due to high-


throughput proteomics and genomics. It is increasingly important to develop
efficient computing tools for concise storage, easy retrieval, accurate analysis,
convenient cross-linkage with other related resources and valid interpretation
of all the available data, as illustrated in Figure 4 . This is one of the reasons
why established IT companies like Microsoft, IBM, Fujitsu, Sun Microsystems
and other alike have ventured into the Bioinformatics field.

Used for tasks ranging


Knowledge Management Embedded controllers in
from nucleotide
sequences and
Computation Control sequencing machines,
fermentation tanks, and
visualizing protein bioreactors direct
folding patterns to intelligent robotic systems
simulating complex to improve efficiency and
3D protein-protein safety.
interactions. Huge
computing power
required.

ICT in Bioinformatics Data needs to reside in a


persistent, non-volatile
form that allows operations
to be repeated and
compared with other
Allows more efficient operations. The databases
sharing of research also need to be accessible
findings and (new) by others that have
data - promotes accessed rights.
collaboration.
Communication Collection
Figure 4 : The four key roles of ICT in Bioinformatics

The global bioinformatics market is centred on gene-based informatics,


genomic data analysis, chemi-informatics, etc. at the demand-side whereas
the supply-side basically looks at systems and software for gene sequencing.
Their relationships are shown in figure 5.

6
Demand-side Market
Segmentation

Gene-based Chemi-informatics
Informatics

Characterization of
Genomic Data Combinatorial
Analysis Libraries

Optimization of
Genomic Data Mining
Combinatorial
Libraries

Supply-side Market
Segmentation:
Systems and Software

Object-oriented Gene-sequence Data-


Gene-sequencing Data Framework for Gene- Gene-sequence dissemination
Generation Software sequence Data Analysis Software Technology
Management

Figure 5: Global industry structure for bioinformatics

Tools in Bioinformatics can be largely grouped under the following:


1) Various databases- Hardware
2) Servers
3) Middleware
4) Data-mining tools and Browsers.
Other tools and techniques are Nucleotide sequence tools, Protein sequence
tools; Structure based tools, Genome analysis tools, Gene expression tools.

Application of Bioinformatics:

• To search database of protein structures for structure-based drug design


methods- rational drug design.

• To model the docking of compounds and their target proteins- Protein


docking and thermodynamics.

7
• To determine the 3-D structure of a protein based on sequence similarity
and structural similarity-In-silico modeling, molecular modeling.

• To manage databases of small molecules that can act as potential lead


compounds- Combinatorial chemistry aspect

Current Market Trends

• Development of tools enabling conversion of raw data to a standard data

• Effective data management

• Increasing Trend toward Dry Lab Techniques

• Increasing Use of High Throughput Systems

• Techniques like RNA interference

• Elimination of False Leads at the Early Stage


• Systems Biology Increasing the Demand for Large Databases, Higher
Computing Power and Data Integration Tools

Key Issues to be addressed in Informatics

• Integrate biological, genomic, computational, clinical, administrative and


financial data into a comprehensive enterprise scale information
management solution

• Integrate all data developed during the R&D, approval and marketing into
a comprehensive enterprise-level “data warehouse”

• Facilitate sharing of key data by all researchers & research entities, not
only the primary researcher and prime research organization

• Enable pervasive access to the R&D data warehouse

• Store all key documents involved in the research and development


process

• Apply GLP/GMP standards of audit, control and verification to all versions


of all data so that a comprehensive history of the research process
emerges from the raw data

Current Trends in Applications

• Improving data capture, data reduction, data management and data


display

8
• Increased transition from wet lab to dry lab, so as to reduce costs of
experimentation

• Emergence of new techniques like micro arrays and computer-aided drug


design resulted in HTS and also cut down on the time

• Increased discovery of NCEs

• Reduction in drug discovery pipeline, development of effective biomarkers-


Stratification and Effective biomarkers

• Epidermal Growth Factor inhibitor used against non-small lung carcinoma,


effective in Japanese but not in Americans

• Personalized medicines- following better understanding of gene


expression, avoiding trial and error as in case of Warfarin

• Understanding the complex biological networks- Systems Biology

• Epigenomics: Detect patterns of MVPs using HTS, diagnose susceptibility


to diseases like Fragile X-syndrome and cancer. Breast cancer studies for
recurrence detection following treatment with Tamoxifen-
Pharmacogenomic application

• Reverse Genomics to detect druggable proteins using yeast hybrid


system.

• Genome sequencing- Used to develop high antibiotic producing strain, in


S. erythraea

• Protein biochips, used to design highly specific biomarkers. Helped to


develop newer vaccines like the HCV and HI vaccines, currently under trial

Current Trends in Technology

• Novel techniques that facilitate the High throughput drug discovery


process include
ƒ Gene sequencer- helped to identify new pharmaceutical targets.
ƒ Robotic pipettes- helped to provide many new drug-like compounds.
ƒ Plate readers- enabled high-throughput screening.
ƒ Automation of routine and repetitive lab procedures.
ƒ Development of cell-based assays.
ƒ Development of kits and reagents for functional studies
ƒ Micro arrays- Miniaturization, Automation, Parallelization
ƒ DNA Shotgun micro arrays- saves on time

9
ƒ Nucleic acid detection-identification of number of targets
ƒ Label-free detection- reducing the insensitivity
ƒ Benchtop arrayer FeBit- automation and convenience
ƒ CMOS biochip-an e-biochip, portable medical diagnostic tool
ƒ Microfluidic MEMS for DNA amplification & detection on the
same chip.

• Miniaturized Liquid Array Bioassays


ƒ Luminex Corporation’s xMAP technology platform integrates fluidics,
optics and digital signal processing to simultaneously perform up to
100 bioassays on a single drop of fluid, by reading biological tests
taking place on the surface of microspheres.
ƒ Luminex xMAP can perform enzymatic, genetic and immunological
tests on the same instrumentation platform.
ƒ The Luminex xMAP technology combines miniaturized liquid array
bioassay capability adapted from several existing biological testing
techniques with small lasers, digital signal processors and proprietary
software

• Lab-on-a-chip 2-D gel electrophoresis, using magnetic particles for


separation of biomolecules based on their natural charges

• Tandem mass spectrophotometer made simple- AutoFlex2 is a time of


flight method for protein characterization and identification. Highly
automated MS/MS, accurate and fast

• Automated protein crystallization- combining MWG’s robotic lab


automation system with Innovadyne’s non-contact dispensing from
Innovadyne’s Nanodrop system. Both Theonyx and Innovadyne can
address diverse viscosities encountered during crystallization.

• Nanosphere’s "Verigene" incorporating "ClearRead" technology, which


eliminates the need for traditional target amplification. permits analysis of
DNA, RNA, or protein targets via a simple color change, on a single
platform using gold nanoparticle technology.

• Cell free protein expression

• Rapid translation system RTS 500, produces proteins in huge quantities


used for cell free protein expression, for characterization, functional
assays and structural studies. Being cell-free, problems associated with in-
vivo techniques are averted.

• Protein-protein interactions

• Genetix has launched a highly automated Y2H screening option for QBOT,
which is a highly flexible automated tool. Y2H option automates labour

10
intensive steps involved in Y2H experiments. The product offers
researchers with a high throughput method to examine protein-protein
interactions. It can screen up to 100.000 Y2H interactions per day.

Some major players

1 Affymetrix Inc Affymetrix's GeneChip® technology was invented in


the late 1980's by a team of scientists lead by
(Santa Clara, California, Stephen P.A. Fodor, Ph.D. The theory behind their
USA) work was revolutionary - a notion that semiconductor
manufacturing techniques could be united with
advances in combinatorial chemistry to build vast
amounts of biological data on a small glass chip. This
technology became the basis of a new company,
Affymetrix, formed as a division of Affymax, N.V. in
1991. Affymetrix began operating independently in
1992.
Accelrys was founded in 2001, bringing together five
2 Accelrys Inc. specialist companies: Molecular Simulations Inc.
(San Diego, California, (MSI), Synopsys Scientific Systems, Oxford
USA) Molecular, the Genetics Computer Group (GCG®),
and Synomics Ltd. In 2004, Accelrys acquired
SciTegic Inc., which continues to operate as a wholly-
owned subsidiary.
Accelrys serves two major markets. Its life science
solutions support pharmaceutical and biotechnology
companies, while its materials science solutions
support companies that develop and optimize
materials and chemicals, such as those serving the
chemicals, petrochemicals, food, personal care
products, electronics, and nanotechnology industries.
3 Celera Genomics Celera was founded in 1998 with the mission to
sequence the human genome and provide clients with
(Rockville, Maryland and early access to the resulting data. Using state-of-the
Alameda, California , art sequencing technology supplied by Applied
USA) Biosystems and sophisticated internally-developed
informatics, Celera pioneered the application of
“shotgun” sequencing. While this “shotgun” approach
was widely criticized at the time, it has subsequently
become a standard method for sequencing complex
organisms that is now broadly accepted and routinely
used by many of the same scientists who originally
scorned the approach. Scores of organisms have now
been sequenced using the Celera “shotgun” method.

4 Applied Biosystems Applied Biosystems Group serves the life science


industry and research community by developing and
(Foster City, California, marketing instrument-based systems, consumables,
USA) software, and services. Customers use these tools to
analyze nucleic acids (DNA and RNA), small
molecules, and proteins to make scientific discoveries
and develop new pharmaceuticals. Applied
Biosystems’ products also serve the needs of some
markets outside of life science research, which we
refer to as “applied markets,” such as the fields of:
human identity testing (forensic and paternity testing);
biosecurity, which refers to products needed in

11
response to the threat of biological terrorism and other
malicious, accidental, and natural biological dangers;
and quality and safety testing, for example in food and
the environment.

5 INCYTE Incyte is a drug discovery and development company


with a growing pipeline of novel small molecule drugs.
(Wilmington,
Delaware, USA)
Lion Bioscience aims to help life science companies
6 LION Bioscience AG increase their productivity in the R&D process. Their
(Postfach 103780 software solutions take on the great challenges of
69027 Heidelberg making the huge amounts of available data easier to
Germany) use, thus allowing their customers to make better and
faster decisions about their research projects.
Its core product is SRS, a software package for
accessing and analyzing data during the biological as
well as the chemical and clinical phase of the drug
discovery process.
MDL was founded in January 1978 as Molecular
7 Elsevier MDL Design Limited. The three founders, Stuart Marson
(San Ramon, California, and Stephen Peacock (both UC Berkeley postdoctoral
USA) fellows) and W. Todd Wipke (a professor at UC Santa
Cruz) started the company as a computer-aided drug
design consultancy, but soon realized that there was
more customer interest in the tools they had created
for manipulating chemical structures in computers
than in their consultancy efforts, and they switched to
making products: the first MDL product MACCS
(Molecular ACCess System) was shipped in 1979,
and the first customers were Chevron's Ortho Division
(Richmond, CA), Shell (Modesto, CA), and FMC
Corporation (Princeton, NJ).
Today, Elsevier MDL is recognized as a pioneering
leader in discovery informatics. Over 1,000 life
science companies supercharge their discovery
engines with MDL software solutions to generate fresh
ideas and make breakthrough discoveries. By
synchronizing and streamlining the sharing and
management of vital information and knowledge, MDL
enables scientists to work more efficiently.
Synamatix was created to solve one of the largest
8 Synamatix Sdn. Bhd. problems facing life sciences today - namely, that the
growth in genomics data continues to outpace
(Kuala Lumpur, advancements made in IT, specifically memory and
Malaysia) processing power. The need for more efficient means
of storing and analysing genetic data is enormous.

Future Trends
With the discovery of new genes it is becoming increasingly important to
improve our understanding of the complex biological network in the body.
Systems biology studies undertaken based on systematic analysis and
computational modeling of the biological phenomenon.

12
The use of currently available bioinformatics tools are thought to be restricted
to few trained and specialized bioinformatics users. The major reasons being:
1) Non User friendly tools
2) Simulation of Actual Cell Process is Very Difficult to Achieve
3) Lack of Common Technology Standards in Bioinformatics Applications
4) Lack of communicability
5) Integration of Data from Various Sources

1.3. Overview of the Global Bioinformatics Industry


This section is structured to provide an overview of the current status of the
global bioinformatics industry.
Bioinformatics as a domain can be divided into the following segments:

• Structural Genomics
• Functional Genomics Proteomics
• Computational Biology
• Computational Chemistry

Some of the major challenges faced today by the global bioinformatics


vendors are outlined below. The primary limitation today is that the use of
currently available bioinformatics tools are thought to be restricted to few
trained and specialized bioinformatics users. A key reason is the disconnect
between the biological and mathematical sciences that make up this domain.
These limitations extend to:

1. Broad lack in understanding of how algorithms are implemented


and run on computers.

2. Precise and predictive models to simulate how a living cell


responds to external stimuli are still lacking. In essence, there is
limited success in the ability of labs to simulate the living organism

3. There are a lot of tools in the market which are designed using
various technological platforms. The biggest challenge right now is
in finding a way how to devise the most effective approach for
organizing, analyzing, storing, manipulating and applying results
from various high-throughput systems.

4. Standardization is the key to future success in genomics,


proteomics, gene expression studies, drug development and
computational biology. Better and more integrated systems,
algorithms, annotation protocols and user interfaces must be
developed if the deluge of information is to be interpreted in any
useful way.

13
5. In addition, for a truly high-throughput screening and other such
automated synthesis systems must be installed and integrated with
the lab's workflow. This will require simpler and more accessible
user-interface software.
6. With the applications not being able to communicate with each
other scientists are spending more time to configure and integrate
their various applications, this is what prevents scientists to use
new bioinformatics tools and this challenge is likely to have a high
impact
7. The data available to the bioinformaticians today is usually from
multiple and varied sources and the data points are often times
corrupt. The need for standard data formats and interfaces is a
major force in bioinformatics. Scientists have developed their own
individual annotation systems as they developed their massive
databases, and there has been very little standardization or
coordination.

Some of the proposed solutions that are being considered today are:

1) Development of tools that facilitate conversion of raw data in various


formats to a standard data
2) Effective data management since a large amount of data that is being
generated is increasing a demand for data management tools
3) Increased interest in reduction in drug pipelines and the subsequent
pressures to increase drug research productivity.
4) There is also a move to migrate towards dry lab techniques. Wet lab
techniques usually need multiple validations, thus increasing the cost and
time.
a. In-silico options exist on two fronts, one being validation through
the simulation and modeling of predicted behavior, the other
being that of supporting validation through wet lab techniques.
b. Support comes in form of information management, data mining
and data visualization.
c. Typically costs for wet lab techniques increase on a linear scale,
while the in-silico approach reduces cost on a linear scale.
5) Technologies such as homology mapping, DNA micro array, computer-
aided molecular design, high throughput screening and others have
allowed for a dramatic increase in throughput from target identification to
target validation. Increasing dramatically the number of compounds that
can be pushed into the market for potential uses.
6) Systems biology is a shift that is slowly occurring in biological sciences. It
is an academic field that seeks to integrate biological data as an attempt to
understand how biological systems function. By studying relationships
and interactions between various parts of a biological system, it seeks to
develop an understandable model of the whole system.

14
7) Data management for clinical trials is increasing the demand for regulatory
compliance tools
The goals of the genomics software tools could be summarized as the
creation of novel, indicative knowledge relating genomic data to large-scale
models, providing insight into the molecular basis of life. Included in this is the
need to consolidate, distribute and organize both the raw data and the results
of these studies. These goals emphasize the intertwined nature of data
management and data analysis, for that which organizes data effectively must
to some extent understand the data it organizes, even if this understanding is
limited simply to relational data storage algorithms.

The genomics segment represents the front-end of the bioinformatics market.


This market, as the name implies, is concerned with the analysis of the
gigantic amount of data being generated. The analyzed data is primarily
utilized in drug-discovery efforts, but also acts as a repository that can reveal
information as to ancestral family trees and a genotypic, rather than a
phenotypic, grouping of species. Resulting from the applicability to drug
discovery, this market segment traditionally contains companies from
pharmaceutical, biotechnology and recently from the computer industry as
well. Databases generated to incorporate various data-types, as discussed in
the 'Bioinformatics Information Manipulation Systems' section research
service act as suppliers to this market segment. Pharmaceutical and
biotechnology companies that primarily use these tools for data analysis
generate even more data that further adds to the database storage
requirements.

This additional data along with the previous data is further analyzed, thus
forming a circular flow of information within the bioinformatics market.
Products in this market segment essentially cover the core of the
bioinformatics market.

The most commonly used tools included in the segment are:

• Data capture tools from micro array-based experiments

• Comprehensive genomic sequence analysis packages/suites

• Genetic mapping tools/utilities linkage analysis tools

• Primer selection tools

• Sequence similarity, database search tools

• Sequence database access tools

• Sequence analysis tools

There is an increased focus amongst the emerging technology companies


and developing economies to establish a foothold in the bioinformatics

15
domain. To develop into a leadership position, some of the key take home
points for Malaysia to consider when building a road map for the
bioinformatics industry are:

• Leverage the advantage of a lower cost structure

• An established intellectual capital in advanced chemistry, increasingly


good manufacturing practices (GMP) and process engineering skills

• Strong intellectual resources and English speaking workers

• Proximity to a large market in Asia Pacific

• Efficient site operations which are comparable to similar


operations/laboratories in the other emerging markets in Asia and
Eastern Europe

• Large investments in infrastructure building

• The potential for the private sector to expand healthcare access is vast

• A transparent legal and financial systems


These considerations are indicative in nature and will require to be whetted in
detail prior to development of any focused roadmap. Also, it is important to
understand that some of the models being considered by global players today
are:

• Collaborative approach for research / manufacturing partnerships

• Service provider oriented models that meet needs of pharma majors

• Scale of the market is likely to support a mixed model approach that


allows for a collaborative as well as a competitive strategy.

2. KEY PROCESSES IN DEVELOPING THE ROADMAP

MIMOS in association with the identified workshop facilitators, conducted a


series of national-level workshops. The primary objectives of these
workshops are to assemble together the key stakeholders in the Malaysian
Bioinformatics domain to collectively formulate the appropriate strategies and
develop a national-level technology roadmap for bioinformatics. The key
participants (shown in Section 8) were primarily from the academic and
research institutions, with a small gathering representing commercial entities.

The workshop was structured in modules that involved sharing with the
audience insights into the global industry and some key trends. Emphasis
was however laid on group discussions and simulated group activities. The

16
activities were targeted at pulling out key areas of focus and relevant
concerns.

The feedback from these was captured and analyzed against the existing
opportunities in the area of bioinformatics. The existing infrastructure and
gaps between state of readiness and areas of opportunities were then
examined.

To prepare for these meetings, several key experts were invited to help the
consultant to identify, clarify and expand the key technology areas in this
bioinformatics roadmap. This was carried out over two sessions, on
Wednesday 17 August and Friday 2 September, 2005. The invited experts
were (arranged in no particular order),
• Assoc. Prof. Dr. Amir Merican University of Malaya
• Prof. Dr. Zaharin Yusof University Science Malaysia
• Assoc. Prof. Dr. Naomi Salim University Technology
Malaysia (Skudai)
• Dr. Habibah Wahab University Science Malaysia
• Assoc. Prof. Dr. Suhaimi Napis University Putra Malaysia

To set the terms of reference for the workshop, Assoc. Prof. Dr. Merican
presented on several definitions of the key areas to help the participants
better understand and appreciate them.

Several key milestones were achieved in the roadmapping exercise. This is


shown in figure 6.

17
To strategize the formation
Pre- of a research cluster for
Planning workshop Bioinformatics involving all
Meeting 1 relevant players, 22 April
2005 (69 participants)

To produce the first version of the


national roadmap for Bioinformatics,
Workshop 17-18 May 2005 (90 participants).
1

nd
Pre- Preparation meeting for the 2
workshop Bioinformatics Workshop, 17
Meeting 2 August 2005 (7 expert members)

To determine and define the


domain areas for the
Pre- Bioinformatics roadmap, 17
workshop August 2005 (7 expert members)
Meeting 3
To prioritize the key areas in Bioinformatics, to
form working groups to further identify the sub-
key areas, and determine the work plan for
Workshop each niche area, 27-28 September 2005
2
To discuss and expand on the key
(research) areas in the proposed national
Group Bioinformatics roadmap, the key deliverables
Facilitators and their milestones/deadlines and agree on
Meeting the format and key details required,
14 October 2005

To review and prioritize all


Programme the programme/project
Prioritization proposals for the roadmap,
Meeting 7 December 2005

Final
Meeting Submission

To review the revised


proposals and budgets for final
Figure 6: The technology roadmapping process submission, 15/12/2005

18
2.1. Prioritisation Process

At the end of the two workshops, a technical committee was set up involving the
workshop participants to assist MIMOS in reviewing and prioritising the proposals
received. This committee consists of 7 domain facilitators as shown below
(arranged in no particular order):

1.) Dr. Arif Anwar


Synamatix
*(Molecular Bioinformatics)

2.) Associate Professor Dr. Mandava Rajeswari


Universiti Sains Malaysia
*(Common Enabling Tools)

3.) Associate Professor Dr. Naomie Salim


Universiti Technologi Malaysia
*(Structural Bioinformatics)

4.) Dr. Mustaffar Kamar Hamzah


Universiti Technologi Mara
*(Bioinformatics Applications)

5.) Professor Dr. Faridah Habib Shah


Melaka Institute of Biotechnology
*(Bioinformatics Applications)

6.) Professor Dr. Abu Bakar Abdul Majeed


University Technology Mara
*(Systems Biology)

7) Mejar Zailani Safari (Dr. Lai Weng Kin)


MIGHT (MIMOS Berhad)
*(Governance)

* Please refer to section 3 for detailed descriptions of the 6 domains

Subsequently, the first technical committee meeting was convened on 14th


October 2005 to further discuss on the procedures of RFP (Request for
Proposal) and the proposal template to be used. On 7th December 2005, a
programme/project review and prioritization meeting was held. Beside the 7
domain facilitators, 6 additional technical experts were invited to help evaluate
and prioritize the key technical areas. The 6 technical experts are (arranged in no
particular order):

19
1.) Associate Professor Dr. Rofina Yasmin Othman
Universiti Malaya

2.) Dr. Zeti Azura Mohamed Husin


Universiti Kebangsaan Malaysia

3.) Dr. Habibah Wahab


Universiti Sains Malaysia

4.) Associate Professor Dr. Rosni Abdullah


Universiti Sains Malaysia

5.) Associate Professor Dr. M. Taj Abdullah


Universiti Malaysia Sarawak

6.) Dr. Vijay Kumar


Universiti Malaysia Sabah

The key features that were used to prioritised the proposals received, revolve
around the following themes,

Key Areas for Roadmap Assessment


• Government Policy & Direction
• Infrastructure
• Funding
• Research & Development Potential
• Human Resources

3. BIOINFORMATICS FRAMEWORK

Overall, the 6 key areas in Bioinformatics which were developed from the
workshops, and agreed upon by the Malaysian bioinformatics researchers,
scientists and practitioners are as shown in figure 7 below,

20
BIOINFORMATICS

Domain #6 Domain #5 Domain #4 Domain #3

Molecular Structural Systems Bioinformatics


Bioinformatics Bioinformatics Biology Applications

Domain #2
Common Enabling Tools

FigureDomain
3. The#1
Key Areas in Bioinformatics andce
Governan their interdependencies

(Adapted from the presentation by Assoc. Prof. Dr. Amir Merican during the
bioinformatics technology roadmap workshop on 27 September 2005)

Figure 7: Key Domains in the Technology Roadmap for Bioinformatics

A brief description of the definitions used for each of the six domains is given in
the following sections.

3.1.1 Domain #1 : Governance


Clearly, there is a need for the right policies to be put in place to oversee
the efficient management of the resources and data that will be
developed. The major issues and challenges that will be addressed within
this domain would include,

• participants’/users’ access rights • policy to manage the


resources,
• intellectual rights management • appointment of steering
policy, committee,
• revenue generation and sharing, • development of best practices,

• accounting policy, • education and training,

21
• resources monitoring, • etc.

Please refer to Appendix B for details of the priority areas.

3.1.2 Domain #2 : Common Enabling Tools


The common enabling tools in bioinformatics, as the name suggests,
would cover the tools which are needed to support bioinformatics work in
molecular bioinformatics, structural bioinformatics, systems biology as well
as many of the bioinformatics applications. As per the agreed definition,
examples of such tools would be,

• Data search & retrieval, • Data warehousing,

• Data visualization, • Image processing,

• Data mining & knowledge • Data communication


discovery, (information sharing),
• Data filtering/pre-processing, • Data ontologies & semantics,

• Data management, • Signal processing,

• Grid technologies, • etc.

On the other hand, the proposals received suggested an interesting


indication of the local needs as shown below:

i.) Microarray Data Processing, Analysis and Visualization


Microarrays are currently the most significant high-throughput
experimentation tool in the life sciences and have been used for
various applications such as drug discovery, disease diagnosis
and related studies. Microarray technology is still rapidly
developing, and as such, there are no established standards for
microarray experiments or how the data is to be processed and
presented. This is a serious concern which needs to be
attended. Microarray analysis has been widely adopted as the
tool for the generation of gene expression data on a genomic
scale. Although largely successful, it is noted that one of its
limitations is the lack of generic tools for processing such data.
This brings about a serious concern as there are divergent
approaches to processing and presenting microarray data –
each vendor typically adopt their proprietary formats for data
capture, storage and exchange. Most laboratories conducting
microarray experiments have various equipments and the lack
of standards or generality hampers efforts to store these

22
important data in the most efficient and widely accessible
manner. Projects in this group aim at developing a generic
platform and a collection of tools using open standards for
microarray data processing, collection and visualization.

ii.) Database Integration, Search Engines


Malaysia is one of the twelve mega diversity countries of the
world which holds about 60% of the known species. Information
about Malaysia rich resource of biodiversity, natural products
and traditional knowledge needs to be captured and integrated
in order that this information is made available to researchers in
life sciences and pharmaceutical industry. Projects proposed
aim at developing databases to capture the diverse knowledge
available. The main focus of the projects in this group is to
develop platforms and search engines to integrate diverse
databases.

iii.) Data Mining


Bioinformatics databases are growing at an explosive rate. This
necessitates effective data mining and analysis tools. Project sin
this category are aimed at developing generic data mining tools
for specific areas such as chemical pathology, genome and
gene expression and a generic visual data mining tool.

iv.) Bioinformatics Literature


Research and development in bioinformatics has spurred
growth in bioinformatics literature at an equally significant pace.
Projects are proposed to organize and search this literature
using domain specific semantic knowledge.

v.) Bioinformatics Ontologies


The amount of knowledge and information to be dealt with in
bioinformatics can be overwhelming, some means to
systematically organize and manage it is extremely needed.
One method is by using ontology technologies. Ontology allows
for the description of domain knowledge in a generic way and
provides an agreed understanding of a domain. Development of
domain ontology can help biologists and bioinformatics
scientists in knowledge sharing and knowledge reuse. It helps in
the development of bioinformatics domain specific knowledge
acquisition tools such as tools for searching, categorisation and
filtering. In the proposed project the ontology is used to
formalize the representation of the information and knowledge
about proteins and amino acids. The taxonomy of the protein
ontology enables better understanding of the functions and

23
characteristics of different proteins and their constituent amino
acids.

vi.) Cell biology image analysis & Visualization


Seeing a cell is an essential aspect in cell biology.
Understanding the biomechanical behaviour of cellular
structures may be implemented through image analysis and
visualization. Protein localization, detection of multiple tagged
proteins, visualization of the highly dynamic and complex
structures of the large volumes of high through put data and
quantitative analysis of the volume of cellular compartments are
some of the challenges in this domain. Todays imaging
technologies in cell biology generate a wealth of complex data,
typically consisting of thousands of individual image slices.
Qualitative and quantitative analysis and multidimensional
visualization of this data is a challenging task. The project
proposed here addresses these issues.

vii.) Protein Gel Image Processing and Analysis


Two dimensional (2D) electrophoresis gel images can be used
for identifying and characterising many forms of a particular
protein encoded by a single gene. In order to carry out gel
image analysis, one first needs to accurately detect and
measure the protein spots in a gel image. Existing software
attempts to automate all steps as much as possible, but errors
in the detection and matching stages are common. This means
that gel analysis requires a significant level of operator
interaction, which is very time consuming. Semi-automated and
fully automated techniques using computer vision techniques
can be applied on digitized images of protein spots to speed up
this analysis as well as enhancing its accuracy.

Please refer to Appendix C for details of the priority areas.

3.1.3 Domain #3 : Bioinformatics Applications


These deal with the applications where bioinformatics can play a very
crucial role, e.g.

• Ecoinformatics • Neuroinformatics

• Biodiversity infiormatics • Cancer bioijnformatics

24
• Agriculture informatics (e.g. • Microarray bioinformatics
cattle informatics, etc.)
• Biomedical informatics • Personalised medicine

Research work proposed in the technology roadmap on Bioinformatics


Applications domain involves the following five key areas:

1.) Flora And Fauna Biodiversity Databases


Malaysia is one of the twelve mega-diversity countries of the world
which holds about 60% of the known species. There is a need for a
research that capture, store and organize these huge amount of
biodiversity resources. There will be an urgent need for development of
software tools for data mining, analysis and modelling, and
downstream processing. Development of this database alone will
contribute to enriching future research endeavours globally. The three
major sectors identified as the thrust areas for creating Malaysian
Biodiversity Databases are as follows:

o Malaysian Health Biodiversity Database for Common Diseases


Application
Diseases that have been identified and prioritised includes; Cancer,
Dengue, Tuberculosis & Alzheimer. In this work Malaysian Health
Biodiversity Database for Common Diseases Application are
developed complete with the hardware and software engineering
investigations. Nonlinear modelling for Bioinformatic data mining
will also be developed for defining DNA sequencing of selected of
those common diseases. New measurement techniques based on
microwave technology will also be developed for DNA pattern
detection of living cells with computer based electronics system for
auto detection. This project hopes to achieve a large database that
has the following datasets;
• 200,000 samples of DNA/Gene profile database
• 2,000 data samples using Microarray technology

o Malaysian Biodiversity Database for Marine Application


This project is concerned on the development of Malaysian
Biodiversity Database for Marine Application complete with the
hardware and software engineering requirements. This project
hopes to achieve a large database that has up to 15,000 samples
of fish species. Other techniques that maybe developed that
includes; nonlinear modeling for data mining, automated DNA
detection and classifications and DNA dot blot kit for marine
characterizations and identification.

o Microbial Biodiversity Database

25
In this sector database relating to bioresources (eg microbes,
plants and animals) will be developed for future research and
development. The projects in this area would be carried out mass
screening of organisms of interests. In line with availability of those
datasets there is a need to build a search engine for the bacterial
bioinformatics database. Whole prokaryotic genomes are now
being sequenced in large volumes and this has led to a dramatic
increase in the number of whole sequenced genomes. However, a
large number of these sequenced prokaryotic genomes have yet to
be characterized, particularly in terms of its coding regions, which
may contain biologically active sites that may serve as early targets
for potential drug development. This project’s novelty lies in the
whole genome prediction method where the aim is to evolve ANNs
able to carry out the prediction of coding regions based on the input
of the prokaryote’s entire genomic sequence rather than windowed
subsequences of fixed lengths.

2.) Agricultural Research Program


o Palm Oil Research
o Rice Research Program
3.) Health-Care Research Program
o Cancer Research Projects
o Tuberculosis Research Projects
o Other Healthcare Projects
4.) Industrial Bioinformatics
o Modeling and Scale Up of Cr(VI) Reduction by Bacteria using Airlift
Bioreactor
5.) Other Future Bioinformatics Applications
o DNA Computing Program
The completely new technology in computing is expected to be
Quantum Computers and DNA-Based (Bio) Computers instead of the
limit-reaching Silicon Computers. It is very important to point out that
DNA Computing is still in its infancy, however, most if not all scientists
believe that it will transform the future of computers specifically in
pharmaceutical and biomedical applications. This is a fundamental
research area which could be a leveraging technology of the future.
Existing proposals are seen as capacity-building with investments for
the future. The initial phase of this research project will involve
developing the knowledge and skills of the teams in DNA based
information storage and processing, and in developing software for
certain applications in Bio informatics to enhance the knowledge and
capabilities of the research group. The second stage will involve
gaining proper training in the line of developing Bio Logical Gates in
order to develop them as part of the future.

26
Please refer to Appendix D for details of the priority areas.

3.1.4 Domain #4 : Systems Biology


Systems Biology refers to the study of the mechanisms underlying
complex biological processes as integrated systems of many, diverse,
interacting components.

The task that Systems Biology attempts to undertake is the actual


integration of genomics, proteomics and indeed all the emerging “omic”
disciplines, with the ultimate aim of designing biological systems.
It will be the dominant paradigm in biology, & many medical applications
as well as scientific discoveries are expected. Furthermore, it involves,
(1) Collection of large sets of experimental data (by high-throughput
technologies and/or by mining the literature of reductionist molecular
biology and biochemistry),
(2) Proposal of mathematical models that might account for at least
some significant aspects of this data set,
(3) Accurate computer solution (simulation) of the mathematical
equations to obtain numerical predictions, and
(4) Assessment of the quality of the model by comparing numerical
simulations with the experimental data.

Similarly, the proposals submitted by local community clearly shown that


systems biology is the study of the science of living as a holistic system of
genetic, genomic, protein, metabolite, cellular, and pathway events that
are in motion and interdependent. It is as much about linking the system
components with a single level of biological organization as it is about
studying the links between different levels of biological organization. The
amount and variety of biological data now available, together with
techniques developed so far have enabled research in Bioinformatics to
move beyond the study of individual biological components (genes,
proteins etc), albeit in a genome-wide context, to attempt to study how
individual parts cooperate in their operation. Bioinformatics has now
moved closer to the area of Systems Biology which seeks to integrate
biological data as an attempt to understand how biological systems
function. By studying the relationships and interactions between various
parts of a biological system it is hoped that an understandable model of
the whole system can be developed. Computational and mathematical
approaches to modeling biological systems have also re-emerged as an
essential component of modern life sciences, mostly due to the urgent
need to process, analyze, and interpret increasing amounts of data we are
able to extract from different organizational levels of the biological system.
It is clear that systems biology empowers us both to understand and to
create. In regards to the latter, the approach can for example be used to
design better interventions to prevent or cure diseases or to engineer

27
whole organisms or parts of them to modulate activity of specific biological
processes and as such affect their interaction with the rest of the system
(applicability in metabolic engineering, organ transplants, and
biotechnology in general). While the field of systems biology is in its
infancy, the applicability of the approach has already been demonstrated
in domains of metabolic engineering, biomarker and drug discovery and
disease characterization. The systems biology domain has earmarked 7
main programmes of research to spearhead the country’s move towards
bio-informatising biology, beginning with the e-cell modeling.

Please refer to Appendix E for details of the priority areas.

3.1.5 Domain #5 : Structural Bioinformatics


Structural bioinformatics is a subset of bioinformatics that is concerned
with the use of biological structures – nucleic acids, proteins, lipids,
carbohydrates, ligands etc. and complexes thereof to further our
understanding of biological systems.

Protein Databank reported that about 30000 protein structures have been
deposited in the database. However, knowledge from the structures
particularly of the protein interactions with ligands or other
macromolecules are elusive but highly needed in order to understand
metabolic pathways and diseases and hence the need to study the
fundamental binding mechanisms using various modeling and simulation
strategies.

It is without doubt that the advent of Human Genome Project has


contributed to a large amount of data that continuously grow
phenomenally which lead to the scientific area called bioinformatics. From
the genome sequence data, scientists are now working to convert it to
protein structures. Knowledge on protein structures is the key to
understanding metabolic pathways and diseases. It is believed that an
understanding of proteins, i.e. its structures, functions and interactions
with other molecules will lead to new therapies that will revolutionize the
way disease is diagnosed and treated. The branch of bioinformatics
dealing with knowledge on proteins is called structural bioinformatics.

Genome sequencing projects; generate tremendous amounts of protein


sequence data and requires intensive computational resources as well as
deep understanding of the subject. Parallel algorithms are used to reduce
these computational requirements whilst offering speed of processing. The
structure of the proteins gives rise to their functionalities. To understand
we are required to be able to deduce or predict the 3 dimensional
structures from the amino acid sequences in a fast and accurate way. This
is of utmost importance to rapid new drugs discovery.

28
Research work proposed in the technology roadmap on Structural
Bioinformatics involves the following three key areas:
a) Structural Databases Tools
b) Protein Structural Prediction Tools
c) Simulation of Molecular Docking

Please refer to Appendix F for details of the priority areas

3.1.6 Domain #6 : Molecular Bioinformatics


Molecular bioinformatics comprises the development and application of
informatics for the purpose of acquisition, storage, organization, archive,
analysis, visualisation or interpretation of molecular data and design of
experiments in molecular biology.

In addition, there are 3 interest areas as indicated by the proposals


received from the local community:

i.) Sequence Analysis Tools and Algorithms


As more and more genome data is made available, it is
becoming increasingly important to convert this data into
knowledge. Current tools and applications often have
shortcomings or are inappropriate for many recent large scale
projects. Hence, by leveraging upon national Genome
sequencing efforts, it is proposed that a variety of tools will be
developed, modified or adapted to enable sequence searching,
analysis and alignment.

ii.) Sequencing and management of Genome data


Malaysia has enormous natural biodiversity resources in terms
of indigenous, potentially unique, plant and animal species. In
order to leverage potential industrial and pharmaceutical
applications of this resource, projects based upon large scale
sequencing of key genomes, as well as a wider screening basis
sequencing approach is proposed. A parallel project based
upon establishing the database infrastructure to store, manage,
share, assemble, analyse and annotate this new genome data
is proposed to support the genome sequencing and screening
efforts.

iii.) Design, Testing and Development of Diagnostics


Microarray Technology
Microarray based technologies for gene expression profiling
have become critical tools for deciphering disease pathways.
Recent developments have also highlighted their applicability as

29
rapid, low-cost, highly sensitive diagnostic kits for a variety of
pathogenic and genetic diseases. It is proposed that as an
example, a microarray will be designed and tested for diagnosis
of all 4 subtypes of the Dengue virus. This will be extended to
other diseases, as well as applications in Bio-security and food
testing, both "halal" and vegetarian.

Please refer to Appendix G for details of the priority areas.

3.2 Prioritised Areas


In view of the wide coverage of each domain in bioinformatics, the 6 project
facilitators identified and prioritised the key areas (as shown in Table 3 below)
during the meeting involving the domain facilitators on 14th October 2005 and 5th
September 2007.

Domain #1 : Governance i) Establishment of the Steering Committee


ii) National Bioinformatics Resources
Management
iii) Capacity Development
iv) Education & Training
v) R&D Team Management
vi) Marketing & Business Development

Domain #2 : Common Enabling Tools i) Cell modeling & Simulation


ii) Modeling & Simulation
iii) National LIMS database
iv) Knowledge warehouse for Malaysia
v) Bioinformatics data processing
vi) Bioinformatics imaging and analysis tools
vii) Visualization
viii) Pattern Recognition for Bioinformatics
ix) DNA Computing

Domain #3 : Bioinformatics Applications i) Common Malaysian Cancers


ii) Emerging Malaysian Diseases
iii) Floral & Fauna Biodiversity
iv) Phamarcogenomics/Personalised
Medicine
v) Natural products database
vi) Metabolomics
vii) Human Biodiversity
viii) Malaysian Health Biodiversity Database

Domain #4 : Systems Biology i) Cell modeling & Simulation


ii) Prokaryotel/Eukaryotic Systems Biology
iii) Cancer pathway modeling
iv) Epidemiology modeling

30
v) Gene networking modeling
vi) Modeling for plant pathogen interaction
vii) Small molecules database tools

Domain #5 : Structural Bioinformatics i) Target Identification & Modeling


ii) Molecular Interaction
iii) Small molecules database tools

Domain #6 : Molecular Bioinformatics i) Tools for computing genomics


ii) Genome assembly tools
iii) Tools for personalised medicine & analysis
iv) Microarray chips design and development
v) Diagnostic and biosecurity microarray
chips
vi) Clinical trials software
vii) Phylogenetics tools
viii) Metagenomics

Table 3: Key areas of the 6 domains in bioinformatics roadmap

4 TECHNOLOGY ROADMAP FOR BIOINFORMATICS

4.1 The Technology Roadmap

Listed below are the key technologies identified based on the proposed priority
areas (please refers to Appendix B – G for more details) submitted by the
Malaysian Bioinformatics research community. These technologies have
significant impact in fulfilling the urgent needs of local research community in
terms of direct access to distributed databases, visualization for exploring
multivariate data and efficient algorithms that permit the analysis, manipulation
and searching of exponentially growing data.

Domain #1: Governance


The Policy & Governance domain involves some major processes which would
help to streamline and coordinate the sharing of resources as well research
directions.
1. Setting up the National Bioinformatics Coordinating Center, Steering
Committee and International Advisor Panel.
2. Drafting of Steering Committee Roles & Responsibilities, Access Rights
and IP Policies.
3. Drafting of National Bioinformatics Resource Usage
4. Revised Roadmap
5. Decision of Next Phase (if any).

31
Domain #2 : Common Enabling Tools
1. Data Mining & Knowledge Discovery
2. Bioinformatics Imaging & Analysis
3. Database, Search & Information Retrieval

Domain #3 : Bioinformatics Applications


1. Biomedical
2. Biodiversity Informatics
3. Agriculture Informatics
4. Cancer Bioinformatics

Domain #4 : Systems Biology


1. Process Modeling
2. Metabolic Pathways/Networks

Domain #5 : Structural Bioinformatics


1. Drugs Design/Chemoinformatics
2. Protein & RNA Structure Prediction

Domain #6 : Molecular Bioinformatics


1. Database Development
2. Sequence/Sequence Retrieval
3. Sequence Alignment

32
4 TECHNOLOGY ROADMAP FOR BIOINFORMATICS
4.1 The Technology Roadmap

* May be subjected to further prioritisation

33
4.2 Implementation Schedule

34
4.3 Budget Summary

YEAR #1 YEAR #2 YEAR #3 YEAR #4 YEAR #5 TOTAL


1 14,758,206 12,467,207 17,320,208 8,106,009 5,702,010 58,353,640
2 3,862,092 3,529,737 2,466,604 1,977,009 1,442,010 13,277,452
3 4,690,000 7,360,000 15,114,000 11,714,000 8,664,000 47,542,000
4 33,086,300 34,175,260 28,674,760 24,382,200 18,514,000 138,832,520
5 9,597,560 6,227,560 5,583,000 438,000 438,000 22,284,120
6 2,240,000 2,640,000 2,330,000 2,330,000 2,930,000 12,470,000
68,234,158 66,399,764 71,488,572 48,947,218 37,690,020 292,759,732
% 23.31 22.68 24.42 16.72 12.87 100.00

80,000,000

70,000,000

60,000,000

50,000,000

40,000,000

30,000,000

20,000,000

10,000,000

2006 2007 2008 2009 2010

Figure 7: Total annual budget requested

35
5 RECOMMENDATIONS

The different key issues and challenges of global bioinformatics industry


outlined in section 1 gave rise to several distinct recommendations that can
be made in relation to the local development of bioinformatics field. These
recommendations are intended to highlight the needs of the field from three
standpoints i.e. technology, competency and funding, in order to support the
realisation of the national vision of bioinformatics as described throughout this
report. They build on the recommendations provided by the local
bioinformatics communities which provide a complementary view of the
important challenges facing the field.

Technology
- Expensive infrastructures cost
- Increasing needs of computational resources

Competency Building
- Lack of trained local human capital
- Less competitive package to attract and retain foreign experts

Funding
- Slowness in funding mechanisms

Furthermore, it is also important the roadmap take cognizance of the


following,
9 Clear Government policies that establish core thrust areas.
9 Formation of a steering committee with representation from research,
academia and industry.
9 Facilitate creation of a common shared infrastructure.
9 Develop strong guidelines that adhere to the highest levels of security
of Intellectual property.
9 Foster a research cluster that facilitates creation of unique and
marketable Intellectual Property with global markets in mind.

6. REFERENCES

“The Malaysian Bioinformatics Roadmap”, Assoc. Prof. Dr. Amir Merican,


presented during the second national technology roadmap workshop for
bioinformatics, on 27 September 2005, Kuala Lumpur.

7. ACKNOWLEDGEMENTS

This report could not have been completed without the participation and
commitment of the people who contributed their knowledge and time in the
workshops that were organised. MIMOS Berhad graciously acknowledges
their contributions.

36
8. LIST OF CONTRIBUTORS

The list of the contributors who have volunteered their time and expertise is
shown below (arranged in no particular order). As is apparent, there is a
judicious mix of academia, research institution and organizations that have a
potential role to play in the Malaysian bioinformatics industry.

1.1.1 INSTITUTIONS OF HIGHER LEARNING


Name Affiliation
1. Assoc.Prof Dr Rosni Abdullah USM
2. Nuraini Abdul Rashid USM
3. Dr. Habibah A. Wahab USM
4. Dr. Nornisah Mohamed USM
5. Prof. Dr. Zaharin Yusoff MIMOS
6. Dr. Chan Huah Yong USM
7. Wahidah Hussain USM
8. Zurinahni Zainol USM
9. Omar Shawkataly USM
10. Mohd. Razip Samian USM
11. Assoc. Prof. Dr. Mandava Rajeswari USM
12. Rashidah USM
13. Dr. Tan Do Yew Unisel
14. Prof. Dr. Lokman Shamsudin Unisel
15. Hasdianty Abdullah Unisel
16. Assoc. Prof.Dr. Suhaimi Napis UPM
17. Prof. Datin Dr Khadijah Mohd Yusoff UPM
18. Assoc. Prof Raja Noor Zaliha Raja Abdul Rahman UPM
19. Dr. Michael Wong UMS
20. Prof. Datin Dr. Ann Anton UMS
21. Dr. Jason Teo UMS
22. Dr. Patricia Anthony UMS
23. Dr. Vijay Kumar UMS
24. Dr. Ibrahim Ali Noorbatcha IIUM

37
25. Prof. Dr. Jubair Jwamear A. S. Al-Jaafer IIUM
26. Dr. Ahmad Aman IIUM

IIUM,
27. Dr. Kamarul Ariffin B. Khalid
Kuantan

28. Dr. Goh Yong Kheng UTAR


29. Assoc. Prof. Dr. Tham Choy Yoong UTAR
Monash
30. Dr. Loke Kar Seng
University
Monash
31. Dr. Chua Tock Hing
University
32. Assoc. Prof. Mohd. Tajuddin Abdullah UNIMAS
33. Dr. Hairul Azman Roslan UNIMAS
34. Dr. Edmund Sin UNIMAS
35. Dr. Awang Ahmad Salehin UNIMAS
36. Emran B. Mohd. Tamil UM
37. Assoc. Prof. Khairuddin Hj. Itam UM
38. Assoc. Prof. Dr. Amir Feisal Merican UM
39. Prof. Rauzah Hashim UM
40. Prof Dr Norsaadah Abdul Rahman UM
41. NoorZaily bin Mohamad Nor UM
42. Assoc. Prof. Dr. Rofina Yasmin Othman UM
43. Mohd. Hasbullah Omar UUM
44. Assoc. Prof. Azizi Zakaria UUM
45. Assoc. Prof. Dr. Muhammad Suzuri Hitam KUSTEM
46. Prof. Dr. Md. Yazid Mohd Saman KUSTEM
47. Dr. A Aziz Ahmad KUSTEM
48. Assoc. Prof. Dr. Jennifer Harikrishna MUST
49. Dr. Ng Kim Yong MUST
50. Prof. Dr. Safaai Deris UTM
51. Assoc. Prof. Dr. Naomie Salim UTM
52. Dr. Mohd Shahir Shamsir UTM
53. Prof. Dr. Rahmah Mohamed UKM
54. Dr. Zeti Hussein UKM

38
55. Dr. Mustaffar Kamal Hamzah UiTM
56. Mohd Zulkifli UiTM
57. Mohd. Ali Isa UiTM
58. Prof. Dr. Abu Bakar Abd. Majeed UiTM
59. Dr. Chin Chiew Foan MMU

1.2 Research Institutes


60. Dr Marzalina Mansor FRIM
61. Dr Norwati Mohamad FRIM
62. Faizatul Shima Mohd Yunus FRIM
63. Dr. Saeid Reza Doust Jalali FRIM
Sarawak
64. Datin Eileen Yen Ee Lee Biodiversity
Centre
Sarawak
65. Mr. Gilbert Lau Sei Kung Biodiversity
Centre
Melaka
66. Prof. Dr Faridah Habib Shah Institute of
Biotechnology
Melaka
67. Dr. Bhore S. J. Institute of
Biotechnology
Melaka
68. Siti Hasrah Mohamed Kassim Institute of
Biotechnology
Melaka
69. Nor Sharmila Sha'aban Institute of
Biotechnology
BIOTEK,
70. Mohd. Yusof Radzuan Saad
MOSTI
71. Leslie Low Eng Ti MPOB
72. Dr. Umi Kalsum Abu Bakar MARDI
73. Maheswary Vellupillai MARDI
74. Prof Che Husna BJK, KPT
75. Vikneswaran Gopal BJK, KPT
76. Prof. Dato Ir. Dr. Mashkuri Yaacob MIMOS
77. Dr. Lai Weng Kin MIMOS
78. Luke Jing Yuan MIMOS
79. Noor Aida Abdullah MIMOS

39
80. Irdawati Ab. Rahman MIMOS
81. Melissa Seah MIMOS
82. Loy Chen Change MIMOS
83. Sulaiman MIMOS
84. David Lo MIMOS
85. Dr Wan Latifer MIMOS
86. Mejar (R) Zalaini Safari MIGHT

Private organizations
87. Dr. Hirzun Mohd. Yusof Sime Darby
88. Nivashini Sime Darby
89. Nadirah Bt. Md Nasaruddin Sime Darby
90. Dr. Chan Yoke Fun Sime Darby
91. Abd Karim Hercus Synamatix
92. Dr Arif Anwar Synamatix
93. Nicholas K. C. Low OSS
94. Farul Azim Mohd. Ghazali OSS
95. Siah Eng Thian OSS
Infovalley Life
96. Dr . D. T. Singh
Sciences
Infovalley Life
97. Soo Shang Yew
Sciences

40
APPENDIX A

Motivations for Roadmapping

1. Roadmapping is just good planning, for all the areas that contribute to a
successful product line. The roadmapping process leads a cross-functional
planning team to fully examine potential competitive strategies and ways
to implement those strategies. Technology decisions are made as an
integral part of the plan, not just an afterthought.
2. Roadmaps incorporate an explicit element of time. Roadmapping helps the
team make sure that they will have the technologies and capabilities at the
time they will be needed to carry out their strategy.
3. Roadmaps link business strategy and market data with product and
technology decisions. Roadmapping prompts a team to be specific with
respect to planned features or performance in terms of value for
customers.
4. Roadmaps reveal gaps in product and technology plans. Areas where
plans are needed to achieve objectives become immediately apparent,
and can be filled before they become problems.
5. Roadmaps prioritize investments based on drivers. At every stage of the
roadmapping process, the focus is on the few most important things:
customer needs, product drivers or technology investments. The team is
prompted to identify, implement, develop, or acquire the most important
things first, spending time and resources in the best way. Also, with a set
of roadmaps in a common format, portfolio decision makers are better
equipped to make the tradeoffs and choices that meet the corporation’s
objectives.
6. Roadmapping helps set more competitive and realistic targets. Product
performance targets are set in terms of the industry competitive
landscape. For example, experience curves are an especially useful tool
for establishing industry based targets. Recognizing that a winning product
strategy usually cannot be all things to all people, the team sets objectives
to lead, maintain parity, or lag competitors in specific areas.
7. Roadmaps provide a guide to the team, allowing the team to recognize
and act on events that require a change in direction. Part of the process of
developing a roadmap is to create a risk roadmap, identifying those events
or changes in conditions that signal a need to reevaluate and revisit the
plan during the development journey.
8. Sharing roadmaps allows strategic use of technology across product lines.
Crossroadmap reviews look across the plans of several product lines to
find common needs, capabilities that can be leveraged, or development
costs that can be shared. Roadmaps can also support a common
corporate database of available or needed technologies.
9. Roadmapping communicates business, technology and product plans to
team members, management, customers, and suppliers. With a roadmap,
a team can clearly explain to customers and suppliers where they are
going. A roadmap gives customers information they can use in their own

41
planning, and can be used to solicit their reaction and guidance. With
suppliers, a roadmap is a framework for partnership and directions setting.
The roadmap also tells the larger development team, corporate
management, and other development teams where the product line is
headed.
10. Finally, roadmapping builds the development team. The roadmapping
process builds a common understanding and shared ownership of the
plan, incorporating ideas and insights from team members representing
the many functions involved in a successful development process.

Source : http://albrightstrategy.com/ (accessed on 10 October 2005)

42
APPENDIX B

TECHNOLOGY PRIORITY AREA: GOVERNANCE

Research Programmes Outputs Outcomes Impact


1 Biotechnology Research and Research and Improved Promotes biotech
Commercialization commercialization commercialization investment in
Database. database. rate of R&D Malaysia.
outputs. Higher
commercialization
- To provide a
rate of R&D
comprehensive digitized output
information system that Research
can allow companies and projects with
investigators to identify higher
local R&D projects for commercialization
investment and potential and IP
collaboration. value.

2 Access rights for participants Policy for national Clear terms of Promotes sharing
of the National bioinformatics membership. of national
Bioinformatics Resource resource centre. resources.
Centre (NBRC).

- To define the participation


and access rights of
various grades of
membership.

3 Establishment of specialized A national Promotes and Higher efficiency


biological grids and the biological facilitates real and effectiveness
Malaysian Integrated information time sharing of of services, yield
Bioinformation System system. specialized or quality of
resources e.g. products or
(MyBIS)
data, information, practices in target
compute power, areas e.g.
- To build an organized and etc. agriculture, ICT,
structured biological biomedical, etc.
information system for the
country.
4 Intellectual property policies A set of IP Promotes efficient Ensure national
for the National policies. sharing of resources are
Bioinformatics Resource resources. adequately
Centre (NBRC). protected.

- Defines the set of IP


policies to ensure correct
usage of resources from
the NBRC.

5 Steering committee for the Steering Facilitates the Promotes efficient


National Bioinformatics committee for the smooth operation and effective
Resource Centre. NBRC. of the NBRC. sharing of
national

43
bioinformatics
- A steering committee who resources.
will have the authority to
make appropriate
decisions related to the
NBRC.

44
APPENDIX C

TECHNOLOGY PRIORITY AREA: COMMON ENABLING TOOLS

Research Programmes Outputs Outcomes Impact


1 Development of Toolkits • A high- • Software-based • Increase
for Vendor Modality- performance, tools to assist the productivity of
agnostic Microarray open-standards large number of scientists working
Data Processing based software scientists that use in initiatives in
toolchest for microarray data genome-research,
processing of from collaborating systems biology
− To develop portable,
microarray image laboratories
high-performance data, that is
toolchest for the modality-agnostic
processing of and vendor-
microarray images agnostic
2 Research and • A toolbox of • Microarray • Enhance growth of
Development of Open microarray analysis tools Malaysian
Source Toolbox for analysis tools and which have clearly biotechnology
Intelligent Microarray database shown its industry
Data Analysis based on effectiveness on • Enhance the
real data of major efficiency and
Machine Intelligence diseases or fauna throughput of the
Algorithms and floral research
community in
− To develop image detecting,
processing, gene identifying and
selection and pattern curing debilitating
recognition diseases in the
techniques for DNA country and
globally
3 Wavelet based • Fast and efficient • Applying and • Bring significant
Microarray Image image denoising extending image value to the
Denoising and of microarray processing and research
Compression image data using wavelet community
wavelets techniques in
• Microarray Image microarray
− To develop a set of
compression research
applications that using wavelets • Bridging the gap
perform image between computer
denoising and scientists and
compression biology experts
specifically using
appropriate wavelet
techniques
4 BioD NatPro Database • Traditional • Enhanced natural • Coordination of key
System knowledge product and novel biodiversity and
database compound research
− To develop a • Natural product discovery information which
traditional knowledge database • Documentation of will enhance
database • Improved BioD traditional natural product
NatPro Database knowledge of discovery effort
System for indigenous
managing communities
biodiversity, • Inventory of

45
traditional Sarawak's useful
knowledge and biological
bioprospecting resources
data
• Database
framework for
traditional
knowledge
documentation
5 Integrated Fungus • A centralized local • Provide basic and • Better
Databases for Malaysian data “warehouse” applied research understanding on
Plant Protection with of all relevant support on the protecting,
Text and Image Query fungus databases fungal pathogens controlling and
System which are of agricultural field recovering of
imported from fungus infections
multiple public for plant protection.
− To develop an fungus databases • Increase the
integrated and • A visual interface production of
comprehensive web-based agriculture goods
database with image browser and a
processing tools for user-friendly
plant protection search engine for
purposes the user to
perform image
and text queries
• The image
processing and
analysis tools to
identify fungus
types from the
local database
6 Sharing of Biodiversity • Technology for • Improved • Enhance growth of
Databases Using biodiversity technology for Bioinformatic
Content-Based Image database sharing Bioinformatic Data research within the
Retrieval (CBIR) and • Text and image sharing and Malaysian research
Intelligent Agents (IA) retrieval collaboration community
techniques based on • Strengthen
• Networking of Malaysian Malaysia as a
− To develop Biodiversity regional hub for
Malaysian
technology for Biodiversity bioinformatic
sharing biodiversity Repositories and research and
databases using Warehouses development
intelligent agents for
textual data and
Content Based Image
Retrieval (CBIR) for
images
7 Integrated Search • Software to be • Reduce the time • Increased
Engine and Information used as search spent on productivity,
Retrieval Tools for tool and searching and researcher can
Bacterial Bioinformatics information getting data from concentrate on real
Database retrieval tool public sources bioinformatics work

− To build a search
engine for the
bacterial

46
bioinformatics
database
8 Generic Data Mining • Bioinformatics • Makes data • Advancement of
Tool for Malaysian service mining easier, bioinformatics and
Clinical Pathological multiplexor, reduces data biotechnology
Data • Dynamic data mining time and research in
mining engine, saves cost Malaysia.
• Output abstraction • Lead to new
− To develop a generic
engine, and discoveries that
data mining tool to
• Novel data mining could benefit the
address the need for pipeline with bioinformatics
dynamic and components to be community and the
customisable data used in the general public
mining bioinformatics
service
multiplexor
9 Integrated Knowledge • Web-service • Algorithm to • Suitable tools and
Discovery Data Mining portal for analyze the gene techniques for the
(tool) Framework for integrating data Malaysian
Genome and Gene genome • Web service researchers in the
Expression Data databases based integrated life science field
• Enhanced knowledge • Strengthen
Sources
algorithm for discovery data Malaysia as a
analyzing gene mining framework regional hub for
− To develop a suitable characteristics for genome biotechnology
algorithm to analyze • New patentable research and
the gene data mining development
characteristics system especially in the
genome domain
10 Integrating Data Mining • A visual data • Improved • Strengthen
and Visualization for preprocessing efficiency of Malaysia position in
medical data algorithm for data researches in various aspects of
preparation of multi domain fields medical research.
data mining. such as health
− To define the medical
• A visual data informatics,
domain and collect mining tool for biomedical and
the respective data medical data bioinformatics

11 Knowledge Warehouse • Digital library • Improved • Enhance growth of


for Malaysia containing efficiency in Malaysian
Bioinformatics Literature bioinformatics search engine biotechnology
data. design industry
• Enhanced search • Become central
− To develop digital
engine for repository for
library to contain bioinformatics Malaysian
these information in data. Bioinformatics
structured manner • Enhanced literature
and develop data algorithm for
mining/ search search engine
engine application to tailored to
make digital library Bioinformatics
efficient for storing data
bioinformatics content
12 Semantic Search Engine • A semantic search • Enhanced search • The repositories of
for Biomedical engine for and retrieval biomedical
biomedical capabilities literatures along

47
Literature. literature (precision and with their semantic
• Novel recall) of the metadata can
− To develop a tools/software existing PubMed. value-add the
semantic search developed for a • Reduced time current PubMed to
medical semantic spent combing provide more
engine retrieving complete
search engine. through the list of
biomedical literature results returned to knowledge sharing
data from PubMed fine services available
database related/relevant publicly.
information.
13 An Intelligent Bio- • A domain • Aiding • Enhance growth of
Informatics Analysis ontology about bioinformatics Malaysian
Tool based on Parsing proteins and researchers in biotechnology
Techniques and amino acids analyzing protein industry and
Ontology Framework • Knowledge contents of strengthen
acquisition tools samples Malaysia as a
tuned for the regional hub for
− To develop a tool for bioinformatics biotechnology
identifying the types domain research and
of proteins those are • A parser that development
present in an recognises protein
organism by types from DNA
analysing its DNA sequences
strand • Regular
expression
specifications
describing amino
acid sequences
• Context free
grammar
specifications
describing the
proteins and their
constituent amino
acids
14 Cell Biology -Image • Research and • Effective, usable, • Direct impact on
Processing, Analysis development of a accessible productivity of
and Visualization for high-performance, software-based scientists working
microscope cell images open-standards tools to assist in initiatives in
based software scientists across proteome-research,
toolchest for disciplines and systems biology
− To research and processing, skill-sets to • The system and
develop systems and analysis and participate in tools developed will
tools for high visualization of 2D systems biology be of significant
throughput Image and 3D cell projects. value to the
analysis of 2D cell biology images • Applying and research
images and extending the community.
automated expertise of the
segmentation and research team in
registration of 2D image processing
image slices for 3D and visualization
techniques to cell
visualization biology.
• Bridging the gap
between computer
scientists and
biology experts.

48
15 Enhancement, and • A set of computer • Effective, usable • New advances in
Analysis of 2- vision applications and efficient 2D gel image
Dimensional that perform computer software analysis
Electrophoresis Gel denoising, that could assist in algorithms.
Images using Computer enhancement, proteomic • Speed related
artifact correction research and drug processes such as
Vision and segmentation discovery process. drug discovery,
of protein spots in diagnosis and
− To research and 2D gel images. identification of
develop systems and • Algorithms for proteins involved in
tools for registration (or various diseases.
enhancement, and matching) 2D gel
analysis of 2- images and
dimensional statistical analysis
electrophoresis gel for interpretation
images using by
biologists/pharma
computer vision cists.

49
APPENDIX D

TECHNOLOGY PRIORITY AREA: BIOINFORMATICS APPLICATIONS

Research Programmes Outputs Outcomes Impact


1 A Bioinformatics • Portable data- • Increased • Recognition of the
Approach to the acquisition awareness on centre of excellence
Management of Breast system for breast cancer • Contribution to
Cancer hospital • Help to create health tourism
• Database paperless hospital
system on
− To develop an breast cancer
integrated • A cancer tissue
bioinformatics system bank
that can be used for
clinical application in
managing breast
cancer patients.

2 Biomarkers Discovery • Proteomic data • Improved • Improved national


and Proteomic Profiling sets for cancer diagnostic health status by
Technology for Cancer research strategy for early increasing cancer
• New software detection of survival rate
tools for cancer cancer • Potential
− To develop new
identification commercializable
dignostic test based screening test
on blood serum for
early cancer
detection

3 Breast Cancer • An intelligent • An intelligent cost- • A universal cost


Diagnosis and Analysis cost-effective effective breast effective tool that is
System using breast cancer cancer referenced by all
Distributed Neural diagnoses/classi diagnoses/classifi centers to
Network fication Internet cation Internet diagnose/classify
application. application. breast cancer
• A shared
− To develop an statistical
intelligent distributed database related
neural network to breast cancer
application that can
be used as a
diagnostic
tool/classifier for
breast cancer.
4 E-Cervical Cancer • To create an • Help speed up • Improved
Research In A Cluster automated detection of healthcare
Grid Environment: image pre-cancerous through early
Automated Screening, screening lesions and detection of
Herbal Drug Discovery, system for cancer of cervix cervical cancer.
detection of on Pap smears.
and Schema Induction pre- • To improve • Potential
cancerous understanding commercializable
− To create an lesions and

50
automated image cancer of the of cervical new herbal
screening system in cervix on cancer. remedies for
grid environment for Pap smears • To improve cervical cancer.
detection of pre- • To identify herbal drug
cancerous lesions potential discovery. • Enhance growth
drug of Malaysian
and cancer of the target/recept biotechnology
cervix on Pap ors industry.
smears. implicated in
cervical
cancer and
perform in-
silico
screening
on potential
chemical
compounds
for the
treatment of
cervical
cancer from
herbal
sources.
• To ascertain
whether the
induced
schema on
cervical
cancer
enhances
the subjects’
knowledge
and
awareness
on cervical
cancer.

5 Flavonoids Inhibition of • A virtual • Result is pertinent • Enhance growth of


Breast Cancer screening model for herbal drug Malaysian
Resistance Protein for breast cancer development biotechnology
resistance industry
− To develop an protein (BCRP) • Strengthen Malaysia
inhibition activity as a regional hub for
effective model for in cancer biotechnology
virtual screening of treatment research and
antibreast cancer • Findings of development
resistance protein flavonoid
(BCRP) activity of analogues with
flavonoids potent BCRP
inhibition
activities
6 Malaysian Health • Large-scale • Improved datasets • Enhance growth of
Biodiversity Database Biodiversity based on Bioinformatic
for Common Diseases Bioinformatic Malaysian Marine research within the
Application database Biodiversity Malaysian research
• System for auto database community
detection and • Strengthen Malaysia
− To develop Malaysian
classification of as a regional hub for
Health Biodiversity

51
Database for DNAs bioinformatic
Common Diseases research and
Application complete development
with the hardware
and software
engineering
requirements
7 Systematics and • New database • Identification of • Development of
Applications of and e-resource novel bioactive potential drugs from
Secondary Metabolites compounds as secondary
as Potential Drug potential drugs metabolites derived
Candidates • Improvement of from natural
yield of resources
metabolites • Minimize cost for
− To develop an e-
• Development of a treatments using
resource that will model that drugs from
provide information optimizes the in secondary
on taxonomy, vitro (tissue metabolites
ethnomedical usage, culture)
bio-activity assay production of
results and metabolic metabolites
profiles
8 Oil Palm genomics: • Annotated oil • A national facility • To take oil palm
Massively Parallel palm genome for oil palm productivity to the
Sequencing and a sequence with genomics utilizing next level to maintain
Customized several newly the state of the art and enhance our
Bioinformatics identified MPS technology global
structural and • A national Oil competitiveness
Architecture for regulatory genes Palm Genomics
Sequence Assembly critical for taking Grid (OPGG) with
and Annotation the oil palm state of the art
productivity to bioinformatics
− To develop a real- the next level facilities
time, group-wide data • Scalable and
management tool for dedicated oil
monitoring of palm oil palm
quality and to bioinformatics
enhance its architecture
integrating all
traceability
the relevant
plant MPS
databases
9 Real-time, Group-wide • Group-wide oil • Better • Enhanced growth of
Data Management Tool palm management of Malaysia oil palm
for Palm Oil Quality and management plantation. industry
Traceability system • Higher oil quality • Better-managed
• Oil traceability overseas plantation
− To develop a real-
time, group-wide data
management tool for
monitoring of palm oil
quality and to
enhance its
traceability
10 Customized • Rice grains with enhanced antioxidant properties

52
Bioinformatics • Improved national capabilities and expertise in rice metabolic
Architecture for the pathway engineering research
Development of "Herbal • A state of the art transgenic rice facility
Rice" • A state of the art metaboloinformatics facility for pathway
simulation studies in the area of natural product research
− To identify, isolate • A national level demonstration for the efficient utilization of
natural product research and bioinformatics for the development
and characterize the
of a commercial product with a ready market at global level
sesquiterpenoid
cyclase gene from
Polygonum
hydropiper
11 Development of • Database for crop • Development of • Enhance
Database and Expert management effective extension growth of
System for Weed • Expert system services for rice Malaysian
Management in technology farmers in Agricultural
Malaysian Direct • Effective Malaysia Bioinformatic
management Research
Seeded Rice Ecosystem
protocols on weedy • Strengthen and
rices modernise
− To develop a Malaysian
computerized information
decision support system on Crop
system (DSS) to management
efficiently and
effectively integrate
biological, agricultural
and technical
information
12 Borneo Biodiversity Bioinformatics and biodiversity portal and also the enabling
Databases applications

− To develop a
biodiversity portal and
also the enabling
applications to use
the database
13 Malaysian Biodiversity • Large-scale Marine • Improved datasets • Enhance
Database for Marine Biodiversity based on growth of
Application Bioinformatic Malaysian Marine Bioinformatic
database Biodiversity research within
− To develop Malaysian • System for auto database the Malaysian
detection and research
Health Biodiversity community
classification of DNAs
Database for Marine • Strengthen
Application complete Malaysia as a
with the hardware regional hub for
and software bioinformatic
engineering research and
requirements development
14 Setting Up of A • Bioinformatics • Improved • Enhance
Bioinformatics Database database for bacteria efficiency in usage growth of
For Bacterial And Living and other of bio resources Malaysian
Organisms In Sabah bioresources in the biotechnology
state of Sabah industry using
local

53
− To carry out a bioresources
systematic isolation, • Strengthen
characterization and Malaysia as a
setting up of a regional hub for
bioinformatics biotechnology
research and
database on the development
bioresources found in • More attractive
Sabah and Malaysia for
biotechnology
investors and
drug
manufacturers
15 Modeling Aedes Aegypti • Mathematical • Control of dengue • Reduction in
Distribution and Dengue simulation models/ epidemics frequency and
Eradication graphics and • Mathematical and cases of
scientific knowledge computational dengue
− To develop advanced for A. aegypti sciences in infections
population dispersal epidemiology and
scientific knowledge dynamics bioinformatics
of A aegypti
population dynamics
from modeling
simulations / analysis
16 Prediction of Prokaryote • Evolutionary ANN • Identification of • Fast
Coding Regions Using system for coding regions in characterization
Differential Evolution prokaryotic gene sequenced of bacterial
and Neural Networks prediction based on prokaryotic genomes
whole genomes genomes • Early detection
• New patentable • Improved of potential
− To design, develop
evolutionary neural prediction rates in drug candidates
and implement gene model for gene coding region from bacterial
prediction system prediction characterization genomes
using ANNs • Increased
revenue and
long-term,
sustainable
growth for
Malaysian
biotechnology
sector
17 Virtual High Throughput • New technology • Cost saving using • Enhance
Screening, Design platform for drug cheaper new growth of
Development of Anti- discovery. drugs Malaysian
tuberculosis Agent(s) • Molecular docking • Recognition as a bioinformatics/p
strategies centre of harmaceutical
− To develop a new • De novo design tools excellence industry
• Strengthen
high throughput Malaysia as a
virtual screening tool regional hub for
in drug discovery pharmaceutical/
using TB as an biotechnology
example research and
development
• More attractive
for
biotechnology
investors and

54
drug
manufacturers
18 Modeling and Scale Up • This model may be used as a scale-up and optimization tool to
of Cr(VI) Reduction by evaluate the influence of various design variables and kinetic
Bacteria using Airlift parameters on the expected performance of airlift bioreactor
Bioreactor • Allow more rapid and inexpensive studies of a wide range of
design and operating conditions of airlift bioreactors as
compared with laboratory or pilot-plant scale experiments with
− To develop a
living cultures
mathematical model
based on a tanks-in-
series model with
backflow to simulate
the Cr(VI)
bioreduction in airlift
bioreactor as well as
to estimate its
performance in
laboratory and pilot-
plant scale
bioreactors
19 Studies of Pathogen- • A database collection • Pathogen-induced • Strategy to
induced Proteins and of the series of cDNA and protein produce
Genes in Important protein-genes library for selected diagnostic kit or
Crops of Malaysia expressed in crop resistant plant
response to disease • Novel protein/
in major crops in gene candidate as
− To compare the
Malaysia a marker for
production of proteins • A more efficient pathogen-related
and genes expressed computing of protein, infection
in diseased crops gene (DNA) analysis
20 Viral Epidemic • Genomic and • Integrated • An adequate
Response System / proteomic analysis epidemic response national
Bioinformatic Base software and other system surveillance
Analysis for Viral bioinformatic tools • Provide and
Epidemic Response fundamental epidemiology
knowledge or a response
System capacity
better
understanding of
− To provide how infectious
comprehensive viruses evolve,
resources to the spread and cause
scientific community disease
to access, analyze,
and study molecular
data for those
infectious diseases
as interoperable
components to
support biological
synthesis
21 Developing Malaysian • New breed of • Put Malaysian • Considerable
DNA-Based Bio Software and new Scientists in same gain in
Computer DNA Based level with industrial knowledge and
Technology countries economy

55
− To develop a DNA-
Based (Bio)
Computers, a
technology in
computing as
opposed to the
limitreaching Silicon
Computers

56
APPENDIX E

TECHNOLOGY PRIORITY AREA: SYSTEMS BIOLOGY

Research Outputs Outcomes Impact


Programmes

1 Holistic epidemiology • Software and • Provide local • Quality of life


diagnostic and drug information on the enhanced.
therapeutics for discoveries current state of • Commerce and
genetic disorders with NDDs. development of
common in Malaysia possibility for • Epidemiological drugs in Malaysia
commercializ studies to boosted.
ation. determine future • Global attraction
− To provide a • Tools for drug patterns of NDDs. for international
tool for drug discoveries. • Tool in drug scientists and
discoveries discoveries. pharmaceutical
industries to our
country.
2 In Silico Systems for • Intelligent and • Functional • Catalyze the
Biopathways high- understanding and emergence and
performance interpretation of development of
− R & D in toolbox for gene function computational
biopathways improved. pathways
biopathways
modeling, • Development and biology.
required simulation use of • The setting-up of
and analysis. technologies, reference center
• -Standard standards and and increase
data resources for local expertise.
exchange biopathways • Creates
format for management international
sharing of supported and partnerships.
pathway coordinated. • Creates potential
information. products for
• New commercializatio
patentable n.
and
enhanced
algorithm for
modeling,
simulating,
predicting and
aligning the
biopathways
including its
publications.
3 Modelling of Various • Cell modeling • Improved • Enhance growth
Living Cells Using E- for various understanding on of cell research
Cell Simulation cell types. various cells within the
• Enhanced cell related to health Malaysian
modeling for applications. Bioinformatic
− To develop
future • Improved Research
various cell implementatio understanding on community.
modeling n of complete various • Strengthen
related to biological measurement Malaysia as a
various systems techniques on the regional hub fir

57
categories of modeling. various living bioinformatic
cell, including • New organism research and
diseases, techniques for behaviours. development.
bacteria and wet-
virus laboratory
measurement
− To provide s and signal
convenient processing
research techniques for
environment to identification
understand the purposes.
behaviour of
various cells as
an
implemented
computer
model
4 Systems Biology of • Graphical and • Improved • A system biology
Chlamydomonas mathematical understanding of platform involving
Reinhardtii in models for C. the biological photosynthetic
Response to reinhardtii system of a organisms/
Environmental which can be eukaryotic and plants.
extended to photosynthetic • Training of
Stresses more model, which can Malaysian
complicated be extended to researchers on
− To formulate a eukaryotic more complicated systems biology;
suitable and eukaryotic and it will be a gain of
graphical and photosyntheti photosynthetic experience for
mathematical c plants of plants of economic biologists and
models to economic importance. computer
explain and importance. • Enhanced scientists to
predict the • Biological understanding of collaborate.
impacts of data sets at the applicability of • Establishment of
transcriptomic mathematical international
abiotic stresses , proteomic models in collaborations
to and metabolic explaining and with C. reinhardtii
photosynthetic levels. predicting research
living • Novel tools/ biological systems. communities and
organisms softwares other
using developed for bioinformatics
Chlamydomon systems groups.
as reinhardtii biology. • Established
as a model models providing
useful information
to enhance the
usage of
Chlamydomonas
sp. Or other
microalgae in
commercial
biocell factories
for pharma- and
nutra- ceuticals
industries.
5 A Systems Biology • Novel • Bioprocess • Enhanced growth
Approach to software with improvement which of industry
Bioprocess possibility for can contribute to involving
Improvement commercializ industrial, food and bioprocesses
ation pharma sectors • Raise profile of

58
− To use data • Models for • Further insights Malaysia for
from an E. coli improved into gene Systems Biology
model to heterologous expression and expertise in
protein regulatory systems training and R&D
develop and
production in of E.coli • Facilitated
validate a suite
E.coli • Improved international
of novel • Improved understanding of collaboration via
software tools production biological the data
for systems strains of applicability of protected
biology E.coli (for network and interface
monoclonal system models • Creation of
antibody opportunities for
expression) investment in
systems biology
in Malaysia

59
APPENDIX F

TECHNOLOGY PRIORITY AREA: Structural Bioinformatics

Research Programmes Outputs Outcomes Impact


1. Fundamental Theory of Protein • A novel • Improve • Reduce
Folding protein understandi operation
folding model ng on time and
- Develop noble protein based on protein cost for
folding model based on dissipative folding protein
particle theory structure
dissipative particle dynamics prediction,
dynamics. • Suggestion and speed
- Verification of the model • An integrated s and up new
by AMBER or CHARMM and strategies drugs
- Optimize model for rapid extensible for treating discovery
protein structure protein incorrect
prediction simulation folded • Contribute to
- Establish possible and proteins treatments
strategies to prevent modelling related for incorrect
incorrect folding related platform disease folded
proteins
diseases by studying the
related
mechanism of misfolding diseases
- Build a flexible and
convenient integrated
platform for protein
modelling work

2. Parallel Conformational Search • Parallel • Improve the • Enhance the


Algorithm for Ab Initio Protein biologically overall industries
Tertiary Structure Prediction inspired performance which are
Using Honey Bee Colony conformatio of the protein related to
nal search structure protein
algorithm prediction tertiary
- Using the honey bee for protein process. structure.
colony to develop a tertiary
parallel protein structure • Bridge the • Strengthen
conformational search prediction. gap between the position
algorithm to efficiently known of Malaysia
find the global minimum • Protein protein among the
protein conformation. tertiary sequence countries
- Determine the structure structure and which are
of an input protein prediction structure. trying to solve
sequence. tool. the protein
structure
prediction
problem.

3. Parallel Protein Secondary and • Statistical • Improve • Enhance


Tertiary Structure Recognition method understandi growth of
software for ng on Malaysian
- Development of protein protein protein biotechnology
secondary and tertiary secondary folding industry.
structure structures.
structure recognition recognition. • A fast

60
process. • Suggestion recognition
- Apply parallel processing • Bipartite s and tools for
to make the process graph strategies protein
compute faster matching for treating secondary
software incorrect and tertiary
- Provide prerequisite
• Parallel folded structure.
knowledge to understand proteins
version of
protein secondary and statistical related • Valuable
tertiary structure. method for disease. informational
- Achieving mantra of protein task for
today’s computing, which secondary rational drug
is to have a high structure design.
performance computing recognition.
with low cost of • Parallel • Strengthen
computer. version of Malaysia as a
bipartite regional hub
graph for
matching biotechnology
algorithm. research and
development.

• More
attractive for
biotechnology
investors and
drug
manufacture.

4. Radial Basis Function • Enhanced • Improved • Enhance


Networks (RBFNs) for Protein functional accuracy in growth of
Structure Prediction approximati protein Malaysian
on and structure biotechnology
pattern prediction. industry
- Integrate processes of
recognition
functional approximation, neural • Strengthen
pattern classification and network Malaysia as
recognition model for a regional
- Provide a more accurate protein hub for
protein structure • structure biotechnology
prediction tool to prediction research
determine protein and
• A development
function
- Provide faster, effective knowledge
processing • More
and efficient neural attractive for
network training tool.
biotechnology
algorithms for prediction investors
• New
purposes. patentable
and drug
manufacturer.
algorithm
for training
Radial
Basis
Function
Networks

5. Creation of a Malaysian Natural • Natural • Generate • To raise


Product Database and Product revenues Malaysia’s
Information Retrieval System Data profile in
Information • Possibility to Natural
System Product

61
- To provide a centralize commerciali research
database for Malaysian • Structure se the
Natural Product Plants Elucidation database • Exploitation
and Research for Tool through of the
registration medicinal
Malaysian researchers to
• Commercial values of the
initiate sharing each • Contribution local plants
ise structure
other’s information for elucidation to the
further analysis and application universities, • New
advancement. software non-profit opportunities
- To add value to existing organization in bringing in
natural products data and • Intellectual and pharmacy
avoiding duplications Property industries research to
Right for through bio- Malaysia
users and active • Initiation and
developers compounds promotion of
commerciali collaboration
• Developme sation amongst
nt of 3D and between
chemical • New and local and
structures advance international
of algorithm for scientists
compounds structure
derived elucidation • Linkage with
from natural herbal
products • Better industry and
database. management other local
of natural academic
• Trained products and
research data research
staff in institutions.
Natural • Aiding and
Products. providing a • Human
technology resource
platform for development
natural and
products upgrading of
drug skills.
discovery.

6. Creation of a Sequence • Creation of • A unique • Enhance


Database that will Allow software sequence growth of
Similarity Search Within the database Malaysian
Contest of Protein Secondary • Creation of which will biotechnology
database, help industry
Structure
which the promote
• Strengthen
structures drug
- Create a query software Malaysia as
are error- discovery
to perform the type of a regional
checked to and design
search required. hub for
remove research. biotechnology
- Create a database to typographic research
store sequence- al errors, and
secondary structure nomenclatu development
relationship for rapid re • research.
retrieval problems,
and other
inconsisten
cies present
in the PDB
files.

62
7. Molecular Database Search • Structural • Software for • Raise
and Retrieval Tools search tools better Malaysia’s
and managemen profile in
- Development of 1D, 2D algorithms t of Malaysian
and 3D molecular molecular cheminform
• Sub- database atics
representations research
structural
- Development of tools and • Faster and
structural search and algorithms cheaper • Discovery of
retrieval algorithms discovery new
- Development of sub- of new medicinal
structural search and drugs values in our
retrieval algorithms local flora
- Development of similarity • Software and fauna.
search and retrieval can be
algorithms used for • Initiation and
- Test of developed registration promotion of
of new technology
algorithms on Malaysian
compounds transfer from
natural product database international
• Generation scientists
of revenues
through
possible
commercial
ization of
molecular
database
management
system

• Promotion
of
interdiscipli
nary
research
team.

8. Diversity analysis and library • Cluster- • Software • Raise


comparison tools based for better Malaysia’s
compound manageme profile in
- Development of cluster- selection nt of Malaysian
based compound tools and molecular cheminform
algorithms database atics
selection algorithms research
- Development of • Dissimilarity • Possibility
dissimilarity-based -based of faster • Discovery of
compound selection compound and new
algorithms selection cheaper medicinal
- Development of tools and discovery values in our
optimization-based algorithms of new local flora
compound selection drugs and fauna
algorithms • Optimizatio
- Test of developed n-based • Generation • Initiation and
algorithms on Malaysian compound of revenues promotion of
selection through technology
natural product database
tools and possible transfer from
algorithms commercial international
ization of scientists

63
molecular
database
manageme
nt system

• Promotion
of
interdiscipli
nary
research
team.

9. An Integrated Platform for • A search • An • Simplify and


Accessing Disparate Protein engine with integrated enhance the
Structure Databases: An associated platform for query/minin
Integration to Relational DBMS. GUI that will heterogene g processes
allow ous Protein on protein
researchers Structural structure
- To develop a single to query Database in information
shared resource where protein Malaysia. from a
many types of structural single
experimental data from database. uniform
various databases are database
integrated and organized • A data managemen
using a data warehouse cleaning t system.
approach tool to
- To develop a data detect and • Web Based
cleaning tool that focus eliminate Bioinformati
duplicate cs
on duplicate records in
records in a application
the integrated protein more platforms.
structural database using effective
parallel association rules and faster • Speed up
way. the
discovery of
• A new
comprehen medication
sive, and
integrated introduction
resource of of drugs to
protein market, with
structural potentially
databases wide range
for benefit to
Malaysian Malaysian
life science community.
scientist
and
research
community.

64
10. Design, Development of Drug • A database • Serving the • Enhance
Receptor Database on selective needs of growth of
information drug the
- To develop an integrated sieving and discovery database
selecting and design development
database, namely Drug for drug
from research.
Receptor Database multiple discovery
(DRD), which is a one- databases. research.
stop database (web
based) server and aims • Beneficial
at providing rapid data for the
acquisition for drug scientist.
discovery scientists.
• Reduce the
timeline and
cost of
discovering
new drugs
as most of
the
information
that they
need will be
store and
manage and
authorize in
one main
database
server.

11. Development of methodology, • New • Developme • Enhance


algorithm and simulation technology nt/ growth of
strategies for efficient and platform for establishme Malaysian
accurate representation of drug nt of new bioinformatic
Molecular Interactions. discovery. drug s/pharmace
discovery utical
• Protein- technologies industry
- To study the interactions ligands
between proteins as well interaction • Cost saving • Strengthen
as between proteins and models. using Malaysia as
ligands so as to cheaper a regional
contribute to the • Protein- new drugs. hub for
fundamental protein pharmaceuti
understanding of binding interaction • Promotion cal/biotechn
mechanisms using models. of ology
various modelling and interdiscipli research
simulation strategies. • Molecular nary and
docking research development
- To develop efficient and team.
algorithms.
fast docking algorithm as • More
an alternatives to the • De novo • New attractive for
laborius and slow design software, biotechnolo
modelling strategies in tools. programs gy investors
investigating these for docking and drug
interactions. and drug manufacturer.
- To design molecules discovery
which are important in
various industries such
as agrochemical and

65
pharmaceutical using de
novo molecular design
approach.

66
APPENDIX G

TECHNOLOGY PRIORITY AREA: Molecular Bioinformatics

Research Programmes Outputs Outcomes Impact

1 A suite of parallel Sequence • Parallel sequence • Improve • Helps develop


Alignment and Clustering alignment (PSA) efficiency for Malaysian fish
Algorithms for Sequence algorithms and protein industry.
Analysis of Fish Species. parallel clustering sequence
algorithms. analysis. • Promotes
- To develop a suite of Malaysia as a
high speed sequence • Index of protein regional hub
alignment tools based sequence data. for fishery
on clustering and R&D
indexing methods with • Workbench with
parallel methodology. interface to access • Attract
PSA suite. investors and
manufacturers
to fishery
industry.

• Encourages
scientific
collaborations.

2 Customized Bioinformatics • Customized • State of art • Contributes


architecture for SNP Bioinformatics facilities for towards a
pharmacogenomics. architecture diagnostic National data
• Databases and testing for centre.
- To help design applications for SNP SNPs for MIB • Development
customized medicine Pharmacogenomics • National data of e-health,
for the development of centre telemedicine
diagnostic and • XML based and
therapeutic purposes databases personalised
for diseases like • Array of medicine.
Breast Cancer, software • Stimulates
Cardiovascular modules research.
disease, and diabetes. • New
algorithms.

3 Customized Bioinformatics • Recombinant viral • Global level • To make


architecture for the diagnostic chip for state of the Malaysia a
development of a each of the target art facility to global player
recombinant viral diagnostic groups cope with in the multi
chips. • A world class new and billion dollar
knowledge centre emerging antiviral drug
- To research on the dedicated for viral lethal viral discovery
use of recombinant genomics strains industry.
viral microarray chip • Infectious clone • Diagnostic • Strengthens
for the development of tool custom Malaysia’s
facility for any new fledging
diagnostic kits and and emerging viral designed for
local strains biotechnolog
screening of natural strains
based on y and
products for antiviral bioinformatic
activity. global

67
knowledge s industries.
managemen
t
4 Design, testing and • Microarray based • Effective • Early
implementation of a diagnostics kit patient diagnosis of
Microarray based Dengue diagnostics dengue
virus diagnosis technology and infections.
monitoring • Fatality
- Enables researchers reduction.
and health care
professionals to easily
access and assess
various forms and
stages of Dengue viral
infections and
outbreaks.

5 Development of New • A new algorithm to • Fast and • Strengthens


Algorithms for Large Scale produce a space Malaysia’s
DNA Sequence Similarity sequence similarity efficient capability in
Search query process in algorithm for bioinfmatics
large scale DNA sequence R&D.
- To develop an efficient databases. similarity
algorithm for • Parallel searching on
sequence comparison implementation of DNA data.
and similarity query of the new algorithm.
large DNA sequence
databases.

6 Development of Oil Palm • Oil palm genome • Discovery of • Improve the


Genome and Proteome and proteome novel oil oil palm
Database database palm genes production
• Intellectual property and proteins • High return
- To develop a shared • Discovery of of investment
data infrastructure that oil palm for oil palm
enables the genes and industry.
integration of these oil proteins
palm genome and important for
proteome data. oil palm
agronomic
trait
7 Establishment of Malaysian • Novel • Knowledge • Strengthens
Animal Virus Database and computational tools growth Malaysia’s
the Development of • MAVD as a national • National bioinformatic
Computational Tools for repository. repository s industry.
Studying Viruses • Availability of • Niche • Attract
complete genomic research biotech and
- To develop a national sequence data. area pharmaceutic
repository for animal al
viruses or related companies.
biological resources • Control and
that local researchers prevention of
can submit and share viral
diseases.
with others.
• Stimulate
research.

8 Establishment of Microarray • DNA Microarray • To aid • Propel

68
Database of Local Natural database domain researchers Malaysian
Products Interactions • Software utility that in analysing biotechnolog
Against Cancer Cell Lines. recognises specific local natural y research
pattern of DNA product industry
microarray chip that interactions • Place
- To develop a DNA represents genes at the Malaysia as a
microarray database that are involved in genomic regional hub
of local natural cancer which can level for for natural
aid in data determining products
product interactions their
interpretation. research and
with cancer cell lines potential development
and software tools to therapeutic • Novel drug
help bioinformatics use in discovery
researchers analyze cancer
the results. treatment.
9 Large-scale Accelerated • A whole genomic • Better • Conservation
Sequencing of Genomes of sequence data of a managemen of our
Malaysian Importance and model forestry t and rainforest
Associated, Assembly, species monitoring of • More efficient
Analysis, and Annotation • Technology/protocols Malaysia’s improvement
in acquiring the data biodiversity of forest
Tools
• Genomic database resources – plantation
- To address many for tropical forestry the species
fundamental and species rainforest • Better
pressing biological • New algorithm for understandin
questions, which are gene finding. g of our
critical for the biodiversity
resources
sustainable
management of our
rainforest.

10 Molecular markers and • Useful molecular • Better pest • Reduction in


phylogenetics of insect markers identified managemen pest
pests • Molecular phylogeny t strategies problems
species within a • Better • Reduction in
- DNA sequencing and taxonomic group understandin use of
molecular markers will worked out. g of chemical
enable us to find new phylogeny of pesticides
ways to complement pest species • Reduction in
the traditional within a environmenta
methods of managing taxonomic l hazards
the pest problems. group from
pesticides
11 Screening of Novel Genetic • Database • Robust and • Increased
Sequences from Microbial management extensive novel
Metagenomic Databases system for microbial compound
metagenomic metagenomi discovery
studies c database from local
• Microbial • Valuable and biological
metagenomic accessible resources
database (DNA resources for
sequences and research
rRNA sequence) • Improved
• Potential novel efficiency in
sequences for discovery of
discovery of unique novel gene
proteins, antibiotics, sequences
enzymes, etc. encoding for

69
• Enhanced method novel
for virtual screening compounds
of metagenomic
databases
12 Parallel Exact and • Suites of parallel • Improved • Strengthens
Approximate Pattern exact and research Malaysia
Matching Algorithm for approximate pattern efficiency. position in
Alignment of Biological matching algorithm. various
Data. • Improved parallel aspects of
sequence alignment biotechnolog
tools. y research.
- Develop improved
biological data
alignment tools to
increase efficiency of
biotechnological
researches.

70