Вы находитесь на странице: 1из 25

Intelligent Medical Technologies and Biomedical Engineering:

Tools and Applications


Anupam Shukla ABV Indian Institute of Information Technology and Management, India Ritu Tiwari ABV Indian Institute of Information Technology and Management, India

Medical inforMation science reference


Hershey New York

Director of Editorial Content: Director of Book Publications: Acquisitions Editor: Development Editor: Publishing Assistant: Typesetter: Production Editor: Cover Design: Printed at:

Kristin Klinger Julia Mosemann Lindsay Johnston David DeRicco Tom Foley and Jamie Snavely Deanna Jo Zombro Jamie Snavely Lisa Tosheff Yurchak Printing Inc.

Published in the United States of America by Medical Information Science Reference (an imprint of IGI Global) 701 E. Chocolate Avenue Hershey PA 17033 Tel: 717-533-8845 Fax: 717-533-8661 E-mail: cust@igi-global.com Web site: http://www.igi-global.com Copyright 2010 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher. Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark. Library of Congress Cataloging-in-Publication Data Intelligent medical technologies and biomedical engineering : tools and applications / Anupam Shukla and Ritu Tiwari, editors. p. ; cm. Includes bibliographical references and index. Summary: "This book takes an innovative look at technology and engineering as they pertain to medicine (medical engineering), teaming them to facilitate new systems that have the ability to change the lifestyles and quality of life of people"--Provided by publisher. ISBN 978-1-61520-977-4 (hardcover) 1. Medical informatics. 2. Intelligent control systems. 3. Biomedical engineering. I. Shukla, Anupam, 1965- II. Tiwari, Ritu, 1977[DNLM: 1. Biomedical Technology. 2. Artificial Intelligence. 3. Biomedical Engineering--methods. 4. Biotechnology-methods. W 82 I61 2010] R858.I557 2010 610.28--dc22 2009054319 British Cataloguing in Publication Data A Cataloguing in Publication record for this book is available from the British Library. All work contributed to this book is new, previously-unpublished material. The views expressed in this book are those of the authors, but not necessarily of the publisher.

203

Hunting Drugs for Potent Antigens in the Silicon Valley


Ashish Runthala Birla Institute of Technology & Science, India

Chapter 10

AbstrAct
Diseases are persistently getting diversified, evolved and thus require rapid vaccine development methodologies. Complex archaic methods require painstaking efforts over a larger time span. In silico methods are utilized these days to screen potent antigens. This approach paves a future way to reduce the number of wet-lab experiments. Its importance is also highlighted by the rapid scattering of diseases and evolution of their variants. This cited work explains immunological aspects and algorithms applied for epitope selection, with their practical problems. Current techniques used to model the native state conformation of epitopes are then described with their fundamental problems. So, the algorithms to validate such models are discussed. Techniques to screen effective potent drugs against the target epitopes are then considered. Lastly, scope for further research in developing better methodologies is highlighted.

INtrODUctION
The significant progress in Genomics Research and Systems Biology has offered an overflow of novel potential targets for Drug discovery (Loging & Harland, 2007). Enhanced technological advances have governed a significant pace in the drug discovery process. High-throughput gene sequencing has also reformed the process employed in identifying novel
DOI: 10.4018/978-1-61520-977-4.ch010

drug targets. Till date, large number of novel gene sequences and their specific signature patterns have been studied but only a limited number have been successfully employed for targeting diseases. The increased figure of prospective targets and lack of their molecular mechanism information has been a bottleneck in the target validation process, since a long time. (Ofran, Punta, Schneider & Rost, 2005; Zheng, Lianyi & Chen, 2006) used multiple superior methods including integrated and systems-based approaches, studied by (Lindsay, 2005; Sams-Dodd, 2005) and (Hardy & Peet, 2004). Current computa-

Copyright 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

Hunting Drugs for Potent Antigens in the Silicon Valley

tional algorithms are mainly based on the detection of conserved functional domains which have similarity to known targets, as studied initially by (Hopkins & Groom, 2002), later by two groups (Wang, Sim, Kim, & Chang, 2004; Kramer, & Cohen, 2004); and the conformational analysis of statistical, energetic features of predicted models (Hajduk, Huth, & Fesik, 2005; Hazduk, Huth, & Tse, 2005). Such algorithms have been ineffective in finding the best targets as they exhibit negligible sequence similarity to known templates with available three-dimensional (3D) structure of proteins in the PDB (Protein Data Bank). Heterologous and structurally mysterious proteins constitute a substantial percentage, almost from 20% -100%, of the Open Reading Frames (ORFs) in many of the completed genomes and therefore, they are still an untapped source of novel drug targets (Han, Cai, Cao, & Chen, 2004). Hence, algorithms which are not dependent upon sequence similarity to known structural protein templates are highly desirable. Thus still, the drug discovery process is incapable to meet the requirements because of the evolution of variants, which broaden the complexity of diseases and thereby makes the available drugs ineffective for them. Conceivably, the utmost source of inefficiency in traditional drug discovery process arises from the high percentage of evaluated drugs that have a low propensity of being successful. To aim leading optimized preliminary preclinical trials, compound libraries have been dedicated in a virtual pool (Good, Krystek, & Mason, 2000; Schnecke, & Bostro, 2006). The sorted compounds are synthesized and experimentally screened for clinical trials. Each such compound is selected based on conformational and geometric properties that augment the chance ratio of stably targeting the specific targets in the course of preclinical development. This chapter is solely devoted to learn current algorithms to trace potent antigenic epitopes in the considered sequence, predict, assess and validate their predicted structures. Lastly, it focuses on docking methodology and simulation studies to

screen the best drug against target protein, with the final step as in-vivo and in-vitro experimental trials.

bAckGrOUND
Quantitative Structure Activity Relationship (QSAR) model(s) and/or docking stimulations are often used in screening such dedicated compound libraries (Anderson, & Wright, 2005). De novo drug designing employs computational algorithms to harmonize target protein binding site structurally & energetically for the selected drug molecule (Mauser, & Guba, 2008). Successful de novo design aims at effective drug structures that have high binding affinity against their target proteins docking sites. Such an approach is extremely successful when genetic and investigational data of drugs and target proteins are available. Two major approaches behind development of such tools are the following. a. Molecular Fragment Approaches (Jhoti, 2007): This approach docks molecular fragment of drug to resolve energetically approving loci on the active site, before being linked. This method first searches key locations in the binding pocket of the target protein structure and then assembles the drug fragments. Once the functional groups are integrated, the next step is to connect them with scaffolds. Optimization steps were mandatory in earlier methodologies, where bonds can be wrecked and resealed back. Monte Carlo (MC) Simulation and evolutionary algorithms have also attempted this optimization. For e.g. Pro-Drug (Westhead, Clark, Frenkel, Murray, Robson, & Waszkowycz, 1995). Sequential Growth Approaches (Honma, 2003): Here, molecules are developed into an active site starting from a seed moiety already bound to the active site. The drug matures atom by atom to complement the

b.

204

Hunting Drugs for Potent Antigens in the Silicon Valley

active site geometrically, and so entropy is lowered in the process for stabilizing the hydrophobic and non-bonded interactions (Nishibata, & Itai, 1991; Cafiisch, Miranker, & Karplus, 1993; Bohacek, & McMartin, 1994) .This is nothing but the same trick used as the first step in the Molecular Fragment Approaches. After being located, drug is grown via sequential addition of atoms or fragments. Most primitive examples in this approach include: Legend (Westhead, Clark, Frenkel, Murray, Robson, & Waszkowycz, 1995) and Genstar (Rotstein, & Murcko, 1993) which grow drugs atom by atom, SmoG (DeWitt, & Shaknovich, 1996), and SPROUT26, which has even used stretches of atoms also to raise molecules. After every addition in these approaches, there is a selection before doing further growth of drug, by free energy minimization of interactions between drug and protein molecules. All these simulation steps are done with scoring functions to accept, or repeat the simulation step or even reject it, in the growth process. Currently, such methodologies bear some critical fundamental problems. The most important of these, is the synthetically impractical generation of a large number of spawned structures. Another drawback comes up from the general distinctions between tried and estimated binding affinities. The scoring functions are also incapable to precisely predict experimentally determined affinities and their functional discrepancies (Kitchen, Decornez, Furr, & Bajorath, 2004; Leach, Shoichet, & Peishoff, 2006). Another biggest problem is the indefinite conformational space possible for tracing the most energetically favorable conformation (Rotstein, & Murcko, 1993). Therefore, most pharmaceutical companies are now interested in improving the docking programs to screen small molecule, through commercially available libraries (Loging, & Harland, 2007; Ofran, Punta, Schneider, & Rost, 2005; Zheng, Lianyi, & Chen,

2006). De novo designing has an advantage that it can efficiently simulate a large structural space to predict the best possible docking site for a drug molecule (Bohacek, & McMartin, 1994). It is seen that smaller fragments have generally low binding affinities (i.e., high values for the dissociation constants) for the target protein than the larger compounds, but they often have the better physicochemical properties. This approach uses the advantage of exponential increase in strength when low affinity fragments are linked together through scaffolds. Although, synthetic accessibility still continue to be the chief concern, combinatorial docking and de novo design algorithms have also been tried for improving the synthetic convenience. Efforts are on the way to optimize the positions and conformations of residues during docking. To understand and develop the basis of this process, let us first quickly recapitulate the basic underlying concepts in immunology, which make the chassis for the drug discovery algorithms. Then we will study epitope prediction algorithms, modeling the potent receptors to understand their native state conformation for docking against the library of drugs for sorting the best possible drug(s).

MAIN FOcUs OF tHe cHAPter


Through active discussion, let us study the basic concepts of immunology which should we be aware of, before understanding the methodologies behind drug designing protocols. So let us quickly brush up the knowledge of immune system. The immune system is localized in several parts of the body. Immune cells develop in the primary organs like Bone Marrow, Thymus and Immune responses occur in the secondary organs of immune system like spleen and lymph nodes. This simply means that Immune System is divided into the following two channels based on their response to various pathogens, or in layman terms, these classes have been bifurcated based

205

Hunting Drugs for Potent Antigens in the Silicon Valley

on how the immune system acts on any non-self conformation, trespassing into the system.

1. Innate Immune response


It is non-specific, i.e. similar immune response is functional against many pathogens. Such a response is common to all infections no matter how they are induced. This response mechanism is considered as the First Line of Defense. A soluble factor, complement system is involved in this response which works through Opsonization. Many cells like macrophages, natural killer cells, and mast cells are also involved in such a response. Among all these cells, macrophages like Neutrophils, are termed as professional Phagocytes or eating cells, which envelops foreign cells and after ingestion, white blood cells (WBCs) must kill the engulfed pathogens by some means, such as the oxidative burst.

(Major Histocompatibility Product) proteins on their surface, which are markers for the bodys own cells. They recognize antigens in two steps. First, T cells must recognize self-MHC, or they are destroyed. Then in second step, T cells that bind too tightly to self-MHC are also destroyed. Remaining T cells later go to spleen and lymph nodes, and wait for other foreign antigens. After recognizing such antigens, some of the cells from the induced pool of T cells go into battle to eliminate the foreign antigen and others become memory cells. This is also known as Cellular Immune response.

2.2. B Cell Selection for an Antigen


Different B cell types have different receptor molecules. When a foreign particle docks onto a receptor, then that B cell type is selected. This is termed as Clonal Selection. The selected B cell then multiplies rapidly to develop a pool of similar copies and is termed as Clonal Expansion. These copies make lots of antibodies against the pathogen, and this phenomenon is known as humoral response. Now, let us study the antibody structure in brief. 2.2.1. Antibody Structure Antibody molecules are simply the molecules which trap foreign particles in a system. Each antibody (Ab) is made up of two identical heavy and light chains, held together by disulfide bonds. An Ab has two important parts. One is variable portion, which constitutes the two arms of the Y shaped Ab structure having different amino acid sequences that cause selective specific binding to antigen. This is termed as Fab or Fragment of Antigen Binding. The other part is the constant portion, the Fc fragment which is involved in effector mechanisms, as after antigen binding many effector cells like Natural Killer cells bind to Fc portions of the antibodies, already bound to the MHC-antigen complex on a T cell surface, through their Fc Receptors (Refer Figure 1). Each

2. Adaptive Immunity
It shows highly specific response against each pathogen. It has the unique property of memory against all earlier infections and is thus responsible for acquired immunity, i.e., the system gains or strengthens the immunity after every infection. It is because the system recognizes antigen in the first infection and combat them efficiently in successive infections, because of its memory to do so. It involves Antigen-Presenting Cells (APCs) and two types of Lymphocytes (T-cells & B-cells). This system is an inducible system, being stimulated every time a non-self antigen is encountered. Hence, it confers lifelong immunity in most of the cases.

2.1. T Cell Training


T cell precursors enter thymus from bone marrow, to learn distinguishing self vs. non-self antigen conformations. There, they express particular T cell receptors and meet cells that express MHC

206

Hunting Drugs for Potent Antigens in the Silicon Valley

Figure 1. Antibody structure showing variable, heavy chains and constant region

Ab molecule is very specific in terms of binding specificity. The variable regions have such a big diversity that they can recognize almost every non-self antigen. Also remember that, single chain variable sequence (ScFv region) is also studied to screen the binding affinity for an antigen. 2.2.2. Antigens It is simply a molecule which is bound by an antibody (Ab). It is generally a foreign or non-self particle. Each antigen has multiple loci to bind antibodies, and so an antigen can be bound by several different antibodies, like in the case of allergy where pollen, cat dander, or chemicals

in soap can act as antigens in a system. Antigens must be processed for being recognised by T cells. The antigens must be presented in certain specific constraints and conformation, to generate a successful immune response (Refer Figure 2). 2.2.3. Antigen Processing: Exogenous Pathway Professional APCs ingest microbes and free foreign particles, degrade them in lysosomes, and then present fragments to CD4+ T cells in reference with MHC II. MHC-II is considered for foreign particles, and MHC-I with CD8+ T cells is considered when a self cell needs to be destroyed after being infected by a foreign particle.

Figure 2. Antigen conformations possible and the respective immune response generated. It implies the necessity of MHC to be presented in reference with peptide antigen

207

Hunting Drugs for Potent Antigens in the Silicon Valley

Antigen presentation does not require metabolically-active cells. Antigen processing involves the lysosomal system. Microbe or foreign particle is first ingested as an endosome. Lysosome possessing lysozyme enzyme combines with this endosome to form an endolysosome, which then combines with a T cell with MHC- II protein on the surface. Thereafter, the antigenic peptide binds MHC-II and is packed in a vesicle, after their assembly in Endoplasmic Reticulum (ER). Surprisingly, it is observed that out of the 20% processed foreign proteins, 0.5% bind MHC, which then leads to 50% CTL response, i.e. 1/2000 peptides are immunogenic.

2.3. Evading Such a Trap


The immune system is effective against many pathogens but sometimes pathogens may also evade such a specialized system. They may accomplish it by blocking access to their antigens and thus evade the effector mechanisms to wipe them out of the system. Sometimes pathogens encounter limited proteolysis of their antigens and are thus they persist in the body to cause infection. Occasionally, MHC molecules also fail to load foreign antigen peptides and thus become ineffective to display them to T cells, leading to a complete failure of immune system. Sometimes, transfer and expression of peptide-MHC complex on the cell surface of APC also fails to be recognized by T cells.

immune cells for inducing a successful response against an infection. Th cells activate and command other immune cells, and are therefore really efficient in effector mechanisms of the immune system. They are vital for B cell antibody class switching, which is responsible for the extreme diversity in light chains of antibodies to recognize almost every non-self antigen. They also lead to the activation and growth of cytotoxic T cells, and thus maximize the bactericidal activity of macrophages. It is this range in function and role in influencing other cells that gives T helper cells their name. Mature Th cells also express the surface protein CD4. These CD4+ T cells generally have pre-defined role as helper T cells, although there are a few exceptions. For example, there are sub-groups of regulatory T cells, natural killer T cells, and cytotoxic T cells, which express CD4. While cytotoxicity is not seen in some specific disease states, they are typically non-existent. All such latter CD4+ T cell groups are not well considered as T helper cells, and are beyond the scope of this chapter.

4. Immune related threats


Pathogens like viruses and bacteria form a large pool of more than four hundred microbial agents associated with human diseases. Eleven million (19%) out of Fifty-seven million worlds population, have died of infectious or parasitic infections (WHO, 2004). Vaccines for thirty four such pathogens have been developed. Thus, one of the goals of Immunity related Bioinformatics or Immunoinformatics is to trace immunogenic regions in pathogens which can successfully target the associated disease or infection. Such Bio-defense targets are classified in three categories: Category A: They spread pretty easily with high mortality rates and thus have potential for massive social & economic disruption;

3. Activation of Immune system


Immune system is an extremely wide array of cells. To stimulate all the respective cells against an infection, there is requirement of certain cells called T Helper (Th) cells, which play an important role in establishing and maximizing the capabilities of immune system. These cells with no cytotoxic activity, i.e. they cant kill either of infected host cells or pathogens, require other

208

Hunting Drugs for Potent Antigens in the Silicon Valley

Category B: They can spread relatively more easily, and have modest mortality rates; and, Category C: They can spread easily, but have low mortality rates. But remember that, they have potential for genetic engineering of more lethal strains, which can be really dangerous.

the same antigen A and a new antigen, B, are again encountered in the system, a primary response will be generated to B and an intense secondary response will be generated to A because of immunologic memory. This is the major reason for giving booster doses after an initial vaccination. There are four following categories of possible vaccines. Live vaccines: They are capable of replicating in the host but are attenuated, i.e., they cant cause disease. They are really advantageous as they induce an extensive immune response. Usually low doses are sufficient for inducing long-lasting protection. But they may cause adverse reactions and can pose problems sometimes. Subunit vaccines: They are small conserved parts of an organism and are therefore simple to produce. They stimulate very little cellular response. They are typically produced by inactivating a whole virus or bacterium by heating or by chemicals. The vaccine is then purified further by selecting one or a few protection conferring proteins. Such vaccines have already been developed for Bordetella pertussis. Genetic vaccines: They are nothing but foreign naked nucleic acids which are injected into the system. Such nucleic acids are coated on gold nano-particles, before injecting. This method is effective against alphaviruses, and many bacteria like Mycobacterium tuberculosis. It has the significant advantage that it is relatively easy to produce, and also induces specific cellular responses. It has the critical demerit that it generates very low immune response in the primary vaccination step. Epitope vaccines: These are the specific domains of known pathogens, which are responsible for a particular disease. Such vaccines are highly capable. They can target multiple conserved epitopes in rapidly

Autoimmunity is another immunological threat. It is the failure of an organism to recognize its own antigens as self, which stimulates an immune response against its own cells and tissues. Any disease resulting in this aberrant immune response is called an Autoimmune Disease. Major such examples include many diseases like systemic lupus erythematosus (SLE), diabetes mellitus type I, Churg-Strauss Syndrome, Sjgrens syndrome, Hashimotos thyroiditis etc. Organ transplantation is also a very complicated immunological problem because Human Leukocyte Antigen (HLA) antigens are located on the cell surface. HLA also termed as MHC sometimes plays a significant role to distinguish self and non-self antigens accurately. There are many factors, which increase the persons susceptibility towards infection. Thus, vaccination or administration of a substance to a patient becomes necessary to prevent the disease. Such vaccinations are usually made of attenuated micro organisms to induce an effective immune response. This enables the development of memory against such microbes for subsequent infections.

5. vAccINAtION
Vaccination is very important to combat a disease. The first antigen encounter induces primary response. Therefore, the respective antibody having the best binding capability of Fab region towards the antigen (say Antigen A) appears. Its concentration rises to a plateau, where effector mechanism eliminates the antigen and then it declines. When

209

Hunting Drugs for Potent Antigens in the Silicon Valley

evolving pathogens like HIV. This methodology can target subdominant epitopes and can shatter such a tolerance. For e.g. such vaccines are effective against tumor antigens, where even the dominant epitopes are tolerated by the system. Such vaccines clearly abide to the norms and have been observed to grant protection in animal models.

6. bIOINFOrMAtIcs ON A rOLe
To design drugs against different diseases, there is a need to understand the geometry of epitopes. This study is aided by Bioinformatics protocols based on physiochemical and biological concept.

dependent on their conformations and account for almost 90% of epitopes on a given antigenic (globular) protein. Continuous (linear) epitopes: They are also constitutive in nature, but are sequential in the antigens primary sequence, and thus impose fewer conformational constraints when they are recognized by an antibody. They often possess residues, which are not involved in direct interaction with the antibodies, defined through ELISA based epitope mapping techniques (PEPSCAN).

6.1. types of epitopes


Epitopes are structurally more flexible than rest of the protein, giving them the wide conformational degrees of freedom. They are prevalent on the cell surface, being exposed to the solvent. The amino acids present in an epitope are usually charged and hydrophilic, which aids its stable interaction with solvent by the formation of hydrophobic protein core. Peptide antigens can be defined as epitopes in terms of their distinct amino acid composition, location, length from five to fifteen amino acids, and immunodominance. Epitopes are bound by a comparative larger proportion of antibodies than other peptide sequences in a normal in-vivo immune response. Hence, they are also termed as Major Antigenic Sites. Epitopes can be divided into two classes: Discontinuous epitopes: They are formed by constitutively expressed residues which are not ordered in the antigens primary sequence of amino acids, but in the overall conformation, they simulate an antigen structure, being non-self to an immune system. Such functional epitopes are highly

Thus, in short if we are able to predict the conformational details of epitopes, we can better understand the dynamics or mechanism behind the antigen binding to an antibody.

6.1.1. Epitope Prediction Approaches


Since a long time, we have been predicting epitope structures based on their physical or chemical properties. So let us study these methods and then we will move on to the current structure prediction methodologies. Archaic prediction approaches are based on the physical property of the epitopes as described below. The current methods compute the structure of target epitope sequences by physics based or knowledge based predictions, which we will study in the later part in Protein Structure Prediction Methodologies. So now let us focus on the methods which consider the physic-chemical properties of the known epitopes.

6.1.2. Property Based Methods


Traditionally, Epitopes have been predicted through many algorithms, which solely depend on the physical and chemical properties that epitopes possess. Such characteristics are generally based on the properties of the primary sequence, which impart exquisite property of hydrophilicity or other physical property, which is readily available but

210

Hunting Drugs for Potent Antigens in the Silicon Valley

not very accurate. Hence, these methods should better be employed for verification purposes. All such sequentially evolved methods are explained below: 6.1.2.1. Exposed Surface and Polarity Scales (1978) Properties like exposed surface area and polarity are evaluated in this method. A window of seven residues is used for scrutinizing the epitope region and corresponding computed values are employed for calculating the mean to assign for the fourth residue in the segment. This residue is considered to be the central locus for the analysis. Such an analysis predicts the probability for a particular stretch of seven residues to be an epitope, based on the exposed surface area and polarity parameters, with reference to the central residue. 6.1.2.2. Hopp and Woods (1981) It was truly a revolutionary work. This methodology is dependent on the necessity of hydrophilic amino acids for an amino acid sequence to be antigenic determinant. This analysis is based on the fact that superficially exposed surface area of a protein faces the solvent in a cell. Hence, local hydrophilicity values are assigned to each amino acid by the method of repetitive averaging using a window of six amino acids. But this method is not an accurate validated method for considering a sequence to be epitope. 6.1.2.3. Welling et al (1985) It is based on the comparison of proportion of each amino acid in known epitopes, to the percentage of respective amino acids in the considered protein sequence. This algorithm allocates an antigenicity value for each amino acid from the comparative occurrence of that amino acid in already known antigenic determinant sites. The repetitive averaging window is extended to eleven to thirteen amino acids depending on the antigenicity values of neighboring residues, thereby increasing the similarity percentage and the accuracy of prediction.

6.1.2.4. Karplus Method (1985) This method is based on the Flexibility scale analysis. Flexibility scale considers mobility rates of proteins based on the known temperature B factor, which represents mean-square dislocation of an atom from its locus in the model. This scale was initially constructed based on the analysis of alpha carbons of thirty one known epitope structures. This computation is similar to the classical algorithm, except that the central locus for reference is the first amino acid of six amino acids window length, and three such flexibility scales are considered here. 6.1.2.5. Emini Method (1985) This prediction is based on the surface accessibility scale computed by multiplication of individual probabilities instead of addition within the considered window of six residues. The accessibility outline is determined using the formulae Sl= (fs+4+i) (0.37)-6, where Sl is the surface likelihood probability, fs is the propensity cost of fractional surface, and the value of i varies from one to six. When Sl= 1 and fs costs are more than one, then the considered six residue stretch indicates very high estimate of being found on surface. 6.1.2.6. Parker Method (1986) This method is based on the hydrophilic scale, constructed from the peptide retention times computed on HPLC and on a reversed-phase column for amino acids. A window of seven residues is employed for analyzing epitope region and corresponding scale values are assigned to all these residues. The algebraic mean of such observed values is then assigned to the fourth residue in the segment as the reference locus. 6.1.2.7. Kolaskar and Tongaonkar Method (1990) Here, One hundred and fifty six antigenic determinants of length shorter than that of twenty amino acids are considered to study thirty different proteins for predicting the antigenic propensity

211

Hunting Drugs for Potent Antigens in the Silicon Valley

of residues. This antigenic scale is optimized and used to predict the epitopes in sequence. 6.1.2.8. Pellequer Method (1993) This algorithm is dependent on the frequency of B turns i present in the considered sequence. To do such a calculation, three scales are considered and a similar window of seven is again used for analyzing the epitope region. The scale values

are introduced just the same way for all seven residues and the algebraic mean is then assigned to the fourth residue in the segment. The only difference observed here is that this method considers three scales in the calculation, making it more efficient, as compared to a single one. Gaussian smoothing curve is then used, to consign weights for all such residues.

Figure 3. Various server tools tabulated along with their web links, used for analyzing PeptideMHC binding, MHC binding sites and other important immunological properties of epitopes, necessary for a successful drug designing protocol.

212

Hunting Drugs for Potent Antigens in the Silicon Valley

6.2. epitope Database and servers


The manual computation time for predicting epitopes is now a history. Now, there are many servers available which predict epitopes in a given sequence. IEDB (Immune Epitope Database) is one such database where the epitopes are predicted during the course of an infection, or as the vaccination outcome. All these servers are focused on the development of bioinformatics tools to analyze, model, and predict immune responses. There are a large number of online prediction methods as bulleted in Figure 3. Thus, there are many questions, which need to be solved before designing an effective drug against the target epitope sequence. All these approaches provide an array of results which need to be integrated together to find the best drug molecule. But, if we can determine or predict the structural conformation of a protein sequence, we can clearly estimate the functional dynamics of a drug molecule interacting with a specific binding pocket of the target epitope sequence.

Considering all such problems only, research on computational methods of protein structure prediction has gained a massive pace. Such booming attention has resulted in the development of global blind tests of protein structure prediction, namely, Critical Assessment of Structure Prediction (CASP), which take place every two years, to judge the significant progress in this field. CASP has classified all structure prediction methods in three categories, as described later in the text. So now let us study these algorithms and then we will see how to validate such models for drug designing purposes.

6.3.1. Protein Structure Prediction Algorithms


So, now we have understood all the traditional methods employed in various online server tools for predicting the epitopes in the target sequence. But as you have noticed, all these algorithms fail to find the correct drug molecule for a target epitope sequence when there is unavailability of known structure for the epitope. This has been the biggest drawback of these algorithms since a long time. But after the evolution of CASP, this problem is resolved to some extent. Remember that once a protein model is predicted by any of the three algorithms ranging from ab- initio, threading and comparative modeling, it is assessed and validated by evaluation tools. Once, a structure satisfies all such validation constraints and structural parameters, it is considered for further analysis. So, now let us study the entire algorithm for predicting the structural conformation of a protein sequence. As per CASP, there are three types of structure prediction methodologies, namely ab-initio, threading and comparative modeling. These three ways of predicting the protein 3D structures are described below: 6.3.1.1. Ab-Initio Method This method is based on core principles of energy, geometry, and kinematics (Richard, & David,

6.3. Methodology of Drug Designing


Traditionally, epitopes are predicted by modeling the conformational details of protein structure. Modeling means determination or prediction of structure for the target sequence. Since many decades, we have been determining the protein structures through X-Ray crystallography and Nuclear Magnetic Resonance (NMR) techniques. Though X-Ray analysis gives a very high resolution of 1A, but there are many critical technical and resource limitations associated with it. X-Ray analysis requires an extremely pure protein sample crystal, and many proteins arent amenable to crystallization at all. NMR on the other hand is just limited to small, soluble proteins only with slightly lower resolution of almost 2.5A. Moreover, the determined results require additional structural refinement through costly and time consuming experimental steps.

213

Hunting Drugs for Potent Antigens in the Silicon Valley

2001). Now, it is also known as Free Modeling (FM) under CASP8. These methods build threedimensional models, solely based on information of potentials dependent on physiochemical interactions (Corey, Taras, Pogorelov, Zaida, Luthey, & Schulten, 2002), such as those employed in CHARMM and AMBER. It starts from assumption that the native state of a protein is the global free energy minimum, and so this method carries on large-scale conformational search for finding the protein native state with lowest energy, based only on its sequence. Reduced representations of protein, at least in initial stages have also been tried in some of these algorithms. To predict energy of a structure, interactions are classically assigned between the sites located at C atom, C atom, and the peptide bond or at center of mass of the side-chains. For each representation, corresponding array of interaction potentials is designed to guide the sampling of conformation space. This method is conceptually correct, but still, there are many fundamental problems. For example, a protein with n amino acids should have roughly 3n possible conformations. Thus, this method searches all possible structural conformations to attain the final native state. In addition to this tedious search, this method uses inaccurate empirical force fields and scoring functions (folding potentials) to find the best structure. Moreover, all this calculation requires a large time span with massive computational efforts. Solvent interaction potential is also not modeled correctly by this method (Lu, & Skolnick, 2003). Hence, still it needs further improvement for practical usage. 6.3.1.2. Protein Threading Because a proteins fold is evolutionarily more conserved than its primary sequence, a protein sequence can be modeled with reasonable accuracy even on a distantly related template, provided that the target-template relationship can be discerned through sequence alignment (I.Cymerman, Feder, Pawowski, Kurowski, & Bujnicki, 2004). This method scans the primary sequence of an unknown

structure against the database of solved structures. Here, a scoring function is used to assess the sequence-structure compatibility. This method, also known as Fold Recognition (FR) is based on the compatibility analysis between solved structures and their primary protein sequences. Methods performing an inverse folding search are also evolved through it, which evaluate compatibility of known structure with a sequence database, and thus predicting which sequences have the best potential to attain the given fold. This method, known as fold assignment or 3D template matching is useful when sequence profiles are impossible to construct because of scarce known sequences, closely related to potential templates. FR method is also automated as GeneSilico structure prediction meta-server, where target sequences are submitted to obtain the desired structure, as the consensus result of different FR methods (Bujnicki, & Fischer, 2004). For each fold, target-template alignments are used as starting point for model refinement and resultant model is then superimposed on previously used template structure to generate the corresponding target-template alignment. Refined models are also constructed by iterating this modeling procedure with different templates by evaluating sequencestructure fit, merging of fragments with best scores, and local realignment in poorly scored regions (Suvobrata, Lei, & Sanchez, 2004). This method is fully automatic and scalable. But still, this method seeks for thermodynamic minima, and so protein folds determined by kinetic control may not be modeled correctly. Although, it models domains and sub domains correctly, their correct mutual orientation is not predicted correctly always. Moreover, a single template may be insufficient to predict the best conformation and hence, this method requires several related structures, which are selected based on their alignment scores (depending on composition bias of the target sequence residues and length bias of the template). Methods are also based on scores called Z-score (Read, & Gayatri, 2007), which perform statistical test,

214

Hunting Drugs for Potent Antigens in the Silicon Valley

to cancel out the biased nature of template and composition, but this method is time-consuming, and for best scores, sequences must be shuffled and threaded many times, almost a hundred times. Consequently, threading so many times becomes another significant computational problem when a large number of amino acid is involved in the target sequence. Hence, these approaches also need further improvement. 6.3.1.3. Comparative Modeling This method also known as Homology Modeling or Template Based Modeling (TBM), uses previously solved structures as starting points, or templates (Rost, Liua, Naira, Wrzeszczynski, & Ofran, 2003). It is successful because of the fact that there is a limited set of nearly 2000 distinct folds, which build up most of the proteins. This method is simply based on sequence-similarity between the input protein sequence and the homologue protein structure as structural template. For a protein, having known templates of more than 50% sequence identity, this prediction approach can reach within 5 main-chain Root Mean Square Deviation (RMSD) of observed structure, which is comparable to the accuracy of a medium NMR structure or a low resolution X-ray structure. But hardly have we found such a large sequence homology (Tramontano, & Morea, 2003). It has been suggested that primary bottlenecks in comparative modeling arises from difficulties in alignment rather than from structure prediction errors, given a known-good alignment (Moult, 2005; Zhang, & Skolnick, 2005; Tress, Ezkurdia, Grana, Lopez, & Valencia, 2005). It is already seen that some structures are promiscuous giving false-positive matches to many sequences which is a big problem, especially while hunting some novel structure having the conserved sequences (Dunbrack, 2006 & Jauch, Hock, Yeo, Neil, & Clarke, 2007). Recently in CASP8, a multi-template combination algorithm came up in this category. This algorithm first selects best templates through

PSI-BLAST (Position Specific Iterative Basic Local Alignment Search Tool). Then it combines only those templates whose alignments with target sequence have E-values closer to that of the top template-target alignment, and at last fragments of other templates are combined, for unaligned amino acids. This was all done through Modeller (Jianlin, 2008). But this approach has some problems. It is effective only when target and template proteins have strong homologous relationship with accurate alignments. It also fails to produce good results in the fold recognition method. Moreover, this method is too much time consuming as it considers all possible templatetarget alignments to find the best results. Similarly, consensus results from various fold recognition methods or multiple templates have been tried in CASP7 also (Ginalski, 2006). Alternative alignment variants are also attempted at the tertiary structural level using quality assessment methods and visual inspection. And, it was concluded here that sequence-structure alignment quality still requires further improvements to avoid errors in cases of low sequence similarity. It is observed that these predictions are closer to the closest template structure, than the experimental structure of the target. Thus this method, is challenged by evolutionarily related proteins that have globally distinct structures (Cozzetto, Alejandro,Giorgetti, Raimondo,Anna, & Tramontano, 2008).

6.3.2. Model Validation


The absence of similar X-ray protein structures to target sequences makes direct evaluation of the predicted models very difficult. Thus, we need to remember the following general facts to validate a model. 1. A model is considered wrong if at least some of its conformational details are missing in reference to the rest of the model. Such errors mostly introduce slip into a model very easily, when incorrect sequence alignments

215

Hunting Drugs for Potent Antigens in the Silicon Valley

2.

3.

are considered during the model building procedure. Such models therefore result in really high improper geometrical stereochemical arrangement even if one is very careful in model building. A model must be considered inaccurate absolutely if its atomic coordinates are not within the lowest possible RMSD value to the experimentally determined template structure, used for the alignment to the target sequence. This criterion can however only be assessed once the structure of the target sequence is designed, and is thereby not usable. In relative terms, however, a model can be considered accurate if that is the best model obtained from available template structure with all the considered optimization steps, i.e. a model is accepted once you have reached the accuracy as high as you can, when its RMSD is within the deviation zone observed for the experimental structure displaying a similar sequence identity level in its alignment with the target sequence. Here, ideal stereo-chemical bond lengths and bond angles must also be considered for deviation. Such accuracies can be studied easily through WHATCHECK program developed by G. Vriend at the EMBL (European Molecular Biology Laboratory). It is important to realize that proper stereochemistry can be evaluated through WHATCHECK. But it is not the only criteria to accept or reject the model, i.e. it is also possible that sometimes, the predicted model satisfies all the constraints of evaluation programs but is still biologically insignificant. Thus, predicted models must be verified through consensus results of many such model verification tools or online servers, and visual assessment.

6.3.3. Docking
This is the technique where we find the possible orientation of a drug molecule in reference to the best possible target protein conformation. So only, we also study the active sites of protein involved in binding to the drug molecules. The goal here is to predict the orientation and position of drug and protein, during the interaction. Such an approach is employed by Pharmaceutical research for the virtual screening of a large database to screen the best possible drug candidates against a specific modeled epitope structure. There are many tools like Autodock etc. which can be used for this purpose. The servers are also available as Molecular Docking Server. Docking for each of the screened compound (often denoted by at least five different anchor conformations) is repeated at least ten times. The specific docking parameters are optimized here for each target independently, by optimizing the possible docking of a small available set of known binders against the large drug-like compound molecular library. It should be taken care here, that the drug and protein complex must be stable with the lowest possible docking energy. Docking is therefore, an important methodology to predict the strength and signal type produced through target protein to a particular drug molecule. Two approaches, mostly used for docking are described below. These approaches simulate the drug-protein interaction, either by energy minimization of pair-wise interaction energies or by stabilizing the molecular complementarities between drug and the most feasible active site available in the protein structure. 6.3.3.1. Conformational Complementarity This approach considers the geometric feasibility features of the interaction between the drug and protein molecules, making them dockable. These features include possible complementary surface area available for drug to fit in the target protein

216

Hunting Drugs for Potent Antigens in the Silicon Valley

the best way. It all depends on the solvent accessible surface area of the target protein and the molecular surface area of the drug. The feasibility of interaction between these two surfaces is considered to depict the snapshot where, the drug is best docked in the active pocket of the protein. These approaches are typically fast and robust, but they cannot usually model the movements or dynamic changes in the drug/protein conformations exactly, although recent developments gives the freedom to allow flexibility during docking for any or both of these molecules. 6.3.3.2. Simulation Docking process is really very difficult to simulate as it is extremely computationally demanding to model the drug and protein alterations in conformations at very small time steps in picoseconds during docking. Here, the protein and the drug are first estranged by some physical distance, and then the drug is allowed to trace the best position, where proteins active site is docked the best possible way. Such moves of the drug incorporate stiff body transformations like translations and rotations. It also simultaneously causes an internal change to the drug molecule, like torsion angle rotations. Each of these moves needs some energetic cost, and hence total energy of the system is calculated after each step. This method is quiescent as it allows the drug flexibility during modeling. Another advantage this approach provides is that we can calculate the docked best feasible distance between drug and target protein molecules. Thus it allows us to estimate the affinity of drug molecule for the target protein. The only problem here is that it is computationally very demanding and may take longer time to properly simulate the interaction at really small time steps. In recent past, the gridbased techniques and optimization methods have considerably solved these problems. 6.3.4. Scoring and Selection To choose the best drug molecule, several scoring tools are employed until around hundred virtual

hits are reached. Initially, an automated Bindingmode analysis (BMA) is performed on all docked conformations to guarantee proper docking. As the next step, ten additional structural scores are computed for the top 10% of the highest different drug-protein docking scores. Generally a cutoff value for scores is then used to narrow down the number of virtual hits. The selected hits are further screened through structure-based principle component analysis (PCA) procedure, which is based on structural properties of the docked compounds. The diagonalized covariance matrix coordinates, include the highest ten docking-scores, docked compound conformations, and a few two dimensional structural descriptors. This approach is employed for all binders and non-binders in the generated set alike. Same region of this PCA map is then considered to select compounds, which are grouped together to select the best scored representative compound. These selected compounds are then lastly verified by in-vivo and in-vitro assays as described below. 6.3.4.1. in-vitro Binding Assay More than one hundred thousand compounds are typically screened in silico for each target, leading to a selection of almost hundred virtual hit compounds to be tested in the lab. Compounds selected from these virtual hits are then considered for the appropriate experimental binding assays (Radio Drug Displacement) against target protein. These compounds are initially tested at very small concentration in duplicate, for accurate results. These trial results with at least 50% efficiency are then validated by a full-concentration doseresponse curve. The resultant compounds, screened in this step with experimentally validated binding affinities, are then defined as the actual hits. Full DoseResponse curves with Ki value less than 5M for these selected hits confirm a very high hit rate for the selected hits to be correct. In most cases, the best conformed hit proves to be a novel compound (New Chemical Entity) in the 1- to 100nM range, with promising pharmacological

217

Hunting Drugs for Potent Antigens in the Silicon Valley

properties, as measured by a variety of in-vitro and in-vivo assays. These assays validate the quality of hits as lead compounds for further consideration before launching in market. 6.3.4.2. Other in-vitro and in-vivo Assays The best compounds, chosen from the initial in-vitro binding assays, are then evaluated again in additional in-vitro and in-vivo assays. These studies are carried out including cell-based functional assays for agonist or antagonist activity. A complete selectivity profile (in-vitro binding assays for up to sixty targets, including Receptors, ion channels, transporters, and enzymes), and human liver microsome stability assay is then studied to verify the drug further. This is done to screen possible side effects of the drug. In some cases, pharmacokinetic properties of the selected compound are measured in-vivo in rats, to study the oral bioavailability (%F) and serum half-life.

steps to modeled protein conformations. This optimization algorithm is generally used as part of the following four-step modeling process: Coarse Modeling: It explores the entire possible conformational space of a protein uniformly to identify the regions of stability, as their conformation should be fixed to depict the best possible global conformation of protein. It is done with comparative modeling or threading methodologies. (ii) Fine Modeling: A similar modeling approach as mentioned in the first step is again utilized to further retune the overall conformation by brushing the structure to develop a more stable conformation. This step is iterated several times to predict the most stable conformational decoy. (iii) Molecular Dynamics (MD) Refinement: The resulting all-atom model is refined through an all atom MD simulation, being subjected up to 300ps with CHARMM22 force field. It is the refinement where we integrate motion of all atoms in the considered protein, per unit time scale to obtain a trajectory which simulates the entire process of drug intercalating with the target protein. A set of constraints is applied during these simulations to ensure that the model does not diverge too much from the initial predicted model, by verifying the RMSD of Alpha carbons of the peptide backbone. These refinement simulations bring in helical kinks and relax the side-chain conformations to stabilize the structure. (iv) ProteinDrug Docked Complex: Finally after docking, the complex of docked drug in the active site of the target protein is designed carefully through MD simulations again. Thus it mimics the experimental co-crystallization method, which locks the target protein structure in a drug-bound conformation. (i)

sOLUtIONs AND recOMMeNDAtIONs


Currently, major problems in the already discussed algorithms are mainly because of incorrectly predicted target protein conformation with higher RMSD values against known templates. So, if the model is incorrect, the entire process followed earlier needs to be repeated, which increases the complexity. But, if we are able to refine the conformation of predicted model to generate its native state, we can employ such models easily with fewer complications. Thus, all the prediction algorithms are still paving their way further to reach the experimental accuracy. It can be done with Biosimulation studies, which is nothing but simulations of biological systems. The development of effective all-atom structural refinement procedures can also be considered to tackle the problems. These refinement steps generate highresolution models to reproduce all key functional features and can be implemented as optimization

218

Hunting Drugs for Potent Antigens in the Silicon Valley

FUtUre reseArcH DIrectIONs


The use of molecular modeling techniques and computational tools may seem like a radical advancement, but mostly there is need of a little expertise in the immuno-informatics arena with the technological advancement to best study the drugs and their side effect before using them for human. Thus, for best studying the epitopes responsible for a disease to design the drugs for them, there is still a mandatory requirement for further advancement in epitope prediction and the way to best model the protein structure at highest possible accuracy in a very small divergence scale from the possible native state. The described techniques have achieved little advancement in this area over the last twenty five years. So, it is clear that the assumptions upon which these models are built and analyzed are still flawed and some revolutionary approach is required. We can consider the entire protein modeling approach as a multi-objective optimization problem, whose solution at every considered step should ultimately be the answer to all such problems.

lowest possible energy. Refinement approaches mostly fail because of inaccurate energy functions and inefficient space sampling to reach global minima without being trapped at local minima or saddle points. Drug Designing algorithms can be really successful if we can solve all these issues by developing an algorithm which computationally predict the drug molecules in a very short time. It is therefore recommended to use different algorithms together at every step during this process for obtaining the consensus best possible results. Hence, we can harness the best possible output from all of the steps to finally predict the best possible drug molecule for a target protein structure. More efficient algorithms are thus the need for the hour.

reFereNces
Anderson, A. C., & Wright, D. L. (2005). The design and docking of virtual compound libraries to structures of drug targets. Current Computer-. Aided Drug Design, 1, 103127. doi:10.2174/1573409052952279 Bajorath, J. (2002)... Nature Reviews. Drug Discovery, 1, 882894. doi:10.1038/nrd941 Bohacek, R.,S., & McMartin, C. (1994). Multiple highly diverse structures complementary to enzyme binding sites: Results of extensive application of de novo design method incorporating combinatorial growth. Journal of the American Chemical Society, 116, 55605571. doi:10.1021/ ja00092a006 Bonneau, R., & Baker, D. (2001). Ab-Initio Protein Structure Prediction: Progress and Prospects. Annual Review of Biophysics and Biomolecular Structure, 30, 173189. doi:10.1146/annurev. biophys.30.1.173

cONcLUsION
The protein folding principles are still not known. So, the protein structure prediction problem is described in the terms of degrees of remoteness between the solved structures and the target sequences. Full length structural models are constructed by copying the atomic framework from the best possible available templates. Thus if we can structurally complete the universe of solved protein structures with all the known folds, we can approach the structure prediction problem completely. Even when we refine the predicted models to predict the native state, we are mostly caught in local minima states during energy minimization. But this is never the case in the cell, where a protein sequence is directly folded to its global minimum native state with

219

Hunting Drugs for Potent Antigens in the Silicon Valley

Bujnicki, J.,M., & Fischer, D. (2004). MetaApproaches to Protein Structure Prediction. Nucleic Acids and Molecular Biology, 15, 2334. doi:10.1007/978-3-540-74268-5_2 Cafiisch, A., Miranker, A., & Karplus, M. (1993). Multiple copy simultaneous search and construction of drugs in binding sites: application to inhibitors of HIV-1 aspartic proteinase. Journal of Medicinal Chemistry, 36, 21422167. doi:10.1021/jm00067a013 Chakravarty, S., Wang, L., & Sanchez, R. (2004). Accuracy of structure-derived properties in simple comparative models of protein structures. Nucleic Acids Research, 33(1), 244259. doi:10.1093/ nar/gki162 Cheng, J. (2008). A multi-template combination algorithm for protein comparative Modeling. BMC Structural Biology, 8(18), 113. Cozzetto, D., Alejandro, G., Raimondo, D., & Tramontano, A. (2008). The Evaluation of Protein Structure Prediction Results. Molecular Biotechnology, 39, 18. doi:10.1007/s12033-007-9023-6 Cymerman, I., & Feder, M., Pawowski, M., Kurowski, M.,A., & Bujnicki, J.,M. (2004). Computational Methods for Protein Structure Prediction and Fold Recognition. Nucleic Acids and Molecular Biology, 15, 134. doi:10.1007/9783-540-74268-5_1 DeWitt, R., & Shaknovich, E. (1996). SmoG: de novo design method based on simple, fast, and accurate free energy estimates. 1. Methodology and supporting evidence. Journal of the American Chemical Society, 118, 1173311744. doi:10.1021/ja960751u DeWitt, R., & Shaknovich, E. (1996). SmoG: de novo design method based on simple, fast, and accurate free energy estimates. 2. Case studies on molecular design. Journal of the American Chemical Society, 119, 46084617. doi:10.1021/ ja963689+

Dunbrack, R. L. Jr. (2006). Sequence comparison and protein structure prediction. Current Opinion in Structural Biology, 16, 374384. doi:10.1016/j. sbi.2006.05.006 Ginalski, K. (2006). Comparative modeling for protein structure prediction. Current Opinion in Structural Biology, 16, 172177. doi:10.1016/j. sbi.2006.02.003 Good, A. C., Krystek, S. R., & Mason, J. S. (2000). High-throughput and virtual screening: core lead discovery technologies move towards integration. Drug Discovery Today, 5, S61S69. doi:10.1016/ S1359-6446(00)80056-2 Hajduk, P. J., Huth, J. R., & Fesik, S. W. (2005). Druggability indices for protein targets derived from NMR based screening data. Journal of Medicinal Chemistry, 48, 25182525. doi:10.1021/ jm049131r Hajduk, P. J., Huth, J. R., & Tse, C. (2005). Predicting protein druggability. Drug Discovery Today, 10, 16751682. doi:10.1016/S13596446(05)03624-X Han, L. Y., Cai, C. Z., Ji, Z. L., Cao, Z. W., & Chen, Y. Z. (2004). Predicting functional family of novel enzymes irrespective of sequence similarity: a statistical learning approach. Nucleic Acids Research, 32, 64376444. doi:10.1093/nar/gkh984 Hardin, C., Taras, V., Zaida, P., & Schulten, L. (2002). Ab-initio protein structure prediction. Current Opinion in Structural Biology, 12, 176181. doi:10.1016/S0959-440X(02)00306-8 Hardy, L. W., & Peet, N. P. (2004). The multiple orthogonal tools approach to define molecular causation in the validation of druggable targets. Drug Discovery Today, 9, 117126. doi:10.1016/ S1359-6446(03)02969-6

220

Hunting Drugs for Potent Antigens in the Silicon Valley

Honma, T. (2003). Recent advances in de novo design strategy for practical lead identification. Medicinal Research Reviews, 23, 606632. doi:10.1002/med.10046 Hopkins, A. L., & Groom, C. R. (2002). The druggable genome. Nature Reviews. Drug Discovery, 1, 727730. doi:10.1038/nrd892 Horn, F., Weare, J., Beukers, M. W., Horsch, S., Bairoch, A., & Chen, W. (1998)... Nucleic Acids Research, 26, 275279. doi:10.1093/nar/26.1.275 Jauch, R., Chuan, H., Prasanna, Y., Kolatkar, R., & Neil, D., & Clarke. (2007). Assessment of CASP7 structure predictions for template free targets. Proteins: Structure. Function and Bioinformatics, 69(8), 5767. doi:10.1002/prot.21771 Jenkins, J. L., Kao, R. Y., & Shapiro, R. (2003). Proteins, 50, 8193. doi:10.1002/prot.10270 Jhoti, H. (2007). Fragment-based drug discovery using rational design. Ernst Schering Found Symp. Proceedings, 3, 169185. doi:10.1007/2789_2007_064 Kitchen, D.,B., Decornez, H., Furr, J.,R., & Bajorath, J. (2004). Docking and scoring in virtual screening for drug discovery: methods and applications. Nature Reviews. Drug Discovery, 3, 935949. doi:10.1038/nrd1549 Kramer, R., & Cohen, D. (2004). Functional genomics to new drug targets. Nature Reviews. Drug Discovery, 3, 965972. doi:10.1038/nrd1552 Leach, A.,R., Shoichet, B.,K., & Peishoff, C.,E. (2006). Prediction of protein drug interactions. Docking and scoring: successes and gaps. Journal of Medicinal Chemistry, 49, 58515855. doi:10.1021/jm060999m Lindsay, M. A. (2005). Finding new drug targets in the 21st century. Drug Discovery Today, 10, 16831687. doi:10.1016/S1359-6446(05)036706

Loging, W., & Harland, L. (2007). Williams-Jones B. High-throughput electronic biology: mining information for drug discovery. Nature Reviews. Drug Discovery, 6, 220230. doi:10.1038/nrd2265 Lu, H., & Skolnick, J. (2003). Application of statistical potentials to protein structure refinement from low resolution ab-initio models. Biopolymers, 70(4), 575584. doi:10.1002/bip.10537 Mauser, H., & Guba, W. (2008). Recent developments in de novo design and scaffold hopping. Current Opinions in Drug Discovery, 11, 365374. Moult, J. (2005). A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Current Opinion in Structural Biology, 15, 285289. doi:10.1016/j.sbi.2005.05.011 Nishibata, Y., & Itai, A. (1991). Automatic creation of drug candidate structures based on receptor structure. Starting point for artificial lead generation. Tetrahedron, 47, 89858990. doi:10.1016/ S0040-4020(01)86503-0 Ofran, Y., Punta, M., Schneider, R., & Rost, B. (2005). Beyond annotation transfer by homology: novel protein-function prediction methods to assist drug discovery. Drug Discovery Today, 21, 1475 1482. doi:10.1016/S1359-6446(05)03621-4 Rattner, A., Sun, H., & Nathans, J. (1999). Annual Review of Genetics, 33, 89131. doi:10.1146/annurev.genet.33.1.89 Read, R. J., & Chavali, G. (2007). Accuracy of CASP7 Predictions in the high accuracy template based Modeling. Proteins: Structure. Function and Bioinformatics, 69(8), 2737. doi:10.1002/ prot.21662 Rost, B., Liua, J., Naira, R., Wrzeszczynski, K. O., & Ofran, Y. (2003). Automatic prediction of protein function. Cellular and Molecular Life Sciences, 60, 26372650. doi:10.1007/s00018003-3114-8

221

Hunting Drugs for Potent Antigens in the Silicon Valley

Rotstein, S.,H., & Murcko, M.,A. (1993). Groupbuild-a fragment-based method for de novo drug design. Journal of Medicinal Chemistry, 36, 17001710. doi:10.1021/jm00064a003 Rotstein, S. H., & Murcko, M. A. (1993). Genstara method for de novo drug design. Journal of Computer-Aided Molecular Design, 7, 2343. doi:10.1007/BF00141573 Sams-Dodd, F. (2005). Target-based drug discovery: is something wrong? Drug Discovery Today, 10, 139147. doi:10.1016/S1359-6446(04)033161 Sautel, M., & Milligan, G. (2000)... Current Medicinal Chemistry, 7, 889896. Schnecke, V., & Bostro, M. (2006). Computational chemistry-driven decision making in lead generation. Drug Discovery Today, 11, 4350. doi:10.1016/S1359-6446(05)03703-7 Schoneberg, T., Schulz, A., & Gudermann, T. (2002)... Reviews of Physiology, Biochemistry and Pharmacology, 144, 143227. Tramontano, A., & Morea, V. (2003). Assessment of homology-based predictions in CASP5. Proteins, 53, 352368. doi:10.1002/prot.10543 Tress, M., Ezkurdia, I., Grana, O., & Lopez, G., & Valencia. (2005). An Assessment of predictions submitted for the CASP6 comparative modeling category. Proteins: Structure. Function and Bioinformatics, 61(7), 2745. doi:10.1002/prot.20720 Van Dongen, M. J., Uppenberg, J., Svensson, S., & Lundback, T., A, Kerud, T., Wikstrom, M., & Schultz, J. (2002). Journal of the American Chemical Society,124,1187411880. doi:10.1021/ ja017830c Wang, S., Sim, T. B., Kim, Y. S., & Chang, Y. T. (2004). Tools for target identification and validation. Current Opinion in Chemical Biology, 8, 371377. doi:10.1016/j.cbpa.2004.06.001

Westhead, D. R., Clark, D. E., Frenkel, D., Li, J., Murray, C. W., Robson, B., & Waszkowycz, B. (1995). PRO_DRUG: An approach to de novo molecular design. 3. A genetic algorithm for structure refinement. Journal of Computer-Aided Molecular Design, 9, 139148. doi:10.1007/BF00124404 Wilson, S., & Bergsma, D. (2000)... Drug Design and Discovery, 17, 105114. Zhang, Y., & Skolnick, J. (2005). The protein structure prediction problem could be solved using the current PDB library. Proceedings of the National Academy of Sciences of the United States of America, 102(4), 10291034. doi:10.1073/ pnas.0407152101 Zheng, C., Lianyi, H., & Chun, W., Yap, Bin, X., & Yuzong, C. (2006). Progress and problems in the exploration of therapeutic targets. Drug Discovery Today, 11, 412420. doi:10.1016/j. drudis.2006.03.012

ADDItIONAL reADING
Aloy, P., Stark, A., Hadley, C., & Russell, R. B. (2003). Predictions without templates: new folds, secondary structure, and contacts in CASP5. Proteins, 53(6), 436456. doi:10.1002/prot.10546 Chu, W., Ghahramani, Z., & Wild, D. L. (2006). Bayesian Segmental Models with Multiple Sequence Alignment Profiles for Protein Secondary Structure and Contact Map Prediction. IEEE/ ACM Transactions on Computational Biology and Bioinformatics, 3(2), 98113. doi:10.1109/ TCBB.2006.17 Chung, J. L., Wang, W., & Bourne, P. E. (2005). Exploiting Sequence and Structure Homology to Find Protein-Protein Binding Sites. Proteins: Structure. Function and Bioinformatics, 62(3), 630640. doi:10.1002/prot.20741

222

Hunting Drugs for Potent Antigens in the Silicon Valley

Friedberg, I. (2006). Automated Function Prediction: the Genomic Challenge. Briefings in Bioinformatics, 7(3), 225242. doi:10.1093/bib/bbl004 Friedberg, I., & Godzik, A. (2005). Connecting the Protein Structure Universe using Recurring Fragments. Structure (London, England), 13, 12131224. doi:10.1016/j.str.2005.05.009 Friedberg, I., & Godzik, A. (2005). Fragnostic: walking through protein structure space. Nucleic Acids Research, 33, W249W251. doi:10.1093/ nar/gki363 Friedberg, I., Harder, T., & Godzik, A. (2006). JAFA: a protein function annotation meta-server. Nucleic Acids Research, 34, W379W381. doi:10.1093/nar/gkl045 Friedberg, I., Jambon, M., & Godzik, A. (2006). New Avenues in Protein Function Prediction. Protein Science, 15(6), 15271529. doi:10.1110/ ps.062158406 Friedberg, I., Jaroszewski, L., Ye, Y., & Godzik, A. (2004). The interplay of fold recognition and experimental structure determination in structural genomics. Current Opinion in Structural Biology, 14, 307312. doi:10.1016/j.sbi.2004.04.005 Grana, O., Baker, D., MacCallum, R. M., Meiler, J., Punta, M., & Rost, B. (2005). CASP6 assessment of contact prediction. Proteins, 61(7), 214224. doi:10.1002/prot.20739 Jia, Y., & Dewey, T. G. (2005). A Random Polymer Model of the Statistical Significance of Structure Alignment. Journal of Computational Biology, 12, 298313. doi:10.1089/cmb.2005.12.298 Jia, Y., Dewey, T. G., Shindyalov, I., & Bourne, P. (2004). A New Scoring Function and Associated Statistical Significance for Structure Alignment by CE. Journal of Computational Biology, 11, 787799. doi:10.1089/cmb.2004.11.787

Jones, D. T. (1997). Successful ab initio prediction of the tertiary structure of NK-lysin using multiple sequences and recognized supersecondary structural motifs. Proteins, 1, 185191. doi:10.1002/ (SICI)1097-0134(1997)1+<185::AIDPROT24>3.0.CO;2-J Kouranov, L., Xie, J., Cruz, D. L., Chen, L., Westbrook, J., Bourne, P. E., & Berman, H. M. (2006). The RCSB PDB Information Portal for Structural Genomics. Nucleic Acids Research, 34, D302D305. doi:10.1093/nar/gkj120 Lesk, A. M., Lo, Conte, L., & Hubbard, T. J. (2001). Assessment of novel fold targets in CASP4: predictions of three-dimensional structures, secondary structures, and interresidue contacts. Proteins, 45(5), 98118. doi:10.1002/prot.10056 Li, Z., Ye, Y., & Godzik, A. (2006). Flexible Structural Neighborhood - A database of protein structural similarities and alignments. Nucleic Acids Research, 34, D277D280. doi:10.1093/ nar/gkj124 Podtelezhnikov, A., Ghahramani, Z., & Wild, D. L. (2007). Learning about Protein Hydrogen Bonding by Minimizing Contrastive Divergence. Proteins: Structure. Function and Bioinformatics, 66, 588599. doi:10.1002/prot.21247 Podtelezhnikov, A., & Wild, D. L. (2005). Exhaustive Metropolis Monte Carlo sampling and analysis of polyalanine conformations adopted under the influence of hydrogen bonds. Proteins: Structure. Function and Bioinformatics, 61, 94104. doi:10.1002/prot.20513 Saqi, M. A. S., & Wild, D. L. (2005). Expectations from structural genomics revisited: an analysis of structural genomics targets. American Journal of Pharmacogenomics, 5(5), 339342. doi:10.2165/00129785-200505050-00006

223

Hunting Drugs for Potent Antigens in the Silicon Valley

Verentik, S., Bourne, P. E., Alexandrov, N. N., & Shindyalov, I. N. (2004). Towards consistent assignment of structural domains in proteins. Journal of Molecular Biology, 339(3), 647678. doi:10.1016/j.jmb.2004.03.053 Vincent, J. J., Tai, C. H., Sathyanarayana, B. K., & Lee, B. (2005). Assessment of CASP6 predictions for new and nearly new fold targets. Proteins, 61(7), 6783. doi:10.1002/prot.20722 Wild, D. L., & Saqi, M. A. S. (2004). Structural Proteomics: Inferring function from protein structure. Current. Proteomics, 1(1), 5965. doi:10.2174/1570164043488234 Yang, S., Doolittle, R. F., & Bourne, P. E. (2005). Phylogeny Determined through Protein Domain Content. Proceedings of the National Academy of Sciences of the United States of America, 102(2), 373378. doi:10.1073/pnas.0408810102 Ye, Y., & Godzik, A. (2004). Database Searching by Flexible Structure Alignment. Protein Science, 13(7), 18411850. doi:10.1110/ps.03602304 Ye, Y., & Godzik, A. (2004). FATCAT: a web server for flexible structure comparison and structure analog searching. Nucleic Acids Research, 32, W582W585. doi:10.1093/nar/gkh430 Ye, Y., & Godzik, A. (2005). Multiple flexible structure alignment using partial order graphs. Bioinformatics (Oxford, England), 21, 23622369. doi:10.1093/bioinformatics/bti353 Zemla, A. (2003). LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Research, 31, 33703374. doi:10.1093/nar/gkg571

keY terMs AND DeFINItIONs


QSAR: Quantitative Structure-Activity Relationships is the way to study the relationship between structural properties of compounds with their activities. These properties include param-

eters for considering hydrophobicity, topology, electronic determinants, and steric effects, which are screened empirically or, computationally. QSAR steps include chemical experimental measurements and biological assays. It is in application now in many fields, related to drug designing. Docking: It is a technique which estimates the best possible conformation when one molecule interacts with another one, to form a stable active complex. It considers the Complementarity of both the molecules, for each other based on charge, entropy, conformation, and all electronic and steric hindrance parameters, to predict the strength of binding between both the molecules. Drug: It is a signaling molecule which binds to target protein, at specialized sites, by interactive forces like ionic bonds, hydrogen bonds and non-bonded interactions, and their strength of binding is thus known as binding affinity. Such interactions are usually reversible as drugs get dissociated after triggering the target proteins. Drugs generally include activators, inhibitors, substrates etc. Simulation: It is just the imitation of some real process based on the physical potentials and the equations of force fields. It displays the motion of each of the interacting moiety per unit of time step considered, in the considered process. It generally represents key characteristics of the considered system. This study is really worth, when we wish to study the in depth mechanism behind action or interaction of a molecule in the selected system. APC: Antigen-presenting cells are specialized white blood cells (WBC) that help the immune system fight against foreign particles. These cells trigger other specialized T-cells when a non-self antigen is encountered in the body. Each of these specialized T-cells is specially equipped to combat specific pathogens, which may be a bacteria, virus or toxin. APC include non-professional cells also like fibroblasts, thymic endothelial cells, glial cells, and many others. All these cells essentially play an important role, in stimulating a significant

224

Hunting Drugs for Potent Antigens in the Silicon Valley

immune response, against an encountered antigen, to facilitate its quick elimination from the system. MHC: Major Histocompatibility Complex makes up a big stretch in the mammalian genome. These proteins are expressed on cell surface in all jawed vertebrates, and display both self and nonself antigens in reference to T cells, which play a significant role in either coordinating or directly killing pathogens, infected or malfunctioning cells. MHC-I is present on all nucleated cells and MHC-II is present on most immune cells, specifically on Antigen Presenting Cells. CD: Cluster of Differentiation defines cellsurface molecules which are recognized by the specified given set of monoclonal antibodies. This cluster is divided into many terms and named in the order of their discovery. Simply, each CD is related with function(s), which were revealed because of the effects on cell or tissue function of the antibodies defining it. HLA: Human Leukocyte Antigen is a major gene product of the MHC complex and it has a strong influence on human allotransplantation, transfusions in obstinate patients. There are 4 of these significant antigens in the HLA complex, located on chromosome six and are known as HLA-A, HLA-B, HLA-C, and HLA-D. Each of these loci has several genetically determined alleles, each being associated with a specific dis-

ease. This system is employed for assessing tissue compatibility. Always remember that, the perfect compatibility exists between identical twins only. Epitope: It is the localized region on an antigen surface and is capable of eliciting an immune response after being encountered by immune system. It is done because of its specific interaction with the specialized antibody, which counter this response. It is also known as an antigenic determinant and the antibody which recognizes an epitope is known as paratope. RMSD: Root Mean Square Deviation is the statistical analysis employed in the comparison of two molecular conformations with well defined structural set of atoms. It is used to find structural divergence between two conformations. It considers a structure as a set of i vectors, each of which is defined in terms of three coordinates. Therefore, RMSD calculates the divergence by considering two such sets of vectors, say ai and bi. Then, the divergence is calculated by the square distance: D = i |ai bi|2 and therefore RMSD is (D / i). This RMSD is generally calculated considering the alpha carbon atoms in a protein as the centre of mass, as they can be easily aligned together between two protein structures. RMSD considering all atoms can be calculated but it is mostly larger than that of C atoms, as main chain considering all atoms is obviously more restrained.

225

Вам также может понравиться