Michael Odorico and Jean-Luc Pellequer* CEA ValrhoCentre de Marcoule, DSV/DIEP/SBTN, BP17171, 30207 Bagnols sur Ce`ze, France In growing need of obtaining highly specic monoclonal antibodies against novel proteins, we developed new functions implemented in the program BEPITOPE to predict continuous protein epitopes. This program not only can compute, combine, display and print prediction proles, but also provides a list of suggested linear peptides to be synthesized. Novel facilities incorporated in BEPITOPE include the treatment of a whole genome, the search for a user-dened pattern, and the combination of prediction to pattern proles. This latter approach is useful to remove unwanted predictions such as those including glycosylation sites. Copyright # 2003 John Wiley & Sons, Ltd. Keywords: antigenic site; propensity scale; epitope prediction; hybridoma; monoclonal antibodies INTRODUCTION Monoclonal antibodies are exquisite tools in molecular biology and are routinely used to characterize, localize and purify target proteins (Goldman 2000). Synthetic peptides are often used to obtain monoclonal antibodies (Van Regenmor- tel et al., 1988) using the hybridoma technology. However, the hybridoma technology is expensive and time-consuming. Therefore, identifying synthetic peptides is critical for obtaining high-quality cross-reacting mono-clonal antibo- dies. Dissection of physico-chemical and structural properties of known continuous protein epitopes, regions recognized by antibodies has been performed (Van Regenmortel, 1999). Previously identied properties include hydrophilicity (Hopp and Woods 1981), exibility/mobility (Tainer et al., 1984; Westhof et al., 1984), accessibility (Barlow et al., 1986) and turns (Rose et al., 1985). Quantication of these properties is determined by assigning a value to each of the 20 natural amino acids, a so-called propensity scale. A recent method combines several of these propensity scales (Alix, 1999) creating a consensus prediction using a so-called antigenic index (Jameson and Wolf 1988). Our previous program PREDITOP was aimed at objec- tively analyzing the prediction success of various methods as well as that of various propensity scales (Pellequer et al., 1991; Pellequer and Westhof 1993). In this paper, we present a new version of the PREDITOP program, BEPITOPE, aimed at predicting continuous protein epitopes and search- ing for patterns in either a single protein or a complete translated genome. BEPITOPE uses standard methods and other complex predictive strategies, such as a method based on turn prediction which has been shown to be successful in predicting continuous epitopes (Pellequer et al., 1993). PROGRAM DESCRIPTION Standard methods Continuous epitopes in proteins share common proper- ties. They are exposed on the surface, often localized in loop regions that are usually exible and composed of polar residues. Therefore, an ideal epitope prediction method must reect these properties. A protocol of the BEPITOPE program selects putative epitopes employing ve methods described below, including one standard and four complex ones. This consensus prediction is made by summing the frequency of an amino acid predicted to be located in an epitope. A user can select either one or all of these ve methods in the nal consensus prediction. Standard methods aim at predicting a given property encoded into a propensity scale. The computation algo- rithms were described elsewhere (Pellequer et al., 1991). The BEPITOPE program adapts more than 30 propensity scales such that the positive values in the hydrophobicity scale correspond to hydrophilic regions, and those in a exibility scale represent exible regions, etc. Conse- quently, putative epitopes are indicated by peaks plotted in the prediction prole generated by the BEPITOPE program. Each putative epitope contains a sequence of 15 residues, centered around each peak of the prole selected above a user-dened threshold. The three complex epitope prediction methods in BEPITOPE are based on protein exibility (Karplus and Schulz, 1985), protein accessibility (Emini et al., 1985), and turns in proteins (Pellequer et al., 1993). We modied the last method to rank predicted epitopes according to their hydrophilicity. It should be noted that this method is now completely automated (Plate 1). The fourth complex epitope prediction method aims to detect amphipathic helices by calculating the hydrophobic moment. JOURNAL OF MOLECULAR RECOGNITION J. Mol. Recognit. 2003; 16: 2022 Published online in Wiley InterScience (www.interscience.wiley.com). DOI:10.1002/jmr.602 Copyright # 2003 John Wiley & Sons, Ltd. *Correspondence to: J.-L. Pellequer, CEA ValrhoCentre de Marcoule, DSV/ DIEP/SBTN, BP17171, 30207 Bagnols sur Ce`ze, France. E-mail: jlpellequer@cea.fr Plate 1. Plate 2. BEPITOPE: PATTERN AND EPITOPE PREDICTION Copyright # 2003 John Wiley & Sons, Ltd. J. Mol. Recognit. 2003; 16 Pattern search Proteins in eukaryotic cells often contain post-translational modications. These modications often occur in loops that possess similar properties as epitopes. Thus, to strengthen the efcacy of epitope prediction, BEPITOPE produces user-dened patterns. Once the prole of a pattern is established, it can be combined with any epitope prediction method mentioned above to remove unwanted predictions. For instance, a user may want to delete all predicted glycosylation sites from an epitope prediction (Plate 2). BEPITOPE allows one to add, subtract, multiply and divide prediction proles. In addition, a prole can be accentuated by a user-dened weight. Patterns are produced by an expression generator. The expression generator will create the appropriate number of proles, depending on the number of unspecied residues. An expression contains elements enclosed in parentheses. An element may contain an amino acid, a group of amino acids, a code representing a physico-chemical property or a range of residues. For example, the expression (LIV)(X2- 3)(LIV) generates two proles (LIV)(XX)(LIV) and (LIV)(XXX)(LIV). BEPITOPE generates a prole window of size n 20, where n is the number of residues in the pattern. Fine tuning of the prole expression can be performed by editing weights for each residue. Genome strategy The purpose of predicting continuous epitopes is to synthesize peptides that will be used to raise monoclonal antibodies. To reduce the cross-reactivity with unrelated proteins, the selected peptides must be specic. A mono- clonal antibody may cross-react with proteins present in an organism that contains the protein of interest. The antibody may also cross-react with proteins in the host that produces monoclonal antibodies. To alleviate this problem, BEPI- TOPE identies redundancy of all predicted epitopes using substitution matrices in an entire genome (BLOSUM62, PAM250, or any one present in the matrix directory). As a result, one can choose a putative epitope that has the least chance to cross-react with unwanted proteins. By default, the cut-off value to identify similar peptides is set to an e- value of 10 7 . Getting help and availability BEPITOPE contains an on-line help menu. Each graphics icon in a window is linked to an hypertext help through a classical web browser. A Frequently Asked Questions menu is also implemented for further support of the program. This FAQ also contains tips on how to use/ understand the program or error messages. The FAQ is written in html and therefore can be easily upgraded. The program is available free for academia on request from the authors. A license agreement is, however, required. The program is implemented in Delphi5 2 under the Windows 1 operating system. Acknowledgements We thank Olivier Pible for valuable comments during this work. Plate 1. Graphics interface of a complex turn prediction method. Top: a user can enter a sequence accession number corresponding to the selected database on the right-hand side. The programwill download the sequence froma remote database through the Internet. Below: a sequence name can be typed or browsed fromthe local disk. Information about the sequence read is shown belowincluding the sequence length, rst residue number and last residue number. A propensity scale can be selected using the pull-down menu. Here, the scale 4turn33 is a combination of four classical turn33 propensity scales as developed in Pellequer et al. (1993). Five graphics icon on the right correspond to the help menu, the reset button, the paste from clipboard button, the clear button, and the set-up button, respectively. Below in the white window is displayed the selected protein sequence. Part of the sequence can be selected using the left mouse button. Below the sequence window, computation parameters can be set. From left to right, a user can choose to add a smoothing procedure (Gaussian by default), the width of the Gaussian distribution, the length of the window to compute scores, the central position were the score will be assigned, and the type of arithmetical operation to combine the four predicted curves (addition or multiplication). On the right, four graphics icons are used to start the computation, to zoom in/out in the result window, to save the prediction prole on disk, to print the prediction prole, respectively. On the extreme right, a check box is used to obtain secondary structure prediction through the NPS@ remote server (Combet et al., 2000). Results are as follows: a red square corresponds to helix; a blue square to strand; an olive square to turns or coil; and a gray square is undetermined. The red curve corresponds to a turn prediction using the multiplication approach whereas the blue curve corresponds to the addition approach. Predicted epitopes are shown on the left window with their associated computed values as well as on the plot as shown with red triangles. Plate 2. Removing putative glycosylation sites from an epitope prediction plot. A standard hydrophilicity curve of the renin protein is shown in blue and the O-glycosylation motif detection curve in red. Subtraction of these two curves produces the green curve. It shows that two hydrophilic peaks coincide with two putative glycosylation sites. Other permitted mathematical operators are shown on the top right. Digits are used for increasing the weight of a particular plot. Sliding bars (in green and yellow) allow a user to align prediction proles by shifting a smaller sequence along a larger one. Label messages appear on each graphical icon pointed by the mouse. BEPITOPE: PATTERN AND EPITOPE PREDICTION 21 Copyright # 2003 John Wiley & Sons, Ltd. J. Mol. Recognit. 2003; 16: 2022 REFERENCES Alix AJ. 1999. Predictive estimation of protein linear epitopes by using the program PEOPLE. Vaccine 18: 311314. Barlow DJ, Edwards MS, Thornton JM. 1986. Continuous and discontinuous protein antigenic determinants. Nature 322: 747748. Combet C, Blanchet C, Geourjon C, Dele age G. 2000. NPS@: network protein sequence analysis. Trends Biochem. Sci. 25: 147150. Emini EA, Hughes JV, Perlow DS, Boger J. 1985. Induction of hepatitis A virus-neutralizing antibody by a virus-specic synthetic peptide. J. Virol. 55: 836839. Goldman RD. 2000. Antibodies: indispensable tools for biomedi- cal research. Trends Biochem. Sci. 25: 593595. Hopp TP, Woods KR. 1981. Prediction of protein antigenic determinants from amino acid sequences. Proc. Natl Acad. Sci. USA 78: 38243828. Jameson BA, Wolf H. 1988. Predicting antigenicity from protein primary structure. Comput. Applic. Biosci. 4: 181186. Karplus PA, Schulz GE. 1985. Prediction of chain exibility in proteins. A tool for the selection of peptide antigens. Naturwissenschaften 72: S.212. Pellequer JL, Westhof E. 1993. PREDITOP: a program for antigenicity predictions. J. Mol. Graph. 11: 204210. Pellequer JL, Westhof E, Van Regenmortel MHV. 1991. Predicting the location of continuous epitopes in proteins from their primary structures. Meth. Enzymol. 203: 176201. Pellequer J-L, Westhof E, Van Regenmortel MHV. 1993. Correla- tion between the location of antigenic sites and the prediction of turns in proteins. Immunol. Lett. 36: 83100. Rose GD, Gierasch LM, Smith JA. 1985. Turns in peptides and proteins. Adv. Prot. Chem. 37: 1109. Tainer JA, Getzoff ED, Alexander H, Houghten RA, Olson AJ, Lerner RA, Hendrickson WA. 1984. The reactivity of anti- peptide antibodies is a function of the atomic mobility of sites in a protein. Nature 312: 127134. Van Regenmortel MHV. 1999. Molecular dissection of protein antigens and the prediction of epitopes. In: Synthetic Peptides as Antigens, Van Regenmortel MHV, Muller S (eds). Elsevier: Amsterdam, 178. Van Regenmortel MHV, Briand JP, Muller S, Plaue S. 1988. Synthetic Polypeptides as Antigens. Elsevier: Amsterdam. Westhof E, Altschuh D, Moras D, Bloomer AC, Mondragon A, Klug A, Van Regenmortel MHV. 1984. Correlation between segmental mobility and the location of antigenic determi- nants in proteins. Nature 311: 123126. 22 M. ODORICO AND J.-L. PELLEQUER Copyright # 2003 John Wiley & Sons, Ltd. J. Mol. Recognit. 2003; 16: 2022