Академический Документы
Профессиональный Документы
Культура Документы
net/publication/265176588
CITATIONS READS
23 133
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Michael Gromiha on 15 November 2015.
Structural bioinformatics
© The Author (2014). Published by Oxford University Press. All rights reserved. For Permissions, please email: 1
journals.permissions@oup.com
complexes (Kastritis and Bonvin, 2013). Tian et al. (2012) devel- subgroups (percentage of aromatic and positively charged residues, hydro-
oped a method using 378 features, which caused the possibility of phobic and hydrophilic residues). The inspection of results obtained for all
over-fitting. Hence, it is necessary to develop accurate and reliable the classifications by comparing the maximum correlation obtained with
methods to address the task of predicting protein-protein binding single feature and combination of two features showed that the classifica-
tion based on function is the best one for further analysis (see results sec-
affinity. Recently, we have devised a model based on machine
tion).
learning approaches for distinguishing low and high affinity pro- The function-based classification has been made with seven groups. (i)
tein-protein complexes using sequence based parameters Antigen-antibody complexes, (ii) Enzyme-inhibitor complexes, (iii) Other
(Yugandhar and Gromiha, 2014). enzymes (complexes of enzymes with proteins other than inhibitors), (iv)
In this work, we have developed a novel approach for predicting G-protein containing complexes, (v) Receptor containing complexes, (vi)
the binding affinity of a given protein-protein complex from its Non-cognate complexes, and (vii) Miscellaneous complexes. Further, the
amino acid sequence. We have classified the protein-protein com- miscellaneous complexes have been subdivided into three different classes
plexes based on their function, molecular weight and percentage of based on percentage of predicted binding site residues in the complex (see
binding site residues and analyzed the relationship between se- section 3.3.7) viz. miscellaneous I, II and III (≤ 4%, 4 to 8% and >8% of
quence and structural properties of interacting free proteins and predicted binding site residues respectively). These nine classes cover the
binding affinity. This imparts biological relevance to the prediction entire dataset of 135 protein-protein complexes.
method as the function of the complex is also taken into considera-
tion. We have classified the complexes into nine groups based on 2.3 Regression models and validation
2
ous” has the correlation of less than 0.5 for any of the considered Further, the property based on percentage of predicted charged
features and we grouped into three categories based on the per- residues at the interface has been identified as one of the important
centage of binding site residues at the interface. factors. This is supported by the previous studies that aromatic and
charged residues at the protein-protein interface play key role in
complex formation (Monaco-Malbet et al., 2000; Pruett and Air,
3.2 Prediction of binding affinity
1998). Moreover, the mutation of charged residues at the binding
The single property correlation for the protein-protein complexes sites showed a large reduction in the affinity and are important for
belonging to nine subclasses are presented in Table 2. Inspection the recognition process (Pruett and Air, 1998). Interestingly, our
of the results presented in this table revealed that the absolute cor- method could accurately predict the binding affinity of 94% of the
relation is more than 0.6 in most of the subclasses and the correla- complexes with in a deviation of 1 kcal mol-1 using jack-knife test.
tion is more than 0.8 for three of them. Further, we have performed
a step-wise least square fit using multiple regression technique for 3.3.2 Enzyme inhibitor complexes (EI)
identifying the combinations of features to predict the binding The interacting partners in this class of protein-protein complexes
affinity at high accuracy. For avoiding over-fitting we limited the are enzymes and inhibitors. The binding free energy of the 31
combinations up to five from the selected list of 113 properties. complexes in this class has the variation of 10 kcal mol-1. CDK2-
The number of properties used in each subclass and the correlation CKSHS1protein complex has the lowest free energy of 9.7 kcal
coefficient with binding affinity is shown in Table 2. We noticed mol-1 and the complex between Colicin E9 nuclease and Im9 im-
that the correlation lies in the range of 0.836 to 0.998 in the sub munity protein has the highest free energy of 18.6 kcal mol-1.
3
Table 2. Correlation between amino acid properties and binding affinity
5
key role especially by forming hydrogen bonds at protein-protein Further, we have compared the performance of our method (PPA-
interfaces are represented using different colors. Pred) with other structure based methods viz. DFIRE (Liu et al.,
2004), PMF (Su et al., 2009) and consensus approach (Moal et al.,
3.6 Comparison of multiple regression and SVM based 2011) using a dataset of 137 complexes along with its subsets with
flexible and rigid complexes (Complexes with interface RMSD
regression methods
(cα atom) < 1Å are considered as rigid and the remaining as flexi-
We have compared the results obtained in the present study using ble). Our method showed better performance than all of them and
multiple regression technique with support vector regression and it could predict efficiently for both rigid as well as flexible com-
the details are given in Supplementary Table S5. We observed that plexes (Figure 3). The improvement of correlation using our meth-
the performance of the present method is better than or similar to od is 0.25, 0.44 and 0.32 in rigid, flexible and all complexes re-
that obtained with support vector regression in most of the classes. spectively.
Further, the average correlation is 0.91 and 0.88 respectively, for
multiple regression and support vector regression. The average 3.8 Prediction on the web
MAE for these models is 0.52 and 0.60. This shows the better per- We have developed a web server named "Protein-Protein Affinity
formance of the present method than support vector regression. Predictor (PPA-Pred)" for predicting the binding affinity of pro-
tein-protein complexes from amino acid sequence. It takes the
functional information and amino acid sequence in FASTA format
3.7 Comparison with previous methods as input. The output includes the predicted value of binding affini-
REFERENCES
Albeck,S. and Schreiber,G. (1999) Biophysical characterization of the interaction of
the beta-lactamase TEM-1 with its protein inhibitor BLIP. Biochemistry. 38, 11-21.
Andersen,P.S. et al.(1999). Role of the T cell receptor α chain in stabilizing TCR-
Fig. 3. Comparison of the performance of our method (PPA-Pred) with superantigen-MHC class II complexes. Immunity, 10(4), 473-483.
previous methods.
6
Arkin,M.R. and Wells,J.A. (2004) Small-molecule inhibitors of protein–protein inter- Liu,S. et al. (2004) A physical reference state unifies the structure‐derived potential of
actions: progressing towards the dream. Nature Reviews Drug Discovery, 3(4), mean force for protein folding and binding. Proteins 56(1), 93-101.
301-317. Lu,W. et al. (1999) Probing intermolecular backbone H-bonding in serine proteinase-
Audie,J. and Scarlata,S. (2007) A novel empirical free energy function that explains protein inhibitor complexes. Chem. Biol., 6(7), 419-427.
and predicts protein-protein binding affinities. Biophys. chem., 129, 198-211. Luo, J. et al. (2014) A functional feature analysis on diverse protein–protein interac-
Bahadur,R.P. et al. (2003) Dissecting subunit interfaces in homodimeric proteins. tions: application for the prediction of binding affinity. J. Comput. Aided Mol.
Proteins, 53, 708-719. Des., 28(6), 619-629.
Bonsor,D.A. et al. (2007). Molecular mimicry enables competitive recruitment by a Ma,X.H. et al. (2002) A fast empirical approach to binding free energy calculations
natively disordered protein. J. Am. Chem. Soc., 129(15), 4800-4807. based on protein interface information. Protein Eng., 15, 677-681.
Campbell,N.A. and Reece,J.B. (2002) Biology. San Francisco: Benjamin Cummings., Maenaka,K. et al. (2001). The human low affinity Fcγ receptors IIa, IIb, and III bind
6th ed. ISBN 0-8053-6624-5. IgG with fast kinetics and distinct thermodynamic properties. J. Biol. Chem.,
Chakrabarti,P. and Janin,J. (2002). Dissecting protein–protein recognition sites. Pro- 276(48), 44898-44904.
teins: Structure, Function, and Bioinformatics, 47(3), 334-343. McDonald,I.K. and Thornton,J.M. (1994) Satisfying hydrogen-bonding potential in
Charton,M. and Charton,B.I. (1982). The structural dependence of amino acid proteins. J. Mol. Biol., 238, 777–793.
hydrophobicity parameters. J. Theor. Biol., 99(4), 629-644. Moal,I.H. et al. (2011) Protein-protein binding affinity prediction on a diverse set of
Charton,M. and Charton,B.I. (1983) The dependence of the Chou-Fasman parameters structures. Bioinformatics, 27, 3002-3009.
on amino acid side chain structure. J. Theor. Biol., 102(1),121-134. Monaco-Malbet,S. et al. (2000) Mutual conformational adaptations in antigen and
DeLano, W. L. (2002) The PyMol Molecular Graphics System. DeLano Scientific, antibody upon complex formation between an Fab and HIV-1 capsid protein
San Carlos, CA, USA. p24. Structure, 8(10), 1069-1077.
Eathiraj,S. et al. (2005). Structural basis of family-wide Rab GTPase recognition by Ofran,Y. and Rost,B. (2007) Interaction sites identified from sequence. Bioinformat-