Вы находитесь на странице: 1из 26

Prediction of Catalytic residue

through ANN
K.P.Mishra (Director)
Brijesh Singh Yadav (Senior
Research Associate)
Sweta Gupta (Research
Associate)

United Research Center, UIT campus,


Allahabad.
e.Mail. brijeshbioinfo@gmail.com
AIM

Develop a new method which help


in identifying the surface
chemistry of active site residue in
a protein.
This method help in ligand
designing, molecular docking, de
novo drug designing and
structural identification, and
comparison of functional site.
Introduction

Computer aided drug designing


are two types
Ligand designing
Active site drug designing
About Neural Network

Neural network is a set of


connected input/output units
where each connection has a
weight associated with it.
Applications of Neural network
are-
speech recognition
About Neural Network

Neural network method can be


used for :
classification,
clustering,
modelling and
prediction of biological data.

• Neural networks can learn by two


methods:
• Supervised learning
WORKING OF NEURAL
NETWORK
Weights

x0 w0j Bias
θj
x1
w1j
∑ f Output

wnj
xn

Inputs Weighted sum Activation


function
How Does the Neural
Network Learn
1. The network gets a training example and,
using the existing weights in the network, it
calculates the output.
2. Backpropagation then calculates the error, by
taking the difference between the calculated
result and the expected (actual result).
3. The error is fed back through the network
and the weights are adjusted to minimize the
error.
About Active Site of Proteins
Proteins are polymers of amino acids linked by
peptide bonds.
The region of a protein that interacts with a ligand is
generally referred to as the “active site.”
Ligands can be proteins, DNA or smaller molecules,
such as pharmaceutical compounds.
The active site generally lies on the surface of the
protein. In some cases, the active site is buried within
the protein.
Residues with reactive groups (Asp, Glu, Ser, Cys,
His, Lys, Arg) tend to be abundant in protein active
sites. The Ser-His- Asp (sometimes Ser-His-
Glu)“catalytic triad” is a motif commonly found in
enzyme active site.
Levels of Protein
Property
Residue properties: which responsible for
their function and structure
• Polar or No polar
• Aromatic or Aliphatic
• Acidic or Basic
• Charged ( either positive or negative) or
uncharged
• Contain Sulfur
• Making H bonding
• Essential or Nonessential
• Cyclic
Why predict protein function
and structure?

Protein's structural property helps


to identify the various anomalies
& diseases and rectify them at
genetic level.
Identifying the surface chemistry
of ligand binding sites residues in
a protein .
Help in ligand designing,
molecular docking, de novo drug
Early methods

Statistical method
Homology Modeling method
Physio-chemical methods
Evolutionary conservation
Sequence patterns method
Approach to
structure prediction
Input is encoding as binary form in15
different property of catalytic or no
catalytic triad.(3X15=45 X122)where
70 active and 52 nonactive site
residues of protein.
Create network training program
using Backpropagation method
where input, output, hidden layer,
learning rate and epoch are fixed
within the code.
Network Design
Methodology used
 Collect structural proteins containing active site
residues from PDB.
2. Searching active site Residues through Ligplot.
3. Searching nonactive site Residues through Surface
Racer.
4. Mapping of protein residues in binary digit with
their 15 properties
5. Create a neural network (a computer program)
6. “Train” it by using proteins with known Active site
and non active residues property .
7. Testing the network with unknown protein residues.
Methodology used
Collect protein-ligand Searching Active Site
PDB complex hetro atom residues with Ligplot
database

Searching non active site


residues using Surface racer

Distinguished amino Acid


with 15 different properties

Mapping All 20 amino acid


in the form of binary digit

Result checked and Testing the neural network Created and Trained the
verified on the unknown protein neural network on the above
using MATLAB residues data set using MATLAB
PDB data selected
We select about 100 protein from pdb .
some example showing the below

Protein-ligand
complex
1a4k.pdb
1a4q.pdb
1a5g.pdb
1a42.pdb
1a46.pdb
1a50.pdb
Protein-ligand interaction
showing by Ligplot
Some protein with their active site
residues
Protein Active site residue

1a4k Glu 81A Asn35B


1a5g Asp102 His57 Ser195 Gly216
1a42 His199 Gln137 Thr199 Glu205
1a46 Gly216 Lys375
1a50 His86 Ser227 Asn236 Gly230
1a94 Ala6c Asp29A
Amino Acid Encoding
Scheme
Active site Residue (output 1 0)
1dih.pdb Arg,Gly,Thr 000001010010000,000010001010001,100000100101000

1ecv.pdb Arg,Gln,His 000001010010000,100010100010000,000001010010000

1fkb.pdb Asp,Glu,Ile 000010001010000,000010001010000,011000000100000

Non active site residues (output 0 1)


9rub.pdbARG,GLU,PHE000001010010000,000010001010000,010100000100000

8cpa.pdb ARG,ASN,LYS000001010010000,100000100010000,000001010100000

8atc.pdb ALA,ASN,GLu011000000010000,100000100010000,000010001010000
training & testing
data Network
Create a program using Matlab Function
for the training of neural network.
The program develops through
Backpropagation method which contain
the variable like train data, train output,
all node, epoch, learning rate, and Error.
A typical architecture is a fully-connected
network (122 inputs,5hiddenlayer, 2
outputs).
We train the network giving different value
of learning rate and hidden layer when we
obtain minimum error then stop the
training.
For the testing of result we also generate
Results and Data

Performance of Training set-


>>116/122)*100
Result- =95.0820% correct prediction
Performance of Testing set-
>> 38/40*100
Result=95.00% correct prediction
Total no. of epoch- 100
Learning rate- .05
False positive- 2 out of 122
False negative-3 out of 122
Performance
Measurement
p = Number of correctly classified
catalytic residues.
n = Number of correctly classified non-
catalytic residues.
o = Number of non-catalytic residues
incorrectly predicted to be catalytic
(over-predictions).
u = Number of catalytic residues
incorrectly predicted to be non-
catalytic (under-predictions).
t = Total residues (p + n + o + u).
Discussion and
Conclusion
Neural network architecture developed
predicts Active site structure of protein with a
performance of almost 95% which is far above
as reported so far.
The analysis of the optimal subset
selected from the initial 15 residue
properties indicates that the
algorithm learns to distinguish
catalytic from non-catalytic residues
based on structural &functional
protein residues.
This method help in ligand
Reference-
[1] - R. A. LASKOWSKI, N. M. LUSCOMBE, M. B.
SWINDELLS and J.M.THORNTON Protein clefts in
molecular recognition and function Protein Sci.1996 5:
2438-2452
[2] - Martin Stahl, Chiara Taroni and Gisber
Schnei:Mapping of protein surface cavities and
prediction of enzyme class by a selforganizing neural
network .
[3] - Bartlett GJ, Porter CT, Borkakoti NThornton JM., ]
Analysis of catalytic residues in enzyme active sites.
Department of Biochemistry and Molecular Biology,
University College London, Darwin Building, Gower
Street, London WC1E 6BT, UK. J Mol Biol. 2002 Nov
15;324(1):105-21
[4]-Campbell SJ, Gold ND, Jackson RM, Westhead DR.:
Ligand binding: functional site location, similarity and
docking.School of Biochemistry and MolecularBiology,
University of Leeds, Leeds, LS2 9JT, UK. Current Opinion
Acknowledgement
I would like to express my sincere thanks to
Dr. (Smt.) Navita Shrivastava, Head, Dept. of
Computer Science, A.P.S. University, Rewa (MP)
Mr.Pritish Kumar Varadwaj Lecturer Indian Institute of
Information Technology Allahabad (UP)
Mr.Rajeev Prithyani Lecturer Dept. of Computer
Science, A.P.S. University, Rewa (MP)
Mr.Sandeep Kushwaha Lecturer Dept. of Computer
Science, A.P.S. University, Rewa (MP)

for their kind supervision and keen interest during


preparation of this project.

Вам также может понравиться