Вы находитесь на странице: 1из 46

AUTODOCK

An Automated Docking Software for


Predicting Optimal Protein-Ligand
Interaction

By
Susan McClatchy, Milind Misra,
Chandreyee Mukherjee, Indu Shrivastava

Introduction
Chandreyee Mukherjee

Automated Docking:
Importance

Interaction between biomolecules lie at the core of all


metabolic processes and life activities
The number of solved protein structures available in the
databases is expanding exponentially
To understand their functions it is essential to elucidate the
interaction mechanisms between the different molecules
Primary importance lies in rational drug design
Depending upon the success of the docked molecules the
docking ligand may be redesigned or its structure further
refined.
Also important in the area of immunology to study antigenantibody interaction.

Inhibitor bound to active site of HIVPR

Surface structure of HIVPR with bound


inhibitor

What is docking?
Prediction of the optimal physical configuration and
energy between two molecules
The docking problem optimizes:

Binding between two molecules such that their


orientation maximizes the interaction

Evaluates the total energy of interaction such


that for the best binding configuration the
binding energy is the minimum

The resultant structural changes brought about


by the interaction

Categories of docking
1. Protein-Protein Docking:
Both molecules are rigid
Interaction produces no change in
conformation
Similar to lock-and key model
2. Protein-Ligand Docking:
Ligand is flexible but the receptor protein is
rigid
Interaction produces conformational
changes in ligand

1. Protein-Protein Docking

2. Protein-Ligand Docking

optimized

Docking uses a search


and score method
It involves:

Finding useful ways of representing the molecules and

molecular properties.
Exploration of the configuration spaces available for
interaction between ligand and receptor.
Evaluate and rank configurations using a scoring system, in
this case the binding energy

However, since it is difficult to evaluate the binding energy


because the binding sites may not be easily accessible, the
binding energy is modeled as follows:
G

bind

= Gvdw + Ghbond + Gelect + G

conform

+ G

tor

+ G

sol

The AutoDock Software


Developed by AJ Olsons group in 1990.
AutoDock uses free energy of the docking
molecules using 3D potential-grids

Uses heuristic search to minimize the energy.


Search Algorithms used:
Simulated Annealing
Genetic Algorithm
Lamarckian GA (GA+LS hybrid)

Algorithms Overview
Simulated Annealing

Based on temperature effects


Start with high temperature and global search
Lower temperature local search

Genetic Algorithm
Charles Darwins Theory of Evolution

Genotype
Phenotype
Lamarckian Algorithm ( Jean Baptiste de
Lamarck)
Phenotype
Genotype

Project Goal
Study algorithms used to perform the
searches and to calculate minimum
energy

Discuss why GA+LS hybrid better than


SA

Look at an example, i.e., dock a ligand


to a protein molecule using latest
AutoDock version

The Algorithms
Sue McClatchy

Simulated Annealing

Algorithm modeled after the cooling of a solution to


form glass, though its better explained by crystal
formation
Given a long enough cooling time, molecules will relax
into their lowest energy state to form the largest
crystals
Quick cooling - highly disordered system
Slow cooling - highly ordered crystal, with each
molecule in its lowest energy state
Algorithm simulates either linear or proportional slow
cooling

The SA Algorithm

Uses neighborhood operator N(s) to generate a set of solutions


according to a fixed distribution
New solution compared to preceding solution, and is accepted if
its energy is lower than that of previous solution
If new solution has higher energy, it is accepted probabilistically
according to Boltzmann distribution (see figure above)
At high temperatures, many higher energy solutions will be
accepted; at low temps., majority of probabilistic moves rejected
Boltzmann probability distribution = e exp(delta E/T) where
delta E = energy difference between two solutions,
T = temperature
Boltzmann finds p(of finding a system with energy E at temp T)

Pseudocode for SA
Compute a random initial state s
n=0, x*n = s
// initialize best solution to s and first state to 0
Repeat i = 1, 2,
// specify number of temperatures to try
Repeat j = 1, 2, , mi // no. of steps to perform for each temp. Ti
Compute a neighbor s = N(s) // s = new solution from N(s)
if (f(s) <= f(s)) then
// if energy of s <= energy of s
s = s
// accept new solution s
if (f(s) < f(x*n)) then
// if energy of new solution <
x*n = s
// energy of best solution of
n=n+1
// state n, replace best with new
endif
else
// otherwise replace s with s using
s = s with probability e (f(s) - f(s))/Ti
// Boltzmann dist.
endif
EndRepeat
EndRepeat

How Genetic Algorithms


Work - A Simple Example

Initial population of
binary creatures
having 6 genes
Each gene has two
different alleles,
either a 0 or a 1
Three operators:
crossover, mutation
and selection

Selection based on a
fitness function f(x)
This operator chooses
those individuals with
the lowest values
Those with higher
values chosen with a
very low probability

Sco
re

Selection

20

13

48

52

Crossover
1 1 1

1 0 0

0 1 0 0

0 0 0

0 0 1

Mutation
0

0 1 0 0

1 1 0 0

0 1

1 1

Lower scoring individuals


create more offspring, higher
scoring ones create fewer or
none at all
Offspring replace parental
generation
Elitism function allows best
individual from parent
generation to persist, if it is a
better solution than new
individuals created
Cycle of selection, mutation,
crossover and replacement
repeated

Sco
re
#o
fs
p

Replacement
0

15

22

Pseudocode for GA
Select an initial population set xi0 = {x10 , x20,, xM0}
Determine fitness values f(xi0) for each individual
Repeat for g = 1, 2, # of generations
Perform selection
Perform crossover with probability
Perform mutation with probability
Determine fitness f(xig) for new individuals
xg* = argmini=1,M f(xig) and yg* = f(xg*)
Perform replacement
Until stopping criterion (# of generations) is reached

How GA works in
AutoDock

Ligands genes are its


x, y and z coordinates
These form a unit vector,
which is given a random
rotation angle between
0o and 360o to form a
quaternion
Additional genes may
represent torsion angles
between bonds of the
ligand

Mapping

In standard GA, the


genotype (x,y,z coordinates
plus rotation and any
torsion angles) are mapped
to the fitness function f(x)
The fitness function value
corresponds to each
individuals phenotype
According to the right hand
side of the figure,
genotypes of parents with
high f(x) values are
mutated to form genotypes
of children with lower f(x)
values

Selection, Crossover &


Mutation

Selection chooses ligands


with the lowest fitness
(energy) values
Crossover exchanges x, y,
z coordinates, or rotations
or torsions between these
ligands
Example: Two ligands
with xyz coordinates Abc
and aBc Crossover results
in new individuals with
coordinates abc and ABc

Mutation operator
mutates coordinate or
other angle values by
adding a random real
number according to a
Cauchy distribution,
which is similar to a
Gaussian but has thicker
tails

Replacement

Individuals with betterthan-average fitness


receive proportionally
more offspring

no= (fw fi)/(fw - <f>),


fw != <f>
where
no= number of offspring
fi = fitness of individual
(energy of ligand)
fw = fitness of worst
individual in last g
generations (typically 10)
<f> = mean fitness of
population

Lamarckian Genetic
Algorithm

According to left hand side


of figure, LGA finds lowest
fitness function (energy)
values first, then maps
these values to their
respective genotypes
Genetic algorithm plus Solis
and Wets local search
Better performance than
either simulated annealing
or genetic algorithm alone

The Application
Milind Misra

HIV-1 Protease and


AHA006

HIV-1 Protease in complex with the


cyclic sulfamide inhibitor, AHA006
Source: Protein Data Bank
Authors: K. Backbro, T. Unge
Exp. Method: X-ray Diffraction (2 res.)
Primary Citation: Backbro et al, J Med
Chem 40 pp. 898 (1997)
Polymer Chains: A, B; Residues: 198;
Atoms: 1632

Protein (HIV-1 Protease)

Ligand
(AHA006)

(Source: PDB)

HIV-1 Protease dimer

(Rasmol)

Initial X-Ray
crystallographic
positions of protein
and ligand

(SYBYL)

Docking Preparation
Ligand
Assign

charges
Define rotatable bonds
Rename aromatic carbons
Merge non-polar hydrogens
Write .pdbq ligand file

Docking Preparation
Protein
Add

essential hydrogens
Load charges
Merge lone-pairs
Add solvation parameters
Write .pdbqs protein file

Docking Preparation Grid


AutoDock

uses
grid-based
docking
Ligand-protein
interaction
energies are precalculated and
then used as a
look-up table

Grid maps are


constructed based
on atoms of
interest in ligand
(here CANOSH)

(AutoDockTools)

Docking Simulated
Annealing

Runs = 100
Cycles = 50
Initial Temp (RT) = 1,000
Temp reduction factor = .95
Linear temperature reduction
Translation reduction factor = 1
Quaternion reduction factor = 1
Torsional reduction factor = 1
# rotatable bonds = 12
Initial coordinates = Random
Initial quaternion = Random
Initial dihedrals = Random
Translation step = 2.0
Quaternion step = 50 deg
Torsion step = 50 deg

Results:
100 different clusters
Energy range: -0.63 to
+64,000
Conformation #81: -0.63
Conformation #67: +20.02
Conformation #68: +10.74

Lowest energy conf not close


to position but similar to
original
Conf #67 closest to position
and conformation of original
ligand; higher energy
Conf #68 close to position but
not conformation of original
ligand; not as high energy

Original ligand conf


SA conformation #67

(SYBYL)

Original ligand conf


SA conformation #67

Close-up of previous

(SYBYL)

Original ligand conf


SA conformation #67

(SYBYL)

100 Clustered SA
Conformations

(gOpenMol)

Docking Genetic
Algorithm

Runs = 50
# Evaluations = 250,000
Population size = 50
Elitism count = 1
Mutation rate = 0.02
Crossover rate = 0.8
Window size = 10
Cauchy alpha = 0
Cauchy beta = 1
# rotatable bonds = 12
Initial coordinates = Random
Initial quaternion = Random
Initial dihedrals = Random
Translation step = 2.0
Quaternion step = 50 deg
Torsion step = 50 deg

Results:
50 different clusters
Energy range: -18.66 to
+86.28
Conformation #39: -18.66
Conformation #9: -10.60

Lowest energy conformation


overall closest to original
ligand conformation
If only 10 runs had been used
instead of 50, then conf #9
would have been the lowest
energy conformation.

Docking Local Search

Runs = 50
Solis-Wets iterations = 300
Consecutive successes = 4
Consecutive failures = 4
Rho = 1
Lower bound on rho = 0.01
LS frequency = 0.06
# rotatable bonds = 12
Initial coordinates = Random
Initial quaternion = Random
Initial dihedrals = Random
Translation step = 2.0
Quaternion step = 50 deg
Torsion step = 50 deg

Results:
18 different clusters
Energy range: +35.92 to
+215,200
Confs #20, 21, 22, 23: +35.92

Lowest energy conformation


was most dissimilar to original
ligand conformation
Better results could have been
obtained by reducing the step
sizes

Docking Lamarckian GA

Runs = 10
Max # Evaluations = 250,000
Max # Generations = 27,000
Population size = 50
Elitism count = 1
Mutation rate = 0.02
Crossover rate = 0.8
Window size = 10
Cauchy alpha = 0
Cauchy beta = 1
Solis-Wets iterations = 300
Consecutive successes = 4
Consecutive failures = 4
Rho = 1
Lower bound on rho = 0.01
LS frequency = 0.06
* Gray options *

Results:
10 different clusters
Energy range: -18.10 to 8.38
Conformation #7: -18.10

Lowest energy conformation


fairly similar to original ligand
conformation
If the number of runs was
restricted to 10 for both GA
and LGA, LGA would have
generated the best structure

Original ligand conf


Best GA conf
Best LGA conf
Best SA conf
Best LS conf

(SYBYL)

Original ligand conf


Best GA conf
Best LGA conf
Best SA conf

(SYBYL)

References
http://cmgm.stanford.edu/biochem218/Projects%201998/Apaydin.pdf
http://www.biz.uiowa.edu/class/6K299_menczer/PPT/Hart/sld018.html
http://cs.felk.cvut.cz/~xobitko/ga/
http://www.bch.msu.edu/labs/kuhn/web/projects/screening/solvation.html
http://wwwcmc.pharm.uu.nl/gillies/thesis/
http://www.chem.uidaho.edu/~honors/boltz.html
S.Kumar et.al. Protein Flexibility and Electrostatic Interactions. IBM Journal of
Research and Development Vol45. No 2001.
G. Morris et.al. Automated Docking Using a Lamarckian Genetic Algorithm and an
Empirical Binding Free Energy Function. Journal of Computational Chemistry, Vol.
19, No. 14, 1639-1662 (1998)
C. Rosin et.al. A Comparison of Global and Local Search Methods in Drug Docking.
UCSD CSE Technical Report #CS97-522 (1997)
C. A. Sotriffer et.al. Automated Docking of Ligands to Antibodies: Methods and
Applications. Methods 20, 280-291 (2000)
M. Vieth et.al. Assessing Search Strategies for Flexible Docking.
Practical Handbook of Genetic Algorithms. Edited by Lance Chambers
An Introduction to Genetic Algorithms. Melanie Mitchell.
Goodsell and Olson Prot. Struct. Func. Genet, 8, 195(1990).
Principals of Biochemistry: Lehninger
R. Durbin, S Eddy, A. Krogh, G. Mitchison Biological sequence analysis
Wm. E. Hart. A Theoretical Comparison of Genetic Algorithms and Simulated
Annealing Sandia National Laboratories, www.cs.sandia.gov/~wehart.

Вам также может понравиться