Академический Документы
Профессиональный Документы
Культура Документы
Homology Modeling
Lausanne, February 22, 2007
Torsten Schwede Biozentrum - Universitt Basel Swiss Institute of Bioinformatics Klingelbergstr 50-70 CH - 4056 Basel, Switzerland Tel: +41-61 267 15 81
http://www.wwpdb.org/
Total Yearly
[ PDB: http://www.pdb.org ]
[ PDB: http://www.pdb.org ]
1,000,000
No experimental structure for most protein sequences
100,000
10,000
TrEMBL
1,000
SwissProt PDB
100 1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
2006
In the near future for most of the known protein sequences no experimental structure will be available.
Many proteins fold spontaneously to their native structure Protein folding is relatively fast (nsec sec) Chaperones speed up folding, but do not alter the structure
The protein sequence contains all information needed to create a correctly folded protein. Can we predict the folding process of a protein structure from their sequences (ab initio)?
Molecular Dynamics
ki = (li li ,0 ) bonds 2 ki + ( i i , 0 ) angles 2
2
Physical time for simulation Typical time-step size Number of MD time steps Atoms in a typical protein and water simulation Approximate number of interactions in force calculation Machine instructions per force calculation Total number of machine instructions Petaflop capacity computer (floating point operations per second)
104 seconds 1015 seconds 1011 32000 109 1000 1023 1 petaflop (1015)
[ PDB: http://www.pdb.org ]
[ http://www.biochem.ucl.ac.uk/bsm/cath_new/ ]
75
50
25
100
Percentage sequence
identity/similarity
80 60 40
Dont Sequence identity implies structural similarity
identity similarity
20 0
know
region .....
50
100
150
200
250
Number of protein folds that occurs in nature is limited. Fold Recognition can be used to:
Identify templates for comparative modeling Assign Protein Function
Further reading: Adam Godzik, "Fold Recognition Methods", in: "Structural Bioinformatics", Bourne & Weissig, Eds.
PDB:
http://www.pdb.org
EBI-MSD
http://www.ebi.ac.uk/msd/
SCOP
http://scop.mrc-lmb.cam.ac.uk/scop/
CATH
http://www.biochem.ucl.ac.uk/bsm/cath_new/
3DPSSM / Phyre
http://www.sbg.bio.ic.ac.uk/servers/3dpssm/ http://www.sbg.bio.ic.ac.uk/~phyre/
GenTHREADER
http://bioinf.cs.ucl.ac.uk/psipred/
FUGUE2
http://www-cryst.bioc.cam.ac.uk/~fugue/prfsearch.html
SAM
http://www.cse.ucsc.edu/research/compbio/HMM-apps/T99query.html
FOLD
http://fold.doe-mbi.ucla.edu/
FFAS/PDBBLAST
http://bioinformatics.burnham-inst.org/
Common core = all residues that can be superposed in 3D For proteins > 60% identical residues, the core contains > 90 % of all residues deviating less than 1.0 .
Similar Sequence
Similar Structure
Homology modeling
= Comparative protein modeling = Knowledge-based modeling Idea: Using experimental 3D-structures of related family members (templates) to calculate a model for a new sequence (target).
Comparative Modeling
Known Structures (Templates)
Target Sequence
Template Selection
Structure modeling
Homology Model(s)
Comparative Modeling
Target Sequence
Template Selection
Structure modeling
Homology Model(s)
Separate into single chains Remove bad structures (models) Create BLASTable database or fold library (profiles, HMMs)
Comparative Modeling
Target Sequence
Template Selection
Template selection: 1. Sequence Similarity / Fold recognition Structure quality (resolution, experimental method) Experimental conditions (ligands and cofactors)
Structure modeling
Homology Model(s)
2.
3.
Comparative Modeling
Target Sequence
Template Selection
Multiple sequence alignment for pairs > 40% identity or Use structural alignment of templates to guide sequence alignment of target or Use separate profiles for template and targets
Structure modeling
Homology Model(s)
Comparative Modeling
Target Sequence
Template Selection
Errors in template selection or alignment result in bad models iterative cycles of alignment, modeling and evaluation Built many models, choose best.
Structure modeling
Homology Model(s)
Comparative Modeling
Target Sequence
Template Selection
I.
Structure modeling
II. Template based fragment assembly Composer (Sybyl, Tripos) SWISS-MODEL III. Satisfaction of spatial restraints Modeller (Insight II, MSI) CPH-Models
Homology Model(s)
I. Manual Modeling
[ http://www.expasy.org/spdbv/ ]
Energy minimization
modeling method will produce unfavorable contacts and bonds Energy minimization is used to
regularize local bond and angle geometry Relax close contacts and geometric strain
extensive energy minimization will move coordinates away from real structure keep it to a minimum SWISS-MODEL is using GROMOS 96 force field for a steepest descent
Homology Modeling
III. Satisfaction of Spatial restraints
M A T E A F
Q S G
M A T E A F
Q S G
p ( x1 x < x 2) =
x2
p( x)dx
with
p( x)dx = 1
p( x) > 0
a) 11 Cys residues Chi-1 angles b) smoothed distribution from a) c) 297 Cys Chi-1 angles as control
Structural rearrangements .
cause problems for template selection and automated evaluation:
Which type of errors in a protein structure can you identify by an empirical force filed? Which type of errors are not recognized?
Statistical Methods
Ramachandran Plot of backbone angles (,)
favored regions generously allowed regions disallowed regions
1D - 3D Checks
Probability for a feature to occur in a given environment, e.g.
Solvent exposed / buried Hydrophobic / polar environment Electrostatic interactions Secondary structure
See: R. Luthy (1992) Assessment of protein models with three-dimensional profiles, Nature, 356(6364):83-5
Met80
*, Met80 +, Ile86
I, Val13
III, Ala182
II, Phe134
Nexpected is the expected number of atomic pairs (i,j) in the same distance shell if there were no interactions between atoms (reference state).
MFP kcal/mol
Methyl-Methyl pairs
Distance
ANOLEA
Correct Structure: PDB: 1GES
PROCHECK
Checks the stereo-chemical quality of a protein structure, producing a number of plots analyzing its overall and residue-by-residue geometry. Covalent geometry Planarity Dihedral angles Chirality Non-bonded interactions Main-chain hydrogen bonds Disulphide bonds Stereochemical parameters Residue-by-residue analysis
Laskowski R A, MacArthur M W, Moss D S & Thornton J M (1993). PROCHECK: a program to check the stereochemical quality of protein structures. J. Appl. Cryst., 26, 283-291. Morris A L, MacArthur M W, Hutchinson E G & Thornton J M (1992). Stereochemical quality of protein structure coordinates. Proteins, 12, 345-364.
WhatCheck / WhatIf
WHAT IF I check my structure? Imagine ... An everyday situation in a biocomputing lab: "Should they use the structure?" An everyday situation in a crystallography lab: "Should they deposit the structure already?" In a WHAT_CHECK report, each reported fact has an assigned severity: error: severe errors encountered during the analyses. Items marked as errors are considered severe problems requiring immediate attention. warning: Either less severe problems or uncommon structural features. These still need special attention. note: Statistical values, plots, or other verbose results of tests and analyses that have been performed.
WHAT IF: A molecular modeling and drug design program. G.Vriend, J. Mol. Graph. (1990) 8, 52-56. Errors in protein structures. R.W.W. Hooft, G. Vriend, C. Sander, E.E. Abola, Nature (1996) 381, 272-272.
RMS Z-scores, should be close to 1.0: Bond lengths : 0.905 Bond angles : 1.476 Omega angle restraints : 0.921 Side chain planarity : 2.681 (loose) Improper dihedral distribution : 1.771 (loose) Inside/Outside distribution : 1.333 (unusual)
whatcheck.txt
All checking tools are happy, so can I believe it now? Models are not experimental facts ! Models can be partially inaccurate or sometimes completely wrong ! A model is a tool that helps to interpret biochemical data.
http://protein.bio.puc.cl/cardex/servers/anolea/ http://swissmodel.expasy.org/anolea/
ProCheck http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html WhatCheck http://www.cmbi.kun.nl/gv/whatcheck/ Verify3D http://www.doe-mbi.ucla.edu/Services/Verify_3D/ Biotech Validation Suite for Protein Structures http://biotech.ebi.ac.uk:8400/
A Model must be wrong, in some respects, else it would be the thing itself. The trick is to see where it is right.
(Henry A. Bent)
Save Zone
Docking of small molecules Drug development; comparable to medium resolution NMR or low resolution X-ray structures
The knowledge of 3-dimensional structures of target proteins allows to undertand interactions of inhibitors and drugs with their target proteins.
Reference: Discovery of a potent and selective protein kinase CK2 inhibitor by high-throughput docking. Vangrevelinghe E, Zimmermann K, Schoepfer J, Portmann R, Fabbro D, Furet P. Oncology Research, Novartis Pharma, Basle, J Med Chem. 2003 Jun 19;46(13):2656-62.
4 2
1 5
3 7
-8
-4
+4
+8
kT/e
100'000
10'000
1'000
100 1986
1988
1990
1992
1994
1996
1998
2000
2002
2004
Structural Genomics
large scale experimental structure solution projects
Goal:
Most of the sequences in a genome database should match at least one structure with a sufficient sequence identity allowing for reliable modeling.
http://www.cbs.dtu.dk/services/CPHmodels/
http://cl.sdsc.edu/hm.html
http://protein.bio.puc.cl/cardex/servers/anolea/ http://swissmodel.expasy.org/anolea/
ProCheck http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html WhatCheck http://www.cmbi.kun.nl/gv/whatcheck/ Verify3D http://www.doe-mbi.ucla.edu/Services/Verify_3D/ Biotech Validation Suite for Protein Structures http://biotech.ebi.ac.uk:8400/