Вы находитесь на странице: 1из 6

“SLIDE”: AN INTERACTIVE THREADING REFINEMENT

TOOL FOR HOMOLOGY MODELING

ANDREI HANGANU1, MARIUS A. MICLUŢA2, BOGDAN-ALEXANDRU POPA1,


LAURENŢIU N. SPIRIDON2, ROBI TĂCUTU3,*
1
University “Politehnica” of Bucharest, Splaiul Independenţei 313, Bucharest, Romania
2
Institute of Biochemistry, Splaiul Independenţei 296, Bucharest, Romania
3
The Shraga Segal Department of Microbiology and Immunology,
Ben-Gurion University of the Negev, POB 653, Beer-Sheva, 84105, Israel
(Received September 14, 2009)

Target to template alignment is the limiting accuracy step in Homology Modeling as


misalignments by only one residue may result in model errors as large as 4 Å. This step becomes even
more critical when sequence identity is lower than 50% and multiple templates are not at hand for
multiple alignment analysis. In these cases alignments may be improved by including structural
information from the template such as by avoiding gaps in secondary structure stretches, in buried
regions or between two residues that are far apart in space and to a moderate extent some newer
methods take such criteria into account.
However, it is always important to edit the alignment by inspecting the superposition of the
target sequence onto the template structure visually, especially if the target-template sequence identity
is low. To this end we have developed SLIDE – a simple tool based on Biojava and Jmol libraries that
allows the interactive threading of the target onto the template, useful for changing the gap location
and target sequence over the template structure and thus improving structurally the alignment by
visual inspection.
Key words: target to template sequence alignment, interactive threading, homology modeling.

INTRODUCTION

Modeling the 3D structure of proteins is at the core of biochemistry and


molecular biology due to its importance in understanding function. Accurate
deterministic protein 3D models result from X-ray crystallography or NMR
spectroscopy (1–3). However, these experiments are tedious due to the large
number of complex steps, and often fail due either to crystallization problems or
size limitations in NMR (4, 5).
*
Corresponding author (E-mail: tacutu@bgu.ac.il)

ROM. J. BIOCHEM., 46, 2, 123–127 (2009)


124 Andrei Hanganu et al. 2

A complementary approach is to use accumulated knowledge and theoretical


tools to predict 3D-structure from sequence. This approach results in probabilistic
rather than deterministic models. Even though, recent validation tests have shown
that for sequence identities over 30%, the accuracy of probabilistic models is
within 2.0 Å rmsd, similar to that of good deterministic models, and even when
going down to remote homology (>17% identity), close to the noise level, one can
generate probabilistic models within ~4Å rmsd, accurately enough to assess the
spatial location of functionally important residues (6–8).
The main problem remains here that, as the homology decreases, the chance
of improper local alignments increases. On the one hand alignments are very
sensitive to the mutation data matrices and the opening gap penalties used to build
the scoring function, and even very closely related blocks substitution matrices
(BLOSUM), usually used for structural alignments (9) can lead to significantly
different results. On the other hand, as the homology decreases whatever matrix
and gap penalty used, the alignment score function becomes more and more
complex with multiple shallow local minima and automatic assignment may lead to
catastrophic faults in the model.
Human decision becomes in this case a critical factor in assessing the overall
structural consequences of a given local alignment and an in depth analysis has to
be carried out by incorporating complementary information such as secondary
structure prediction and as many as possible physico-chemical and bioinformatics
propensity profiles of the target sequence – hydrophobicity, accessibility, charge
distribution, etc. or any other opportunity criteria. For such an analysis the 3D
inspection with real time interactive intervention functions on the various
alignments proposed by advanced alignment programs is very useful in deciding on
a shortlist of alignments that should be retained for starting the further proper
modeling procedures. To cover this important prerequisite step in homology and
remote homology modeling we have developed SLIDE, a program that allows the
interactive threading of a target sequence onto a given template.

METHODS

The SLIDE software was developed for Java virtual machines and written in
Java. SLIDE is based mainly on BioJava (10) and JmolViewer (11) libraries.
Biojava is an open source founded by Thomas Down and Matthew Pocock as an
API for use in bioinformatics. This project is developed by members and coordinated
by the Open Bioinformatics Foundation (OBF) and consists in a collection of libraries
connecting APIs such as aminoacids and nucleotide alphabets, BLAST interface, I/O
operations, modules for dynamic programming, genetic algorithms, statistical analysis,
graphical interfaces, raw data transformation for DB import, etc.
3 “SLIDE” for interactive threading 125

JmolViewer, on the other hand, is a collection of libraries designed to


visualize biological macromolecules, which are easily integrated into Java applications.
JmolViewer allows atom subset selection, various protein visualization modes –
CPK, trace, wireframe, etc, various color schemes, batch mode instruction console,
etc. In SLIDE, BioJava libraries were used to process sequence and protein
structure information and Jmol was mainly used to build the protein structure
viewer module.

RESULTS AND DISCUSSION

SLIDE allows the interactive refinement of an alignment and real-time


threading, due to the fact that it combines alignment functions such as opening,
extending and moving gaps with the real-time transfer of coordinates into the 3D
structure, which results in real-time display of the sequence threading change into
the three-dimensional space.
The program incorporates the following modules in the main window:
a) the alignment panel;
b) the protein structure viewer, based on Jmol libraries;
c) the control panel;
In the alignment panel, the amino acids of the two sequence strings are coded
by standard one letter and dash for gaps. In addition, amino acid letters are colored
using the following code: yellow – bulky hydrophobic, white – small hydrophobic,
cyan – polar, red – acidic, blue – basic, light magenta – cysteine, dark magenta –
proline. The panel is designed to allow sub-string sliding in both rows and the
following functions are provided: on-click substring selection, on-click gap
opening and closing, on-click gap extension and reduction, on-click gap sliding.
The alignment operations are synchronized with the visual instructions in the
protein structure viewer. For example, the amino acids color code of the target
sequence is projected onto the 3D structure of the template in the structure viewer,
and any sliding of substrings in the alignment window commands the sliding of
that substring onto the 3D structure in the structure viewer.
The control panel groups the various menus, options and settings that may be
used for example file opening, structure viewing options, re-coloring, etc.
SLIDE shows very effective in assessing structurally the alignments obtained
in moderate and remote homology situations. We present here its application in the
identification of a disulphide bond left undetected by any advanced alignment
procedures we have used in modeling the D2 domain of expansin B1 protein from
Globodera rostochiensis, a plant parasite nematode, similar to plant expansins (12). For
example, the best alignment was obtained with ClustalX, using BLOSUM62. However,
126

Fig. 1. – Alignment between D2 domain of Gr-EXPB1 and P. pratense expansine (1n10) automatically generated by MULTALIN program with
BLOSUM62 matrix. The two marked cysteines should form a disulfide bond. The identity is highlighted in green, the similarity is in yellow and in
magenta are shown the deletions and insertions from the vicinity of the Cys-Cys bond.
Andrei Hanganu et al.

A B
Fig. 2. – The SLIDE interface loaded with the target (D2 domain of expansin B1) and template (1n10): A) Amino acids positions according to the
automatically generated alignment by MULTALIN. The disulfide bridge is labeled by the white dashed line in the structure viewer. The distance
between the two cysteines and their orientations do not allow a disulfide bridge to form. B) The alignment refined with SLIDE. The two gaps in the
4

alignment window are shortened by one position in order to allow the disulfide bridge to form.
5 “SLIDE” for interactive threading 127

due to the very high value of a Cys-Cys match in this substitution matrix, the program
forces such a match in the case of Cysteine 72 (C72) of the sequence and the local best
score is obtained with the cost of extending by one position both the deletion opened
further upstream and the insertion opened further downstream the sequence (Fig. 1).
However, when inspecting visually the 3D mapping of this alignment with SLIDE one
easily sees that the lateral chain of C67, which is in principle close in space, becomes
too distant to C72 for forming a disulphide bridge (Fig. 2a). By sliding the target
subsequence such that both the upstream deletion and downstream insertion reduce by
one position to the cost of destroying the sequence Cys-Cys match (replaced by a
C-T alignment), one obtains a spatial match between C67-C72 in the 3D real world
(Fig. 2b), which obviously is the correct structural solution.
The program will be improved by adding a real-time free energy estimation
in addition to visual inspection. These two features will provide a new way to
approach alignment refining for homology modeling.

Acknowledgements: Marius A. Micluţa and Laurenţiu N. Spiridon acknowledge CNCSIS grant


PN2-ID-249 168/2007 for this work.

REFERENCES

1. Blundell T.L., Johnson L.N., Protein Crystallography, Academic Press, New York, 1976.
2. Blow D., Outline of Crystallography for Biologists, Oxford University Press, 2002, pp. 83–102.
3. Wuthrich K., Connick R.E., Wutrick K., NMR In Structural Biology, in: World Scientific Series
In 20th Century Chemistry, 5, Wuthrich K., ed, World Scientific, Singapore, 1995, pp. 3–36.
4. Acharya K.R., Lloyd M.D., The advantages and limitations of protein crystal structures, Trends
Pharmacol. Sci., 26, 10–14 (2005).
5. Marco E., Gago F., Overcoming the inadequacies or limitations of experimental structures as
drug targets by using computational modeling tools and molecular dynamics simulations, Chem.
Med. Chem., 2, 1388–401 (2007).
6. Milac A.L., Petrescu A.-J., Protein Structure Prediction (III) Ab initio Methods, Rom. J. Biochem.,
40, 109–123 (2003).
7. Milac A.L., Petrescu A.-J., Protein Structure Prediction (II) Remote Homology Modeling, Rom.
J. Biochem., 39, 101–116 (2002).
8. Milac A.L., Petrescu A.-J., Protein Structure Prediction (I) Homology Modeling, Rom. J. Biochem.,
38, 249–275 (2001).
9. Henikoff S., Henikoff J.G., Amino acid substitution matrices from protein blocks, Proc. Natl.
Acad. Sci. USA, 89, 10915–10919 (1992).
10. Holland R.C.G., Down T., Pocock M., Prlic A., Huen D., James K., Foisy S., Drager A., Yates
A., Heuer M., Schreiber M.J., BioJava: an Open-Source Framework for Bioinformatics,
Bioinformatics, 24, 2096–2097 (2008).
11. Jmol: an open-source Java viewer for chemical structures in 3D, http://www.jmol.org
12. Kudla U., Qin L., Milac A., Kielak A., Maissen C., Overmars H., Popeijus H., Roze E., Petrescu A.,
Smant G., Bakker J., Helder J., Origin, distribution and 3D-modeling of Gr-EXPB1, an expansin from
the potato cyst nematode Globodera rostochiensis, FEBS Letters, 579, 2451–2457 (2005).
128 Andrei Hanganu et al. 6