Академический Документы
Профессиональный Документы
Культура Документы
Igor N. Berezovsky,
Computational Biology Unit, Bergen Center for Computational Science, University of Bergen, Bergen, Norway
doi: 10.1002/9780470048672.wecb638
Advanced Article
Article Contents
Basics of Protein and DNA Thermostability Physics and Evolution of Thermophilic Adaptation Genomics/Proteomics of Thermophilic Adaptation Minimalist Physical Model of Protein Thermostability Conclusions
Adaptation to different environmental temperatures establishes specic requirements on the stability of DNA and protein macromolecules. Organismal strategies of thermophilic adaptation, structure- and sequence-based, and their physical origins provide a consistent picture of the evolution of protein thermostability. A strong correlation between the optimal growth temperature (OGT) and the frequency of ApG dinucleotides in both sense and antisense strands of genomic DNA along with the absence of any thermophilic bias in the nucleotide composition highlights a key role of base stacking in the thermostabilization of the DNA double helix. The codon bias provides an excess of ApG pairs, which ensures the thermophilic adaptation of genomic DNA. The concentration of seven amino acids, Ile, Val, Tyr, Trp, Arg, Glu, Leu (IVYWREL), serves as a universal proteomic predictor of the OGT prokaryotes. The IVYWREL combination manifests a generic thermophilic trend in amino acid composition: the increase of hydrophobic and charged residues at the expense of polar ones. This so-called from both ends of the hydrophobicity scale trend is a result of the positive (stabilizing the native state) and the negative (destabilizing misfolded conformations) components of protein design. The pressure to preserve energies of important native and non-native contacts results in a correlation in mutations of amino acid residues involved into these contacts. A comparison of energy (MyiazawaJernigan potential) and substitution (BLOSUM62) matrices reveals a high rate of substitutions between amino acids that strongly attract each other (native contacts) and between residues that strongly repel each other (non-native contacts).
What makes thermophilic adaptation so attractive for researchers from the very beginning of protein and DNA molecular studies (1, 2)? Although life exists in different extreme conditions, such as temperature, pressure, salinity, radioactivity (3), the adaptation to extreme temperatures is an outstanding phenomenon. Indeed, organisms belonging to one level of organization, prokaryotes, thrive under environmental temperatures that cover the entire range from 10 to +110 C, one third of the absolute temperature interval. A signicant difference in the optimal growth temperature of prokaryotes results in a distinct stability of their proteins and DNA, which makes them a central subject in the studies of mechanisms of molecular adaptation.
The thermostabilization of biomolecules is a result of the mutual contribution from fundamental interactions [e.g., hydrophobic forces (4, 5) or ionic interactions (3, 6, 7)] that stabilize individual molecules and prevent their aggregation (6), structure modications [such as DNA superhelicity (8, 9) and posttranslational modication of proteins], interactions with an environment (10), intermolecular interactions (11), and oligomerization (12). The possible dependence of fundamental interactions, for example, hydrophobic forces, on temperature may also affect stability. However, it remains a subject of controversy as to how and to what extent the dependence of the interaction strength on temperature should be taken into account (1316). This article reviews the very basic level of protein and DNA thermostability, 1
WILEY ENCYCLOPEDIA OF CHEMICAL BIOLOGY 2008, John Wiley & Sons, Inc.
design concept is based on using previously known stabilizing factors. The limited predictive power of the rational design concept prompts one to test all potentially thermostabilizing mutations by using site-directed mutagenesis. The second approach is a directed evolution approach. Selective pressure or screening for a desired trait applied after random mutagenesis and/or DNA shufing provides another possibility for engineering protein stability. Limited sequence space, amenable to testing in directed evolution, makes it necessary to eliminate more effectively the neutral and deleterious mutations, to increase the number of recombination events, and to improve the selection tools. Third, the consensus concept is based on the assumption that consensus amino acid contributes more to the stability of the protein chain than the nonconsensus amino acid at a given position in the alignment of the amino acid sequences.
WILEY ENCYCLOPEDIA OF CHEMICAL BIOLOGY 2008, John Wiley & Sons, Inc.
Table 1 Important features of structure-based and sequence-based strategies of thermophilic adaptation Strategy of adaptation Important features Structure-based Enhanced packing of the structure Compactness Advantage(s) No or minimal demands on sequence specicity Robust under a wide range of conditions Less adaptable to changes of environment Sequence-based Small number of strong strategically placed interactions Bulk of the structure is not changed Provides fast adaptation
Disadvantage(s)
interactions that do not signicantly alter the protein structure. Therefore, just a few strategic substitutions in the sequence can lead to a signicant stabilization of the existing structure through the formation of several strong interactions specic to certain demands of the environment. These staples work locally, which leaves the bulk of the structure and its compactness unchanged. A possible disadvantage to this mechanism, however, also exists. Sequence-based stabilization may not be robust because it is tailored typically to a specic and narrow range of environmental conditions.
additionally by the high-throughput analysis of major folds (43). The Van der Waals contact density in hyperthermophilic archaea Pyrococcus is higher than in hyperthermophilic (T. maritima and A. aeolicus ) or mesophilic (E. coli ) bacteria. It indicates that on the organismal level, archaea used a structure-based mechanism and developed a respective strategy of thermophilic adaptation. What evolutionary scenario can one imagine for the emergence of another, sequence-based, strategy of adaptation? When mesophilic organisms recolonized in a hot environment, it was necessary to nd a fast and effective way of tuning protein stability. To increase the stability of the protein without a redesign of the whole structure is possible via making sequence substitutions that introduce staples, a restricted set of a strong specic interaction (e.g., ion pairs). Hyperthermophilic bacteria (T. maritima and A. aeolicus ), which recolonized in hot conditions, exemplify a sequence-based strategy. A high-throughput comparison of T. maritima and A. aeolicus proteomes with those of hyperthermophilic archaea shows the crucial role of sequence-based strategy in achieving the thermostability of proteins in hyperthermophilic bacteria (43). An analysis of the phylogenetic relationships between hyperythermophilic archaea and bacteria provides additional evidence for different organismal strategies of adaptation. 24% and 16% of the genes of T. maritima and A. aeolicus , respectively, were transferred to bacteria via lateral gene transfer (LGT) from archaea (49, 50), and corresponding bacterial proteins are the most similar to those of archaea. The importance of LGT in specic biochemical and environmental adaptations was demonstrated undoubtedly by the comparison of complete genomes, codon analysis within genomes, and phylogenetic trees based on single gene families (see Reference 51 and references therein). Alternatively, it may be problematic to assess the relative contributions of LGT and vertical inheritance. For example, T. maritima and A. aeolicus belong to two lineages (Thermotogales and Aquicales ) believed to have diverged earliest from the rest of bacteria. Therefore, it is possible that T. maritima and A. aeolicus retained ancestral genes and share some primitive features with archaea, whereas these genes were lost in the rest of the bacterial species. However, regardless of the 3
WILEY ENCYCLOPEDIA OF CHEMICAL BIOLOGY 2008, John Wiley & Sons, Inc.
scenario working in Thermotoga and Aquifex (genes are received via LGT or, alternatively, are descendants of retained ancestral ones), the so-called archaeal parts of their genomes are reective of the hyperthermophilic lifestyle and the distant evolutionary past (51). In particular, the archaeal parts of the above bacterial proteomes (extracted according to the listing in the taxonomic distributions of the homolog TaxMap, available at http://www.ncbi.nlm.nih.gov ) exhibit compositional features typical for structure-based strategy, whereas the bacterial parts follow a sequence-based strategy of thermophilic adaptation (43). Later events in protein evolution affected structures/sequences of both archaeal and bacterial species which combine strategies of adaptation (52) or use complementary mechanisms of stability (53).
charged residues is almost entirely because of lysines (34) (see Table 2). If only the total content of arginine (Arg) plus lysine (Lys) residues would matter in determining the stability of hyperthermophilic proteins, then no preference for the Lys over the Arg should exist. Arg and Lys are similar residues by their physical and chemical features; both residues are charged and have the same maximal number [81] of possible rotamers. An examination of the substitutions of types Arg/Lys versus Lys/Arg in the alignments of mesophilic sequences versus hyperthermophilic ones (Fig. 1) sheds light on the relationship between Arg and Lys content. The number before the slash (Table 3) is the percentage of amino acid residues in the mesophilic sequence, for example, Arg that was replaced by the other amino acid in the hyperthermophilic sequence, for example, Lys. The number after the slash reects the same data for the opposite replacement, for example, Lys in the mesophilic sequence replaced by Arg in the hyperthermophilic sequence. Numbers in parenthesis show the ratio of forward to backward substitutions. The control groups are pairs Leu/Ile and Ser/Thr. Residues in each pair possess similar physical and chemical features (Leu/Ile are hydrophobic; Ser/Thrare polar), and both have the same maximal number of possible rotamers (9 and 3, respectively). In nine hyperthermophilic organisms, the pairs RK/KR demonstrate a remarkable bias toward the replacement of arginine in the mesophilic sequence with lysine in the hyperthermophilic sequence (up to almost four times in N. equitans ). In all alignments of E. coli sequences against those from a hyperthermophilic genome, numbers of residues substitutions in control pairs Leu/Ile and Ser/Thr are equal or very similar. The exceptions are pairs LI/IL and RK/KR in A. pernix and M. kandleri , which show bias in the opposite (hyperthermo-to-meso) direction, perhaps, as a consequence of high GC content (53). The above observation challenges the idea that arginine and lysine play the same role in thermostability (34) and hints to the specic role of lysine in protein stabilization. The complementary all-atom unfolding simulations show that lysines have a much greater number of accessible rotamers than arginines of similar degree of burial in folded states of proteins (53). Signicant residual dynamics of lysine in folded states of proteins make the entropic cost to fold lysine-rich proteins less favorable compared with arginine-rich ones. The arginine-to-lysine replacement stabilizes the folded state, preserving, however, the charged nature of the substitution
Table 2 Percentage of charged amino acids and (G + C) content of 10 hyperthermophilic archaea (A), 2 hyperthermophilic bacteria (B), and mesophilic bacteria E. coli . A strong prevalence of lysine over arginine in proteomes of hyperthermophiles is obtained for nine organisms. A bold font marks the exception from the general trend EC Lys (K) Arg (R) Gln (E) Asp (D) G + C content Life kingdom 4.4 5.5 5.8 5.1 50.8 B AA 9.4 4.9 9.6 4.3 43.5 B AF 6.9 5.8 8.9 5.8 48.6 A MJ 10.3 3.9 8.6 5.5 31.4 A NE 10.8 3.9 7.9 5.0 31.6 A PA 7.8 5.7 8.9 4.6 44.7 A PH 8.0 5.6 8.7 4.4 41.9 A PF 8.1 5.3 8.9 4.4 40.8 A ST 8.0 4.2 7.0 4.6 32.8 A TM 7.6 5.5 8.9 5.0 46.2 B AP 4.0 7.8 7.3 4.2 56.3 A MK 4.0 8.3 10.0 5.8 62.1 A
A, archaea; AA, A. aeolicus ; AF , A. fulgidus ; AP , A. pernix ; B , bacteria; EC , E. coli; MJ , M. jannaschii ; MK , M. kandleri; NE , N. equitans ; PA, P. abyssi ; PH , P. horikoschii ; PF , P. furiosis ; ST , S. tokodaii ; TM , T. maritima .
WILEY ENCYCLOPEDIA OF CHEMICAL BIOLOGY 2008, John Wiley & Sons, Inc.
Meso-
XRXXXXXXXXXKXXXXXXXXRXXXXKRXRXXX
Hyperthermo-
XKXXXXXXXXXRXXXXXXXXKXXXXRKXKXXX
Figure 1 Scheme of the pair wise alignment of mesophilic versus hyperthermophilic coding sequences. Only extended segments of alignments were considered (length 45 residues or larger) with gaps less than 3 residues and high sequence similarity (e = 0.05).
position. Positively charged residues, therefore, are the choice of nature for the evolutionary optimization of hyperthermostable proteins via entropic mechanism.
(j )
the values of F(j) and the optimal growth temperature (OGT) of the organism allows us to determine the best predictor of OGT. For 86 complete proteomes of prokaryotes thriving under temperatures from 10 to +110 C, the combination of Ile, Val, Tyr, Trp, Arg, Glu, and Leu (IVYWREL) gives the highest correlation coefcient between the fraction (FIVYWREL ) in the proteome and OGT of the organism. The correlation coefcient R is 0.930, and the quantitative relationship between the OGT (in degrees Celsius) and fraction F of IVYWREL amino acids reads Topt = 937F -335. The accuracy of Topt prediction (root-mean-square
deviation) is 8.9 C. Additional analysis of thermostability predictors of major protein folds shows that they are very similar to universal combination IVYWREL. The correlation coefcient between IVYWREL content in sequences of the most abundant protein folds and OGT is very high, for example, / barrel (R = 0.87), barrel (0.87), Rossman fold (0.86), and bundle (0.82). However, if complementary mechanisms of stabilization are invoked, such as heme and metal binding (globin, cytochrome C, ferredoxin) or SS bridges (lysozyme), then the correlation coefcient is signicantly lower: 0.53, 0.44, 0.45, and 0.5, respectively. Thermostability predictors for two major types of membrane proteins, -helical bundle and -barrel, reveal the low slope of the correlation FIVYWREL with OGT in the former and the CVYP predictor in the latter. This result suggests that thermal adaptation in membrane proteins is governed by different rules than in globular ones, in particular, the stability and folding of membrane proteins are affected by the interactions with the lipid bilayer (55). Various control tests show the statistical signicance and robustness of the IVYWREL predictor (32). The IVYWREL fraction is a better predictor of thermostability than fractions of charged Asp, Glu, Lys, and Arg (DEKR) or hydrophobic Ile, Val, Trp, and Leu (IVWL) amino acids, which predict OGT with the accuracy 21 and 16.8 C, respectively. Thus, both hydrophobic and charged residues are important for achieving thermostabilization, contrary to earlier beliefs that only hydrophobic or charged residues are major determinants of thermostability. In addition to IVYWREL amino acids, several residues exist that are favorable for thermostabilization. For example, the addition of Met (M) or a combination of Phe and Pro (F,P) results in the correlation coefcient 0.921 and 0.917 for the predictor, and the substitution of Trp (W) into His (H) or the substitution of pair Trp, Arg (W,R) into Gly, Pro (G,P) gives the correlation coefcient 0.914 and 0.902, respectively. Importantly, Ala (A) and Gln (Q) are extremely disadvantageous for thermostabilization. If Ala (A) or the combination Ala, Gln (A,Q) are added to IVYWREL it practically destroys a predictor (R = 0.47 and 0.24). The same situation is observed
Table 3 Percentage of the forward/backward replacements in alignments of hyperthermophilic genomes against mesophilic E. coli . A bold font shows two hyperthrmophilic organisms without the general trend of arginine-to-lysine replacement Hyperthermophilic genome A. aeolicus A. fulgidus M. jannaschii N. equitans P. abyssi P. horikoschii P. furiosis S. tokodaii T. maritima A. pernix M. kandleri RK/KR 20.0/8.1 (2.47) 14.5/10.6 (1.37) 22.4/6.0 (3.73) 23.7/6 (3.95) 16.3/10.0 (1.63) 16.7/9.6 (1.74) 16.5/9.9 (1.67) 18.2/7.4 (2.46) 16.3/9.5 (1.72) 8.1/15.6 (0.52) 8.1/15.8 (0.51) LI/IL 14.2/19.3 (0.74) 14.4/17.5 (0.83) 20/16.7 (1.2) 19.5/19 (1.03) 16.1/18.3 (0.88) 16.7/18.3 (0.92) 16.3/18.2 (0.90) 18.5/17.8 (1.04) 14.2/18.3 (0.78) 10.7/20.9 (0.52) 9. 8/19.1 (0.51) TS/ST 7.5/6.8 8.4/6.6 7.0/6.5 6.8/6.8 7.7/7.0 8.1/7.2 7.8/7.5 9.8/7.3 8.6/7.7 9.8/7.2 6.6/7.0 (1.10) (1.27) (1.08) (1.00) (1.10) (1.13) (1.04) (1.34) (1.17) (1.36) (0.95)
A, archaea; AA, A. aeolicus ; AF , A. fulgidus ; AP , A. pernix ; B , bacteria; EC , E. coli; MJ , M. jannaschii ; MK , M. kandleri; NE , N. equitans ; PA, P. abyssi ; PH , P. horikoschii ; PF , P. furiosis ; ST , S. tokodaii ; TM , T. maritima .
WILEY ENCYCLOPEDIA OF CHEMICAL BIOLOGY 2008, John Wiley & Sons, Inc.
when Ala (A) or a combination Ala, Gln (A,Q) replaces Glu (E) or a combination Val, Glu (V,E): The correlation coefcient R is 0.18 and 0.23, respectively. Finally, the fundamental question in thermophilic adaptation is a relationship between the amino acid composition of proteins and the nucleotide composition of coding DNA sequences (3941). The availability of prokaryotic complete genomes, which consist mostly of coding DNA (on average 85% of the total genome size), claries a relationship between the thermophilic adaptation of protein and DNA. If the IVYWREL predictor depends on nucleotide composition only, it must remain the same after the reshufing of coding sequences given a nucleotide composition. However, the reshufing of nucleotide sequences results in a nonIVYWREL thermostability predictor (32). Therefore, amino acid composition and thermal adaptation of proteins are not affected by the nucleotide composition of DNA sequences. The amino acid composition of the proteome, on the contrary, introduces a bias in the purine loading (A + G content) of nucleotide sequences. Indeed, purine loading of coding sequences reversely translated from protein sequences without codon bias, for example, by using synonymous codons with equal probabilities, is very close to a natural nucleotide sequence. The correlation coefcient between the purine loading and OGT is 0.48 and 0.6 in sequences without codon bias and natural ones, respectively (32).
correlation in the nucleotide sequences does not depend on sequence correlations in amino acid sequences because removing an effect of the codon interface does not destroy a correlation between cAG /cCT and OGT, R = 0.736/0.574 (Fig. 2, middle row). Second, the correlation in DNA sequences stems from the neighboring nucleotides within a codon. Indeed, removing the natural codon bias results in eliminating the correlation between cAG /cCT and OGT, R = 0.177/0.216 (Fig. 2, bottom row). Thus, the codon bias establishes an excessive use of codons that contain successive ApG ans CpT pairs, which is manifested in the correlation of cAG /cCT with the optimal growth temperature. The above sequence correlations in the coding strand of DNA sequences point to base stacking as a major factor of DNA thermostabilization. ApG dinucleotides have a low energy characteristic for a purinepurine stacking. The cCT correlation also shows, although indirectly, the role of stacking in themostabilization. Indeed, the abundance of CpT pairs in the sense strand points to the equal enrichment of the antisense strand with ApG pairs because of the opposite directionality of sense and antisense strands of DNA. Therefore, the thermostabilization of double-stranded DNA is based on the stacking interactions provided by ApG pairs that are spread in different locations of both sense and antisense strands. This picture holds also for the whole DNA of prokaryotes, including its coding and noncoding parts. Therefore, in the scenario of thermophilic adaptation of double-stranded DNA, the stacking interactions play a major role. The codon bias provides an increase in the number of ApG dinucleotides with OGT in both sense and antisense strands of the DNA double helix. The necessity for ApG pairs can be explained by their low free energies of stacking obtained both theoretically (56) and experimentally (57). First, the study of the free energy contribution to the nucleic base stacking in aqueous solution shows that the free energy of stacking in order of decreasing stability follows the order purinepurine>> purinepyrimidine>pyrimidinepurine>pyrimidinepyrimidine in general, and the free energy of ApG stacking is one of the lowest in particular (56). Second, the experimental study on the coaxial stacking contribution to the stabilization of gel-immobilized duplexes reveals that adenine stacking with other bases is signicantly stronger than the stacking of other bases (57). The reasons for the discrepancy between the latter and the parameters of duplex stability obtained in the nearest neighbor approximation are yet to be explored (see Reference 27 and References 1619 therein). Recent experimental efforts also corroborate a major role of the base stacking (31) in DNA thermostability and the independence of the latter on GC base pairing (1). In particular, DNA stacking parameters are determined directly (31) for the temperatures from below room temperature to close to melting temperature and for the salt concentrations from 15 to 100 mM Na+ . It seems that base stacking is the main stabilizing factor in the double-stranded DNA that determines the temperature and the salt dependence of DNA stability parameters. The AT pairing is always destabilizing, and GC pairing contributes almost no stabilization (31). It is important to note that base stacking interactions always are stabilizing for both AT- and GC-containing contacts in double-stranded DNA. Bioinformatics studies display the importance of stacking
WILEY ENCYCLOPEDIA OF CHEMICAL BIOLOGY 2008, John Wiley & Sons, Inc.
Sequence description Real amino acid sequence Real codon bias Randomized a. a. sequence Real codon bias Real amino acid sequence No codon bias
Amino acid and corresponding nucleotide sequences Nucleotide Correlation pair coefficient Lys.Tyr.Pro.Val.Leu.Val.Arg.Phe.Leu 3 AAG.TAT.CCT.GTT.TTA.GTA.AGA.TTC.CTC 5 5 TTC.ATA.GGA.CAA.AAT.CAT.TCT.AAG. GAG 3 Val.Lys.Pro.Tyr.Val.Phe.Leu.Arg.Leu 3 GTA.AAG.CCT.TAT.GTT.TTC.CTC.AGA.TTA 5 5 CAT.TTC.GGA.ATA.CAA.AAG.GAG.TCT.AAT 3
ApG CpT
0.680 0.601
ApG CpT
0.736 0.574
0.177 0.216
Figure 2 Base stacking provided by the correlations in nucleotide sequences is the major mechanism of DNA thermostability. Upper row . Real amino acid sequence and original codon bias. Middle row . The effect of codon interface is removed through the reshufing of protein sequences while retaining the actual codons used for each amino acid. Bottom row . Codon bias in natural protein sequences is removed by using synonymous codons with equal probabilities. ApG and CpT pairs in the sense strand and ApG pairs in the antisense strand of DNA are underlined if they are located inside one codon. For example (upper row), the rst ApG pair in the sense strand is in the Lys codon, whereas the second ApG pair is on the border between the codons of Leu and Val.
by showing the independence of the DNA thermostability on (G + C) content (32, 3941) and by illuminating a specic role of ApG stacking in the thermostability of the DNA double helix via a consideration of pair-wise nearest-neighbor correlations (32) or a regression analysis of the dinucleotide composition of genomic DNA (41).
Temperature
Negative design
Energy
Figure 3 The widening of the energy gap between the native state and the misfolded structures during an increase in the environmental temperature. Positive design provides the lowering of the native state energy, whereas negative design contributes to the increase of the energy of misfolded structures.
so-called P-design (60, 61)] exists that maximizes the Boltzmann probability Pnat of being in the lowest energy (native e E0 /Tenv , where E0 is state) conformation, Pnat (Tenv ) = 103345 the lowest energy among all conformations and Tenv is the environmental temperature. It takes the environmental temperature Tenv as an input physical parameter, introduces mutations in the amino acid sequence, and accepts or rejects them according to the Metropolis criterion. As a result, this procedure designs proteins stable at given Tenv . The stability of designed proteins is characterized by their melting temperature Tmelt that can be found numerically from the condition Pnat (Tmelt ) = 0.5. 7
i =0
e Ei /Tenv
WILEY ENCYCLOPEDIA OF CHEMICAL BIOLOGY 2008, John Wiley & Sons, Inc.
amino acids forming such contacts should mutate in a correlated way. For example, correlated mutations may occur as swaps to keep specic attractive native and repulsive non-native interactions (see Fig. 8 in Reference 62). This scenario invokes a peculiar dependence between the amino acid substitution rates [e.g., BLOSUM matrices (63)] and the interaction energy between corresponding amino acid residues [e.g., the MiyazawaJernigan quasi-chemical potential (64)]. Frequent substitutions are expected between amino acids that strongly attract each other (to preserve specic stabilizing native contacts) and between amino acids that strongly repel each other (to preserve specic non-native repulsive contacts). The dependence of elements of substitution matrix BLOSUM (62, 63) for 190 pairs of amino acids (synonymous substitutions are excluded) versus their interaction energy as approximated by the knowledge-based MiyazawaJernigan potential (64) has a nonmonotonic nature (Fig. 4, top chart; the dependence is highlighted by the parabolic t). The most frequent substitutions are observed between the most attractive and most repulsive amino acids. A blow-up of the right top part of Fig. 4 (bottom chart) shows that along with conserved substitutions that reect a positive design (arginine to lysine and glutamic acid to aspartic acid substitutions), frequent substitutions exist between mutually repulsive amino acids with vastly different physical-chemical properties and encoded by very dissimilar amino codons, such as glutamine to arginine, serine to asparagine, and so forth (Fig. 4; bottom chart). The high frequencies of substitutions between residues that strongly repeal each other explain the correlated mutations observed between the residues that are distant in the native structure (62). These residues may form important repulsive contacts, which increase the energies of the misfolded conformations (see Fig. 10 in Reference 62). Whereas a positive design is used widely in experiments (65), the big challenge in using the negative design originates from the difculties in the modeling of relevant misfolded conformations (66). Nevertheless, charged residues were used effectively in negative design (65, 66). Site-directed mutagenesis provides other, although indirect, evidence of the contribution of charged residues to negative design: The mutation of polar groups to charged ones on the protein surface leads to structure stabilization even in the absence of the salt-bridge partners of the mutated group. It also has been shown (6770) that surface electrostatic interactions provide a marginal contribution to stability of the native structure; hence, the possible importance of charged amino acids is in making unfavorable high-energy contacts in misfolded structures. In the case of thermophilic adaptation, positive and negative components of design work concomitantly and provide stabilization of the structure via an opening of the energy gap from both sides: A decreasing energy of the native state and, at the same time, an increasing energy of misfolded conformations can both exist.
Positive and Negative Design in Evolution and Thermal Adaptation of Natural Proteins
The requirement to preserve energy of key contacts in multiples sequences that fold into the same structure implies that 8
Conclusions
A deep understanding of the physical mechanisms and the evolution of thermophilic adaptation is crucial for the engineering and design of biologic catalysts with desired stability (20). This
WILEY ENCYCLOPEDIA OF CHEMICAL BIOLOGY 2008, John Wiley & Sons, Inc.
Positive design
3 8 1.2
4 Element of MJ matrix
RK
DE
0.6 0.4
NH
AS QR EK SN QH
0.2 0 0.2
Negative design
TN GS NQ SD SQ SE NE QD NK SK
1.5
0.5
Figure 4 The dependence of the elements of the BLOSUM62 substitution matrix on the interaction energies between amino acid residues (approximated by the MiyazawaJernigan parameter). Top chart . Only nonsynonymous substitutions are presented. The curved line represents the parabolic t to highlight the nonmonotonic nature of the plot. Bottom chart . Blowup of the right upper corner of the top chart. Amino acid pairs are labeled, and pairs of amino acids that can contribute to positive and negative components of design are shown.
knowledge also is important for establishing a trade-off between the stability and exibility in a directed evolution of protein function (66, 71). Current predictors of the stability effects of protein mutations are based on empirical potentials that are calibrated to t experimentally observed G values (20, 72, 73). Although predictions of Gs during mutation in the native state are in a good agreement with experimentally observed ones, they lack the effect of mutations on misfolded conformations, the structure-dependence of mutation effects (37, 38), and the dependence of mutations on the evolutionary strategy of thermophilic adaptation (43). Recent computational studies of thermophilic adaptation described in this article make use of genomic/proteomic data (32, 43, 53, 62), simulations of model lattice proteins (62), and off-lattice all-atom simulations of natural proteins (43, 53). High-throughput analysis reveals signals of novel mechanisms of protein [entropic mechanism (53)] and DNA [purinepurine base stacking (32)] thermostability and urges us to consider
what evolutionary strategy was followed in the process of thermal adaptation (43). Proteomic analysis and simulations of thermophilic adaptation also demonstrate that negative design necessarily should be taken into account to properly predict the effect of protein mutations (62).
References
1. Marmur J, Doty P. Determination of the base composition of deoxyribonucleic acid from its thermal denaturation temperature. J. Mol. Biol. 1962;5:109118. Perutz MF, Raidt H. Stereochemical basis of heat stability in bacterial ferredoxins and in haemoglobin A2. Nature 1975;255:256 259. Jaenicke R, Bohm G. The stability of proteins in extreme environments. Curr. Opin. Struct. Biol. 1998;8:738748. Gromiha MM, Oobatake M, Sarai A. Important amino acid properties for enhanced thermostability from mesophilic to thermophilic proteins. Biophys. Chem. 1999;82:5167.
2.
3. 4.
WILEY ENCYCLOPEDIA OF CHEMICAL BIOLOGY 2008, John Wiley & Sons, Inc.
5. Szilagyi A, Zavodszky P. Structural differences between mesophilic, moderately thermophilic and extremely thermophilic protein subunits: results of a comprehensive survey. Structure 2000;8:493504. 6. Greaves RB, Warwicker J. Mechanisms for stabilisation and the maintenance of solubility in proteins from thermophiles. BMC Struct. Biol. 2007;7:18. 7. Xiao L, Honig B. Electrostatic contributions to the stability of hyperthermophilic proteins. J. Mol. Biol. 1999;289:14351444. 8. Forterre P, Elie C. The Biochemistry of Archaea (Archaebacteria). Kates M, Kushner, D., and Matheson, A., eds. 1993. pp. 325361. 9. Kikuchi A. Cozarelli NR, Wang JC, editors; 1990. pp. 285298. 10. Marguet E, Forterre P. DNA stability at temperatures typical for hyperthermophiles. Nucleic Acids Res. 1994;22:16811686. 11. Kirino H, Aoki M, Aoshima M, Hayashi Y, Ohba M, Yamagishi A, Wakagi T, Oshima T. Hydrophobic interaction at the subunit interface contributes to the thermostability of 3-isopropylmalate dehydrogenase from an extreme thermophile, Thermus thermophilus. Eur. J. Biochem. 1994;220:275281. 12. Tanaka Y, Tsumoto K, Yasutake Y, Umetsu M, Yao M, Fukada H, Tanaka I, Kumagai I. How oligomerization contributes to the thermostability of an archaeon protein. Protein L-isoaspartylO-methyltransferase from Sulfolobus tokodaii. J. Biol. Chem. 2004;279:3295732967. 13. Makhatadze GI, Privalov PL. Heat capacity of proteins. I. Partial molar heat capacity of individual amino acid residues in aqueous solution: hydration effect. J. Mol. Biol. 1990;213:375384. 14. Makhatadze GI, Privalov PL. Contribution of hydration to protein folding thermodynamics. I. The enthalpy of hydration. J. Mol. Biol. 1993;232:639659. 15. Prabhu NV, Sharp KA. Heat capacity in proteins. Annu. Rev. Phys. Chem. 2005;56:521548. 16. Privalov PL, Makhatadze GI. Contribution of hydration to protein folding thermodynamics. II. The entropy and Gibbs energy of hydration. J. Mol. Biol. 1993;232:660679. 17. Berezovsky IN, Tumanyan VG, Esipova NG. Representation of amino acid sequences in terms of interaction energy in protein globules. FEBS Lett. 1997;418:4346. 18. Chen J, Stites WE. Replacement of staphylococcal nuclease hydrophobic core residues with those from thermophilic homologues indicates packing is improved in some thermostable proteins. J. Mol. Biol. 2004;344:271280. 19. Holder JB, Bennett AF, Chen J, Spencer DS, Byrne MP, Stites WE. Energetics of side chain packing in staphylococcal nuclease assessed by exchange of valines, isoleucines, and leucines. Biochemistry 2001;40:1399814003. 20. Korkegian A, Black ME, Baker D, Stoddard BL. Computational thermostabilization of an enzyme. Science 2005;308:857860. 21. Jaenicke R. Stability and folding of domain proteins. Prog. Biophys. Mol. Biol. 1999;71:155241. 22. Querol E, Perez-Pons JA, Mozo-Villarias A. Analysis of protein conformational characteristics related to thermostability. Protein Eng. 1996;9:265271. 23. Vetriani C, Maeder DL, Tolliday N, Yip KS, Stillman TJ, Britton KL, Rice DW, Klump HH, Robb FT. Protein thermostability above 100 degrees C: a key role for ionic interactions. Proc. Natl. Acad. Sci. U.S.A. 1998;95:1230012305. 24. Hurley JH, Baase WA, Matthews BW. Design and structural analysis of alternative hydrophobic core packing arrangements in bacteriophage T4 lysozyme. J. Mol. Biol. 1992;224:11431159.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34. 35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
Thompson MJ, Eisenberg D. Transproteomic evidence of a loop-deletion mechanism for enhancing protein thermostability. J. Mol. Biol. 1999;290:595604. Peer I, Felder CE, Man O, Silman I, Sussman JL, Beckmann JS. Proteomic signatures: amino acid and oligopeptide compositions differentiate among phyla. Proteins 2004;54:2040. Protozanova E, Yakovchuk P, Frank-Kamenetskii MD. Stackedunstacked equilibrium at the nick site of DNA. J. Mol. Biol. 2004;342:775785. Bouthier de la Tour C, Portemer C, Huber R, Forterre P, Duguet M. Reverse gyrase in thermophilic eubacteria. J. Bacteriol. 1991;173:39213923. Sandman K, Krzycki JA, Dobrinski B, Lurz R, Reeve JN, HMf, a DNA-binding protein isolated from the hyperthermophilic archaeon Methanothermus fervidus, is most closely related to histones. Proc. Natl. Acad. Sci. U.S.A. 1990;87:57885791. Stein DB, Searcy DG. Physiologically important stabilization of DNA by a prokaryotic histone-like protein. Science 1978;202: 219221. Yakovchuk P, Protozanova E, Frank-Kamenetskii MD. Basestacking and base-pairing contributions into thermal stability of the DNA double helix. Nucleic Acids Res. 2006;34:564574. Zeldovich KB, Berezovsky IN, Shakhnovich EI. Protein and DNA sequence determinants of thermophilic adaptation. PLoS. Comput. Biol. 2007;3:e5. Loladze VV, Ibarra-Molero B, Sanchez-Ruiz JM, Makhatadze GI. Engineering a thermostable protein via optimization of charge-charge interactions on the protein surface. Biochemistry 1999;38:1641916423. Cambillau C, Claverie JM. Structural and genomic correlates of hyperthermostability. J. Biol. Chem. 2000;275:3238332386. Chakravarty S, Varadarajan R. Elucidation of factors responsible for enhanced thermal stability of proteins: a structural genomics based study. Biochemistry 2002;41:81528161. Pack SP Yoo YJ. Protein thermostability: structure-based difference of amino acid between thermophilic and mesophilic proteins. J. Biotechnol. 2004;11:269277. Saelensminde G Halskau O, Jr., Helland R, Willassen NP, Jonassen I. Structure-dependent relationships between growth temperature of prokaryotes and the amino acid frequency in their proteins. Extremophiles 2007;11:585596. Glyakina AV, Garbuzynskiy SO, Lobanov MY, Galzitskaya OV. Different packing of external residues can explain differences in the thermostability of proteins from thermophililic and mesophilic organisms. Bioinformatics 2007;23:22312238. Galtier N, Lobry Jr, Relationships between genomic G + C content, RNA secondary structures, and optimal growth temperature in prokaryotes. J. Mol. Evol. 1997;44:632636. Hurst LD, Merchant AR. High guanine-cytosine content is not an adaptation to high temperature: a comparative analysis amongst prokaryotes. Proc. Biol. Sci. 2001;268:493497. Nakashima H, Fukuchi S, Nishikawa K. Compositional changes in RNA, DNA and proteins for bacterial adaptation to higher and lower temperatures. J. Biochem. 2003;133:507513. Lehmann M, Wyss M. Engineering proteins for thermostability: the use of sequence alignments versus rational design and directed evolution. Curr. Opin. Biotechnol. 2001;12:371375. Berezovsky IN, Shakhnovich EI. Physics and evolution of thermophilic adaptation. Proc. Natl. Acad. Sci. U.S.A. 2005;102: 1274212747. England JL, Shakhnovich BE, Shakhnovich EI. Natural selection of more designable folds: a mechanism for thermophilic adaptation. Proc. Natl. Acad. Sci. U.S.A. 2003;100:87278731.
10
WILEY ENCYCLOPEDIA OF CHEMICAL BIOLOGY 2008, John Wiley & Sons, Inc.
45.
46.
47. 48.
49.
50.
51.
52.
53.
54.
55. 56.
57.
58.
59.
Ogata Y, Imai E, Honda H, Hatori K, Matsuno K. Hydrothermal circulation of seawater through hot vents and contribution of interface chemistry to prebiotic synthesis. Orig. Life Evol. Biosph. 2000;30:527537. Li H, Helling R, Tang C, Wingreen N. Emergence of preferred structures in a simple model of protein folding. Science 1996;273:666669. Zeldovich KB, Berezovsky IN, Shakhnovich EI. Physical origins of protein superfamilies. J. Mol. Biol. 2006;357:13351343. Shakhnovich BE, Deeds E, Delisi C, Shakhnovich E. Protein structure and evolutionary history determine sequence space topology. Genome Res. 2005;15:385392. Deckert G, Warren PV, Gaasterland T, Young WG, Lenox AL, Graham DE, Overbeek R, Snead MA, Keller M, Aujay M, Huber R, Feldman RA, Short JM, Olsen GJ, Swanson RV. The complete genome of the hyperthermophilic bacterium Aquifex aeolicus. Nature 1998;392:353358. Nelson KE, Clayton RA, Gill SR, Gwinn ML, Dodson RJ, Haft DH, Hickey EK, Peterson JD, Nelson WC, Ketchum KA, McDonald L, Utterback TR, Malek JA, Linher KD, Garrett MM, Stewart AM, Cotton MD, Pratt MS, Phillips CA, Richardson D, Heidelberg J, Sutton GG, Fleischmann RD, Eisen JA, Fraser CM, et al. Evidence for lateral gene transfer between Archaea and bacteria from genome sequence of Thermotoga maritima. Nature 1999;399:323329. Nesbo CL, LHaridon S, Stetter KO, Doolittle WF. Phylogenetic analyses of two archaeal genes in thermotoga maritima reveal multiple transfers between archaea and bacteria. Mol. Biol. Evol. 2001;18:362375. Ausili A, Cobucci-Ponzano B, Di Lauro B, DAvino R, Perugino G, Bertoli E, Scire A, Rossi M, Tanfani F, Moracci M. A comparative infrared spectroscopic study of glycoside hydrolases from extremophilic archaea revealed different molecular mechanisms of adaptation to high temperatures. Proteins 2007;67:9911001. Berezovsky IN, Chen WW, Choi PJ, Shakhnovich EI. Entropic stabilization of proteins and its proteomic consequences. PLoS. Comput. Biol. 2005;1:e47. Ponnuswamy P Muthusamy, R Manavalan, P. Amino acid composition and thermal stability of proteins. Internat. J. Biol. Macromol. 1982;4:186190. Bowie JU. Solving the membrane protein folding problem. Nature 2005;438:581589. Friedman RA, Honig B. A free energy analysis of nucleic acid base stacking in aqueous solution. Biophys. J. 1995;69:1528 1535. Vasiliskov VA, Prokopenko DV, Mirzabekov AD. Parallel multiplex thermodynamic analysis of coaxial base stacking in DNA duplexes by oligodeoxyribonucleotide microchips. Nucleic Acids Res. 2001;29:23032313. Goldstein RA, Luthey-Schulten ZA, Wolynes PG. Optimal protein-folding codes from spin-glass theory. Proc. Natl. Acad. Sci. U.S.A. 1992;89:49184922. Shakhnovich EI. Protein Folding thermodynamics and dynamics: where physics, chemistry, and biology meet. Chem. Rev. 2006;106:15591588. Morrissey MP, Shakhnovich EI. Design of proteins with selected thermal properties. Fold. Des. 1996;1:391405. Seno F, Vendruscolo M, Maritan A, Banavar JR, Optimal protein design procedure. Phys. Rev. Lett. 1996;77:19011904. Berezovsky IN, Zeldovich KB, Shakhnovich EI. Positive and negative design in stability and thermal adaptation of natural proteins. PLoS. Comput. Biol. 2007;3:e52.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. U.S.A. 1992;89:10915 10919. Miyazawa S, Jernigan RL. Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. J. Mol. Biol. 1996;256:623644. Butterfoss GL, Kuhlman B. Computer-based design of novel protein structures. Annu. Rev. Biophys. Biomol. Struct. 2006;35: 4965. Bolon DN, Grant RA, Baker TA, Sauer RT. Specicity versus stability in computational protein design. Proc. Natl. Acad. Sci. U.S.A. 2005;102:1272412729. Perez-Jimenez R, Godoy-Ruiz R, Ibarra-Molero B, Sanchez-Ruiz JM. The effect of charge-introduction mutations on E. coli thioredoxin stability. Biophys. Chem. 2005;115:105107. Pjura P, Matsumura M, Baase WA, Matthews BW. Development of an in vivo method to identify mutants of phage T4 lysozyme of enhanced thermostability. Protein Sci. 1993;2:22172225. Sali D, Bycroft M, Fersht AR. Surface electrostatic interactions contribute little of stability of barnase. J. Mol. Biol. 1991;220:779788. Zhang XJ, Baase WA, Shoichet BK, Wilson KP, Matthews BW. Enhancement of protein stability by the combination of point mutations in T4 lysozyme is additive. Protein Eng. 1995;8:10171022. Bloom JD, Meyer MM, Meinhold P, Otey CR, MacMillan D, Arnold FH. Evolving strategies for enzyme engineering. Curr. Opin. Struct. Biol. 2005;15:447452. Gilis D, Rooman M, PoPMuSiC, an algorithm for predicting protein mutant stability changes: application to prion proteins. Protein Eng. 2000;13:849856. Schymkowitz J, Borg J, Stricher F, Nys R, Rousseau F, Serrano L. The FoldX web server: an online force eld. Nucleic Acids Res. 2005;33:W382388.
Further Reading
Branden C, Tooze J. Introduction to Protein Structure. 1999. Garland Publishing Inc. Cantor CR, Schimmel PR. Biophysical Chemistry. Part I: The conformation of biological macromolecules. Part III: The behavior of biological macromolecules. 1980. W.H. Freeman. Hochachka P, Somero G. Biochemical Adaptation. Mechanism and Process in Physiological Evolution. 2002. Oxford University Press, Oxford. Saenger W. Principles of Nucleic Acid Structure. 1984. Springer-Verlag, New York.
See Also
Amino Acids, Chemical Properties of Physico-chemical Properties of Nucleic Acids: Character and Recognition of WatsonCrick Base Pairs Protein Folding, Computation and Modeling of Protein Folding, Energetics of Proteins: Computational Analysis of Structure, Function and Stability Thermophiles, Topics in Chemical Biology
WILEY ENCYCLOPEDIA OF CHEMICAL BIOLOGY 2008, John Wiley & Sons, Inc.
11