Вы находитесь на странице: 1из 7

proteins

STRUCTURE O FUNCTION O BIOINFORMATICS

New classification of supersecondary structures of sandwich-like proteins uncovers strict patterns of strand assemblage
Yih-Shien Chiang,1 Tatiana I. Gelfand,2 Alexander E. Kister,1* and Israel M. Gelfand2
1 2

Department of Health Informatics, SHRP, University of Medicine and Dentistry of New Jersey, Newark, New Jersey 07107 Department of Mathematics, Rutgers University, Piscataway, New Jersey 08855

ABSTRACT To describe the supersecondary structure (SSS) of b sandwichlike proteins (SPs), we introduce a structural unit called the strandon. A strandon is defined as a set of sequentially consecutive strands connected by hydrogen bonds in 3D structures. Representing b-proteins as the assembly of strandons exposes the underlying similarities in their SSS and enables us to construct a novel classification scheme of SPs. Classification of all known SPs is based on shared supersecondary structural features and is presented in the SSS database (http://binfs.umdnj.edu/sssdb/). Analysis of the SSS reveals two common specific patterns. The first pattern defines the arrangement of strandons and was found in 95% of all examined SPs. The second pattern establishes the ordering of strands in the protein domain and was observed in 82% of the analyzed SPs. Knowledge of these two patterns that uncover the spatial arrangement of strands will likely prove useful in protein structure prediction.
C 2007 Wiley-Liss, Inc. V

INTRODUCTION Crucial insight into the sequence-structure relationship of proteins was made by Anfinsen, who showed that all the information about the native structure of a protein is encoded in its amino acid sequence.1 However, the question of how to read this information off of the sequence so as to reconstruct tertiary structure remains unclear. One approach is to proceed in a step-wise fashion. First, to learn the rules by which primary amino-acid sequence folds into secondary structure, then how secondary structure elementsb-strands and a-helicescome together into supersecondary structure (SSS), and, finally, how SSS determines the 3D shape of the protein molecule. There has been considerable progress with regard to the first step of this research program: Modern algorithms can outline about 80% of the secondary structure based on the primary sequence.26 However, there has been less success in the subsequent stages. This work is designed to fill in the void in our understanding of the rules that determine the formation of SSS from the secondary structure elements. Pioneering research on deciphering super-secondary structures of b-proteins was carried out by Richardson, who discovered a specific topology made of 4 strands in b-sheets, which she called the Greek Key.7,26 A number of further rules pertaining to the organization of secondary structure units have since been worked out. It was shown, for example, that loops do not cross and that knots are never found in protein chains.813 Nonetheless, we are still far from being able to outline the SSS based on secondary structure. The present paper builds on our previous discovery of the strand interlock, an invariant SSS in b sandwich-like proteins (SPs).14,15 The interlock describes the arrangement of 4 strands that form the core of SP structures. Here, we generalize our initial results and reveal two rules that apply to the arrangement of all strands in the main sheets of SPs. To uncover the common SSS features, we introduced a representation of SPs in terms of supersecondary structural unit, termed strandon.15 A strandon is defined as a set of sequentially consecutive strands that are connected by hydrogen bonds in a b-sheet. Representation of a protein structure as an assemblage of strandons will be referred to as a supermotif.
Abbreviations: SP, sandwich-like protein; SSS, supersecondary structure. The Supplementary Material referred to in this article can be found at http://www.interscience.wiley.com/jpages/08873585/ suppmat/ Grant sponsor: Robert Wood Johnson Foundation *Correspondence to: Alexander E. Kister, Department of Health Informatics, SHRP, University of Medicine and Dentistry of New Jersey, Newark, NJ 07107. E-mail: kisterae@umdnj.edu Received 9 September 2006; Revised 22 January 2007; Accepted 27 January 2007 Published online 7 June 2007 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/prot.21473

Proteins 2007; 68:915921.

Key words: b-sandwich proteins; supersecondary structure; protein folding; protein structure classification; protein structure prediction.

C 2007 WILEY-LISS, INC. V

PROTEINS

915

Y.-S. Chiang et al.

We propose here a new method of classifying SPs, based on their supermotifarrangement of strandons in the two main b-sheets. All SPs are sorted in such a way that the proteins with the same strandon arrangement are grouped under one supermotif. Within the same supermotif, the SPs are grouped according to the arrangement of their strands in the two main sheets, their motifs. This classification is set forth in the new SSS database (http://binfs.umdnj.edu/sssdb/). The SSS database affords several important insights. First, we were able to demonstrate that in 95% of the known SPs, strandon arrangements satisfy the rule of supermotifs, as will be explained later. The rule of supermotifs dramatically restricts the number of permissible arrangements of strandons. It turns out that the great diversity of sandwich proteins is described by a very small number of supermotifs. We also delineate a rule of motifs that governs that arrangement of strands within a strandon. The rule is valid for 82% of all SPs. Knowledge of the two rules that define spatial arrangement of strands in SPs should prove useful for protein structure prediction from its amino acid sequences. MATERIALS AND METHODS
Study material

1. calculation of hydrogen bond contacts between main chain atoms of residues in each domain; 2. identification of strands; 3. construction of motifs: determination of the arrangement of strands; 4. identification of strandons, which are the units of SSS; 5. construction of supermotifs: determination of the arrangement of strandons in the two main sheets. This five-step procedure was applied to each of the 703 SPs. We then grouped together all SPs with identical strandon arrangement, that is, the same supermotif. And finally, we divided SPs with the same supermotif into subgroups that share a common motif.
Calculation of H-bond contacts

A representative set of structures was chosen from the SCOP database (1.65 release).16 From the all b-proteins class, the structures of 38 protein folds described as sandwich-like were selected. We analyzed the structures of 81 SCOP superfamilies consisting of 177 families and 417 protein domains. The domains are further subdivided into clusters, which generally encompass proteins belonging to different species. Thus, the lowest classification unit in SCOP may be called a cluster/species. Since proteins from the same SCOP cluster/species have the same number and arrangement of strands in a protein domain, i.e. they have identical SSS motifs, we restrict our analysis to one randomly selected representative structure from each cluster/species. Altogether, SCOP classification encompasses 749 species of SPs. We were not able to analyze the 46 structures for one of the following reasons: only coordinates of Ca atoms were available; the structure consisted of a discontinuous chain, or multiple chains; the structure was otherwise too complicated for us to identify all of the strands that make up the two main sheets. These 46 structures are grouped in the 0 folder in the SSS database. Therefore, we analyzed the structure of 703 SPs whose atomic coordinates were taken from PDB (32 structures were solved by NMR technique).
Methodology

To calculate hydrogen bonds, we used the HBplus program.24 HBplus inputs the coordinates of atoms from a PDB file and uses geometric criteria to output a list of potential donor/acceptor pairs of atoms. From the list of all hydrogen-bonded atoms, we extracted those H-bonds that involve only the main chain atoms of residues. Generally, the number of such hydrogen bond contacts is about 1525 per b-domain. Examination of the H-bonds allows us to identify the strands and to determine their arrangements in the b-sheets.
Identification of strands

The steps in the analysis of the proteins SSS are as follows:

Our way of identifying and numbering strands of the protein chain largely coincides with the secondary structure definition in the PDBsum database.17 However, there are some notable differences: (a) We consider two PDBsum-defined consecutive strands to be a single strand if they meet all the following criteria: (1) the 2 strands are parallel to each other, (2) the number of residues interposed between them is no more than five, and (3) both strands are hydrogen bonded to the same strand in the b-sheet. For example, in protein structure 1edq [Fig. 1(a,b)], we consider two PDBsum defined strands, 10 and 11, to be a single strand since they are parallel to each other, have three residues between them, and are both hydrogen-bonded to Strand 9. Such strand mergers were carried out in 258 structures. (b) A single PDBsum strand is broken up into 2 strands if it has hydrogen bonds with strands in both main sheets. This modification was made in 10 structures, which are listed in the SSS database (see Fig. 1S in the Supplementary Material). (c) In 79 structures, we disregard a short, 23 residuelong strand at the edge of a sheet if it has only one interstrand hydrogen bond. In many cases the edge strand has no homologues in most other structures of the same protein family (see the comments in SSS database) (see Fig. 2S in the Supplementary Material).
DOI 10.1002/prot

916

PROTEINS

Strand Assemblage in Sandwich-Like Proteins

Figure 1
Schematical presentation of the arrangement of strands in the protein structure Chitinase A, N-terminal domain N (1edq, chain A:24132). (a) The strands are represented with large, dark arrows. The thin lines are the loops between strands. Strands are numbered sequentially from the N-terminus. Hydrogen bonds are indicated with solid, straight lines between strands. The sheets are labeled A, B, and C. (b) The numbers represent the strands that make up Sheets A, B, and C. Strands follow the PDBsums secondary structure definition. (c) The modified and renumbered representation of the strands in two main sheets, A and B. Both Strands 10 and 11 in sheet B from Figure 1(b) are hydrogen bonded to Strand 9 and are here considered as one strand (Strand 8). Strands 4 and 5 that made up Sheet C in Figure 1(b) will not be considered. This is the variant of the motif used in this research. (d) The supermotif: Strands 1, 2, 3, and 6 form Strandon I, II, III, and V respectively. The consecutive pairs of Strands 4 and 5, and 7 and 8 form Strandon IV and VI respectively.

SSS Motif: arrangement of strands within a protein domain

number of strands. If the number of strands is approximately the same in all sheets, then the auxiliary sheet is the one with the shortest strands (on average, 23 residues in length). The auxiliary sheets were determined by inspection. Because we focus our investigation on the common part of all SPs, we choose to examine the arrangement of strands exclusively in the two main bsheets and disregard the strands in the auxiliary sheets. After omitting the auxiliary sheets, the strands in the protein structure were sequentially numbered, starting at the N-terminal. For example, Strands 3 and 4 in structure 1edq are three-residues-long, are hydrogen bonded to each other, and form auxiliary sheet C [Fig. 1(a,b)]. The constituent strands of the auxiliary sheet are omitted in Figure 1(c), and the strands in the two main sheets are renumbered accordingly. A few structures have unusual arrangements of three b-sheets. For example, in structure 1h6e one large sheet with 9 strands simultaneously forms sandwich-like architecture with two smaller sheets (see Fig. 3S in the Supplementary Material). It was found that there are no hydrogen bond contacts between Strands 7 and 15 (the closest distance between ), and, consequently, we conthese strands is about 41 A sider Strands 10, 13, 8, 7 to form one b-sheet and Strands 4, 5, 15 to form the other. The lengths of Strands 4, 5, and 15 are 3, 2, and 5 residues, respectively, whereas the lengths of Strands 10, 13, 8, 7 are 5, 9, 9, and 5 residues, respectively. In accordance with our conventions, the sheet with the shorter strands (4, 5, and 15) is considered the auxiliary sheet. In the same way, by manual inspection, the auxiliary sheets were selected and disregarded from our analysis (see the comments in the SSS database).
Identification of strandons

The identification of strands and the knowledge of the interstrand hydrogen bonds allows us to determine the arrangement of strands within a domain, its SSS motif, or motif, for short. In the SSS database, we present two variants of the motif for each structure. The first variant is the arrangement of strands as they are defined in the PDBsum database. The second variant of the motif utilizes our strand definitions with the above modifications, and also disregards the strands in the so-called auxiliary sheets. Although SPs are generally defined as having two main b-sheets, about 22% of structures grouped as SPs in the SCOP and CATH databases have auxiliary sheets in addition to their two main sandwich sheets.16,19 Our analysis revealed 261 structures with more than 2 sheets (109 structures with 3 b-sheets, 36 structures with 4 sheets, 5 structures with 5 sheets, 4 structures with 6 sheets, 3 structures with 7 sheets, 1 structure with 9 sheets, and 3 structures with 10 sheets). By definition, an auxiliary sheet(s) is taken to be the sheet with the smallest
DOI 10.1002/prot

A strandon is defined as a sequence of the maximum number of consecutive strands that are hydrogen bonded sequentially in the 3D structure. For example, two consecutive Strands 4 and 5 [Fig. 1(c)] form a strandon because they are H-bonded to each other. Strand 3, which is next to Strand 4, is not a part in this strandon because it has no H-bonds with

Figure 2
The rule of supermotifs: The pattern of strandons arrangement in 2 b-sheets. Each strandon is depicted by a box. The top line shows the arrangement of strandons in Sheet A, and the bottom line, the arrangement of strandons in Sheet B. Dashes symbolize hydrogen bonds between strands in neighboring strandons.

PROTEINS

917

Y.-S. Chiang et al.

Table I
All Theoretically Possible Supermotifs with 4, 6, and 8 Strandons that Follow the Rule of Supermotifs

Supermotif 4 Strandons: I II IV I 6 Strandons: I II VI I V VI 8 Strandons: I II VIII I VII VIII VI VII III IV II III III VI II V I IV III VIII II VII I VI VIII V V IV IV III III II VII IV VI III V II IV I V VI IV V III* IV II III

Number of SPs 64 191

Percentage of all SPs 9.1 27.2

Number of motifs 19 38

224 37 15

31.9 4.8 2.1

34 13 5

22 7 40

3.1 1.0 5.7

9 3 8

For each supermotif, the numbers of SPs and percentages of all SPs, as well as the number of actual motifs that the given supermotif describes, are presented. The supermotif labeled * is not observed in any currently known SPs.

Strand 4. Analogously, Strand 6 is not hydrogen bonded to Strand 5, so Strand 6 is not included in the strandon. Similar analysis turned up another strandon made up of Strands 7 and 8 [Fig. 1(c)]. By definition, a single strand, which is not hydrogenbonded to any of its consecutive strand, constitutes a strandon. This is the case with Strands 1, 2, 3, and 6 in Figure 1(c). The strand numbering is cyclical, so the first and last strands in a structure are considered to be consecutive, and will belong to one strandon if there are hydrogen bonds between them. Strandons are denoted by Roman numerals. The strandon that includes Strand 1 is considered to be Strandon I, and so on [Fig. 1(d)]. Because we use revised strand numbering, it is possible that two strands that are not consecutive according to PDBsumsuch as when there is an auxiliary sheet between themwill still belong to the same strandon in our analysis.
Strandon Arrangements: The Supermotifs

RESULTS
SSS database

We developed a novel hierarchical classification of sandwich-like protein structures based on the arrangement of their strands and strandons. Protein structures with the same motif, an identical number and arrangement of strands in the two main 2 b-sheets, fall into one group. Motifs with an identical number and arrangement of strandons, that is the same supermotif, form one set. This hierarchical classification scheme: supermotifmotif protein structure is the basis for the SSS database of SPs. The SSS database contains 38 different supermotifs and 185 different motifs, which describe all of the analyzed structures. Although the number of supermotifs suggests a fairly large diversity, it must be noted that 90% of all analyzed SPs are described by just 10 different supermotifs. Moreover, 14 of the most popular motifs describe about half of the analyzed SPs. (See Table 1S in the Supplementary Material).

The supersecondary motif that describes the arrangement of strands can be rewritten in terms of strandons. Such an arrangement of strandons is termed a supermotif. Thus, the motif of Chitinase A [Fig. 1(c)] is composed of 8 strands, while its supermotif has 6 strandons [Fig. 1(d)].

The rule of supermotifs

Analysis of the supermotifs in the SSS database led to the discovery of a pattern in the ordering of the stranDOI 10.1002/prot

918

PROTEINS

Strand Assemblage in Sandwich-Like Proteins

Figure 3
All possible supermotifs with 4 strandons that correspond with the rule of supermotifs. The strandons in the boxes are shown with roman numerals. Strandon K is considered here as the left edge strandon in Sheet A. (a) Strandon arrangement when K I. (b) Strandon arrangement when K II. (c) Strandon arrangement when K III. (d) Strandon arrangement when K IV.

dons in the two main sheets. This is called the rule of supermotifs, and is represented in (Fig. 2): Strandons are numbered cyclically, thus the first and last strandons are considered to be consecutive. For example, in a supermotif with 6 strandons, if Strandon K is Strandon VI, then Strandon K 1 is Strandon I, K 2 is Strandon II, and so on (see the supermotif with 6 strandons in Table I). Strandon K at the edge of sheet A can be any strandon. It follows from the rule that if the number of strandons in the structure, N, is even, then there are N/2 strandons in each sheet. If N is odd, there are (N 1)/2 strandons in one sheet and [(N 1)/2]1 strandons in the other. When checking whether an observed supermotif follows the rule, Strandon K can be selected on either side for a supermotif with an even number of strandons, and only on one side for a supermotif with an odd number of strandons. For the supermotifs with an odd number of strandons, there are two neighboring consecutive strandons in one sheet, and one of these is an edge strandon. Strandons K is selected on the side of the structure that does not have this edge strandon. The arrangement of strandons in 95% of all structures in the SSS database obeys the rule of supermotifs. The rule of supermotifs is a generalization of the strandon interlock described in our previous paper.15 The strandon interlock rule predicts that if Strandons J and I are neighbors in one main sheet, then Strandons J 1 and I 1 are neighbors in the other sheet, and it specifies the arrangement of these 4 strandons. By contrast, the rule of supermotifs formulated in the present paper describes the arrangement of all strandons in the 2 main sheets of SPs. Without this constraint rule, the number of possible supermotifs for a given number of strandons grows rapidly
DOI 10.1002/prot

as the number of strandons increases: 4 strandons give 18 possible supermotifs (if we consider that a sheet can have 1, 2, or 3 strandons), while six strandons can be combined to form 900 possible supermotifs (see Fig. 4S in the Supplementary Material). Enforcing the constraint rule of supermotifs drastically reduces the number of allowed supermotifs in SPs. Our analysis shows that if N, the total number of strandons in a structure, is odd, then the number of possible arrangements that follow the rule of supermotifs is 2N. For an even number of strandons, that is, if N is even, the pattern of supermotifs has an inherent symmetry and there are N/2 possible arrangements that correspond to the rule of supermotifs. In fact about 85% of examined SPs have 4, 6, or 8 strandons (Table I). Let us explore the symmetry for structures with an even number of strandons. For a structure with 4 strandons to follow the rule, a pair of consecutive strandons, K and K 1, need to be located at the left or the right edge of the sheets. For K I and K II, the resulting arrangements of strandons are shown in Figure 3(a,b). Because of the inherent symmetry in the pattern of supermotifs, which is true for all the supermotifs with an even number of strandons, the arrangement of strandons when K III [Fig. 3(c)] is the same as when K I [Fig. 3(a)]. Likewise, the strandon arrangements for K IV and K II are identical [Fig. 3(b,d), respectively]. Thus there are only 2 supermotifs with 4 strandons that follow the rule of supermotifs. Only these 2 supermotifs are observed in SPs with 4 strandons (Table I). In similar way it is clear that there are 3 supermotifs with 6 strandons that follow the rule (K I, II, and III).

Figure 4
Two supermotifs with 4 strandons. The left edge strandon in Sheet A is strandon K. We accept cyclic ordering for both strands and strandons. Therefore, the first and last strands in a structure are consecutive and will belong to the same strandon if they share hydrogen bonds. The arrows ? or /are directed towards the highest numbered strand in each strandon. (a) A supermotif where K I, with two possible motifs for the given supermotif. From left to right, examples of SPs described by these motifs are 1akj_D and 1who. (b) A supermotif where K II, with two possible motifs for the given supermotif. From left to right, examples of SPs described by these motifs are 1dqi_A and 1i8a_A.

PROTEINS

919

Y.-S. Chiang et al.

Only these supermotifs are observed in SPs with 6 strandons (Table I). With 8 strandons, there are four possible arrangements that satisfy the rule (K 1, 2, 3, and 4). Three of these supermotifs are actually observed in SPs with 8 strandons (Table I).
The rule of motifs

of the many investigations in this field is to break down proteins into large structural units and consider the emerging patterns of arrangement of these units. For over 30 years, researchers have looked at how secondary structural elements assemble into SSS.2023 Inspection of protein structures revealed preferences and regularities in the geometry of protein chains. It was shown that secondary structure elements that are adjacent in the sequence are often in contact in 3D structures and the connection between these elements neither cross each other nor make knots in a chain.25 In this, as in our previous research,15 we have extended these rules by considering the organization of SSS units (strandons) in the known SPs. The advantage of looking at protein structure as an arrangement of strandons (or supermotifs) is that it enables us to find specific patterns of the strand packing in the sandwich-like protein structures: The rule of supermotifs and the rule of motifs. These rules sharply delimit the number of possible structural permutations of strands in space and lead to the conclusion that very few possible arrangements of strands are compatible with sandwich architecture. These restrictive rules help us to understand why only a limited number of different protein folds is observed.18 The two patterns that we uncovered support and predict a finding in our previous research,14 the strand interlock. The strand interlock, which is the arrangement of two pairs of consecutive strands I, I 1 and k, k 1, can be seen as a corollary of the more general rule of motifs. If an SPs SSS corresponds with both the rules of supermotifs and motifs, then it will have strand interlock(s). Consider the example of a supermotif with 4 strandons, which follows the rule of supermotifs [Fig. 4(a)]. In the first motif (Structure 1akj, Chain D) two strands, 2 and 7, from Strandons, I and III, respectively, share hydrogen bond contacts, and can be considered to be Strands I and k in a strand interlock. Two other strands, I 1 and k 1 (Strands 3 and 8), are located in Strandons II and IV, respectively. The ordering of strands in all of the strandons corresponds with the rule of motifs; therefore, Strands 3 and 8, from Strandons II and IV, should be neighbors and consequently form hydrogen bonds. Thus two pairs of Strands 2, 3 and 7, 8 are the two pairs of strands that form the interlock. In this research, we introduce a novel approach to classifying SPs. This classification is based solely on regularities in SSS. This algorithm is used to create the SSS database of all known SPs. It is important to note that because for classification, we did not take in account sequence similarity, proteins grouped together can have very different amino acid sequences. In the SSS database, proteins grouped together often belong to different folds and different superfamilies in the SCOP database. The proposed structural classification is critically dependent on the definition of secondary structural
DOI 10.1002/prot

The rule of supermotifs explains the arrangement of strandons in the 2 b-sheets. However, it does not say anything about the ordering of strands within the strandons. In other words, it does not tell us whether in any given strandon, the strands numbers increase to the right or to the left (see the direction of arrows in Figure 4). The analysis of the arrangement of strands within the supermotifs and strandons reveals a strong correlation between the location of a strandon in the supermotif and strand ordering within the strandon. This observation leads us to formulate the rule of motifs. The rule of motifs expands on the pattern of supermotifs shown above and states that: For a strandon K, the strand with the highest number is located at the edge of the sheet. For two strandons K and K1 or any two neighboring strandons in the same sheet, strands numbers in these two strandons will increase in opposite directions. This means that the order of strands in a given strandon is the same for all proteins described by the same supermotif. For example, let us consider two different structures whose supermotifs and motifs are shown in Figure 4. In the two motifs in Figure 4(a), the order of strands in Strandon I is the same (indicated by the direction of arrows). Likewise, the order of strands in Strandon II, or III, or IV, is the same in the two motifs. Figure 4 also demonstrates that the ordering of strands within a strandon depends on the strandons place in the supermotif. In Figure 4(a) the arrows indicating the direction in which strands numbers increase in Strandons I and III are directed away from each other, while in Figure 4(b), which represents a different supermotif, the arrows for Strandon I and III are directed towards each other. In both cases, the arrows in neighboring strandons point in opposite direction in agreement with the rule of motifs. In 82% of the SPs analyzed for the SSS database, the arrangement of strands in all strandons satisfies the rule of motifs. In 12% of analyzed structures, the incorrect ordering of strands was found in only one strandon, and in 6% of the structures, an exception was found in two or more strandons. DISCUSSION Uncovering structural similarities in proteins with dissimilar sequences is essential for understanding the relationship between sequence and structure. The main idea

920

PROTEINS

Strand Assemblage in Sandwich-Like Proteins

elements and of the contacts between strands in the main b-sheets. To minimize an element of arbitrariness, we use a single objective and structurally reasonable criterion for both definitions: The hydrogen bonds between main chain atoms. The advantage of this approach is that secondary structure and SSS is deduced from objective dataa list of hydrogen bonds, which can readily be verified by an independent test. We started our analysis with widely used secondary structures presented in PDBsum database. However, after analyzing the hydrogen bonds of the structures, we found it necessary to revise some of these data according to the rules, which were described in our methods. These revisions were developed based on the analysis of all structures and we consider them to be valid for most b-sandwich structures. However, it must be acknowledged that the use of the hydrogen bond analysis for identifying strands and 3D structure is not always sufficient for an adequate description of specific 3D features. This leaves questions about the 3D structure of certain sandwich proteins whose answer would effect how the proteins are classified. Some of these questions, which require further analysis, are as follows: What is the shortest segment of amino acid sequence, hydrogen bonded to an edge strand in a sheet, that can be defined as a strand? Is it necessary to divide a twisted or bent sheet into 2 sheets? What angle of bend would be the threshold for this division? Should a large, twisted sheet be divided into two subdomains? What cases merit this division? In a structure with more than 2 sheets, are there other ways, in addition to the one used here, to define the two 2 b-sheets? These questions reveal some ambiguity in classifying SPs solely on structural information obtained from their hydrogen bonds. Nonetheless, any potential errors would not significantly effect our overall classification and statistical results. The patterns of spatial arrangement of secondary and SSS elements in SPs, described here, will help pave the way towards the goal of predicting protein structure from its amino acid sequence. ACKNOWLEDGMENTS We thank Drs. M. Shibata and C. Chothia for helpful discussions and comments, and Mrs. M. Goldman for continuous encouragement. We would also like to thank our anonymous reviewers for their very useful and valuable critical comments. REFERENCES
1. Anfinsen CB. Principles that govern the folding of protein chains. Science 1973;181:223230. 2. He J, Hu HJ, Harrison R, Tai PC, Pan Y. Rule generation for protein secondary structure prediction with support vector machines and decision tree. IEEE Trans Nanobiosci 2006;5:4653.

3. Aydin Z, Altunbasak Y, Borodovsky M. Protein secondary structure prediction for a single-sequence using hidden semi-Markov models. BMC Bioinformatics 2006;7:178. 4. Lin K, Simossis VA, Taylor WR, Heringa J. A simple and fast secondary structure prediction method using hidden neural networks. Bioinformatics 2005;21:152159. 5. Crooks GE, Brenner SE. Protein secondary structure: entropy, correlations and prediction. Bioinformatics 2004;20:16031611. 6. Pollastri G, McLysaght A. Porter: a new, accurate server for protein secondary structure prediction. Bioinformatics 2005;21: 17191720. 7. Richardson JS. Handedness of crossover connections in b-sheets. Proc Natl Acad Sci USA 1976;173:26192623. 8. Cohen FE, Sternberg MJE, Taylor WR. Analysis and prediction of the packing of a-helices against a b-sheet in the tertiary structure of globular proteins. J Mol Biol 1982;156:821862. 9. Taylor WR, Green NM. The predicted secondary structures of the nucleotide-binding sites of six cation-transporting ATPases lead to a probable tertiary fold. Eur J Biochem 1989;179:241248. 10. Clark DA, Shirazi J, Rawlings CJ. Protein topology prediction through constraint-based search and the evaluation of topological folding rules. Protein Eng 1991;4:751760. 11. Woolfson DN, Evans PA, Hutchinson EG, Thornton JM. Topological and stereochemical restrictions in b-sandwich protein structures. Prot Eng 1993;6:461470. 12. Ruczinski I, Kooperberg C, Bonneau R, Baker D. Distributions of b sheets in proteins with application to structure prediction. Proteins 2002;48:85-97 13. Eidhammer I, Jonassen I, Taylor WR. Structure comparison and structure patterns. J Comp Biol 2000;7:685716. 14. Kister A, Finkelstein A, Gelfand I. Common sequence and structural features in sandwich-like proteins. Proc Natl Acad Sci USA 2002;99:1413714141. 15. Kister AE, Fokas AS, Papatheodorou TS, Gelfand IM. Strict rules determine arrangements of strands in sandwich proteins. Proc Natl Acad Sci USA 2006;103:41074110. 16. Andreeva A, Howorth D, Brenner SE, Hubbard TJP, Chothia C, Murzin AG. SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acid Res 2004;32:D226D229. 17. Laskowski RA, Chistyakov VV, Thornton JM. PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids. Nucleic Acids Res 2005;33:D266D268. 18. Chothia C. One thousand families for the molecular biologist. Nature 1992;357:543544. 19. Pearl F, Todd A, Sillitoe I, Dibley M, Redfern O, Lewis T, Bennett C, Marsden R, Grant A, Lee D, Akpor A, Maibaum M, Harrison A, Dallman T, Reeves G, Diboun I, Addou S, Lise S, Johnston C, Sillero A, Thornton J, Orengo C. The CATH domain structure database and related resources Gene3D and DHS provide comprehensive domain family information for genome analysis. Nucleic Acids Res 2005;33:D247D251. 20. Levitt M, Chothia C. Structural patterns in globular proteins. Nature 1976;261:552558. 21. Sternberg MJE, Thornton JM. On the conformation of proteins: the handedness of the b-strand-a-helix-b-strand unit. J Mol Biol 1976; 105:367382. 22. Cohen FE, Sternberg MJE, Taylor WR. Analysis of the tertiary structure of protein b-sheet sandwiches. J Mol Biol 1981;148:253272. 23. Michalopoulos I, Torrance GM, Gilbert DR, Westhead DR. TOPS: an enhanced database of protein structural topology. Nucleic Acids Res 2004;32:D251D254. 24. McDonald IK, Thornton JM. Satisfying hydrogen bonding potential in proteins. J Mol Biol 1994;238:777793. 25. Chothia C, Finkelstein AV. The classification and origins of protein folding patterns. Annu Rev Biochem 1990;59:10071039. 26. Richardson JS. b-Sheet topology and the relatedness of proteins. Nature, 268, 1997;495-500.
PROTEINS

DOI 10.1002/prot

921

Вам также может понравиться