Вы находитесь на странице: 1из 10

Matrix Biology Vol. 15/1997, pp.

545-554 1997 by GustavFischerVerlag

The Collagen Triple-Helix Structure


BARBARA B R O D S K Y * and J O H N A. M. R A M S H A W +
""Department of Biochemistry, UMDNJ-Robert Wood Johnson Medical School, Piscataway, New Jersey, USA CSIRO, Division of Biomolecular Engineering, Parkville, Australia

Abstract
Recent advances, principally through the study of peptide models, have led to an enhanced understanding of the structure and function of the collagen triple helix. In particular, the first crystal structure has clearly shown the highly ordered hydration network critical for stabilizing both the molecular conformation and the interactions between triple helices. The sequence dependent nature of the conformational features is also under active investigation by NMR and other techniques. The triple-helix motif has now been identified in proteins other than collagens, and it has been established as being important in many specific biological interactions as well as being a structural element. The nature of recognition and the degree of specificity for interactions involving triple helices may differ from globular proteins. Triple-helix binding domains consist of linear sequences along the helix, making them amenable to characterization by simple model peptides. The application of structural techniques to such model peptides can serve to clarify the interactions involved in triple-helix recognition and binding and can help explain the varying impact of different structural alterations found in mutant collagens in diseased states. Keywords: collagen, triple-helix, hydration, x-ray crystallography, N M R

Introduction
The collagen family represents a group of diverse extracellular matrix molecules linked by the presence of the collagen triple-helix structure as a common structural element (Brodsky and Shah, 1995; Bateman et al., 1996). In higher animals, at least 19 distinct collagen types are known to share this common motif, but they vary in their higher order structures and functions. The triple helix constitutes a rod-like structure important for fibril formation and structural integrity, but it is now clear that the triple helix also interacts with a wide range of molecules important in extracellular matrix organization and function. In addition, the collagen triple helix has been found as a domain in a variety of other proteins, including those involved in early host-defense functions (Table I) (Hoppe and Reid, 1994). i Abbreviations used: MSR, macrophage scavenger receptor, Hyp, hyanzyprotine.

Table I. Examples of occurrences of triple-helix domains, forms of self-association and binding of ligands (Kielty et al., 1993; Hoppe and Reid, 1994; Brodsky and Shah, 1995). Protein Self-Association Ligand Binding Activities collagenase, decorin,integrins, phospholipids integrins, heparin, nidogen C1 r,Cls after activation; C 1q receptor OxLDL, tetraplex nucleic acids

Fibrillar collagens:
Type I collagen D-periodic fibrils

Non-fibrillar collagens:
Basement membrane collagen (type IV) Clq irregular network

Non-collagenous proteins:
hexamer of trimers

MSR

membrane component

546

B. Brodsky and J. A. M. Ramshaw from that seen for the tail tendon (10/3) (Fraser at al., 1979). Three water molecules per tripeptide were also identified from the data. Crystal structure determination the triple-helical conformation A prerequisite for the determination of the high resolution structure of the collagen triple helix is obtaining true single crystals suitable for x-ray crystallography, and Bella et al. (1994) only recently solved the first crystal structure for a triple-helical molecule. The structure was solved for a 30-met peptide, denoted the Gly ~Ala peptide, which consists of ten repeating Pro-Hyp-Gly units with a single substitution of a Gly by an Ala residue in the middle. The Gly ~Ala peptide diffracted to a high resolution (1.9 ,~), and had the unique molecular packing expected for a single crystal. It is likely that the presence of a single Ala residue in each peptide chain established a recognizable feature that locked in the axial arrangement, rather than forming the fiber-like associations found for the repeating (Pro-ProGly)x 0 (Okuyama et al., 1981). The Gly ~Ala peptide was found to be a triple-helical rodlike molecule 87 A long and 10 ~. in diameter (Bella et al., 1994). The structure of the Pro-Hyp-Gly regions of the Gly ~Ala peptide confirmed the general parameters of the fiber diffraction model (Fraser et al., 1979), but now variations in individual tripeptide units along the chain could be determined. Adjacent triplets in an individual chain have a twist close to 60 and a height near 8.4 A, with rather small variations among the different Gly-Pro-Hyp triplets, except in the terminal regions and near the Ala substitution. A GlyPro-Hyp triplet in one chain is related to the Gly-Pro-Hyp triplet in an adjacent chain in the molecule by a screw symmetry, with a twist close to 100 and a 2.86 A unit height. A somewhat greater variation is seen in these superhelical parameters as one goes along the helix. The ~, angles are close to those obtained for models of (Pro-Pro-Gly)10 and collagen. The crystal structure allowed the first direct visualization of the expected N H C - - O hydrogen bonds involving the peptide backbone, showing the N H of glycine hydrogen bonded to the C=O of proline, the residue in the X position. This pattern is interrupted in the Gly ~Ala substitution region. There is no information concerning possible hydrogen bonding of an amide in the X position, since this position is always occupied by Pro in the Gly ~Ala peptide. The helical symmetry seen in the Gly ~Ala triple helix is 7/2, giving 3.5 residues/turn, the same symmetry found for the (Pro-Pro-Gly)l 0 peptide (see Fraser and McRae, 1973, for explanation of notation). Cohen and Bear (1953) had suggested that collagen had a 7/2 symmetry from the fiber diffraction pattern, but the Ramachandran (1967) and Rich and Crick (1961) models were based on 10/3 symmetry (3.33 residues/turn). Careful examination of the fiber

The conformation of collagen was recognized early to be distinct from other proteins studied. Its amino acid sequence shows two unique features: (1) glycine is every third residue, generating a repeating (Gly-X-Y) n pattern; and (2) a high proportion of residues (about 20 %) are the imino acids proline and hydroxyproline. Around 1960, the molecular structure was determined to be three supercoiled polyproline II-like chains (Rich and Crick, 1961; Ramachandran, 1967). It is only recently that a crystal structure of a triple-helical molecule has become available and allowed a detailed consideration of its features and the definition of its extraordinary hydration network (Bella et al., 1994; 1995). Peptides have played a major role in the recent advances in knowledge about the triple-helical structure, serving as models of collagen sequences accessible to x-ray crystallography and N M R spectroscopy. In this review, we focus on the recent advances in defining collagen triple-helix structure as a basis for understanding its molecular properties, self-association and specific binding of other molecules.

Structure of the

Triple-Helix

Fiber diffraction studies on collagen The triple-helix model for collagen was based on fiber diffraction studies. The excellent orientation of collagen molecules in stretched tail tendon gives highly oriented fiber x-ray diffraction patterns of collagen. The indexing of layer lines from this pattern, together with collagen's distinctive amino acid features, led Ramachandran and Kartha (1954) to propose a three-stranded model for collagen, which was then modified to give the generally accepted supercoiled triple-helical structure (Rich and Crick, 1961; Ramachandran, 1967; see Fraser and MacRae, 1973, for review of early studies). This model was further refined using a linked atom least squares refinement on improved xray data by Fraser et al. (1979). The fiber diffraction pattern represents an average over the whole collagen molecule, and the model obtained provides coordinates for an average Gly-Pro-Hyp tripeptide unit. These coordinates served as a basis for energy minimization studies to refine the structure further (NSmethy., 1988). From fiber diffraction studies, it is not possible to determine if triple-helical parameters vary with different GIy-X-Y triplets. To obtain such information, it is necessary to determine the molecular structure from single crystal structures. A peptide model of collagen (Pro-Pro-Gly)l 0 was crystallized, but the data indicated that the collagen-like helices were aligned end to end to form a columnar, polymeric type of association rather than precise crystalline packing (Okuyama et al., 1981). An average model was obtained through a linked atom least square refinement that showed a backbone conformation very similar to that of dry, stretched collagen, but with a different symmetry (7/2)

The Collagen Triple-Helix

547

(a) Direct H-bonding


( G l y ) N H "" CO(Pro)

Co) Water mediated H-bonding linking


carbonyl groups COfflyp) ... W ... CO(Gly)
intrachain
,K)

(c)

Water mediated H-bonding linking Hyp-OH groups and carbonyl groups

OH(H~) ... w --. CO(GJy)


0 1 l O ' l y p ) . " W ... C O ( H ) ' p )

interchain

intcrchain

intrachain and intetchain

~....w...
o"

"w

.....

*0 i " %, .,

-it

"w

c:.O
N. . . . . . . . . . . . .

o.L
:,,0

:2 ...... .
;=O O

.....
( :,,O
>O

3--"::'"~
:

."

. ~ ,.o . ..W%%
o

/ >0 C-O O'N_ -0 :,,0

, :-o""

"'~.

.o-

.o

~v

...... (

N-

Fig. 1. A schematic drawing illustrating the types of hydrogen bonding patterns found in the triple-helix: (a) direct peptide group hydrogen bonding; (b) water mediated hydrogen bonding linking carbonyl groups; and (c) water mediated hydrogen bonding linking hydroxyproline OH groups and carbonyl groups. Variations in the number of water molecules linking the groups shown in {b) and (c) are seen in the crystal structure, and the water-mediated hydrogen bonds shown are not fully occupied (Bella et al., 1995).

diffraction pattern of stretched tail tendon pattern indicated that it is very close to 10/3 symmetry (Fraser et al., 1979). It is possible that there are local variations along the collagen helix and that the stretching of the tendon may reduce these variations, favoring the 10/3 helix. Bella et al. (1994) hypothesized that the high imino acid content of the Gly -+Ala peptide and (Pro-Pro-Gly)l 0 may result in the 7/2 symmetry, and it is possible that symmetry varies along the collagen chain, with 7/2 symmetry in imino acid rich regions and 10/3 symmetry in other regions. These two symmetries are only slightly different in terms of the local appearance of the triple-helix and the spatial relationships between nearby atoms. However, they are distinct from a crystallographic point of view and would result in different appearances of the triple-helix over long distances.

Triple-helix hydration network


The analysis of Privalov (1982) emphasized the dominance of enthalpy and hydrogen bonding in stabilization of the triple helix. The recent crystal structure of the triple helix has confirmed and given details to this analysis, showing the structure to be stabilized by sets of regular hydrogen bonding patterns. A major stabilizing feature of a-helical and [~-sheet protein structures is the participation of every backbone carbonyl and amide group in N H C = O hydro-

gen bonds. Examination of the triple-helix shows a serious deficiency in this regard. In each Gly-X-Y tripeptide unit, the amide group of the Gly forms a hydrogen bond with the carbonyl group of the residue in the X position in the adjacent chain of the triple-helix, ( G I y ) N H C = O ( X ) (Fig. la). This leaves the carbonyl group of the glycine residues and the carbonyl of the residue in the Y position with no amide hydrogen bonding partners. If the X or Y positions are occupied by amino acids, rather than Pro or Hyp', then their amide groups are also available for hydrogen bonding, but no carbonyl in the triple helix is within bonding distance. In addition, the hydroxyl group of hydroxyproline points out from the triple helix and cannot directly hydrogen bond to any other group within the molecule. The manner in which these groups are satisfied is seen in the Gly ~Ala peptide structure, where an extensive and ordered water network forms hydrogen bonds with all available carbonyl and Hyp hydroxyl groups (Fig. 1). The structure also shows C c ~ H C - - O bonds, which constitute a network of weak but systematic hydrogen bonds (Bella and Berman, 1997). A detailed analysis was carried out on the ordered waters in the Gly ~Ala triple-helix structures (Bella et al. 1995). All available groups of the peptide backbone and Hyp are seen to be involved in binding water molecules through a variety of arrangements. On the average, the C=O groups of Gly residues are bound to one water, while the C=O

548

B. Brodsky and J. A. M. Ramshaw between molecules is maintained by the water molecules which connect adjacent helices. This suggests that the lateral molecular packing of collagen molecules may be determined largely by their hydration shell, which is linked to backbone carbonyls and Hyp.

The role of hydroxyproline in the triple helix


. :. ,.: _ , ..'7

/'

-.

.1.4

Fig. 2. Illustration of the packing of adjacent triple helices in the crystal structure of the Gly 'Ala peptide. The water molecules between the chains are indicated by dots, and the lack of direct contact between neighboring molecules can be seen. (Bella, Brodsky and Berman, 1994)

groups of Hyp show two sites for hydrogen bonding to water. The O H of Hyp can bind two water molecules at two distinct sites, but not all positions are fully occupied. Waters bridge two categories of groups in this peptide: water molecules may link two carbonyl groups or they may link one carbonyl group with a hydroxyl group of Hyp. The number of waters involved in bridging two groups appears to vary along the molecule, such that two, three, four, or even five water molecules may form a chain linking the two groups. The water molecules may be linking two groups on the same chain, e.g., (Gly) C = O W " O = C ( H y p ) (Fig. l b) or (GIy)C= O W O H ( H y p ) (Fig. lc). Linkages are also seen between groups on two different chains within a molecule, e.g., Chain A H y p ( O H ) W O = C ( H y p ) C h a i n B (Fig. lc). In the crystal structure, water bridges are also observed to be the critical element in connecting adjacent triple-helical molecules and maintaining the intermolecular spacing (Bella et al., 1994, 1995). The distance between adjacent molecules in the crystal is 14 A, a value very similar to that seen for the lateral packing of collagen molecules in tendon, skin and all other tissues examined by fiber diffraction. It was surprising to find little or no direct contact between neighboring molecules (Fig. 2). Rather, the distance

Collagen is unique among animal proteins in its high content of hydroxyproline, which is formed as a post-translational modification of prolines which are incorporated in the Y position of GIy-X-Y triplets. Both proline and hydroxyproline stabilize the polyproline II extended conformation of the individual chains in the triple helix because of the stereochemical restrictions of the imino acid rings. But hydroxyproline was shown to confer a greater stability than proline in the Y position, and this stabilization was shown to be stereospecific and dependent on being in the Y position (Berg and Prockop, 1973; Burjanadze, 1982). The realization that the hydroxyl group of Hyp could not directly hydrogen bond to the carbonyl groups within the same molecule led to the proposal that its effect was mediated through bridging water molecules (Fraser et al., 1979; Privalov, 1982). This hypothesis has now been confirmed with the crystal structure, which now offers details about the special role of this unusual residue. The hydroxyl group can act as a hydrogen bond acceptor and donor, and water molecules can bind at two different sites (Bella et al., 1995). It plays a pivotal role in creating the ordered water shell around the triple-helix. As pointed out by Bella et al. (1995), Hyp hydroxyl groups linked by water molecules to a Gly C=O within the same chain, and to the Hyp C=O of the adjacent chain, are sufficient to satisfy all hydrogen bonding potential in the chains (Fig. lc ). A high content of Hyp in the Y position of certain collagen types (type Ill, type IV) may thus enhance stability via this extensive water network.

Amino Acid Sequence Dependence of the Triple-Helix Structure


The fiber diffraction-based structures (Fraser et al., 1979), and the Gly *Ala peptide structure (Belta et al., 1994) focus on Gly-Pro-Hyp tripeptide sequences in collagen. In the (Gly-X-Y)338 sequence of type I collagen, only about 10 % of the triplets are Gly-Pro-Hyp. The remaining triplets consist of about 20 frequent triplets found more than 1 % in the sequence (Dolz and Heidemann, 1986), and approximately 70 other triplets found one to three times in the sequence It seems likely that the common GlyPro-Hyp, Gly-Pro-Ala and Gly-Ala-Hyp triplets are required for stability of the triple-helix, while triplets containing charged and hydrophobic residues function as sites of recognition involved in biological specificity. Clarifica-

The Collagen Triple-Helix tion of the molecular basis of recognition and binding of a collagen sequence to another molecule requires an understanding of how variable (GIy-X-Y)n amino acid sequences affect basic properties of the triple-helix, such as the helical parameters, hydration, dynamics and potential interactions.

549

Table II. The thermal stability of host-guest peptides with non-polar GIy-X-Y guest triplets, together with the number of occurrences of each triplet in the cd(I) chain (based on Shah et al., 1996). Triplet Gly-Pro-Hyp Gly-Ala-Hyp Gly-Pro-Ala Gly-Leu-Hyp Gly-Pro-Leu Gly-Phe-Hyp Gly-Pro-Phe Tmin host-guest peptide 44 C 40 C 38 C 39 C 33 C 34 C 28 C Occurrences in ~l(l) (%) 42 20 31 (12 %) (6 %) (9 %)

Stability scale
A scale of triple-helix propensity for different GIy-X-Y triplets would be a first step toward understanding variations in triple-helix features along the length of the triple helix. Bachinger and Davis (1991) proposed a stability scale for different triplets, based on the cluster analysis of Heidemann and a variety of model polypeptide data. This scheme was used to predict regions of relative greater and lower stability in collagen, and to explain the relative severity of different cases of osteogenesis imperfecta (Bachinger et al., 1993). This classification scheme illustrates the utility of having such a scale for understanding the effects of mutations, but lacks a firm experimental basis. The use of a "host-guest" peptide set for examining the stability of the triple-helix presents an opportunity for establishing an experimentally based scale of triple-helix propensity as well as clarifying the interactions stabilizing the triple-helix. Following the concepts used to determine the stabilizing influence of individual amino acids for the c~-helix (O'Neil and DeGrado, 1990) and ]3-sheet (Smith et al., 1994), a "host-guest" peptide design has recently been applied to the triple helix, using GIy-X-Y triplets as the basic unit (Shah et al., 1996). The design used for a host-guest set of triple-helical peptides was acetyl(Gly-Pro-Hyp)3Gly-XY-(GIy-Pro-Hyp)4GIy-Gly-NH2, where GIy-X-Y is the guest triplet substituted into a stabilizing constant (GlyPro-Hyp)8 environment with both ends blocked. Initial host-guest studies (Shah et al., 1996) focused on the most frequent non-polar residues found in collagens, putting Pro, Ala, Leu and Phe in the X positions and Hyp, Ala, Leu and Phe in the Y positions. Although Leu and Phe are found almost exclusively in the X position in collagens, peptides were also made with them in the Y position, to see the consequences. The 12 peptides studied cover 35 % of the sequence of the 0~1(I) triple helix. All peptides formed stable triple-helical structures and showed a wide range of thermal stabilities (Tm = 21 C - 44 C), depending on the identity of the guest triplet (Table II). The results confirmed the stabilizing influence of imino acids, the greater stabilizing effect of Hyp compared with Pro, and the lack of hydrophobic stabilization of the triple helix. These studies also demonstrated that triplets with Leu and Phe in Y positions can form stable triple helices, even though they are less stable than when these residues are in the X position (Table II). The results suggest that the predominance of Leu and Phe in the X position of collagens is likely to be a consequence of their potential for intermolecular interactions

11 (3 %) 0 (0 %) 7 0 (2 %) (0 %)

in that position, rather than a result of molecular stability factors (Bansal and Ramachandran, 1978). The study of individual triplets is a first step toward understanding the effect of amino acid sequence, but it is likely that interactions occur between adjacent triplets for some sequences. Thus, incorporation of two adjacent guest triplets into host-guest peptide sets may be required before there can be an accurate stability scale for collagen amino acid sequences. The large number of possible GIy-X-Y triplets presents an obstacle to establishing a comprehensive scale of all triplets, but the relatively small number of triplets found with a high frequency, and the simple additivity relationships seen for some peptides, makes it possible to address many aspects of triple-helix stability using a limited number of triplets.

Experimental studies of different tripeptide sequences: NMR spectroscopy


The effect of different tripeptide sequences on the triple helix is now accessible to direct experimental approaches, through the use of synthetic peptides of varying sequence "capped" by stabilizing Gly-Pro-Hyp units (Li et al., 1993; Fields, 1995). These peptides are amenable to NMR and circular dichroism spectroscopy studies and are good candidates for crystallography. Such studies provide experimental data that may be correlated with recent computer calculations on varied tripeptide sequences (Vitagliano et al., 1993; Paterlini et al., 1995). Here, recent NMR approaches to sequence dependent features are reviewed. NMR has the potential to study properties of the triplehelix in solution, allowing investigation of the dynamics of molecules, and to focus on the properties of individual labelled residues. NMR studies in the 1970s and 1980s by Torchia (1982) challenged the concept that collagen fibrils are rod-like molecules packed into rigid fibrils by showing rotational angular mobility for the polypeptide backbone in collagen fibrils, together with considerable side chain

550

B. Brodsky and J. A. M. Ramshaw Information about the structural propensities and dynamics of single chain molecules containing collagen GlyX-Y sequences has also been obtained from NMR studies (Mayo et al., 1991). The results show their potential to adopt beta-bend structures which may be important for biological situations such as hydroxylation of the unfolded chains or recognition. The conformational propensities of collagen telopeptides have also been investigated by NMR, indicating the presence of extended structures and [3-turns (Liu et al., 1993).

motion. More recently, multi-dimensional N M R spectroscopy has been applied to collagen-like triple-helical peptides (Fan et al., 1993; Li et al., 1993; Melacini et al., 1996). In theory, it is possible to use distance information obtained from NOE assignments to directly determine the triple-helical conformation in solution, but in practice, this is hindered by peak broadening from the rod-like triple helix and by overlapping resonances from the highly repetitive Gly-X-Y sequences. Although a complete structural solution has not been possible, peptides may be synthesized with selectively ~SN enriched amino acids, allowing heteronuclear NMR studies on the structure and dynamics of specifically labelled residues. Resonance systems could be assigned for the repeating tripeptide unit Pro-Hyp-Gly peptide of (Pro-Hyp-Gly)10 and connectivity shown along the tripeptide unit (Li et al., 1993). A small number of NOE peaks observed between atoms is consistent with those expected for the triple-helical model and indicate that the three chains are closely packed in solution as well as in the solid state. NMR studies were also carried out on the peptide (Pro-Hyp-Gly)~-Ile-Thr-GlyAla-Arg-lSNGlyJSNLeu-lSNAla-Gly- (Pro-Hyp-Gly)4, designated T3-785, which contains an imino acid poor nineresidue sequence from type III collagen the C-terminal to the collagenase cleavage site. lSN residues were incorporated for the Gly, Leu and Ala sites indicated, so that these residues could be specifically investigated using heteronuclear NMR techniques. The Gly, I,eu and Ala residues could be assigned for each of the three chains in the triple helix, and the NOE peaks between these residues confirmed the oneresidue stagger and close packing of these three chains. Hydrogen exchange studies indicated that the N H groups of Gly in the central region of T3-785 have a faster exchange rate than the N H of Gly in (Pro-Hyp-Gly)10 (Table III) (Fan et al., 1993). This shows the effect of neighboring sequences on the mobility of the central Gly residue of the helix, indicating that the Gly-Pro-Hyp environment creates a very rigid helix, as expected. N M R approaches hold great promise for clarifying the effect of surrounding amino acid sequences on structure and dynamics. Heteronuclear NMR has also been used to monitor the folding of specific LSN-labelled residues at different positions in a peptide, differentiating residues involved in nucleation from those involved in propagation (Liu et al., 1996).

Self-Association of Triple-Helices into Fibrils


Collagen molecules self-associate to form their final functional state in tissues. The most well characterized and most collagen common form is the D-periodic fibril, observed as the major structural component in tendon, skin and most other connective tissues. However, non-fibrillar collagens form a wide array of other supramolecular structures, including networks, antiparallel arrays and supercoiled structures (Kielty et al., 1993; Brodsky and Shah, 1995). The recent crystal structure of a triple-helical peptide suggests that lateral association between molecules may be mediated in large part by water, rather than by direct van der Waals contacts between adjacent triple-helices (Fig. 2). Previous models where Pro and Hyp units of one molecule interact directly with imino acids of the adjacent molecule must be replaced with images of neighboring molecules connected through water bridges, with direct interactions limited to the longest side chains. It appears that the hydration shell forms around triple-helices before their self-association and plays a critical role in this interaction. In contrast to the lateral packing, the axial molecular stagger of adjacent molecules is likely to be determined by interactions involving long side chains which can bridge and interact between adjacent molecules. It is possible that the asymmetric distribution of Leu and Phe and Glu residues (X position preference) and Arg residues (Y position preference) positions them to interact directly with adjacent triple-helices. The importance of water-mediated interactions between triple-helices has also been documented by the elegant studies of Leikin et al. (1994, 1995). By measuring the effect of

Table III. The hydrogen exchange rates, protection factors and thermal stability for ~SN-GIyin two different environments in triple-helical peptides (based on Fan et al., 1993). Peptide Sequence (Pro-Hyp-Gly)~ lle-Thr-Gly-Ala-Arg-tSNGly-Leu-Ala-Gly-(Pro-Hyp-Gly)4 (Pro-Hyp-Gly)4 Pro-Hyp-SlNGly-(Pro-Hyp-Gly)s H-D exchange rate (*SN-Gly) 0.5 < 0.04 Protection Factor (tSN-Gly) 1077 > 104 Thermal Stability of peptide 25 "C 60 C

The Collagen Triple-Helix osmotic stress on collagen intermolecular distances, this group has reported short range exponential repulsions together with a longer range temperature dependent attraction. These results are interpreted in terms of a necessary rearrangement of a hydrogen bonding network of water as collagen molecules pack together, indicating a critical role for water structure in protein folding and assembly.

551

Triple-Helical Binding Domains


Binding domains and interactions
The collagen triple helix has been considered as a prototype of a rod-like protein whose role is self-association to form fibrils, but there is an increasing appreciation of its role in specific binding of other molecules (Table I). Collagen interacts with many different kinds of molecules (Kadler, 1994), and distinct protein binding domains have been localized along the collagen triple-helix, including the unique binding and cleavage site for collagenase, cell binding sites for integrins, and an antibody binding site (Fields, 1995; Hori et al., 1992). The binding sites are all linear sequences along the chain, and some interactions have been shown to be dependent on the native triple-helical structure of collagen, while others appear to be independent of conformation (Fields, 1995). Examination of the basic features of the triple-helix structure shows that it is well suited for interacting with other molecules as well as for self-association. All residues in X and Y positions of the (Gly-X-Y)n sequence are highly exposed to solvent and available for interactions (Jones and Miller, 1991). As a result, linear sequences of X and Y residues define binding sites. Residues in the X position have a somewhat greater exposure than those in the Y (Jones and Miller, 1991) and appear to be more favorably placed for interactions in some cases. The nature of recognition between a triple helix and its ligand is not known. Hydrogen bonding, either direct or water mediated, is the dominant interaction stabilizing the triple-helical conformation of collagen. As discussed above, as a collagen triple-helix approaches another molecule there, must be some reorientation of its hydration structure. All hydrophobic and charged residues are available on the surface for interactions with other molecules, and charges in particular occur with high frequency in identified binding sites (Gullberg et al., 1992; Fields, 1995). Three examples of triple-helix interactions with other molecules are given below to illustrate the current state of understanding.

is initiated by cleavage of all three chains at the site between residues 775 and 776 by collagenase (matrix metalloproteinases MMP 1, 8 and 13, Li et al., 1995)). The conformation of the triple helix at and near the cleavage site has distinctive features that may be important for recognition and binding (Fields, 1991). For example, in type I collagen, the 12 residues toward the C-terminal end of the cleavage site contain no imino acids, which could result in a loosened or structurally altered conformation. The recent determination of the three-dimensional structure of collagenase through crystallography has raised the question of triple-helix recognition by the enzyme at the molecular level. Collagenases consist of three domains: an N-terminal catalytic domain, a C-terminal hemopexin-like domain, and a proline-rich linker connecting these two domains (Li et al., 1995). The catalytic domain alone cannot bind to or cleave native collagen, and experiments point to a role for the hemopexin domain and perhaps the linker region in such binding. The structure determination of the full length collagenase formed the basis for attempts to model the binding of a triple-helix between the active site and the hemepexin domain, but no obvious favorable binding site was seen (Li et al., 1995). One theory postulates that the proline-rich linker region adopts a polyproline IIlike single stranded conformation which can then interact with the collagen triple-helix, separating the three strands and allowing cleavage (De Souza et al., 1996). It is notable that even though the structures of both collagenase and the triple-helix are now known to molecular resolution, the mode of interaction of the enzyme with the triple-helix that determines cleavage specificity is yet to be elucidated.

Collagen cell-binding sites


Cell binding to collagen plays a role in numerous physiological processes. Cell adhesion to triple-helical collagen is mediated in many cases by binding to (xl]31, 0~2151and 0G131 integrins, and there is also integrin-independent binding (Gullberg et al., 1992; Fields, 1995). Various cellular recognition sites have been identified within the triple-helical regions of type I, II, III and IV collagens. Studies on peptides incorporating these sequences showed that the presence of the triple-helical conformation greatly enhanced cell adhesion activity in some cases, while the activity was conformation independent for other sites (Fields, 1995). The mechanism of platelet aggregation by collagens has been characterized in detail through studies on triple-helical peptides (Morton et al., 1995) and has been shown to involve both 0t2[31 integrin recognition of specific type III Gly-X-Y sequences and an integrin-independent process which responds to (Gly-ProHyp) n repeats in a triple-helix form. Thus, cell binding to the triple-helix can require sequence-specific interactions, as involved in integrin mediated binding, or conformational specific, sequence-independent interactions such as those characterized for platelet aggregation.

Collagen-collagenase interactions
Collagen is an unusually stable protein, and its controlled degradation is essential in normal development and other physiological processes. The triple-helix of fibril-forming collagens is resistant to most proteases, and its degradation

552

B. Brodsky and J. A. M. Ramshaw substitution for a single glycine residue which breaks the repeating (Gly-X-Y), pattern. The most well-studied disease is osteogenesis imperfecta, where over 150 cases of Gly substitutions have been reported, resulting in brittle bone disease ranging from mild to lethal (Prockop and Kivirriko, 1995). It has been suggested that the degree of severity of the clinical osteogenesis imperfecta phenotype may relate to the distance of the Gly substitution from the C-terminal nucleation site of the helix (gradient model) or to the local environment of the Gly substitution site (regional model). It has also been hypothesized that a mutation in an imino acid rich environment has more serious consequences than one in an imino acid poor, more flexible region (Bachinger and Davis, 1993). It has been suggested that the substitution of a single Gly residue may lead to a loop or kink, a small decrease in thermal stability of the collagen, and a decreased folding rate (Prockop and Kivirriko, 1995). Studies on a simple (Pro-Hyp-Gly)10 peptide with a single Gly ,Ala substitution showed a large drop in thermal stability, a loss of direct interchain N H O C bonds, which are replaced by water mediated hydrogen bonding, and a local untwisting at the Ala site (Bella et al., 1994; Long et al., 1993). Peptides with more realistic collagen sequences are needed to investigate the effect of environment on such a substitution. It is worth noting that some cases of familial early onset osteoarthritis and spondylepiphyseal dysplasia have been found to be associated with changes in the Y position (Arg ,Cys) of GIy-X-Y repeats in type II collagen (Prockop and Kivirriko, 1995), and it is intriguing to consider what effects such a substitution might have on properties of the triple helix.

Macrophage scavenger receptor: triple-helix ligand binding


Non-collagenous proteins with a triple-helix domain include Clq, lung surfactant apoproteins A and D, mannose binding protein and macrophage scavenger receptor (MSR) (Hoppe and Reid, 1994; Brodsky and Shah, 1995). In these proteins, the triple-helix domain may have both a structural role in defining a rod-like domain and a role in ligand binding (Table I). The specificity and nature of ligand binding by the triple-helix domain of the MSR is under active investigation. MSRs are integral membrane glycoproteins which mediate the uptake of oxidized LDL, a process implicated in the etiology of atherosclerosis (Krieger and Herz, 1994). MSRs bind to other physiologically important ligands, including Gram-positive bacteria through lipoteichoic acid (Dunne et al., 1994), and [3-amyloid peptide (El Khoury et al., 1996), and are also capable of binding a wide variety of polyanionic ligands with considerable discrimination. For instance, MSRs bind to chemically modified proteins (oxLDL, AcLDL, maleylated bovine serum albumin) but not to the unmodified proteins; and to tetraplex nucleic acids (poly (I), poly (G)), but not to double- or single-stranded nucleic acids (e.g., poly (A) and poly (C)) (Krieger and Herz, 1994). MSR contains discrete domains, including a membrane spanning region, a three-stranded coiled coil {x-helical region, and a collagen-like triple-helical region. Deletion and substitution studies showed that ligand binding and specificity are determined by the collagenqike triple-helical domain (Krieger and Herz, 1994). The unusually broad ligand specificity and the discrimination between different polyanions are therefore a result of interactions involving the triple-helix. The triple-helix domain (Gly-XY)23 has a highly basic character, and several specific lysine residues near the C-terminus of the domain were implicated in binding. A triple-helical peptide containing the Gly-Pro-Lys-Gly-Gln-Lys-Gly-Glu-Lys sequence from this ligand binding region was found to form stable triple-helices at neutral, but not acid, pH values (Anachi et al., 1995). This model MSR peptide was found to bind to tetraplex poly (I), but not to double-stranded or singlestranded forms of poly (I), mimicking the discrimination shown by MSR (Mielewczyk et al., 1996). In addition, a related cross-linked triple-helical peptide was found to bind acetylated LDL (Tanaka et al., 1996). These studies illustrate that a short linear (Gly-X-Y), sequence has the ability to model the binding and discrimination of the native macrophage scavenger receptor.

Future Directions
The recent crystallography and N M R results make us confident that the basic model of the triple-helix is correct for the Gly-Pro-Hyp regions both in the solid state and in solution, and emphasize the importance of the water network as an integral part of its structure. To approach the biological properties of collagen, in particular its specific interactions in self-association and binding other molecules, the next step is to understand the effect of variations in amino acid sequence on triple-helix conformation, dynamics and folding. The successful use of synthetic collagen-like peptides for crystallography and multi-dimensional N M R will serve as a foothold for studying triple-helices with varying sequences, including functionally important binding regions.

Collagen Diseases: Structural Mutations in the Triple Helix


Some hereditary connective tissue diseases are caused by mutations in fibril forming collagens, most commonly a

Acknowledgements
We thank Dr. R.D.B. Fraser for helpful comments on x-ray diffraction and Dr. Jordi Bella for valuable discussions clarifying the hydration network. We are grateful to Dr. Jordi Bella and Dr. Helen Berman for the diagram on which Figure 2 is based. This

The Collagen Triple-Helix work was supported by NIH grant AR19626 (B.B.), an NSF U.S.Australia Cooperative Research grant and the Australia/U.S. Bilateral Science and Technology Collaboration Program.

553

References

Anachi, R.B., Siegal, D.L., Baum, J. and Brodsky, B.: Acid destabilization of a triple-helical peptide model of the macrophage scavenger receptor. FEBS Lett. 368: 551-555, 1995. B~ichinger, H.P. and Davis, J.M.: Sequence specific thermal stability of the collagen triple helix. Int. J. Biol. Macromol. 13: 152-156, 1991. B/ichinger, H.P., Morris, N.P. and Davis, J.M.: Thermal stability and folding of the collagen triple helix and the effects of mutations in osteogenesis imperfecta on the triple helix of type I collagen. Am. J. Med. Gen. 45: 152-162, 1993. Bansal, M. and Ramachandran, G.N.: A theoretical study of the structure of (Gly-Pro-Leu)" and (Gly-Leu-Pro)n. Int. J. Pept. Protein Res. 11: 73-81, 1977. Bateman, J.E, Lamande, S. and Ramshaw, J.A.M.: Collagen superfamily. In: Extracellular Matrix, vol. 2, Molecular Components and Interactions, ed. by Comper, W.D., Harwood Academic Publishers, Amsterdam, 1996, pp. 22-67. Bella, J., Eaton, M., Brodsky, B. and Berman, H.M.: Crystal and molecular structure of a collagen-like peptide at 1.9 A. resolution. Science 266: 75-81, 1994. Bella, J., Brodsky, B. and Berman, H.M.: Hydration structure of a collagen peptide. Structure 3: 893-906, 1995. Bella, J. and Berman, H. M.: Crystallographic evidence for Cc~H...O=C hydrogen bonds in a collagen triple-helix. J. Mol. Biol. (1997, in press). Berg, R.A. and Prockop, D.J.: The thermal transition of a non-hydroxylated form of collagen. Evidence for a role for hydroxyproline in stabilizing the triple-helix of collagen. Biochem. Biophys. Res. Comm. 52: 115-120, 1973. Brodsky, B. and Shah, N.K.: The triple-helix motif in proteins. FASEB. J. 9: 1537-1546, 1995. Burjanadze, T.V.: Evidence for the role of 4-hydroxyproline localized in the third position of the triplet (GIy-X-Y) in adaptational changes of thermostability of a collagen molecule and collagen fibrils. Biopolymers 21: 1489-1501, 1982. Cohen,C. and Bear, R.S.: Helical polypeptide chain configuration in collagen. J. Am. Chem. Soc. 75: 2783-2784, 1953. De Souza, S.J., Pereira, H.M., Jacchieri, S. and Brentani, R.R.: Collagen/collagenase interaction: Does the enzyme mimic the conformation of its own substrate? FASEB J. I0: 927-930, 1996. D61z, R. and Heidemann, E.: Influence of different tripeptides on the stability of the collagen triple-helix I. Analysis of the collagen sequence and identification of typical tripeptides. Biopolymers 25: 1069-1080, 1986. Dunne, D.W., Resnick, D., Greenberg, J., Krieger, M. and Joiner, K.A.: The type I macrophage scavenger receptor binds to Grampositive bacteria and recognizes lipoteichoic acid. Proc. Natl. Acad. Sci. USA 91: 1863-1867, 1994. El Khoury, J., Hickman, S.E., Thomas, C.A., Cao, L., Silverstein, S.C. and Loike, J.D.: Scavenger receptor-mediated adhesion of microglia to [3-amyloid fibrils. Nature 382: 716-719, 1996. Fan, P., Li, M.-H., Brodsky, B. and Baum, J.: Backbone dynamics of (Pro-Hyp-Gly)") and a designed collagen-like triple-helical peptide. Biochemistry 32: 7377-7387, 1993. Fields, G.B.: A model for interstitial collagen catabolism by mammalian collagenases. J. Theoret. Biol. 153: 585-602, 1991.

Fields, G.B.: The collagen triple-helix: correlation of conformation with biological activities. Connect. Tissue Res. 31: 235-243, 1995. Fraser, R.D.B. and MacRae, T.P.: Conformation in Fibrous Proteins, Academic Press, New York, 1973. Fraser, R.D.B., MacRae, T.P. and Suzuki, E.: Chain conformation in the collagen molecule. J. Mol. Biol. 129: 463-481, 1979. Gullberg, D., Gehlsen, K.R., Turner, D.C., Ahlen, K., Zijenah, L.S., Barnes, M.J. and Rubin, K.: Analysis of 0~1~1, o~2131, and c~3131 integrins in cell-collagen interactions: identification of conformation dependent ~1~1 binding sites in collagen type I. E M B O J. 11: 3865-3873, 1992. Hoppe, H.-J. and Reid, K.B.M.: Collectins - soluble proteins containing collagenous regions and lectin domains - and their roles in innate immunity. Protein Sci. 3:1143-1158, 1994. Hori, H., Keene, D.R., Sakai, L.Y., Wirtz, M.K., B~ichinger, H.P., Godfrey, M. and Hollister, D.W.: Repeating helical epitopes of defined amino acid sequence in human type III collagen identified by monoclonal antibodies. Mol. lmmunol. 29: 759-770, 1992. Jones, E.Y. and Miller, A.: Analysis of structural design features in collagen. J. Mol. Biol. 218: 209-219, 1991. Kadler, K.: Extracellular matrix 1: Fibril forming collagens. Protein Profile 5: 519-638, 1994. Kielty, C.M., Hopkinson, I. and Grant, M.E.: The collagen family: structure, assembly, and organization in the extracellular matrix. In: Connective Tissue and Its Hereditable Disorders, Molecular, Genetic and Medical Aspects, ed. by Royce, P.M. and Steinmann, B., Wiley Liss, New York, 1993, pp. 103-148. Krieger, M. and Herz, J.: Structures and functions of multiligand lipoprotein receptors: Macrophage scavenger receptors and LDL receptor-related protein (LRP). Ann. Rev. Biochem. 63: 601-637, 1994. Leikin, S., Rau, D.C. and Parsegian, V.A.: Direct measurement of forces between self-assembled proteins. Temperature-dependent exponential forces between collagen triple helices. Proc. Natl. Acad. Sci. 91: 276-280, 1994. Leikin,S., Rau, D.C. and Parsegian, V.A.: Temperature-favoured assembly of collagen is driven by hydrophilic not bydrophobic interactions. Nature Struct. Biol. 2: 205-210, 1995. Li, J., Brick, R, O'Hare, M.C., Skarzynski, T., Lloyd, L.E, Curry, V.A., Clark, I.M., Bigg,H.E, Haleman, B.L., Cawston, T.E. and Blow, D.M.: Structure of full-length porcine synovial collagenase reveals a C-terminal domain containing a calcium-linked four bladed ~-propeller. Structure 3: 541-549, 1995. Li, M.H., Fan, P., Brodsky, B. and Baum, J.: Two-dimensional NMR assignments and conformation of (Pro-Hyp-Gly)10 and a designed collagen triple-helical peptide. Biochemistry .32: 73777387, 1993. Liu, X., Otter, A., Scott, P.G., Cann, J.R. and Kotovych,G.: Conformational analysis of the type II and type llI collagen (x-1 chain C-telopeptide by 1H NMR and circular dichroism spectroscopy. J. Biomol. Struct. Dyn. I 1: 541-555, 1993. Liu, X., Siegal, D.L., Fan, P., Brodsky, B. and Baum, J.: Direct NMR measurement of the folding kinetics of a trimeric peptide. Biochemistry 35: 4306-4313, 1996. Long, C.G., Braswell, E., Zhu, D., Apigo, J., Baum, J. and Brodsky, B.: Characterization of collagen-like peptides containing interruptions in the repeating GIy-X-Y sequence. Biochemistry 32: 11688-11695, 1993. Mayo, K.H., Parra-Diaz,D., McCarthy, J.B. and Chelberg, M: Cell adhesion promoting peptide GVKGDKGNPGWPGAP from the collagen type IV triple-helix: Cis/trans proline-induced multiple IH NMR conformations and evidence for a KG/PG multiple

554

B. Brodsky and J. A. M. Ramshaw diseases and potentials for therapy. Ann. Rev. Biochem. 64: 403-434, 1995. Ramachandran, G.N.: Structure of collagen at the molecular level. In: Treatise on Collagen, vol. 1, Chemistry of Collagen, ed. by Ramachandran, G.N., Academic Press, London, 1967, pp. 103183. Ramachandran, G.N. and Kartha, G.: Structure of collagen. Nature 174: 269-270, 1954. Rich, A. and Crick, EH.C.. The molecular structure of collagen. J. Mol. Biol, 3: 483-506, 1961. Shah, N.K., Ramshaw, J.A.M., Kirkpatrick, A., Shah, C. and Brodsky, B.: A host-guest set of triple helical peptides: Stability of Gly-X-Y triplets containing common non-polar residues. Biochemistry 35: 10262-10268, 1996. Smith, C.K., Withka, J.M. and Regan, L.: A thermodynamic scale for the [3-sheet forming tendencies of the amino acids. Biochemistry 33: 5510-5517, 1994. Tanaka, T., Nishikawa, A., Tanaka, Y., Kodama, T., Imanishi, T. and Doi, T.: Synthetic collagen-like domain derived from the macrophage scavenger receptor binds acetylated low-density lipoprotein in vitro. Protein Eng. 9: 307-313, 1996. Torchia, D.A.: Solid state NMR studies of molecular motion in collagen fibrils. Meth. Enzymol. 82: 174-186, 1982. Vitagtiano,L., NSmethy, G., Zagari, A. and Scheraga, H.A.: Stabilization of the triple-helical structure of natural collagen by sidechain interactions. Biochemistry 32: 7354-7359, 1993. Dr. Barbara Brodsky, Department of Biochemistry, UMDNJ-Robert Wood Johnson Medical School, Piscataway, NJ 08 854 Received November 26, 1996

turn repeat motif in the all-trans proline state. Biochemistry 308251-8267, 1991. Melacini, G., Feng, Y. and Goodman, M.: Acetyl-terminated and template-assembled collagen-based polypeptides composed of Gly-Pro-Hyp Sequences. 3. Conformational analysis by HNMR and molecular modeling. J. Am. Chem. Soc, 118: 10359-10364, 1996. Mielewczyk, S.S., Breslauer, K.J., Anachi, R.B. and Brodsky, B.: Binding studies of a triple-helical peptide model of macrophage scavenger receptor to tetraplex nucleic acids. Biochemistry ,35: 11396-11402, 1996. Morton, L.E, Hargreaves, P.G., Farndale, R.W., Young, R.D. and Barnes, M.J,: Integrin et2131-independent activation of platelets by simple collagen-like peptides: collagen tertiary (triple-helical) and quaternary (polymeric) structures are sufficient alone for 0~2~l-independent platelet reactivity. Biochem. J. 306: 337-344, 1995. N6methy, G.: Energetics and thermodynamics of collagen self-assembly. In: Collagen, vol. I, ed. by Nimni, M.E., CRC Press, Boca Raton, 1988, pp. 79-94. Okuyama, K., Okuyama, K., Arnott, S., Takayanagi, M. and Kakudo,M.: Crystal and molecular structure of a collagen-like polypeptide (Pro- Pro-Gly)10J. Mol. Biol. 152: 427-443, 1981. O'Neil, K.T. and DeGrado, W.E: A thermodynamic scale for the helix-forming tendencies of the commonly occurring amino acids. Science 250: 646-651, 1990. Paterlini, M.G., N6methy, G. and Scheraga, H.A.: The energy of formation of internal loops in triple-helical collagen polypeptides. Biopolymers 35: 607-619, 1995. Privalov, EL.: Stability of proteins. Adv. Protein Chem. 35:1-104, 1982. Prockop, D.J. and Kivirriko, K.I.: Collagens: molecular biology',