LS1a Fall 2014 Lab 2: Computer Modeling of Proteins with PyMOL
Goals: The objectives of this lab are to provide you with a better understanding of: 1. how to use PyMOL at a basic level. 2. how to translate a two-dimensional chemical structures to a three-dimensional chemical model. 3. how to relate different three-dimensional representations used by scientists to depict biological polymers. 4. the two types of protein secondary structure: alpha ()-helices and beta ()-sheets 5. how the exceptional polarity of peptide bonds contributes to the stability of secondary structure. 6. how the properties of individual amino acids influence the structure of proteins
Required Safety Regulations and Lab Etiquette Wear lab coat and safety glasses at all times Wearing appropriate clothes that protect the leg, foot, and ankle (e.g., jeans, socks, and sneakers) No eating or drinking in lab [Points can be deducted from your post-lab assignment for not wearing the appropriate clothes, not wearing the required personal protective equipment, or leaving a mess.]
This laboratory is a computer-based exercise that will help you to become more comfortable with viewing two- and three-dimensional images of amino acids and proteins using a software program called PyMOL (its also the same software that is used to make a lot of images look at throughout the course).
You can complete this lab by reading the instructions as you progress through the different exercises. Once you become comfortable using PyMOL, answer the different questions on the answer sheets provided in lab as you move along.
Your answer sheet will be due a week from todays lab, and the PyMOL files you will be using for todays lab will be placed online Friday after 4pm if you would like to finish your assignment after section is over.
Introduction As you have learned in lecture, some of the most important molecules associated with life are polymers. Polymers are chains composed of many repeating units called monomers that are covalently linked together. You can think of monomers as chemical building blocks. Different types of polymers are made of different types of monomers. For example, nucleotides are the monomeric building blocks of the polymer DNA; just as amino acids are the monomeric building blocks of polypeptides. (The distinction between a protein and a polypeptide is largely semantics, but occasionally a protein will consist of multiple polypeptides, as we will see in the case of hemoglobin.)
Figure 1: A polymer is a string of monomers covalently attached together. 2 Amino acids are the monomer building blocks of proteins; conversely, proteins are strings of amino acids covalently attached to each other. Different proteins have different amino acid sequences and the shape and function of a protein depends on the order and arrangement of its amino acid building blocks.
In order to understand how proteins work, we must first understand what they look like. Using methods such as X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy (both of which are beyond the scope of this course), scientists are able to determine the distinct three-dimensional arrangement of atoms within macromolecules which are otherwise waaaaaaaay too small to be seen with even the most powerful microscopes. Scientists can then use this structural information about a protein to make educated guesses about how the protein works, based on the relative locations of certain molecular components. This information is vital to improving our understanding of healthy and diseased cells at a molecular level, and for identifying potential drug targets, as we will see later in the course using the examples of HIV and Chronic Myeloid Leukemia (CML).
III. Procedure This lab is composed of three sections, each which deals with a different aspect of amino acid and protein chemistry and structure. The lab questions starting in Section 2 are highlighted with a symbol in the left margin of the page and should be answered as you work your way through the different exercises. Answer your questions on the question sheet that is due in section a week from today.
Section 1.1: Introduction to PyMOL A) Basic Manipulations To begin using PyMOL, pair up with a buddy at a computer and do the following: Login to the Students account using the password g3n0m3. Shh, its a secret ;) Open the Life Sciences Labs folder on the bottom. Click on the file that says LS1a_PyMOL_Lab2.zip. This will automatically decompress the file and create a LS1a_PyMOL_Lab2 folder. This folder has all the appropriate files for us. Open the LS1a_PyMOL_Lab2 folder Click on the 1_Amino_Acid_Basics folder and then open the Amino_Acids.pml file.
After clicking on Amino_Acids.pml, PyMOL will automatically open have loaded up into two windows: an external interface on top, and a larger viewer window below it. Well be focusing mostly on the lower viewer window, which you can maximize by clicking the green + button at the top left corner of the window.
The atomic structure of the amino acid (L)-valine should appear in the viewing window and look something like this:
3
Change the orientation and zoom of the molecule by clicking on it with your mouse (playing around with this program is the best way to use it to learn and highly encouraged). The basic mouse controls are as follows:
Rotate (= left mouse button): To rotate the molecule, click on the left mouse button and move the mouse left, right, up or down. Move (= middle button): To move an object on the screen, you must click on the object using the wheel as a button. Make sure to hold the button down as you move the mouse and try to not turn the wheel. Turning the wheel simply controls the Z-axis clipping plane for the viewer. If you rotate the wheel by accident and clip the image, turn it back in the other direction until the entire molecule reappears. Zoom (= right mouse button): Click on the right mouse button and move the mouse up or down in order to zoom in or out on the molecule. Go easy at first! It doesnt take much to lose the molecule if you go too far. Type reset in one of the text entry panels if anything ever gets too crazy, or just close and open the .pml (or .pse) file again.
It takes a little practice to feel comfortable manipulating molecules, but its pretty easy once you remember what each button does. Practice using all three functions; once you have a feel for how to manipulate the image, move on to the next section. If you ever lose the molecule, just close the program and click on the Amino_Acids.pml file again.
b. Turning the Objects On/Off On the right side of the viewer window, you will notice a series of rectangular buttons with names in them, as well as a series of smaller square buttons ([A], [S], [H], [L], and [C]) next to each rectangular button:
Each of the rectangular buttons controls an object in the viewer. By pressing on the button, you can make the object appear or disappear. For example, clicking on L_Valine will make the amino acid disappear completely; clicking on it again will cause the amino acid to reappear.
You will not be using the small square buttons for todays lab ([A], [S], [H], [L] or [C]). If youd like to learn more about using what they do, you can visit the companys website at PyMOL.org or a helpful PyMOL wiki page. (Briefly: A stands for action, S stands for show, H stands for hide, L stands for label, and C stands for color. Clicking on any of them produces a drop-down menu.)
4
Section 1.2: Amino Acid Basics Lets begin with the basic building block of proteins: the amino acid. There are 20 standard naturally occurring amino acids that our cells use as the monomers to construct proteins. The one we will begin with is valine.
First, make sure that the object Valine is turned on and all other rectangular buttons are turned off. The particular representation currently being used to show (L)-valine is called sticks. Carbon atoms are shown in green, oxygen atoms in red, and nitrogen atoms in blue.
The alpha ()-carbon These 20 molecules are all called amino acids because all 20 of them contain an amino group and a carboxylic acid group (nifty). A third group, called the side chain (denoted below by R), is different for each of the 20 amino acids (in fact, it is the only difference between each of the 20 natural amino acids). In amino acids, the carboxylic acid group, the amino group and the side chain group are all connected to the C carbon.
Can you identify the C carbon in the valine structure shown on the computer? Once you think you have identified it, do the following: Turn off Valine and turn on Ca_Carbon This will highlight the C carbon in yellow.
Hydrogens Notice that PyMOL does not explicitly show any hydrogen atoms as part of the valine molecule. The hydrogens are typically implied in all of the 3-D structures you will be viewing (just as they are implied by many of the 2-D standard line drawings we look at), so it is worthwhile getting used to looking at molecules this way. (The lack of hydrogens is because most of our molecular structures have been determined using X-ray crystallography, a technique that can detect carbon, nitrogen, oxygen, phosphorous, and sulfur, but not hydrogen because hydrogen is too small.)
Turn off Ca_Carbon and turn on Hydrogens
This gives you a better sense of where the hydrogen atoms are located in the structure. Notice that three hydrogen atoms are bound to the nitrogen, while no hydrogen atoms are bound to either of the two carboxyl-group oxygens. This is the way valine would exist in solution at cellular pH (pH approximately 7.4):
Generic amino acid at physiological pH
Notice that because PyMOL does not include typically hydrogens, it can also not tell you anything about whether atoms are formally charged.
Amine Amino acid Carboxylic acid -carbon (C) 5 Notice that if you rotate the molecule such that the nitrogen is on the left, the carboxyl group is on the right, and side chain is pointing towards the top, then we can observe the amino acids stereochemistry: the L-configuration of the amino acid describes the side chain pointing out towards towards you as the hydrogen points away from you:
Double Bonds You might also have noticed that double bonds are not distinguished from single bonds in the PyMOL molecule you are viewing. Again, this will be a common theme for the structures that you will be viewing in this lab. In certain cases it may indeed be useful to see the double bonds, and PyMOL is capable of showing them.
Turn off Hydrogens Turn on Double_Bond
Here you can see how PyMOL distinguishes single bonds from double bonds using the sticks representation. In this case, a double bond can be seen between the carbon atom and one of the oxygen atoms of the carboxylic acid group.
Using different representations to show structures Although the sticks representation is useful for viewing the atomic connectivity and the geometry of a molecule, it is less useful at showing the space occupied by the electrons that surround the nuclei of different atoms within the molecule. A different type of representation (Mesh) can be used to show the volume that the molecule occupies.
Turn off Double_Bond Turn on Hydrogens and Mesh
This representation shows a mesh that depicts the approximate boundary of electron density around the different atoms in the molecule. Notice how the space occupied by electrons is not at all apparent from a simple sticks model.
Turn on Space_Filling
Figure 2: Stereochemistry of a L-valine, notice how the side chain points out of the page towards you when the molecule is rotated such that the hydrogen points into the page away from you. On the left is the two-dimension line drawings we usually use to represent L-valine, on the right is a PyMOL image of the same molecule in the same orientation. 6 Here we see another, more realistic view of the relative space occupied by each atom in the molecule. However, notice that now it becomes much more difficult to see how atoms are connected to each other and how bonds are oriented relative to adjacent bonds.
Turn off Mesh
Scientists use these different representations (sticks, space filling, mesh, etc.) depending on their particular needs when studying a molecule. Note that each representation has its own advantages and disadvantages. For example, while the sticks representation provides a rather detailed look at a macromolecules structure at the atomic level, it is difficult to discern the general macromolecular shape (the structure has too much information in it, so to speak). This is particularly true when viewing structures that are much larger than just a single amino acid.
Section 2: Protein Basics In this section, we will be examining the basics of protein structure. The specific protein we will be looking at is hemoglobin. You may be already familiar with hemoglobin; it is a protein found prominently in red blood cells and is responsible for carrying and distributing oxygen throughout your body.
Close your current PyMOL viewer window to exit the program. Return to the LS1a_PyMOL_Lab2 folder and open the 2_Protein_Basics folder. Click on the Protein_Basics.pml file. This will automatically open up the PyMOL program and load the coordinates for hemoglobin.
Hemoglobin consists of four polypeptide chains. We will begin looking at a single amino acid of one of the polypeptide chains and build our way up until we visualize the entire protein. In that vein, this file begins where you left off with the amino acid valine in the now familiar sticks representation. In this case, valine is simply referred to as Amino_Acid in the list of objects (rectangular buttons) on the right.
Turn off Amino_Acid and turn on Tripeptide. You may need to zoom out a bit and move the structure back to the center of the screen using the appropriate mouse buttons.
Here we can see how amino acids link together. Note how the carboxylic acid group from one amino acid is used to link to the amine group of another to form what is called an amide (or a peptide) bond. In this tripeptide, a lysine is bound to the N-terminal end (the side that ends with the amine) of the valine and an alanine is bound to the C-terminal end (the side that ends with the carboxylic acid) of the valine. Typically, we write peptide sequences from the N-terminal end to the C-terminal end, so this example tripeptide would be written as Lys-Val-Ala.
Question 1
Turn off Tripeptide and turn on Primary. Feel free (as always) to play around with the molecule. You may need to readjust the frame to see the entire structure.
We are now seeing even more of one of hemoglobins polypeptide chains. The sequence of amino acids in a polypeptide (or a protein) is known as either the primary sequence or the primary structure. The primary sequence for this particular segment is:
7 Nt-Ser-Ala-Gln-Val-Lys-Gly-His-Gly-Lys-Lys-Val-Ala-Asp-Ala-Leu-Thr-Asn-Ala-Val-Ala-His-Val-Ct, or using the one-letter code: Nt-SAQVKGHGKKVADALTNAVAHV-Ct. The terms Nt and Ct stand for amino-terminal end and carboxy-terminal end, respectively.
[A quick note about terminology: the difference between the term polypeptide and protein is the scale. Proteins are often much larger than polypeptides, and the term polypeptide often refers to a small portion of a protein. Both peptides and proteins are polymers of amino acids connected by peptide bonds. The terms polypeptide and protein are sometimes used synonymously. Some key differences are: 1) proteins are polypeptides that perform specific functions inside cells; and 2) some proteins, such as hemoglobin, consist of several polypeptide chains.
In addition, amino acids present within a protein are commonly referred to as residues. Thus, the phrase N-terminal residue refers to the amino acid that is present at the N-terminus of the protein (or peptide). This is because the synthesis of a peptide bond between two amino acids is a condensation reaction as it releases a water molecule. The bound amino acids are therefore the residue left after the loss of a water molecule.]
We can simplify this structure by removing the amino acids side chains but keeping the backbone atoms:
Turn off Primary and turn on Backbone
You may notice that the peptide backbone is starting to take on a helical shape. The term peptide backbone typically applies to the C carbon atoms and the atoms involved in the peptide bonds, but excludes the atoms in the side chains. The backbone view shown here excludes the side chains (the R groups) that are normally attached to the C carbon atoms. The cartoon representation further helps to simplify and clearly indicate the helical structure of the backbone.
Turn on Secondary
This cartoon represents an alpha ()-helix, an example of secondary structure. Notice how the helical cartoon lacks any detailed depictions of the atoms that form the structure. Such simplified diagrams are often used to show the overall shapes of proteins to avoid introducing too much detail into the picture. We will discuss specific types of secondary structures in the next segment.
To see how the amino acid side chains are oriented along the helical backbone, do the following:
Turn off Backbone and turn on Helix_Sidechains
Helix_Sidechains shows the orientation of the side chains relative to the helix, which are pointed outwards rather than inwards.
Turn off Secondary and Helix_Sidechains, then turn on Tertiary. Zoom out to see the whole polypeptide chain.
If we combine all of the secondary structural elements included within a polypeptide, we get a large, compact structure called the tertiary structure of the protein. The shape of each polypeptide is unique and depends upon the polypeptides unique amino acid sequence. The proteins shape allows it to carry 8 out its particular function with great efficiency; the shape of the protein is optimized for the proteins required function within (or outside of) the cell.
Turn on Full_Backbone (and keep Tertiary on)
Notice how well the backbone aligns with the cartoon representation of the tertiary structure. Again, the cartoon is a convenient simplification of the overall structure of the protein. Lets see what the entire protein looks like without simplification:
Turn off Full_Backbone and Tertiary, turn on Sticks_Rep
While showing the entire polypeptide as sticks describes the location of each atom in the polypeptide, its quite a lot of information and not as obvious what were looking at. Lets examine another type of representation.
Turn on Surface_Rep
This is called a surface representation. This representation depicts the actual volume and shape the protein occupies using the van der Waal radii of each atom. It provides a great sense of what the actual surface of the protein looks like.
Question 2
Now lets see what happens when multiple polypeptide chains combine to form a single functional protein:
Turn off Surface_Rep and Sticks_Rep Turn on Quaternary
Each of these differently-colored polypeptide chains is virtually identical in structure and sequence; sometimes, a protein consists of multiple polypeptide chains that come together to form one functional unit.
Many proteins consist of just a single polypeptide chain. However, for proteins that consist of several different polypeptide chains, their quarternary structure is the level of structure at which multiple polypeptide chainseach with their own primary, secondary, and tertiary structuresfold together into one functional unit.
Question 3
As we mentioned earlier, hemoglobin acts as an oxygen carrier. But where does it carry the oxygen? Some proteins need additional structural components to perform their duties. In many cases, proteins utilize a small molecule, called a cofactor, to assist.
Turn on Cofactor
In this particular case, the cofactor is a porphyrin ring, a large ring with a metal ion in the center (here, iron is used as the central ion). Notice that each chain has its own cofactor.
9 Now lets see where the oxygen gets bound to the ring.
Turn on Oxygen and Surface_Rep
Question 4
[Note: The oxygen molecules (O2) are shown using a surface representation for clarity, which makes them appear much larger than the surrounding atoms in the porphyrin ring (which are shown with a stick representation).]
At this point, it is worth stressing that the ability of hemoglobin to bind oxygen and its cofactors is dependent on the structure of the protein itself. As youve seen, this structure is determined by the sequence of amino acids that combine in a distinct order to form the protein. Therefore, the primary sequence of a protein determines its structure, and its structure determines the proteins function.
Section 3: Protein Secondary Structure
Close your current viewer, open up the folder called 3_Secondary_Structure, and click on the file called Secondary_Structure.pse. This will automatically open up the PyMOL program, and load up the coordinates for a different protein called Top7.
In this brief section, we will be examining protein secondary structure, and how cartoons are used to simplify visualizing protein structure. Since stick representation is a bit noisy, so lets simplify the protein to its backbone as weve done previously.
Turn off Protein_Sticks and turn on Backbone
-helices Eliminating the side chains alone helps simplify the structure and you can begin to see some of the basic elements of secondary structure. Lets examine each of these elements in greater detail.
Turn off Backbone and turn on Helix_Sticks
Now, these particular amino acids form helical secondary structures, exactly like those we saw in hemoglobin. We can therefore simplify them in cartoon mode as helices. (By playing around with these structures in stick representation, can you see how they are helical before turning on the cartoon representation?) These types of structure are called -helices, and are quite common in proteins. Here, weve colored them magenta for clarity.
Turn off Helix_Sticks and turn on Helix_Cartoon
The cartoon representation is one of the most common ways that scientists portray -helical regions contained within proteins. Notice that in the cartoon representation, none of the atoms that comprise the helix are shown. Instead, the helix cartoon outlines the shape of the peptide backbone and makes it easier to view complex structures.
So what causes these regions to take on a helical shape? Part of the answer is that the amino acids are able to form a series of hydrogen bonds that help to stabilize the helix. Certain amino acids favor this secondary structure more than others. 10
To take a closer look at the hydrogen bonds that are important in stabilizing helices, we will focus on one of the two helices shown in this protein.
Turn off Helix_Cartoon Turn on Alpha_Helix
It may help to re-center the image on your screen so that you can rotate it more easily. To do so, right- click on the mouse and choose the center (vis) command.
In this image, the hydrogen atoms that are attached to the backbone nitrogen atoms are depicted in white. Also, the image specifically shows the double bonds that form between the carbon and oxygen atoms in the backbone.
Question 5
Now, lets take a look at some of the hydrogen bonds that are important in stabilizing the helix. For simplicity, only hydrogen bonds in a short stretch of the helix are shown (the helix contains more than just the six H-bonds that are depicted here). To see the bonds:
Successively turn on H_bond1 through H_bond6
Question 6
It is important to recognize that the network of H-bonds that stabilizes the helix stems from the interactions between backbone amine (-NH) and backbone carbonyl (-C=O) groups. The side chains of the amino acids are not involved in the H-bond network shown here. To help illustrate this more clearly:
Turn on Helix_Ribbon and Sidechains
Here you can see that the side chains are not forming any of the hydrogen bonds that are important in maintaining the shape of the helix.
Question 7
In order to get a better sense of how the side chains are distributed along a helix, it is useful to view the helix from a different perspective. Scientists sometimes use a depiction that shows the helix as viewed from above (rather than from the side), which can be done by rotating the helix such that one is looking straight down its center. Turn off Helix_Ribbon and Sidechains and Alpha_Helix Turn off all H_bonds Then, turn on Top_Down_Helix and Top_Down_Cartoon (make sure to re-center and re-zoom)
Now, rotate the Top_Down helix such that you are viewing it from above looking straight down its center. Make sure that the N-terminal end of the helix is facing you. Remember, the amine (-NH) groups point towards the N-terminal end, while the carbonyl groups (-C=O) point towards the C- terminal end.
11 Your screen should look something like this:
Once you have the N-terminal end facing you:
Turn off Top_Down_Helix (but keep the Top_Down_Cartoon visible), and Turn on With_Sidechains
Your helix should now look something like the one shown below. Carbon atoms in the side chain are shown in blue, oxygen in red, and Sulfur in yellow.
Scientists sometimes use a helical wheel diagram to represent a helix viewed from above. Such a diagram is shown below on the right, where the side chains that are projecting from the helix are labeled as R1, R2, etc. starting from the N-terminus.
Question 8
12 The ways in which side chains are positioned along the different faces of an alpha helix can significantly influence the manner in which a helix is positioned relative to other proteins, or other parts of the same protein.
Question 9
Beta Sheets To look at another type of secondary structure, first do the following:
Turn off Top_Down_Cartoon and With_Sidechains Then turn on Sheet_Sticks (zoom out & re-center if necessary)
These strands take on a different shape than those seen in helices. Rather than curling, these strands of peptides form long, straight sheets when aligned with each other. Thus, we simplify them in cartoon mode as beta-strands, which combine to form beta-sheets. Weve illustrated the sheets (and the strands) here in blue.
Turn on B_Sheet
Notice how strands are represented in cartoon mode using long, flattened arrows; the arrows provide directionality for the strands. Since amino acids, by convention, are written from N-terminal end to C- terminal end, the arrows point from the N-terminal to the C-terminal direction of the peptide strands.
Similarly to -helices, the driving force for formation of beta-sheets is the formation of stabilizing hydrogen bonds. In this case, H-bonds form between the backbone NH and C=O groups of neighboring strands (see picture below). Again, certain amino acid sequences favor -sheet formation over helices or loops.
In order for a beta-sheet to form, it needs several flat peptide strands (beta strands) to align together to form a flat sheet. Since aligned strands can run in the same direction or in the opposite direction, we describe beta sheets as being either parallel or antiparallel. The image above is an example of an antiparallel sheet.
Question 10
Question 11: Follow the instructions on your question sheet to take two images of the beta sheet: one of the hydrophobic face and one of the hydrophilic face. 13
Now lets take a look at the protein segments that connect these secondary structure elements.
Turn off Sheet_Sticks and then turn on Loop_Sticks
These segments do not exhibit a uniform structure, as seen in the previous two examples. Rather, they serve simply to connect the adjacent strands or helices together.
Turn on Loops and then turn off Loops_Sticks
Thus, we simplify these as loops, which are depicted as simple ribbons (in this case, yellow).
Turn on C_Term_Sticks and N_Term_Sticks
Note: You will need to scroll down the menu to see the box N_Term_Sticks.
In addition to loops, the N-terminal and C-terminal ends of the peptide strand are often lacking in definite structure as well. These termini are generally depicted in a similar manner as loops, though they dont actually connect anything. Rather, they signify the beginning and end of the protein.
Turn off C_Term_Sticks and N_Term_Sticks Turn on C_Term and N_Term Turn on Helix_Cartoon and Sheet_Sticks
Question 12
And here, we now have our entire protein represented in secondary structure elements, via a cartoon representation.
To see the protein as a single cartoon, turn off all the other elements and turn on Protein as follows:
Turn off all elements by hitting the (all) button at the top; turn on Protein at the bottom. Notice how the secondary motifs combine to generate a unique tertiary structure for the protein.
Figure 3 below is a schematic in which the secondary structural motifs are mapped out along the linear, primary amino acid sequence. The protein begins at the amino-terminus (N-terminus) and concludes at the carboxy-terminus (C-terminus). There are 97 amino acids total in the protein, and every 10 amino acids is ticked off along the length of the primary sequence.
Figure 3: A schematic listing the linear primary amino-acid sequence of the protein annotated with the secondary structure of each region. The numbers on top indicate the amino acid position (there are 97 total amino acids). The green arrows indicate -strands and the red portions represent - helices. The black lines between each secondary structural element are loops. 14 Section 4: Bulk Water
As a final exercise, youll have a chance to get a better sense of what scientists believe water may look like as a solvent.
Close your current viewer window and open the folder titled 4_Bulk_Water and then open the file Water.pml
This particular set of coordinates is not from a solved atomic structure, but rather a simulation of how scientists bulk water is thought to be organized in liquid form at any given instant.
Here we see water represented in sticks form with the hydrogens included (without the hydrogens, the sticks representation looks rather silly as just a bunch of dots where the oxygens go).
Lets view how these molecules primarily interact with each other:
Turn on H_Bonds
As you may recall, the ability of water to hydrogen-bond to itself is one of the reasons it is such an important molecule to sustain life. Hydrogen bonds are similar in nature to normal dipole-dipole interactions, but are considerably stronger.
Atoms are understood to be nuclei surrounded by electron clouds. Thus, it is sometimes helpful to view molecules from a spheres representation, to get a better view of how the molecules fit together.
Turn off Sticks and H_Bonds, turn on Spheres
Notice how closely packed together the molecules are, from this perspective.
Turn Spheres off and turn Sticks and H_Bonds back on
Keep in mind that in liquid form, the molecules are in constant motion, rotating freely around each other. This free motion is associated with the waters entropy. Question 13