Академический Документы
Профессиональный Документы
Культура Документы
Bioinformatics
Homework - 4
Using Web Application Of
ProtParam, PeptideCutter,
Compute pI / Mw, TMHMM,
SignalP
GEBZE TECHNICAL UNIVERSITY
EBRU AKHARMAN
142204026
22.12.2017
22.12.2017
~3~
22.12.2017
First of all, it is found that the given protein sequence belongs to which organism and what it does. To find this,
the protein sequence is identified by the Blast aid. The Blastp algorithm is selected for this operation. The
following steps are followed.
STEP 1:
The blastp algorithm is selected in the blast database and the protein sequence is pasted to the TextBox. Then
click "BLAST" option.
STEP 3:
At the end of the blast search, the protein with the lowest "E Value" and the highest "Ident" score is selected.
Putative serine protease with signal anchor [Ixodes scapularis] is the most suitable protein.
~4~
22.12.2017
What does this protein sequence do?
Trypsin is a serine protease from the PA clan superfamily, found in the digestive system of many vertebrates,
where it hydrolyses proteins.Trypsin is formed in the small intestine when its proenzyme form,
the trypsinogen produced by the pancreas, is activated. Trypsin cleaves peptide chains mainly at
the carboxyl side of the amino acids lysine or arginine, except when either is followed by proline. It is used for
numerous biotechnological processes. The process is commonly referred to as
trypsin proteolysis or trypsinisation, and proteins that have been digested/treated with trypsin are said to
have been trypsinized.
In the duodenum, trypsin catalyzes the hydrolysis of peptide bonds, breaking down proteins into smaller
peptides. The peptide products are then further hydrolyzed into amino acids via other proteases, rendering
them available for absorption into the blood stream. Tryptic digestion is a necessary step in protein absorption
as proteins are generally too large to be absorbed through the lining of the small intestine.
Trypsin is produced as the inactive zymogen trypsinogen in the pancreas. When the pancreas is stimulated
by cholecystokinin, it is then secreted into the first part of the small intestine (the duodenum) via
the pancreatic duct. Once in the small intestine, the enzyme enteropeptidase activates trypsinogen into trypsin
by proteolytic cleavage. Auto catalysis does not happen with trypsin, as trypsinogen is a poor substrate,
therefore enzymatic damage to the pancreas is avoided.
The enzymatic mechanism is similar to that of other serine proteases. These enzymes contain a catalytic
triad consisting of histidine-57, aspartate-102, and serine-195. These three residues form a charge relay that
increases nucleophilicity of the active site serine. This is achieved by modifying the electrostatic environment
of the serine. The enzymatic reaction that trypsin catalyzes is thermodynamicallyfavorable but requires
significant activation energy (it is "kinetically unfavorable"). In addition, trypsin contains an "oxyanion hole"
formed by the backbone amide hydrogen atoms of Gly-193 and Ser-195, which serves to stabilize the
developing negative charge on the carbonyl oxygen atom of the cleaved amides.
The aspartate residue (Asp 189) located in the catalytic pocket (S1) of trypsin is responsible for attracting and
stabilizing positively charged lysine and/or arginine, and is, thus, responsible for the specificity of the enzyme.
This means that trypsin predominantly cleaves proteins at the carboxyl side (or "C-terminal side") of the amino
acids lysine and arginine except when either is bound to a C-terminal proline, although large-scale mass
spectrometry data suggest cleavage occurs even with proline. Trypsin is considered an endopeptidase, i.e., the
cleavage occurs within the polypeptide chain rather than at the terminal amino acids located at the ends
of polypeptides.
By examining this protein with different web applications?
PeptideCutter:
The necessary arrangements are made to cut the protein. Tripsin and chymotrypsin are selected. The lowest
cleavage probability as 80 % is adjusted. Then "Perform" button is clicked. This application is for chemotripsin.
The PeptideCutter web application calculated the length of the protein sequence and separated each 10 amino
acids.
~6~
22.12.2017
The name of the enzyme that cuts the protein is the number and position of the cut-off points is replaced in
area. In addition, the probabilities of the cutting points are also specified.
The Image 8 displays sequentially all cleavage sites in the sequence and the respective cleaving enzymes from
the N- to the C-terminus. In addition to this, a list of peptides resulting from the digestion is provided, the
respective lengths in amino acids and the molecular weight in Daltons. The interrupted protein is given in the
amino acid sequence. The position of the cut-off area is also included.
~7~
22.12.2017
“ ” This signprotein separates one of every 10 amino acids. It is thus easier to follow the position number of
the amino acids. “ ” This sign indicates the every cleavage sites of the protein. Percentage probability and
cleavage rate are specified in each cleavage area. There is a cleavage site between this amino acid and the
neighbouring amino acid in C-terminal direction.
~8~
22.12.2017
The name of the enzyme that cuts the protein is the number and position of the cut-off points is replaced in
area. In addition, the probabilities of the cutting points are also specified.
The Image 8 displays sequentially all cleavage sites in the sequence and the respective cleaving enzymes from
the N- to the C-terminus. In addition to this, a list of peptides resulting from the digestion is provided, the
respective lengths in amino acids and the molecular weight in Daltons. The interrupted protein is given in the
amino acid sequence. The position of the cut-off area is also included.
~9~
22.12.2017
~ 10 ~
22.12.2017
“ ” This signprotein separates one of every 10 amino acids. It is thus easier to follow the position number of
the amino acids. “ ” This sign indicates the every cleavage sites of the protein. Percentage probability and
cleavage rate are specified in each cleavage area. There is a cleavage site between this amino acid and the
neighbouring amino acid in C-terminal direction.
Compute pI / Mw:
The protein sequence is attached to the specified TextBox. Then "Click here to Compute pI / Mw" button is
clicked without changing any parameters.
~ 11 ~
22.12.2017
Protein pI is calculated using pK values of amino acids described in Bjellqvist et al., which were defined by
examining polypeptide migration between pH 4.5 to 7.3 in an immobilised pH gradient gel environment with
9.2M and 9.8M urea at 15°C or 25°C. Prediction of protein pI for highly basic proteins is yet to be studied and it
is possible that current Compute pI/Mw predictions may not be adequate for this purpose. The buffer capacity
of a protein will affect the accuracy of its predicted pI, with poor buffer capacity leading to greater error in
prediction. Because of this, pI predictions for small proteins can be problematic. Protein Mw is calculated by the
addition of average isotopic masses of amino acids in the protein and the average isotopic mass of one water
molecule. Molecular weight values are given in Dalton (Da). This program does not account for the effects of
post-translational modifications, thus modified proteins on a 2-D gel may migrate to a position quite different
to that predicted. Protein glycosylation in particular can affect protein migration in both pI and Mw dimensions.
On the result page, the isoelectric point and the molecular weight of the protein sequence are indicated. In
addition, the Protein sequence is divided into 10 amino acid groups. This makes it easier to follow the sequence
length. According to the results obtained, the length of the protein sequence is 392 amino acids, the molecular
weight of protein is 43307.49 Da, the isoelectric point of protein is 4.50.
ProtParam:
~ 12 ~
22.12.2017
The protein sequence is attached to the specified TextBox. Then "Compute Parameters " button is clicked
without changing any parameters.
The protein sequence is divided into 10 amino acid groups. This makes it easier to follow the sequence length.
The result of this ProtParam contains information on the total number of amino acids, the molecular weight of
the protein, the theoretical isoelectric point and the ratios of the amino acids to the protein. The amino acid
number is 392, the protein molecular weight is 43307.49, the theoretical isoelectric point is 4.50. This result
also indicates the number of negatively and positively charged residues. Total number of negatively charged
residues are 60, total number of positively charged residues are 29.
~ 13 ~
22.12.2017
This ProtParam result also gives information about the atomic composition, the chemical formula of the protein
and the number of atoms of protein. The extinction coefficient indicates how much light a protein absorbs at a
certain wavelength. The protein absorbance value is reached by the wave length calculation and the protein
concentration can be calculated in this way. This information is important for biochemical experiments. The
half-life is a prediction that takes about half the amount of protein in a cell to disappear after its synthesis in the
cell. This results, half - life is different in different organisms with the same protein. The estimated half-life is 30
hours (mammalian reticulocytes, in vitro), 20 hours (yeast, in vivo), 10 hours (Escherichia coli, in vivo). The
instability index provides an estimate of the stability of your protein in a test tube. A protein whose instability
index is smaller than 40 is predicted as stable, a value above 40 predicts that the protein may be unstable. Our
protein has an instability index of 45.44. In this case we say that this protein is unstable. The aliphatic index of a
protein is defined as the relative volume occupied by aliphatic side chains (alanine, valine, isoleucine, and
leucine). It may be regarded as a positive factor for the increase of thermostability of globular proteins. If the
aliphatic index is 80% and above it is high in thermostability. Our protein has an aliphatic index of 71.71. In this
case the thermostability of this protein is low.
~ 14 ~
22.12.2017
TMHMM:
The eukaryotic protein sequences 1 and 2 and the prokaryotic protein sequences 1 and 2 are pasted to the
TextBox area. Then, "Submit" button is clicked. This process is repeated for all sequences.
~ 15 ~
22.12.2017
~ 16 ~
22.12.2017
The length of the protein sequence is 375. Exp number of AAs in TMHs is 151.67665. Exp number of AAs in
TMHs is larger than 18 it is very likely to be a transmembrane protein. Exp number, first 60 AAs is 1.90502
because of this number is larger than 1, predicted transmembrane helix in the N-term could be a signal peptide.
Red line means transmembrane protein, blue line inside, pink line means outside. This protein has to 7
transmembrane helices.
~ 17 ~
22.12.2017
The length of the protein sequence is 726. Exp number of AAs in TMHs is 44.6023299999999999. Exp number
of AAs in TMHs is larger than 18 it is very likely to be a transmembrane protein. Exp number, first 60 AAs is
21.58614 because of this number is larger than 1, predicted transmembrane helix in the N-term could be a
signal peptide. Red line means transmembrane protein, blue line inside, pink line means outside. This protein
has to 2 transmembrane helices.
~ 18 ~
22.12.2017
The length of the protein sequence is 388. Exp number of AAs in TMHs is 207.19207. Exp number of AAs in
TMHs is larger than 18 it is very likely to be a transmembrane protein. Exp number, first 60 AAs is 28.93546
because of this number is larger than 1, predicted transmembrane helix in the N-term could be a signal peptide.
Red line means transmembrane protein, blue line inside, pink line means outside. This protein has to 9
transmembrane helices.
~ 19 ~
22.12.2017
SignalP:
The eukaryotic protein sequences and the prokaryotic protein sequences are pasted to the TextBox area. Then,
"Submit" button is clicked. This process is repeated for all sequences. If the sequence belongs to a eukaryotic
organism, the eukaryotic option is marked. If the sequence belongs to a prokaryotic organism, the option is
marked according to gram negative or gram positive. The selected protein sequence belongs to a prokaryotic
and gram-negative organism.
~ 20 ~
22.12.2017
On the horizontal axis of the signal graphic are 10 amino acids separated and scored on the vertical axis. There
are 3 different scoring types in the vertical axis. These scores are C-score, Y-score and S-score.
C-score (raw cleavage site score): The output from the CS networks, which are trained to distinguish signal
peptide cleavage sites from everything else. The C-score is trained to be high at the position immediately after
the cleavage site (the first residue in the mature protein).
S-score (signal peptide score): The output from the SP networks, which are trained to distinguish positions
within signal peptides from positions in the mature part of the proteins and from proteins without signal
peptides.
Y-score (combined cleavage site score): A combination (geometric average) of the C-score and the slope of the
S-score, resulting in a better cleavage site prediction than the raw C-score alone. This is due to the fact that
multiple high-peaking C-scores can be found in one sequence, where only one is the true cleavage site. The Y-
score distinguishes between C-score peaks by choosing the one where the slope of the S-score is steep. If the C
score is rising while the S score is decreasing in the graph, then there is a signal peptide in that area.
~ 21 ~
22.12.2017
The eukaryotic protein sequences and the prokaryotic protein sequences are pasted to the TextBox area. Then,
"Submit" button is clicked. This process is repeated for all sequences. If the sequence belongs to a eukaryotic
organism, the eukaryotic option is marked. If the sequence belongs to a prokaryotic organism, the option is
marked according to gram negative or gram positive. The selected protein sequence belongs to a eukaryotic
organism.
~ 22 ~
22.12.2017
On the horizontal axis of the signal graphic are 10 amino acids separated and scored on the vertical axis. There
are 3 different scoring types in the vertical axis. These scores are C-score, Y-score and S-score.
C-score (raw cleavage site score): The output from the CS networks, which are trained to distinguish signal
peptide cleavage sites from everything else. The C-score is trained to be high at the position immediately after
the cleavage site (the first residue in the mature protein).
S-score (signal peptide score): The output from the SP networks, which are trained to distinguish positions
within signal peptides from positions in the mature part of the proteins and from proteins without signal
peptides. Y-score (combined cleavage site score): A combination (geometric average) of the C-score and the
slope of the S-score, resulting in a better cleavage site prediction than the raw C-score alone. This is due to the
fact that multiple high-peaking C-scores can be found in one sequence, where only one is the true cleavage site.
The Y-score distinguishes between C-score peaks by choosing the one where the slope of the S-score is steep. C
score is not rising and the S score is not decreasing in the graph, then there is not a signal peptide in that area.
The C, S, and Y scores show a straight line.
https://link.springer.com/referenceworkentry/10.1007%2F978-3-642-11274-4_819
http://www.cbs.dtu.dk/services/SignalP/
https://web.expasy.org/protparam/
http://web.expasy.org/peptide_cutter/
https://web.expasy.org/compute_pi/
http://www.cbs.dtu.dk/services/TMHMM/
~ 24 ~