Вы находитесь на странице: 1из 24

2017

Bioinformatics
Homework - 4
Using Web Application Of
ProtParam, PeptideCutter,
Compute pI / Mw, TMHMM,
SignalP
GEBZE TECHNICAL UNIVERSITY

EBRU AKHARMAN
142204026
22.12.2017
22.12.2017

Using Web Application Of ProtParam, PeptideCutter,


Compute pI / Mw, TMHMM, SignalP
HW4
Ebru AKHARMAN - 142204026, Gebze Technical University, Turkey
AIM:
Worldwide, plant protein contributes substantially as a food resource because it contains essential amino
acids for meeting human physiological requirements. However, many versatile plant proteins are used as
medicinal agents as they are produced by using molecular tools of biotechnology. Proteins can be obtained
from plants, animals and microorganism cells. The abundant economical proteins can be obtained from plant
seeds.The following characteristics are unique to each protein: Amino acid composition, sequence, subunit
structures, size, shape, net charge, isoelectric point, solubility, heat stability and hydrophobicity. Knowing the
physical and chemical properties of proteins makes a great contribution to the work done.The purpose of this
assignment is to determine the properties of the protein or nucleotide sequence with web applications such as
ProtParam, PeptideCutter, Compute pI / Mw, TMHMM, SignalP.
INTRODUCTION:

Compute pI / Mw: possibility to enter start and end position in two


boxes. By default (i.e. if you leave the two boxes
Compute pI / Mw is a tool that calculates the empty) the complete sequence will be analyzed.
isoelectric point and moleculer weight of an input
sequence. The sequence can be input in the FASTA The parameters computed by ProtParam include
format, the output is the pI and molecular weight the molecular weight, theoretical pI, amino acid
for the entire lenght of the sequence. composition, atomic composition, extinction
coefficient, estimated half-life, instability index,
The isoelectric point is the pH at which a aliphatic index and grand average of hydropathicity
molecule or surface carries no net electrical charge. (GRAVY).
Many molecules are zwitterions, containing both
positive and negative charges. The net charge on Expressed in this form, the extinction coefficient
the molecule is governed by the pH of the allows for estimation of the molar concentration of
surrounding medium. The pI is the pH value at a solution from its measured absorbance. (A / ε =
which the molecule carries no net electrical charge molar concentration) Extinction coefficients for
and thus the negative and positive charges are proteins are generally reported with respect to an
equal.
absorbance measured at or near a wavelength of
Molecular weight, indicates how many Daltons 280 nm. Although the absorption maxima for
the protein or nucleotide sequence. certain proteins may be at other wavelengths, 280
nm is favored because proteins absorb strongly
ProtParam:
there while other substances commonly in protein
ProtParam computes various physico-chemical solutions do not.
properties that can be deduced from a protein
sequence. No additional information is required The half-life is a prediction of the time it takes for
about the protein under consideration. The protein half of the amount of protein in a cell to disappear
can either be specified as a Swiss-Prot/TrEMBL after its synthesis in the cell. ProtParam relies on
accession number or ID, or in form of a raw the "N-end rule", which relates the half-life of a
sequence. White space and numbers are ignored. If protein to the identity of its N-terminal residue; the
you provide the accession number of a Swiss- prediction is given for 3 model organisms (human,
Prot/TrEMBL entry, you will be prompted with an
yeast and E.coli). The N-end rule originated from
intermediary page that allows you to select the
portion of the sequence on which you would like to the observations that the identity of the N-terminal
perform the analysis. The choice includes a residue of a protein plays an important role in
selection of mature chains or peptides and domains determining its stability in vivo. The rule was
from the Swiss-Prot feature table (which can be established from experiments that explored the
chosen by clicking on the positions), as well as the metabolic fate of artificial beta-galactosidase
~2~
22.12.2017
proteins with different N-terminal amino acids names or sequentially according to the amino acid
engineered by site-directed mutagenesis. The beta- number. A third option for output is a map of
gal proteins thus designed have strikingly different cleavage sites. The sequence and the cleavage sites
half-lives in vivo, from more than 100 hours to less mapped onto it are grouped in blocks, the size of
than 2 minutes, depending on the nature of the which can be chosen by the user to provide a
amino acid at the amino terminus and on the convenient form of print-out.
experimental model (yeast in vivo; mammalian
reticulocytes in vitro, Escherichia coli in vivo). In All or a group of favourite enzymes and chemicals
addition, it has been shown that in eukaryotes, the can be selected. Most of the cleavage rules
association of a destabilizing N-terminal residue for individual enzymes were deduced from
and of an internal lysine targets the protein to specificity data summed up by Keil (1992).
ubiquitin-mediated proteolytic degradation. Note Only in few cases, specificity data allowed the
that the program gives an estimation of the protein establishment of more sophisticated models, e.g. for
half-life and is not applicable for N-terminally trypsin or chymotrypsin. In this case, the cleavage
modified proteins. probability of the individual sites is added to the
results.
The instability index provides an estimate of the
stability of your protein in a test tube. Statistical TMHMM:
analysis of 12 unstable and 32 stable proteins has This server is for prediction of transmembrane
revealed that there are certain dipeptides, the helices in proteins. The program takes proteins
occurence of which is significantly different in the in FASTA format. It recognizes the 20 amino acids.
unstable proteins compared with those in the stable A transmembrane protein (TP) is a type
ones. The authors of this method have assigned a of integral membrane protein that spans the
weight value of instability to each of the 400 entirety of the biological membrane to which it is
different dipeptides (DIWV). A protein whose permanently attached. Many transmembrane
instability index is smaller than 40 is predicted as proteins function as gateways to permit the
stable, a value above 40 predicts that the protein transport of specific substances across the
may be unstable. biological membrane. They frequently undergo
The aliphatic index of a protein is defined as the significant conformational changes to move a
relative volume occupied by aliphatic side chains substance through the membrane.
(alanine, valine, isoleucine, and leucine). It may be Transmembrane proteins are polytopic proteins
regarded as a positive factor for the increase of that aggregate and precipitate in water. They
thermostability of globular proteins. require detergents or nonpolar solvents for
The GRAVY value for a peptide or protein is extraction, although some of them (beta-barrels)
calculated as the sum of hydropathy values of all can be also extracted using denaturing agents.
the amino acids, divided by the number of residues SignalP:
in the sequence.
SignalP server predicts the presence and location of
PeptideCutter:
signal peptide cleavage sites in amino acid
PeptideCutter searches a protein sequence from sequences from different organisms: Gram-positive
the SWISS-PROT and/or TrEMBL databases or a prokaryotes, Gram-negative prokaryotes, and
user-entered protein sequence for protease eukaryotes. The method incorporates a prediction
cleavage sites. Single proteases and chemicals, a of cleavage sites and a signal peptide/non-signal
selection or the whole list of proteases and peptide prediction based on a combination of
chemicals can be used. Different forms of output of several artificial neural Networks .
the results are available: Tables of cleavage sites
either grouped alphabetically according to enzyme

~3~
22.12.2017

Image 1: Protein Sequence

First of all, it is found that the given protein sequence belongs to which organism and what it does. To find this,
the protein sequence is identified by the Blast aid. The Blastp algorithm is selected for this operation. The
following steps are followed.

STEP 1:

Image 2: Blast Page

The blastp algorithm is selected in the blast database and the protein sequence is pasted to the TextBox. Then
click "BLAST" option.

STEP 3:

Image 3: Blast Results

At the end of the blast search, the protein with the lowest "E Value" and the highest "Ident" score is selected.
Putative serine protease with signal anchor [Ixodes scapularis] is the most suitable protein.

~4~
22.12.2017
What does this protein sequence do?
Trypsin is a serine protease from the PA clan superfamily, found in the digestive system of many vertebrates,
where it hydrolyses proteins.Trypsin is formed in the small intestine when its proenzyme form,
the trypsinogen produced by the pancreas, is activated. Trypsin cleaves peptide chains mainly at
the carboxyl side of the amino acids lysine or arginine, except when either is followed by proline. It is used for
numerous biotechnological processes. The process is commonly referred to as
trypsin proteolysis or trypsinisation, and proteins that have been digested/treated with trypsin are said to
have been trypsinized.
In the duodenum, trypsin catalyzes the hydrolysis of peptide bonds, breaking down proteins into smaller
peptides. The peptide products are then further hydrolyzed into amino acids via other proteases, rendering
them available for absorption into the blood stream. Tryptic digestion is a necessary step in protein absorption
as proteins are generally too large to be absorbed through the lining of the small intestine.
Trypsin is produced as the inactive zymogen trypsinogen in the pancreas. When the pancreas is stimulated
by cholecystokinin, it is then secreted into the first part of the small intestine (the duodenum) via
the pancreatic duct. Once in the small intestine, the enzyme enteropeptidase activates trypsinogen into trypsin
by proteolytic cleavage. Auto catalysis does not happen with trypsin, as trypsinogen is a poor substrate,
therefore enzymatic damage to the pancreas is avoided.
The enzymatic mechanism is similar to that of other serine proteases. These enzymes contain a catalytic
triad consisting of histidine-57, aspartate-102, and serine-195. These three residues form a charge relay that
increases nucleophilicity of the active site serine. This is achieved by modifying the electrostatic environment
of the serine. The enzymatic reaction that trypsin catalyzes is thermodynamicallyfavorable but requires
significant activation energy (it is "kinetically unfavorable"). In addition, trypsin contains an "oxyanion hole"
formed by the backbone amide hydrogen atoms of Gly-193 and Ser-195, which serves to stabilize the
developing negative charge on the carbonyl oxygen atom of the cleaved amides.
The aspartate residue (Asp 189) located in the catalytic pocket (S1) of trypsin is responsible for attracting and
stabilizing positively charged lysine and/or arginine, and is, thus, responsible for the specificity of the enzyme.
This means that trypsin predominantly cleaves proteins at the carboxyl side (or "C-terminal side") of the amino
acids lysine and arginine except when either is bound to a C-terminal proline, although large-scale mass
spectrometry data suggest cleavage occurs even with proline. Trypsin is considered an endopeptidase, i.e., the
cleavage occurs within the polypeptide chain rather than at the terminal amino acids located at the ends
of polypeptides.
By examining this protein with different web applications?

PeptideCutter:

Image 4: Peptide Cutter Web Application

The protein sequence is pasted to the TextBox.


~5~
22.12.2017

Image 5: Required Arrangements

The necessary arrangements are made to cut the protein. Tripsin and chymotrypsin are selected. The lowest
cleavage probability as 80 % is adjusted. Then "Perform" button is clicked. This application is for chemotripsin.

Image 6: Lenght of Protein Sequence

The PeptideCutter web application calculated the length of the protein sequence and separated each 10 amino
acids.

~6~
22.12.2017

Image 7: The Cleavage Sites For Chemotrypsin

The name of the enzyme that cuts the protein is the number and position of the cut-off points is replaced in
area. In addition, the probabilities of the cutting points are also specified.

Image 8: Other Results

The Image 8 displays sequentially all cleavage sites in the sequence and the respective cleaving enzymes from
the N- to the C-terminus. In addition to this, a list of peptides resulting from the digestion is provided, the
respective lengths in amino acids and the molecular weight in Daltons. The interrupted protein is given in the
amino acid sequence. The position of the cut-off area is also included.

Image 9: Peptide Maps of the Protein with Chemotrypsin

~7~
22.12.2017

Image 10: Peptide Maps of the Protein with Chemotrypsin

Image 11: Peptide Maps of the Protein with Chemotrypsin

“ ” This signprotein separates one of every 10 amino acids. It is thus easier to follow the position number of
the amino acids. “ ” This sign indicates the every cleavage sites of the protein. Percentage probability and
cleavage rate are specified in each cleavage area. There is a cleavage site between this amino acid and the
neighbouring amino acid in C-terminal direction.

~8~
22.12.2017

Image 12: The Cleavage Sites For Trypsin

The name of the enzyme that cuts the protein is the number and position of the cut-off points is replaced in
area. In addition, the probabilities of the cutting points are also specified.

Image 13: Other Results For Trypsin

The Image 8 displays sequentially all cleavage sites in the sequence and the respective cleaving enzymes from
the N- to the C-terminus. In addition to this, a list of peptides resulting from the digestion is provided, the
respective lengths in amino acids and the molecular weight in Daltons. The interrupted protein is given in the
amino acid sequence. The position of the cut-off area is also included.

~9~
22.12.2017

Image 14: Peptide Maps of the Protein with Trypsin

Image 15: Peptide Maps of the Protein with Trypsin

~ 10 ~
22.12.2017

Image 16: Peptide Maps of the Protein with Trypsin

“ ” This signprotein separates one of every 10 amino acids. It is thus easier to follow the position number of
the amino acids. “ ” This sign indicates the every cleavage sites of the protein. Percentage probability and
cleavage rate are specified in each cleavage area. There is a cleavage site between this amino acid and the
neighbouring amino acid in C-terminal direction.

Compute pI / Mw:

Image 17: Compute pI / Mw Application

The protein sequence is attached to the specified TextBox. Then "Click here to Compute pI / Mw" button is
clicked without changing any parameters.

~ 11 ~
22.12.2017

Image 18: The Result of Compute pI / Mw

Protein pI is calculated using pK values of amino acids described in Bjellqvist et al., which were defined by
examining polypeptide migration between pH 4.5 to 7.3 in an immobilised pH gradient gel environment with
9.2M and 9.8M urea at 15°C or 25°C. Prediction of protein pI for highly basic proteins is yet to be studied and it
is possible that current Compute pI/Mw predictions may not be adequate for this purpose. The buffer capacity
of a protein will affect the accuracy of its predicted pI, with poor buffer capacity leading to greater error in
prediction. Because of this, pI predictions for small proteins can be problematic. Protein Mw is calculated by the
addition of average isotopic masses of amino acids in the protein and the average isotopic mass of one water
molecule. Molecular weight values are given in Dalton (Da). This program does not account for the effects of
post-translational modifications, thus modified proteins on a 2-D gel may migrate to a position quite different
to that predicted. Protein glycosylation in particular can affect protein migration in both pI and Mw dimensions.
On the result page, the isoelectric point and the molecular weight of the protein sequence are indicated. In
addition, the Protein sequence is divided into 10 amino acid groups. This makes it easier to follow the sequence
length. According to the results obtained, the length of the protein sequence is 392 amino acids, the molecular
weight of protein is 43307.49 Da, the isoelectric point of protein is 4.50.

ProtParam:

Image 19: ProtParam Web Application

~ 12 ~
22.12.2017
The protein sequence is attached to the specified TextBox. Then "Compute Parameters " button is clicked
without changing any parameters.

Image 20: The Result 1 of ProtParam

The protein sequence is divided into 10 amino acid groups. This makes it easier to follow the sequence length.

Image 21: The Result 2 of ProtParam

The result of this ProtParam contains information on the total number of amino acids, the molecular weight of
the protein, the theoretical isoelectric point and the ratios of the amino acids to the protein. The amino acid
number is 392, the protein molecular weight is 43307.49, the theoretical isoelectric point is 4.50. This result
also indicates the number of negatively and positively charged residues. Total number of negatively charged
residues are 60, total number of positively charged residues are 29.

~ 13 ~
22.12.2017

Image 23: The Result 3 of ProtParam

This ProtParam result also gives information about the atomic composition, the chemical formula of the protein
and the number of atoms of protein. The extinction coefficient indicates how much light a protein absorbs at a
certain wavelength. The protein absorbance value is reached by the wave length calculation and the protein
concentration can be calculated in this way. This information is important for biochemical experiments. The
half-life is a prediction that takes about half the amount of protein in a cell to disappear after its synthesis in the
cell. This results, half - life is different in different organisms with the same protein. The estimated half-life is 30
hours (mammalian reticulocytes, in vitro), 20 hours (yeast, in vivo), 10 hours (Escherichia coli, in vivo). The
instability index provides an estimate of the stability of your protein in a test tube. A protein whose instability
index is smaller than 40 is predicted as stable, a value above 40 predicts that the protein may be unstable. Our
protein has an instability index of 45.44. In this case we say that this protein is unstable. The aliphatic index of a
protein is defined as the relative volume occupied by aliphatic side chains (alanine, valine, isoleucine, and
leucine). It may be regarded as a positive factor for the increase of thermostability of globular proteins. If the
aliphatic index is 80% and above it is high in thermostability. Our protein has an aliphatic index of 71.71. In this
case the thermostability of this protein is low.

~ 14 ~
22.12.2017
TMHMM:

Image 24: Eukaryotic Protein Sequence 1

Image 25: TMHMM Web Application

The eukaryotic protein sequences 1 and 2 and the prokaryotic protein sequences 1 and 2 are pasted to the
TextBox area. Then, "Submit" button is clicked. This process is repeated for all sequences.

~ 15 ~
22.12.2017

This field shows the length of


the "TMhelix", length of the
"inside", length of the "outside".

Image 26: The Result of Eukaryotic Protein Sequence 1

The first few lines gives some statistics:

 Length: the length of the protein sequence.


 Number of predicted TMHs: The number of predicted transmembrane helices.
 Exp number of AAs in TMHs: The expected number of amino acids in transmembrane helices.
 Exp number, first 60 AAs: The expected number of amino acids in transmembrane helices in the first 60
amino acids of the protein.
 Total prob of N-in: The total probability that the N-term is on the cytoplasmic side of the membrane.
 POSSIBLE N-term signal sequence: a warning that is produced when "Exp number, first 60 AAs" is
larger than 10.
The length of the protein sequence is 770. Exp number of AAs in TMHs is 22.72525. Exp number of AAs in
TMHs is larger than 18 it is very likely to be a transmembrane protein. Exp number, first 60 AAs is 0.0027
because of this number is smaller than 1, predicted transmembrane helix in the N-term could not be a signal
peptide. Red line means transmembrane protein, blue line inside, pink line means outside. This protein has to
one transmembrane helices.

~ 16 ~
22.12.2017

Image 27: Image 24: Eukaryotic Protein Sequence 2

This field shows the length of


the "TMhelix", length of the
"inside", length of the
"outside".

Image 28: The Result of Eukaryotic Protein Sequence 2

The length of the protein sequence is 375. Exp number of AAs in TMHs is 151.67665. Exp number of AAs in
TMHs is larger than 18 it is very likely to be a transmembrane protein. Exp number, first 60 AAs is 1.90502
because of this number is larger than 1, predicted transmembrane helix in the N-term could be a signal peptide.
Red line means transmembrane protein, blue line inside, pink line means outside. This protein has to 7
transmembrane helices.

~ 17 ~
22.12.2017

Image 29: Prokaryotic Protein Sequence 1

This field shows the length


of the "TMhelix", length of
the "inside", length of the
"outside".

Image 30: The Result of Prokaryotic Protein Sequence 1

The length of the protein sequence is 726. Exp number of AAs in TMHs is 44.6023299999999999. Exp number
of AAs in TMHs is larger than 18 it is very likely to be a transmembrane protein. Exp number, first 60 AAs is
21.58614 because of this number is larger than 1, predicted transmembrane helix in the N-term could be a
signal peptide. Red line means transmembrane protein, blue line inside, pink line means outside. This protein
has to 2 transmembrane helices.

~ 18 ~
22.12.2017

Image 31: Prokaryotic Protein Sequence 2

This field shows the length


of the "TMhelix", length of
the "inside", length of the
"outside".

Image 32: The Result of Prokaryotic Protein Sequence 2

The length of the protein sequence is 388. Exp number of AAs in TMHs is 207.19207. Exp number of AAs in
TMHs is larger than 18 it is very likely to be a transmembrane protein. Exp number, first 60 AAs is 28.93546
because of this number is larger than 1, predicted transmembrane helix in the N-term could be a signal peptide.
Red line means transmembrane protein, blue line inside, pink line means outside. This protein has to 9
transmembrane helices.

~ 19 ~
22.12.2017
SignalP:

Image 33: Prokaryotic Protein Sequence for SignalP Web Application

Image 34: SignalP Page

The eukaryotic protein sequences and the prokaryotic protein sequences are pasted to the TextBox area. Then,
"Submit" button is clicked. This process is repeated for all sequences. If the sequence belongs to a eukaryotic
organism, the eukaryotic option is marked. If the sequence belongs to a prokaryotic organism, the option is
marked according to gram negative or gram positive. The selected protein sequence belongs to a prokaryotic
and gram-negative organism.

~ 20 ~
22.12.2017

Image 35: Signal Graph for Prokaryotic Protein Sequence

On the horizontal axis of the signal graphic are 10 amino acids separated and scored on the vertical axis. There
are 3 different scoring types in the vertical axis. These scores are C-score, Y-score and S-score.

C-score (raw cleavage site score): The output from the CS networks, which are trained to distinguish signal
peptide cleavage sites from everything else. The C-score is trained to be high at the position immediately after
the cleavage site (the first residue in the mature protein).

S-score (signal peptide score): The output from the SP networks, which are trained to distinguish positions
within signal peptides from positions in the mature part of the proteins and from proteins without signal
peptides.

Y-score (combined cleavage site score): A combination (geometric average) of the C-score and the slope of the
S-score, resulting in a better cleavage site prediction than the raw C-score alone. This is due to the fact that
multiple high-peaking C-scores can be found in one sequence, where only one is the true cleavage site. The Y-
score distinguishes between C-score peaks by choosing the one where the slope of the S-score is steep. If the C
score is rising while the S score is decreasing in the graph, then there is a signal peptide in that area.

There is cleavage site in


between 23 and 24
positions and there is 1
Image 35: The Results Of This Graph signal peptide.

~ 21 ~
22.12.2017

Image 36: Eukaryotic Protein Sequence for SignalP Web Application

Image 37: SignalP Page

The eukaryotic protein sequences and the prokaryotic protein sequences are pasted to the TextBox area. Then,
"Submit" button is clicked. This process is repeated for all sequences. If the sequence belongs to a eukaryotic
organism, the eukaryotic option is marked. If the sequence belongs to a prokaryotic organism, the option is
marked according to gram negative or gram positive. The selected protein sequence belongs to a eukaryotic
organism.

~ 22 ~
22.12.2017

Image 38: Signal Graph for Eukaryotic Protein Sequence

On the horizontal axis of the signal graphic are 10 amino acids separated and scored on the vertical axis. There
are 3 different scoring types in the vertical axis. These scores are C-score, Y-score and S-score.

C-score (raw cleavage site score): The output from the CS networks, which are trained to distinguish signal
peptide cleavage sites from everything else. The C-score is trained to be high at the position immediately after
the cleavage site (the first residue in the mature protein).

S-score (signal peptide score): The output from the SP networks, which are trained to distinguish positions
within signal peptides from positions in the mature part of the proteins and from proteins without signal
peptides. Y-score (combined cleavage site score): A combination (geometric average) of the C-score and the
slope of the S-score, resulting in a better cleavage site prediction than the raw C-score alone. This is due to the
fact that multiple high-peaking C-scores can be found in one sequence, where only one is the true cleavage site.
The Y-score distinguishes between C-score peaks by choosing the one where the slope of the S-score is steep. C
score is not rising and the S score is not decreasing in the graph, then there is not a signal peptide in that area.
The C, S, and Y scores show a straight line.

There is not cleavage site


Image 39: The Results Of This Graph and there is not signal
peptide.
~ 23 ~
22.12.2017
RESOURCES:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3841988/

https://link.springer.com/referenceworkentry/10.1007%2F978-3-642-11274-4_819

http://www.cbs.dtu.dk/services/SignalP/

https://web.expasy.org/protparam/

http://web.expasy.org/peptide_cutter/

https://web.expasy.org/compute_pi/

http://www.cbs.dtu.dk/services/TMHMM/

~ 24 ~

Вам также может понравиться