You are on page 1of 24

Scoring Matrices

How did they get the values for

Look at 71 groups of protein sequences
where the proteins in each group are at
least 85% similar (Why these groups?)
Compute relative mutability of each amino
acid probability of change
From relative mutability, compute
mutability probability for each amino acid
pair X,Y probability that X will change to Y
over a certain evolutionary time
Normalize the mutability probability for
each pair to a value between 0 and 1

Computing Relative Mutability A Measure of

the Likelihood that an Amino Acid Will Mutate
For each amino acid
Changes (p) = number of times the amino acid
changed into something else
exposure to mutation =
(percentage occurrence of the amino acid in the
group of sequences being analyzed) * (frequency of
amino acids changes in the group based on the
phylogenetic tree)
relative mutability =
(changes/exposure to mutation) / 100

Mutability Probability Between Amino Acid Pairs

For each pair of amino acids X and Y:

r = relative mutability of X
c = num times X becomes Y or vice versa
p = num changes involving X
mutability probability of X to Y =
(r * c) / p

Computing Relative Mutability of A:

changes = # times A changes into something else = 4
% occurrence of A in group = 10 / 63 = 0.159
frequency of all amino acid changes in group = 6 * 2 = 12
(Note: Count changes backwards and forwards.)
exposure to mutation = (% occurrence of A in group)
* (frequency of all amino acid changes in group)
= 12 * 0.159
relative mutability = (changes / exposure to mutation) / 100
= (4 / (12 * 0.159)) = 2.09 / 100 = 0.0209

Divide this value by 100 to give us PAM 1, where were modeling

1 substitution per 100 residues.
Example from Fundamental Concepts of Bioinformatics by Krane
and Raymer.

Computing Mutability Probability that

A will change to G:
r = relative mutability of A = .0209
c = num times A becomes G or vice versa
p = num changes involving A = 4
mutability probability of A to G =
(r * c) / p = (0.0209 * 3) / 4 = 0.0156

Mutability Probability, X to Y
For each Y among all amino acids,
compute mutability probability of X to Y
as described above
Get a total of these 20 probabilities.
Divide them by a normalizing factor such
that the probability that X will NOT
change is 99% and the sum of
probabilities that it will change to any
other amino acid is 1%

Mutability Probabilities to
Log Odds Score for X to Y

Compute the relative frequency of change for X to

Y as follows:
Get the X to Y mutability probability
Divide by the % frequency of X in the sequence
Convert to log base 10, multiply by 10

In our example, we get log10(0.0156/0.1587) =

To compute log10(.098) solve for x:
10x = 0.098

x = -1.01

Compute log odds score for Y to X

Usefulness of Log Odds Scores

A score of 0 indicates that the change
from one amino acid to another is what is
expected by chance
A negative score means that the change is
probably due to chance
A positive score means that the change is
more than expected by chance
Because the scores are in log form, they
can be added (i.e., the chance that X will
change to Y and then Y to Z)

Disadvantages of PAM
circularity in the analysis
The original PAM-1
matrix was based on a limited number
representative of all protein families
The Markov model does not take into
account that multi-step mutations
should be treated differently from
single-step ones