Академический Документы
Профессиональный Документы
Культура Документы
Estimation
Peter N
Robinson
Estimating
Parameters
from Data
Parameter Estimation
Maximum
Likelihood
(ML)
Estimation
ML vs. MAP
Beta
distribution
Peter N Robinson
Maximum a
posteriori
(MAP)
Estimation
MAQ
In many situations in bioinformatics, we want to estimate optimal parameters from data. In the examples we have seen in
the lectures on variant calling, these parameters might be the
error rate for reads, the proportion of a certain genotype, the
proportion of nonreference bases etc. However, the hello world
example for this sort of thing is the coin toss, so we will start
with that.
Coin toss
Parameter
Estimation
Peter N
Robinson
Estimating
Parameters
from Data
Maximum
Likelihood
(ML)
Estimation
Beta
distribution
Maximum a
posteriori
(MAP)
Estimation
MAQ
Lets say we have two coins that are each tossed 10 times
Coin 1: H,T,T,H,H,H,T,H,T,T
Coin 2: T,T,T,H,T,T,T,H,T,T
Intuitively, we might guess that coin one is a fair coin, i.e.,
P(X = H) = 0.5, and that coin 2 is biased, i.e.,
P(X = H) 6= 0.5
MAQ
p(X = 1; ) =
(1)
n
Y
p(xi ; )
(3)
i=1
3
10
2
7
10
8
(4)
Beta
distribution
Maximum a
posteriori
(MAP)
Estimation
MAQ
n
Y
i=1
p(xi ; ) =
n
Y
xi (1 )1xi
(5)
i=1
Log likelihood
Parameter
Estimation
Peter N
Robinson
Estimating
Parameters
from Data
Maximum
Likelihood
(ML)
Estimation
Beta
distribution
we maximize this
log p(X; ) = log
n
Y
xi (1 )1xi
i=1
Maximum a
posteriori
(MAP)
Estimation
MAQ
(6)
n
X
log xi (1 )1xi
i=1
n
X
i=1
n
X
i=1
Log likelihood
Parameter
Estimation
Peter N
Robinson
Estimating
Parameters
from Data
Maximum
Likelihood
(ML)
Estimation
Beta
distribution
Maximum a
posteriori
(MAP)
Estimation
MAQ
Note that one often denotes the log likelihood function with
the symbol L = log p(X; ).
A function f defined on a subset of the real numbers with real
values is called monotonic (also monotonically increasing, increasing or non-decreasing), if for all x and y such that x y
one has f (x) f (y )
Thus, the monotonicity of the log function guarantees that
argmax p(X; ) = argmax log p(X; )
(7)
ML estimate
Parameter
Estimation
Peter N
Robinson
Estimating
Parameters
from Data
Maximum
Likelihood
(ML)
Estimation
Beta
distribution
Maximum a
posteriori
(MAP)
Estimation
MAQ
n
X
i=1
(8)
ML estimate
Parameter
Estimation
Peter N
Robinson
Estimating
Parameters
from Data
Maximum
Likelihood
(ML)
Estimation
Beta
distribution
Maximum a
posteriori
(MAP)
Estimation
MAQ
Thus
n
X
log p(X; ) =
[xi log + (1 xi ) log(1 )]
i=1
n
X
xi
i=1
log +
(1 xi )
log(1 )
i=1
n
n
1X
1 X
xi
(1 xi )
1
i=1
i=1
ML estimate
Parameter
Estimation
Peter N
Robinson
Estimating
Parameters
from Data
Maximum
Likelihood
(ML)
Estimation
Beta
distribution
Maximum a
posteriori
(MAP)
Estimation
MAQ
0 =
1
=
1
1 =
1
=
ML =
1X
1
xi
1
i=1
Pn
(1 xi )
i=1
Pn
xi
Pn i=1
1
Pni=1 1
i=1 xi
n
Pn
i=1 xi
n
1X
xi
n
log p(X; ) = 0:
n
X
(1 xi )
i=1
i=1
Estimating
Parameters
from Data
Maximum
Likelihood
(ML)
Estimation
Beta
distribution
Maximum a
posteriori
(MAP)
Estimation
MAQ
p(|X) =
p(X|)p()
p(X)
(9)
Thus, Bayes law converts our prior belief about the parameter
(before seeing data) into a posterior probability, p(|X), by
using the likelihood function p(X|). The maximum
a-posteriori (MAP) estimate is defined as
(10)
Estimating
Parameters
from Data
Maximum
Likelihood
(ML)
Estimation
Beta
distribution
Maximum a
posteriori
(MAP)
Estimation
MAQ
MAP
= argmax p(|X)
p(X|)p()
p(X)
= argmax p(X|)p()
= argmax
To take a simple example of a situation in which MAP estimation might produce better results than ML estimation, let us
consider a statistician who wants to predict the outcome of the
next election in the USA.
The statistician is able to gather data on party preferences
by asking people he meets at the Wall Street Golf Club1
which party they plan on voting for in the next election
Maximum
Likelihood
(ML)
Estimation
Beta
distribution
Maximum a
posteriori
(MAP)
Estimation
MAQ
Beta
distribution
Maximum a
posteriori
(MAP)
Estimation
MAQ
Maximum
Likelihood
(ML)
Estimation
beta distribution
1
p() =
1 (1 )1
B(, )
2.5
Estimating
Parameters
from Data
MAQ
B(, ) =
( + )
() ()
1.5
0.5
Maximum a
posteriori
(MAP)
Estimation
2.0
Beta
distribution
==0.5
=5, =1
=1, =3
==2
=2, =5
1.0
Peter N
Robinson
0.0
Parameter
Estimation
0.0
0.2
0.4
0.6
0.8
1.0
Random variables are either discrete (i.e., they can assume one
of a list of values, like the Bernoulli with heads/tails) or continuous (i.e., they can take on any numerical value in a certain
interval, like the Beta distribution with ).
A probability density function (pdf) of a continuous
random variable, is a function that describes the relative
likelihood for this random variable to take on a given
value, i.e., p() : R R+ such that
Z
Pr( (a, b)) =
p()d
(11)
a
The probability that the value of lies between a and b is given by integrating the pdf over this region
(12)
(13)
Peter N
Robinson
=7 =3
Maximum a
posteriori
(MAP)
Estimation
MAQ
2.0
1.5
1.0
0.5
Beta
distribution
0.0
Maximum
Likelihood
(ML)
Estimation
2.5
Estimating
Parameters
from Data
0.0
0.2
0.4
0.6
0.8
1.0
1
+2
71
7+32
= 0.75
Beta
distribution
Maximum a
posteriori
(MAP)
Estimation
MAQ
Pr(|X) =
Pr(X|)Pr()
Pr(X)
(14)
We then have
MAP
= argmax Pr(|X)
Pr(X|)Pr()
Pr(X)
= argmax Pr(X|)Pr()
= argmax
= argmax
Y
xi X
Pr(xi |)Pr()
= argmax log
= argmax
Pr(xi |) Pr()
xi X
X
xi X
Maximum
Likelihood
(ML)
Estimation
Beta
distribution
Maximum a
posteriori
(MAP)
Estimation
posterior
Likelihood (Bernoulli)
MAQ
prior (Beta)
Estimating
Parameters
from Data
Maximum
Likelihood
(ML)
Estimation
Beta
distribution
Maximum a
posteriori
(MAP)
Estimation
MAQ
thus
Pr(|X) Pr(X|)Pr()
is equivalent to
Pr(|X)
(
Y
i
)
Bernoulli(xi |)
Beta(|, )
(15)
Furthermore
Peter N
Robinson
Estimating
Parameters
from Data
Maximum
Likelihood
(ML)
Estimation
L = log Pr(|X)
(
)
Y
= log
Bernoulli(xi |) Beta(|, )
i
Beta
distribution
Maximum a
posteriori
(MAP)
Estimation
MAQ
We solve for
MAP = argmax L as follows
argmax
Estimating
Parameters
from Data
Maximum
Likelihood
(ML)
Estimation
Beta
distribution
Maximum a
posteriori
(MAP)
Estimation
MAQ
L =
log Bernoulli(xi |) +
log Beta(|, )
1
i
i=1
see slide 11
i=1
log Beta(|, )
=
=
=
=
=
( + )
log
1 (1 )1
() ()
( + )
log
+
log 1 (1 )1
() ()
log 1 (1 )1
0+
( 1)
log + ( 1)
(1 )
1
1
To find
MAP , we now set
Peter N
Robinson
Estimating
Parameters
from Data
Maximum
Likelihood
(ML)
Estimation
MAQ
n
1X
xi
i=1
n
1 X
1
1
(1 xi ) +
1 i=1
and thus
"
Beta
distribution
Maximum a
posteriori
(MAP)
Estimation
n
X
(1 xi ) + 1
"
=
(1 )
i=1
n
X
X
(1 xi ) +
xi + 1 + 1
xi + 1
"
xi + 1
"
i=1
#
X
n
X
i=1
#
1++2
X
i
xi + 1
xi = nr
(17)
Then,
"
Beta
distribution
Maximum a
posteriori
(MAP)
Estimation
n
X
#
1++2
i=1
xi + 1
[n + + 2]
nR + 1
MAQ
and finally
MAP =
nR + 1
n++2
(18)
Estimating
Parameters
from Data
Maximum
Likelihood
(ML)
Estimation
Recall that in our poll of 100 members of the Wall Street Golf
club, only seven said they would vote democratic. Thus
Peter N
Robinson
Beta
distribution
n = 100
Maximum a
posteriori
(MAP)
Estimation
nr = 93
MAQ
nR
ML
MAP
100
100
100
100
100
93
93
93
93
93
1
5
100
1000
10000
1
5
100
1000
10000
0.93
0.93
0.93
0.93
0.93
0.93
0.90
0.64
0.52
0.502
Thus, MAP pulls the estimate towards the prior to an extent that depends on the strength of the prior
k 2n
Beta
distribution
Maximum a
posteriori
(MAP)
Estimation
MAQ
g =
argmax
p(g |D)
g (ha,ai,ha,bi,hb,bi)
MAQ
> dbinom(k,n,0.5) * r
where k is the number of non-reference bases, n is the total number of bases, and 0.5 corresponds to the
probability of seeing a non-ref base given that the true genotype is heterozygous.
MAQ
> dbinom(k,n,0.5) * r
where k is the number of non-reference bases, n is the total number of bases, and 0.5 corresponds to the
probability of seeing a non-ref base given that the true genotype is heterozygous.
0.4
0.4
0.8
0.8
r=0.2
10
k
15
20
0.0
Maximum
Likelihood
(ML)
Estimation
0.0
Estimating
Parameters
from Data
r=0.001
10
15
20
Beta
distribution
Maximum a
posteriori
(MAP)
Estimation
MAQ
Maximum
Likelihood
(ML)
Estimation
Beta
distribution
Maximum a
posteriori
(MAP)
Estimation
MAQ