Академический Документы
Профессиональный Документы
Культура Документы
Thomas P. Minka
2002
Abstract
This note derives a fast algorithm for maximum-likelihood estimation of both parameters of a Gamma
distribution or negative-binomial distribution.
Introduction
We have observed n independent data points X = [x1 ..xn ] from the same density . We restrict to the class of
Gamma densities, i.e. = (a, b):
p(x|a, b) = Ga(x; a, b) =
x
xa1
exp( )
(a)ba
b
0.14
0.12
0.1
p(x)
0.08
0.06
0.04
0.02
10
x
12
14
16
18
20
Maximum likelihood
The log-likelihood is
log p(D|a, b)
(a 1)
1X
xi
b i
(1)
(2)
(3)
4
4.5
Exact
Approx
Bound
5
5.5
6
0
10
15
20
Figure 2: The log-likelihood (4) versus the Gamma-type approximation (9) and the bound (6) at convergence. The approximation is nearly identical to the true likelihood. The dataset was 100 points sampled
from Ga(7.3, 4.5).
Substituting this into (1) gives
log p(D|a, b)
(4)
(5)
(6)
The maximum is at
0 = nlog x n(a) n log x
+ n(1 + log a0 ) n
(
a) = log x log x
+ log a0
(7)
(8)
where is the digamma function. The iteration proceeds by setting a0 to the current a
, then inverting the
function to get a new a
. Because the log-likelihood is concave, this iteration must converge to the (unique) global
maximum. Unfortunately, it can be quite slow, requiring around 250 iterations if a = 10, less for smaller a, and
more for larger a.
The second algorithm is much faster, and is obtained via generalized Newton [1]. Using an approximation of
the form,
log p(D|a, b)
c0 + c1 a + c2 log(a)
(9)
the update is
1
anew
+ log a (a)
1 log x log x
+
a
a2 (1/a 0 (a))
(10)
This converges in about four iterations. Figure 2 shows that this approximation is very close to the true loglikelihood, which explains the good performance.
A good starting point for the iteration is obtained via the approximation
log (a)
a log(a) a
(a)
log(a)
0.5
log x log x
1
log a + const.
2
(Stirling)
1
2a
(11)
(12)
(13)
2.1
Negative binomial
The maximum-likelihood problem for the negative binomial distribution is quite similar to that for the Gamma.
This is because the negative binomial is a mixture of Poissons, with Gamma mixing distribution:
Z x
Z
a1 /b
Po(x; )Ga(; a, b)d =
e
e
d
(14)
p(x|a, b) =
(a)ba
x!
x
a
b
b
a+x1
=
(15)
1
x
b+1
b+1
Lets consider a slightly generalized negative binomial, where the waiting time for x is given by t:
Z
Z
(t)x t a1 /b
Po(x; t)Ga(; a, b)d =
p(x|t, a, b) =
e
e
d
(a)ba
x!
x
a
bt
bt
a+x1
=
1
x
bt + 1
bt + 1
(16)
(17)
Given a data set D = {(xi , ti )}, we want to estimate (a, b). One approach is to use EM, where the E-step infers
the hidden variable i :
b
bti + 1
E[i ]
(xi + a)
E[log i ]
(xi + a) + log
(18)
b
bti + 1
(19)
1X
E[i ]
b i
(20)
References
[1] Thomas P. Minka. Beyond newtons method. research.microsoft.com/~minka/papers/newton.html,
2000.