Академический Документы
Профессиональный Документы
Культура Документы
f (y|θ ) = θ (1 − θ )y , y = 0, 1, 2 . . .
I Let us consider a RV Y. We denote a RS of size n from Y as
y1 , . . . , yn , i.e. independent realisations.
y : = y1 , y2 , . . . , yn
∂ log(Ln (θ |y))
1. = 0 ⇒ θ̂ = . . .
∂θ
∂2 log(Ln (θ |y))
2. Check |θ =θ̂ < 0.
∂θ 2
Tn (Y): Maximum likelihood estimator.
∑ni=1 yi ∑n Y
For example, θ̂ = ⇒ Tn ( Y ) = i = 1 i
n n
3
L3 (θ |y1 , y2 , y3 ) = ∏ f (yi |θ ).
i=1
3
1 yi
L3 (θ |y1 , y2 , y3 ) =
θ6 ∏ yi e − θ ,
i=1
3
yi
log(L3 (θ |y1 , y2 , y3 )) = −6 log θ + ∑ log yi − .
i=1
θ
∑3i=1 yi
Solving this equation, we get θ̂ = and θ̂ = 63.
6
n
1
Ln (µ|y) = (2 π ) −n/2
∏ exp − 2 (yi − µ)2
i=1
!
n
1
= (2 π ) −n/2
exp −
2 ∑ ( yi − µ ) 2
.
i=1
n
n 1
log Ln (µ|y) = −
2
log(2 π ) −
2 ∑ ( yi − µ ) 2 .
i=1
∂ log Ln (µ|y) n
1 n
∂µ
= ∑ (yi − µ) = 0 ⇒ µ̂ = n ∑ yi .
i=1 i=1
(a)
Check for maximum
hence maximum.
1 n
(b) Tn (Y) = ∑ Yi .
n i=1
D1 Score Statistic
Proof.
∂ log(Ln (θ |Y))
E(D1 ) = E
∂θ
Z
∂
= log(f (y|θ )) f (y|θ ) dy
Y ∂θ
!
1
Z
∂
= f (y|θ ) f (y|θ ) dy
Y f (y|θ ) ∂θ
Z
∂
= f (y|θ ) dy
∂θ Y
∂
= 1 = 0.
∂θ
2
∂ log(Ln (θ |Y))
E(−D2 ) = E −
∂θ 2
2
∂ ∂ ∂
= −E log f ( Y | θ ) = − E log f ( Y | θ )
∂θ 2 ∂θ ∂θ
∂ 1 ∂
= −E f (Y | θ )
∂θ f (Y|θ ) ∂θ
!2
Z
1 ∂ 2 1 ∂
= − f (y| θ ) − f (y|θ ) f (y|θ ) dy
Y f (y|θ ) ∂θ 2 f (y|θ ) ∂θ
!2
1
Z
∂
= f (y| θ ) f (y|θ ) dy
Y f (y|θ ) ∂θ
!
∂ log(Ln (θ |Y)) 2 ∂ log(Ln (θ |Y)) 2
Z
= f (y|θ ) dy = E
Y ∂θ ∂θ
2
26 / 97 = E(D 1 )2.
Unit . Likelihood Theory ACTL30004
Theorem 1
Suppose that Tn∗ (Y) is an unbiased estimator of a scalar parameter
θ associated with a random variable Y. The minimum possible value
for Var(Tn∗ (Y)) is
1 1
Var(Tn∗ (Y)) ≥ = .
I (θ ) ∂2 log(Ln (θ |Y))
E −
∂θ 2
Now the right hand side of the above expression can be simplified
using
!
1
Z
∂
E(Tn∗ (Y) D1 ) = Tn∗ (Y) f (y| θ ) f (y|θ ) dy
Y f (y|θ ) ∂θ
Z
∂
= Tn∗ (Y) f (y|θ ) dy
Y ∂θ
Z
∂
= T ∗ (Y) f (y|θ ) dy
∂θ Y n
∂
= θ = 1.
∂θ
1
Var(Tn∗ (Y)) ≥ .
E(D21 )
1
Var(Tn∗ (Y)) =
∂2 log(L n ( θ |Y))
E −
∂θ 2
We therefore have
∂ log(Ln (θ |Y))
D1 = = a(θ ) (Tn∗ (Y) − θ ).
∂θ
∂ log(Ln (θ |Y)) 1
D1 = = (T ∗ (Y ) − θ ).
∂θ Var(Tn∗ (Y)) n
From this final line, the only solution to D1 = 0 is θ̂ = Tn∗ (Y) and
hence Tn∗ (Y) is the maximum likelihood estimator of θ.
f (y | θ ) = θ y (1 − θ )1−y .
n n
Ln (θ |y) = θ ∑i=1 yi (1 − θ )n−∑i=1 yi
n ȳ n (1 − ȳ)
=
θ̂ 1 − θ̂
θ̂ = ȳ
Checking that we have found a maximum, we find
∂2 log Ln (θ ) 1 n
n − ∑ni=1 yi
∂θ 2
= −
θ2 ∑ yi − (1 − θ )2
,
i=1
θ 2 i∑
E − = E Yi +
dθ 2 =1 (1 − θ )2
1 1
= 2
nθ + (n − n θ )
θ (1 − θ )2
n n
= +
θ 1−θ
∂2 log(Ln (θ |Y)) n (1 − θ ) + n θ
E − =
∂θ 2 θ (1 − θ )
n −nθ +nθ n
= = .
θ (1 − θ ) θ (1 − θ )
θ (1 − θ )
⇒ CRLB for variance of an unbiased estimator of θ is ,
n
the same as the answer from (b).
38 / 97 Unit 2. Likelihood Theory ACTL30004
Theorem 3
For large n,
1
Tn (Y) ' N
θ0 , 2
∂ log(Ln (θ |Y))
E ∂θ | θ0
Proof.
See the Appendix.
n
log(Ln (µ|y)) = ∑ (−µ + yi log µ − log yi !)
i=1
n n
= −n µ + log µ ∑ yi − ∑ log yi !.
i=1 i=1
n
n Tn (Y) = n ∑i=n1
Yi
= ∑ni=1 Yi .
-6000
-6500
-7000
-7500
-100
-150
-200
2 4 6 8 10 θ
∂2 log Ln (θ |y)
Observed information ≡ − .
∂θ 2
1
(i(θ0 ))−1 is the covariance matrix where i(θ0 ) has (j, k)th
n
element given by
∂2
E − log f (Y1 |θ ) , with j, k = 1, . . . , p.
∂θj ∂θk
√ 1 n
yi − µ
2
log(Ln (µ, σ )|y) = −n log(σ 2 π) −
2 ∑ σ
.
i=1
∂ log(Ln (µ, σ )) n ∑n (y − µ )2
= − + i=1 3i .
∂σ σ σ
which gives
n
1
σ̂2 =
n ∑ (yi − ȳ)2 .
i=1
We now provide a test procedure for choosing one and only one of
the models based on the likelihood ratio test statistic.
H0 : θ ∈ Θ0 Vs. H1 : θ ∈ Θ1 = Θ − Θ0 .
log(Ln (θ0 |Y)) ≈ log(Ln (θ̂n |Y)) + (θ0 − θ̂n )> a(θ̂n )
1
+ (θ0 − θ̂n )> b(θ̂n ) (θ0 − θ̂n ),
2
2 log Rn ≈ χ2p .
Note that in the last step we have used that if Z ∼ Nr (0, Σ), then
Use the likelihood ratio test to test the null hypothesis that Ex-
ponential distribution with rate parameter λ is appropriate against
the alternative hypothesis that Gamma distribution is appropriate.
Perform your statistical test at the 5% significance level.
β α α −1 − β y
f (y|α, β) = y e , with β, α > 0 and y > 0
Γ(α)
and where
Z ∞
Γ (t) = xt−1 e−x dx.
0
x(x − 1) . . . (x − k + 1) Γ (x + 1)
x
= = .
k k! Γ (k + 1) Γ (x − k + 1)
1 ω
= and r = ω.
1+β θ+ω
We get
ω y
y+ω−1
ω θ
f (y|θ, ω ) = y = 0, 1, . . . ω, θ > 0.
y θ+ω θ+ω
As β → 0, ω → ∞ and so we write
θy (y + ω − 1)!ω ω
lim (f (y|θ, ω )) = lim
ω →∞ y! ω → ∞ ( ω − 1 ) ! ( θ + ω ) ω +y
!
θy (y + ω − 1) ! θ −ω
= lim 1+
y! ω →∞ (ω − 1)!(θ + ω )y ω
e− θ θ y
=
y!
(a) Fit the Poisson and negative binomial models to the above
data using the method of the maximum likelihood.
(b) Compare the fit of the two models using the likelihood ratio
test. Which model is preferable?
65 / 97 Unit 2. Likelihood Theory ACTL30004
Solution to Exercise 2.7 (a)
Poisson model:
θ̂ = 0.144220.
The LRT for comparing the Poisson and NB models gives a test
statistic of
∂ log Ln (θ |y)
∂θ1 =0
.. ..
.. is difficult to solve.
. . .
∂ log Ln (θ |y)
=0
∂θp
g ( x ) ≈ g ( x0 ) + ( x − x0 ) g 0 ( x0 ) .
Setting this to 0, replacing x with x1 and solving for x1 , we have
0 ≈ g(x0 ) + (x − x0 ) g0 (x0 )
g(x0 )
⇒ x1 = x0 − 0 .
g ( x0 )
g ( x0 )
x1 = x0 −
g0 (x0 )
1.52 − 2
= 1.5 − = 1.41667.
2 × 1.5
Hence after four iterations, our estimate of the solution to the equa-
tion x2 = 2 is x = 1.41421.
gi : Rn → R, i = 1, 2, . . . , n.
gi (x) = 0, i = 1, 2, . . . , n
where x = (x1 , x2 , . . . , xn ).
x = (x1 , x2 , . . . , xn ).
>
∂f ∂f ∂f
f (x) = f (x0 ) + , ..., (x − x0 )
∂x1 ∂x2 ∂xn
2 2
∂ f ∂ f ∂2 f
∂x21 ∂x1 ∂x2 ... ∂x1 ∂xn
∂2 f ∂2 f ∂2 f
...
1 >
∂x2 x1 ∂x22 ∂x2 ∂xn
+ (x − x0 ) .. .. ..
(x − x0 ) + . . .
2 ..
. . . .
∂2 f ∂2 f ∂2 f
∂xn x1 ∂xn x2 ... ∂x2n
2x1 3x22
Jf = 1 .
x1 0
gi : Rn → R, i = 1, 2, . . . , n.
gi (x) = 0, i = 1, 2, . . . , n
where x = (x1 , x2 , . . . , xn ).
We will write down n separate first order Taylor series approximations
to the equations, i.e. one for each
gi (x) = 0 for i = 1, 2, . . . , n.
We will expand the Taylor series about a vector of initial estimates
x0 = (x01 , . . . , x0n ).
>
∂gi ∂gi ∂gi
gi (x) ≈ gi (x0 ) + , ,..., (x − x0 ).
∂x1 ∂x2 ∂xn
>
∂g1 ∂g1 ∂g1
g1 (x) ≈ g1 (x0 ) + , ,..., (x − x0 )
∂x1 ∂x2 ∂xn
>
∂g2 ∂g2 ∂g2
g2 (x) ≈ g2 (x0 ) + , ,..., (x − x0 )
∂x1 ∂x2 ∂xn
.. .
. ≈ ..
>
∂gn ∂gn ∂gn
gn ( x ) ≈ gn ( x0 ) + , ,..., (x − x0 )
∂x1 ∂x2 ∂xn
Finally setting the left hand side equal to 0 and replacing x with x1
we have
and then
10 −1
Jg (x0 ) = .
2 10
On substitution, we have
−1
5 10 −1 4
x1 = −
1 2 10 2
5 1 10 1 4
= −
1 102 −2 10 2
4.588
=
0.882
(to 3 decimal places).
! −1
∂2 log L(θ |y)
!
∂ log L(θ |y)
θ1 = θ0 − | θ0 | θ0 where
∂θ∂θ > ∂θ
∂2 log L(θ |y)
− |θ0 is the observed information matrix at θ0 and
∂θ∂θ >
∂ log L(θ |y)
|θ0 is the score vector at θ0 .
∂θ
90 / 97 Unit 2. Likelihood Theory ACTL30004
Fisher–Scoring (Modification)
In practice the observed information is replaced by expected Fisher’s
information
" #!−1
∂2 log L(θ |y)
!
∂ log L(θ |y)
θ1 = θ0 − E | θ0 | θ0 or,
∂θ∂θ > ∂θ
" #!−1
∂2 log L(θ |y)
!
∂ log L(θ |y)
θ1 = θ0 + E − | θ0 | θ0 .
∂θ∂θ > ∂θ
This algorithm for finding the solution of the system of normal equa-
tions equal to 0 is called Fisher–Scoring.
91 / 97 Unit 2. Likelihood Theory ACTL30004
Exercise 10
Consider the following Poisson model:
Yi ∼ Poisson(λi = α + β zi ).
y 29 37 32 33 26 30 24 28 20 21
z 1 2 3 4 5 6 7 8 9 10
∂ log(L(α, β|y)) 10
yi
∂α
= ∑ −1 + α + β zi and
i=1
∂ log(L(α, β|y)) 10
yi zi
∂β
= ∑ −zi + α + β zi .
i=1
∂ log(Ln (α,β|y))
!
α1 α0 −1
= + (I (α0 , β 0 )) ∂α
∂ log(Ln (α,β|y)) .
β1 β0 ∂β |(α0 ,β 0 )
y=c(29,37,32,33,26,30,24,28,20,21)
z=c(1,2,3,4,5,6,7,8,9,10)
initial=lm(y~z)
initial
Call:
lm(formula = y ~ z)
Coefficients:
(Intercept) z
35.800 -1.418