Вы находитесь на странице: 1из 97

ACTL30004

Unit 2. Likelihood Theory

1 / 97 Unit 2. Likelihood Theory ACTL30004


Objectives
I Advantages of using maximum likelihood estimation (MLE).
I Estimate parameters of statistical models using MLE.
I Understand the difference between an estimate and an
estimator.
I Understand the concept of minimum variance for an unbiased
estimator of a parameter (CRLB).
I Analyze when an unbiased estimator coincides with the ML
estimator.
I Be able to use Theorems 2.1, 2.2 and 2.3.
I Provide the asymptotic distribution of the ML estimator.
I Interpret the graphical meaning of second derivative of the
log-likelihood.
I Approximate the expected Fisher’s information.

2 / 97 Unit 2. Likelihood Theory ACTL30004


Objectives

I Provide the asymptotic distribution of the ML estimator for


multivariate case.
I Apply the likelihood ratio test for nested models.
I Perform MLE with Excel.
I Understand how Fisher Scoring algorithm works.
I Apply Newton–Raphson approximation for single variable prob-
lems.
I Apply Newton–Raphson approximation for multivariable prob-
lems.
I Modify Newton–Raphson algorithm to obtain Fisher Scoring
algorithm.

3 / 97 Unit 2. Likelihood Theory ACTL30004


Introduction

I Likelihood theory is mostly used as a method for estimating


unknown parameters associated to a random variable.

I Review of materials learnt in MAST20005.

I New Material: Fisher Scoring algorithm.

I Set the foundations for the analysis of GLM’s.

4 / 97 Unit 2. Likelihood Theory ACTL30004


Maximum Likelihood Estimation

I Consider y as a value taken on by a RV Y.

I Use y in drawing conclusions on the unknown CDF F(·|θ ) of Y


that depends on the unknown quantity θ ∈ Θ ⊆ R.

I Quantity θ is called the parameter and the set Θ is the pa-


rameter space.

I Class F(·|θ ) is called the parametric statistical model.

5 / 97 Unit 2. Likelihood Theory ACTL30004


Estimates and Estimators
I An estimate of θ, i.e. θ̂, is a value close to θ.

I Choice which best explains why the phenomenom under study


has produced the observed data y.

I Usually estimate ≡ point estimate are equivalent.

I General criterion for constructing estimates that associates an


element of θ ∈ Θ with every realisation of the random variable
y ∈ Y.

I It defines a function Tn (Y) which is called an estimator of θ.

I Tn (Y) is a RV (or transformation of a set of RV’s).

6 / 97 Unit 2. Likelihood Theory ACTL30004


Estimates and Estimators

I pdf (or pmf) associated with a continuous (or discrete) RV Y


will be denoted as f (y|θ ).

I Y ∼ G eo(θ ), its pmf is given by

f (y|θ ) = θ (1 − θ )y , y = 0, 1, 2 . . .
I Let us consider a RV Y. We denote a RS of size n from Y as
y1 , . . . , yn , i.e. independent realisations.

I Capital letters: random variables

I Lower case letters: sample values.

7 / 97 Unit 2. Likelihood Theory ACTL30004


Maximum Likelihood Estimates
I Likelihood function is defined in terms of a RS from a RV Y
with unknown parameter θ.

I Values in the RS are assumed to be drawn from the same RV,


say Y, and they are assumed to be independent of each other.

I Equivalent to say that we have a RS of size n obtained from


RV’s Y1 , . . . , Yn where the variables are independent and iden-
tically distributed (iid).

I Likelihood function is the product of the pdf ( or pmf) is given


by
n
Ln (θ |y1 , . . . , yn ) = ∏ f (yi |θ ), based on n and θ.
i=1

8 / 97 Unit 2. Likelihood Theory ACTL30004


Maximum Likelihood Estimates
I Since Ln (θ |y1 , . . . , yn ) is non–negative ∀ θ ∈ Θ, in likelihood
theory we use the log–likelihood function
!
n n
`(θ ) = log(Ln (θ |y1 , . . . , yn )) = log ∏ f ( yi | θ ) = ∑ log(f (yi |θ )).
i=1 i=1

Definition: Given a RS of size n from a RV Y with a single unknown


parameter θ ∈ Θ, a maximum likelihood estimate (MLE) of θ is an
element θ̂ which attains the maximum value of Ln (θ |y1 , . . . , yn ) in
Θ, i.e. such that

Ln (θ̂ |y1 , . . . , yn ) = max Ln (θ |y1 , . . . , yn )


θ ∈Θ

As log–likelihood function is a monotonic transformation of the like-


lihood function this definition is equivalent to

`(θ̂ ) = max `(θ ).


θ ∈Θ

9 / 97 Unit 2. Likelihood Theory ACTL30004


Maximum Likelihood Estimates
I ML estimation is only one method of estimation to estimate
the unknown parameter θ.
I ML estimation chooses the value of θ, θ̂, such that
Ln (θ̂ |y1 , . . . , yn ) > Ln (θ |y1 , . . . , yn ) ∀ θ ∈ Θ.
I MLE may not exist.
I MLE is not unique.
I The likelihood function has to be maximized in the space Θ
specified by the statistical model, not over the set of the ad-
missible values of θ.
I In general, ML estimator has no closed–form expression.
I If so, MLE has to be obtained numerically for the observed
sample. In real applications, this aspect is very relevant, and
gives rise to numerical methods. In this Unit we will consider
the Fisher–Scoring algorithm.

10 / 97 Unit 2. Likelihood Theory ACTL30004


Maximum Likelihood Estimation – Notation

y : = y1 , y2 , . . . , yn

∂ log(Ln (θ |y))
1. = 0 ⇒ θ̂ = . . .
∂θ
∂2 log(Ln (θ |y))
2. Check |θ =θ̂ < 0.
∂θ 2
Tn (Y): Maximum likelihood estimator.

Replace y1 , . . . , yn in the MLE θ̂ by RV’s Y1 , . . . , Yn .

∑ni=1 yi ∑n Y
For example, θ̂ = ⇒ Tn ( Y ) = i = 1 i
n n

11 / 97 Unit 2. Likelihood Theory ACTL30004


Exercise 1
A certain type of electronic component has a lifetime Y (in hours)
with pdf given by
(  y
1
2 y e− θ , if y > 0
f (y| θ ) = θ
0, otherwise.

As defined above, we use θ̂ to denote MLE of θ.

Suppose that three such components, tested independently, had


lifetimes of 120, 130 and 128 hours. Based on this sample, find θ̂.

12 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 1
The likelihood function is Ln (θ |y1 , . . . , yn ) = ∏ni=1 f (yi |θ ). For our
sample of size three we can write this as:

3
L3 (θ |y1 , y2 , y3 ) = ∏ f (yi |θ ).
i=1
3
1 yi
L3 (θ |y1 , y2 , y3 ) =
θ6 ∏ yi e − θ ,
i=1

where y1 = 120, y2 = 130 and y3 = 128.


To find MLE, θ̂, we maximise this likelihood function by varying θ.
Take the log-likelihood function,

log(Ln (θ |y1 , y2 , y3 )).

13 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 1

3 
yi 
log(L3 (θ |y1 , y2 , y3 )) = −6 log θ + ∑ log yi − .
i=1
θ

Maximising this expression by setting the first derivative equal to


zero, and replacing θ by θ̂, we have
3
6 1
− + 2
θ̂ θ̂
∑ yi = 0.
i=1

∑3i=1 yi
Solving this equation, we get θ̂ = and θ̂ = 63.
6

14 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 1

Check that θ̂ corresponds to a maximum by observing the sign of


the second derivative of the log-likelihood function at θ = θ̂.

∂2 log(L3 (θ |y1 , y2 , y3 )) 6 2 · 378


= −
∂θ 2 θ̂ 2 θ̂ 3
= −0.001511716 < 0.

which is clearly negative when θ is 63.

15 / 97 Unit 2. Likelihood Theory ACTL30004


Exercise 2
Suppose that we have a random sample of size n from
Y ∼ N (µ, 1). We write this sample as y1 , . . . , yn .
(a) Find the MLE of µ.
(b) Write down an expression for Tn (Y), the ML estimator
of µ based on a sample of size n.

16 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 2 (a)
 
1 1 2
f (y| µ ) = √ exp − (y − µ) .
2π 2

n  
1
Ln (µ|y) = (2 π ) −n/2
∏ exp − 2 (yi − µ)2
i=1
!
n
1
= (2 π ) −n/2
exp −
2 ∑ ( yi − µ ) 2
.
i=1

n
n 1
log Ln (µ|y) = −
2
log(2 π ) −
2 ∑ ( yi − µ ) 2 .
i=1

∂ log Ln (µ|y) n
1 n

∂µ
= ∑ (yi − µ) = 0 ⇒ µ̂ = n ∑ yi .
i=1 i=1

17 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 2 (a) and (b)

(a)
Check for maximum

∂ log2 (Ln (µ|y))


= −n
∂µ2

hence maximum.
1 n
(b) Tn (Y) = ∑ Yi .
n i=1

18 / 97 Unit 2. Likelihood Theory ACTL30004


Notation

We adopt the following notation:

∂ log(Ln (θ |y)) ∂2 log(Ln (θ |y))


d1 = and d2 =
∂θ ∂θ 2
d1 and d2 are functions of θ, then it will be useful to have RV’s that
relate to d1 and d2 .
These RV’s are obtained by replacing the sample values in d1 and
d2 by RV’s. The resulting expressions are denoted as D1 and D2 .

D1 Score Statistic

E[−D2 ] expected Fisher’s information (Information function).

19 / 97 Unit 2. Likelihood Theory ACTL30004


Score Statistic

D1 is a function of θ and it is denoted as U (θ ). From the definition


of score statistic we have
∂ log(Ln (θ |Y))
U (θ ) =
∂θ
where sample values y1 , . . . , yn have been replaced by RV’s Y1 , . . . , Yn
It will be useful to define

∂ log(L1 (θ |Yj )) ∂ log(f (Yj |θ ))


Uj (θ ) = = .
∂θ ∂θ
Clearly, U (θ ) = ∑nj=1 Uj (θ ).

20 / 97 Unit 2. Likelihood Theory ACTL30004


Expected Fisher’s Information

The expected Fisher’s information or information function is denoted


as I (θ ). We have
 2
∂ log(Ln (θ |Y))

I (θ ) = E −
∂θ 2
where before the expectation is taken, sample values y1 , . . . , yn have
been replaced by Y1 , . . . , Yn .

21 / 97 Unit 2. Likelihood Theory ACTL30004


Notation

I Tn (Y) is defined as the ML estimator for θ.

I Tn∗ (Y) denotes a general estimator of θ (e.g. derived using the


method of moments or percentiles or even MLE).

I We say that Tn∗ (Y) is an unbiased estimator of a parameter θ


if E[Tn∗ (Y)] = θ.

22 / 97 Unit 2. Likelihood Theory ACTL30004


Lemma 1
From the definition given above we have
!
n
∂ log(Ln (θ |Y)) ∂ ∂
D1 =
∂θ
=
∂θ
log ∏ f ( Yi | θ ) =
∂θ
log (f (Y|θ ))
i=1

where Y denotes n iid random variables, then E(D1 ) = 0, that is


the expected value of the score statistic is 0.

23 / 97 Unit 2. Likelihood Theory ACTL30004


Proof of Lemma 2.1

Proof.

∂ log(Ln (θ |Y))
 
E(D1 ) = E
∂θ
Z

= log(f (y|θ )) f (y|θ ) dy
Y ∂θ
!
1
Z

= f (y|θ ) f (y|θ ) dy
Y f (y|θ ) ∂θ
Z

= f (y|θ ) dy
∂θ Y

= 1 = 0.
∂θ

24 / 97 Unit 2. Likelihood Theory ACTL30004


Lemma 2
The expected Fisher’s information can be written as
2 !
∂2 log(Ln (θ |Y)) ∂ log(Ln (θ |Y))
  
I (θ ) = E − =E .
∂θ 2 ∂θ

25 / 97 Unit 2. Likelihood Theory ACTL30004


Proof of Lemma 2.2
Proof.

 2
∂ log(Ln (θ |Y))

E(−D2 ) = E −
∂θ 2
 2    
∂ ∂ ∂
= −E log f ( Y | θ ) = − E log f ( Y | θ )
∂θ 2 ∂θ ∂θ
  
∂ 1 ∂
= −E f (Y | θ )
∂θ f (Y|θ ) ∂θ
 !2 
Z
1 ∂ 2 1 ∂
= −  f (y| θ ) − f (y|θ )  f (y|θ ) dy
Y f (y|θ ) ∂θ 2 f (y|θ ) ∂θ
!2
1
Z

= f (y| θ ) f (y|θ ) dy
Y f (y|θ ) ∂θ
 !
∂ log(Ln (θ |Y)) 2 ∂ log(Ln (θ |Y)) 2
Z   
= f (y|θ ) dy = E
Y ∂θ ∂θ
2
26 / 97 = E(D 1 )2.
Unit . Likelihood Theory ACTL30004
Theorem 1
Suppose that Tn∗ (Y) is an unbiased estimator of a scalar parameter
θ associated with a random variable Y. The minimum possible value
for Var(Tn∗ (Y)) is

1 1
Var(Tn∗ (Y)) ≥ = .
I (θ ) ∂2 log(Ln (θ |Y))

E −
∂θ 2

This minimum variance for an unbiased estimator of θ is called the


Cramer-Rao Lower Bound for variance (CRLB).

27 / 97 Unit 2. Likelihood Theory ACTL30004


Proof of Theorem 2.1
Let us consider Cov(Tn∗ (Y), D1 ), given that E(D1 ) = 0, we have

Cov(Tn∗ (Y), D1 ) = E(Tn∗ (Y) D1 ).

Now the right hand side of the above expression can be simplified
using

!
1
Z

E(Tn∗ (Y) D1 ) = Tn∗ (Y) f (y| θ ) f (y|θ ) dy
Y f (y|θ ) ∂θ
Z

= Tn∗ (Y) f (y|θ ) dy
Y ∂θ
Z

= T ∗ (Y) f (y|θ ) dy
∂θ Y n

= θ = 1.
∂θ

28 / 97 Unit 2. Likelihood Theory ACTL30004


Proof of Theorem 2.1 (continued)

Now by using the Schwartz inequality, it follows that

(Cov(Tn∗ (Y), D1 ))2 ≤ Var(Tn∗ (Y)) Var(D1 )


and hence using results from Lemma 1 and Lemma 2, we have that

1
Var(Tn∗ (Y)) ≥ .
E(D21 )

29 / 97 Unit 2. Likelihood Theory ACTL30004


Theorem 2
If there exists an unbiased estimator, Tn∗ (Y), of θ and

1
Var(Tn∗ (Y)) =
∂2 log(L n ( θ |Y))
 
E −
∂θ 2

then Tn∗ (Y) = Tn (Y).

30 / 97 Unit 2. Likelihood Theory ACTL30004


Proof of Theorem 2.2
From Theorem 1 we have that

(Cov(Tn∗ (Y), D1 ))2 ≤ Var(Tn∗ (Y)) Var(D1 )


1
can be written as Var(Tn∗ (Y)) ≥ .
E(D21 )
We note that equality will hold here if Tn∗ (Y) and D1 are perfectly
positively correlated. Equality will mean that we have an estimator
that has variance equal to the Cramer–Rao minimum lower bound.
That is, we will have

1 = Var(Tn∗ (Y)) Var(D1 )


if Tn∗ (Y) can be written as a linear function of D1 . We will write
this general linear function as

D1 = a(θ ) (Tn∗ (Y) − θ ) + b(θ ),


Now, taking expectations and using that E(D1 ) = 0 we have that
31 / 97 Unit 2. Likelihood Theory ACTL30004
Proof of Theorem 2.2 (Continued)

0 = a(θ ) (E(Tn∗ (Y)) − θ ) + b(θ )


which is only true if b(θ ) = 0 since Tn∗ (Y) is an unbiased estimator
for θ.

We therefore have
∂ log(Ln (θ |Y))
D1 = = a(θ ) (Tn∗ (Y) − θ ).
∂θ

Hence E(D21 ) = a(θ )2 Var(Tn∗ (Y)).

Now if Var(Tn∗ (Y)) attains the Cramer–Rao minimum lower bound,


we have that
Var(Tn∗ (Y)) = a(θ )−1 .

32 / 97 Unit 2. Likelihood Theory ACTL30004


Proof of Theorem 2.2 (Continued)

We can therefore write

∂ log(Ln (θ |Y)) 1
D1 = = (T ∗ (Y ) − θ ).
∂θ Var(Tn∗ (Y)) n

From this final line, the only solution to D1 = 0 is θ̂ = Tn∗ (Y) and
hence Tn∗ (Y) is the maximum likelihood estimator of θ.

33 / 97 Unit 2. Likelihood Theory ACTL30004


Exercise 3
Consider a random sample of size n, y1 , y2 , . . . , yn , from the
Y ∼ Bin(1, θ ).
(a) Find the maximum likelihood estimate of θ.
(b) Find expressions for E(Tn (Y)) and Var(Tn (Y)).
(c) Show that the variance of the ML estimator found in
(b) is equal to the Cramer–Rao lower bound (CRLB)
for the variance of an unbiased estimator of θ.

34 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 2.3 (a)
The pmf and likelihood function are

f (y | θ ) = θ y (1 − θ )1−y .

n n
Ln (θ |y) = θ ∑i=1 yi (1 − θ )n−∑i=1 yi

The log–likelihood function is


!
n n
log Ln (θ |y) = ∑ yi log θ + n − ∑ yi log(1 − θ )
i=1 i=1

The derivative with respect to θ is

∂ log Ln (θ |y) ∑ni=1 yi n − ∑ni=1 yi


= −
∂θ θ 1−θ

35 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 2.3 (a)
Setting this to zero and solving, we have

n ȳ n (1 − ȳ)
=
θ̂ 1 − θ̂
θ̂ = ȳ
Checking that we have found a maximum, we find

∂2 log Ln (θ ) 1 n
n − ∑ni=1 yi
∂θ 2
= −
θ2 ∑ yi − (1 − θ )2
,
i=1

which when θ is replaced with ȳ can be written as


 
1 1
−n +
ȳ 1 − ȳ
which is clearly negative (assuming that not all of the sample values
are 0 or 1).
36 / 97 Unit 2. Likelihood Theory ACTL30004
Solution to Exercise 2.3 (b)

E (Tn (Y)) = E(Ȳ) = θ


θ (1 − θ )
Var (Tn (Y)) = Var(Ȳ) = .
n

37 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 2.3 (c) −1
∂2 log(Ln (θ |Y))
 
The CRLB is E − .
∂θ 2
!
1 n n − ∑ni=1 Yi
 2
∂ log(Ln (θ |Y))


θ 2 i∑
E − = E Yi +
dθ 2 =1 (1 − θ )2
1 1
= 2
nθ + (n − n θ )
θ (1 − θ )2
n n
= +
θ 1−θ

∂2 log(Ln (θ |Y)) n (1 − θ ) + n θ
 
E − =
∂θ 2 θ (1 − θ )
n −nθ +nθ n
= = .
θ (1 − θ ) θ (1 − θ )
θ (1 − θ )
⇒ CRLB for variance of an unbiased estimator of θ is ,
n
the same as the answer from (b).
38 / 97 Unit 2. Likelihood Theory ACTL30004
Theorem 3
For large n,
 
 1 
Tn (Y) ' N 
 θ0 ,  2  
∂ log(Ln (θ |Y))

E ∂θ | θ0

Proof.
See the Appendix.

Tn (Y) is distributed approximately normally with mean θ0 (assumed


true underlying value of θ) and variance equal to the reciprocal of
the information evaluated at θ = θ0 .

39 / 97 Unit 2. Likelihood Theory ACTL30004


Exercise 4
Suppose that Y1 , . . . , Yn are iid Poisson random variables with
parameter µ.
(a) Find an expression for the ML estimator of µ based on
a sample of size n.
(b) Find the mean and variance of the estimator from (a).
 2
∂ log L(µ|Y)

(c) By finding E − write down the asymp-
∂µ2
totic distribution of the ML estimator from (a).
(d) What is the exact distribution of n Tn (Y) where Tn (Y)
is the ML estimator from (a)?

40 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 2.4 (a)
The likelihood function is
n
e−µ µyi
 
Ln (µ|y) = ∏ yi !
.
i=1

The log–likelihood function is

n
log(Ln (µ|y)) = ∑ (−µ + yi log µ − log yi !)
i=1
n n
= −n µ + log µ ∑ yi − ∑ log yi !.
i=1 i=1

Differentiating this with respect to µ and setting the result equal to


0, we obtain the ML estimator of µ, Tn (Y) = Ȳ.
Check for maximum:
∂2 log(Ln (µ)|y) ∑ni=1 yi
=− < 0.
∂µ2 µ2
41 / 97 Unit 2. Likelihood Theory ACTL30004
Solution to Exercise 2.4 (b)

E(Tn (Y)) = E(Ȳ) = 1


n ∑ni=1 E(Yi ) = µ.
n
1 µ
Var(Tn (Y)) = Var(Ȳ) =
n2 ∑ Var(Yi ) = n
.
i=1

42 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 2.4 (c)

Using the second derivative from (a), we have


 2  n
∂ log(Ln (µ|Y)) ∑i=1 Yi
 
E − = E
∂µ2 µ2
∑ni=1 E (Yi ) nµ n
= = 2 = .
µ2 µ µ
µ
Hence the asymptotic distribution of the ML estimator of µ is N µ, n .

43 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 2.4 (d)

n
n Tn (Y) = n ∑i=n1
Yi
= ∑ni=1 Yi .

This has a Poisson distribution with parameter n µ.

44 / 97 Unit 2. Likelihood Theory ACTL30004


 −1
∂2 log(Ln (θ |Y))

Remark 1–Intuition behind ∂θ 2
log L
-5500

-6000

-6500

-7000

-7500

0.1 0.2 0.3 0.4 0.5 0.6 0.7 θ

I Maximum of ` clearly defined around θ̂ so it will change


gradient very sharply around θ̂.
I A greater confidence will be put on θ̂.
I log Ln (θ |y) < log Ln (θ̂ |y).
I A much larger negative second derivative value at MLE ⇒ A
smaller variance (greater confidence) of θ̂
45 / 97 Unit 2. Likelihood Theory ACTL30004
 −1
∂2 log(Ln (θ |Y))

Remark 1–Intuition behind ∂θ 2
log L
-50

-100

-150

-200

2 4 6 8 10 θ

I Maximum of ` not clearly defined around θ̂ so it will change


gradient slowly around θ̂ .
I A lower confidence will be put on θ̂.
I log Ln (θ |y) slightly lower than log Ln (θ̂ |y).
I Slightly larger negative second derivative value at MLE ⇒ A
greater variance (lower confidence) of θ̂
46 / 97 Unit 2. Likelihood Theory ACTL30004
Remark 2–Observe Information instead of Expected
Information

∂2 log L(θ |Y)


 
I Sometimes it is difficult to find E − .
∂θ 2
I In these cases, we use the observed information function
defined as
I The observed information can always be calculated:

∂2 log Ln (θ |y)
Observed information ≡ − .
∂θ 2

47 / 97 Unit 2. Likelihood Theory ACTL30004


Remark 3–Extension to Multi–Parameter Case
I These results can be extended to the case of an unknown vector
of parameters θ (dimension p).
I In this case we have
 
1
Tn (Y) ' Nmulti θ0 , (i(θ0 ))−1 where
n
Nmulti (·, ·) represents the multivariate normal distribution,

θ0 mean random vector and

1
(i(θ0 ))−1 is the covariance matrix where i(θ0 ) has (j, k)th
n
element given by

∂2
 
E − log f (Y1 |θ ) , with j, k = 1, . . . , p.
∂θj ∂θk

48 / 97 Unit 2. Likelihood Theory ACTL30004


Exercise 5
Based on a sample y1 , y2 , . . . , yn of size n from the normal distribu-
tion with unknown parameters µ and σ, find the maximum likelihood
estimates of µ and σ.

49 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 2.5

We maximise the likelihood


 !!
n
1 yi − µ 2

1
Ln (µ, σ|y) = ∏ √ exp −
i=1 σ 2π 2 σ
n  !
1 n yi − µ 2
 
1
= √ exp − ∑ .
σ 2π 2 i=1 σ

Then, we find the log–likelihood function,

√ 1 n 
yi − µ
2
log(Ln (µ, σ )|y) = −n log(σ 2 π) −
2 ∑ σ
.
i=1

50 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 2.5

We then take the partial derivative with respect to µ.

∂ log(Ln (µ, σ)|y) n


1
∂µ
=
σ2 ∑ (yi − µ) .
i=1

Setting this to 0 and solving for µ, we have µ̂ = ȳ.


We next take the partial derivative with respect to σ.

∂ log(Ln (µ, σ )) n ∑n (y − µ )2
= − + i=1 3i .
∂σ σ σ

Setting this to 0, replacing µ with µ̂ = ȳ and replacing σ with σ̂,


we have:

51 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 2.5

∑ni=1 (yi − ȳ)2 n


=
σ̂3 σ̂

which gives
n
1
σ̂2 =
n ∑ (yi − ȳ)2 .
i=1

Note: for the function of two variables, we check for a maximum


by observing that the matrix of second order partial derivatives has
∂2 log Ln (µ, σ|y)
a positive determinant and that < 0.
∂µ2
This will NOT be required of you in this course.

52 / 97 Unit 2. Likelihood Theory ACTL30004


Likelihood Ratio Test

This statistical test is used to compare the goodness–of–fit of two


models, one of which (the reduced model) is a special case of the
other (the larger model). Let us consider that the larger model is
specified in terms of a random variable Y with probability density
function f (y|θ ) where θ ∈ Θ ⊆ Rk (k ≥ 1).

If the parameter space Θ of this model is partitioned into two subsets


Θ0 and Θ1 such that Θ = Θ0 ∪ Θ1 and Θ0 ∩ Θ1 = ∅, we can
assume a reduced model where Y follows a density function f (y|θ )
with θ ∈ Θ0 .

We now provide a test procedure for choosing one and only one of
the models based on the likelihood ratio test statistic.

53 / 97 Unit 2. Likelihood Theory ACTL30004


Theorem 4 (Likelihood Ratio Test (for nested models))
Consider a random variable Y with probability density function f (y|θ ) where
θ ∈ Θ ⊆ Rk (k ≥ 1). Our aim will be to test

H0 : θ ∈ Θ0 Vs. H1 : θ ∈ Θ1 = Θ − Θ0 .

If we define the ratio of likelihoods as


max Ln (θ |Y)
θ ∈Θ
Rn = .
max Ln (θ |Y)
θ ∈ Θ0

We will reject H0 in favour of H1 if and only if


 
max Ln (θ |Y)
θ ∈ Θ
2 log Rn = 2 log  
max Ln (θ |Y)
θ ∈ Θ0
= 2 [log(max Ln (θ |Y)) − log(max Ln (θ |Y))]
θ ∈Θ θ ∈ Θ0
= 2 [max log(Ln (θ |Y)) − max log(Ln (θ |Y))]
θ ∈Θ θ ∈ Θ0

is too large. Besides, for large n if the true value of θ is in Θ0 :

2 log Rn ≈ χ2p where p = dim Θ − dim Θ0 .


54 / 97 Unit 2. Likelihood Theory ACTL30004
Proof of Theorem 2.4
We assume Θ0 = θ0 and Θ has dimension p.

Under the assumption that H0 : θ = θ0 is true and expanding the


log–likelihood function at θ0 as a Taylor series about θ = θ̂n , where
θ̂n is the MLE of θ under the larger model.

log(Ln (θ0 |Y)) ≈ log(Ln (θ̂n |Y)) + (θ0 − θ̂n )> a(θ̂n )
1
+ (θ0 − θ̂n )> b(θ̂n ) (θ0 − θ̂n ),
2

a(θ̂n ) is a vector of first order partial derivatives of log(Ln (θ |Y))


at θ̂n and b(θn ) is a matrix of second order partial derivatives of
log(Ln (θ |Y)) at θ̂n .

As θ̂n is the MLE of θ ⇒ a(θ̂n ) = 0 (i.e. zero vector).

55 / 97 Unit 2. Likelihood Theory ACTL30004


Likelihood Ratio Test – Proof
Now, by using Remark 2, we have that
 2
∂ log(Ln (θ |Y))

−b(θ̂n ) ≈ E − = n i( θ0 ).
∂θi ∂θj θ 0

Using these results we have


2 log Rn = 2[log(Ln (θ̂n |Y)) − log(Ln (θ0 |Y))]
≈ (θ0 − θ̂n )> n i(θ0 ) (θ0 − θ̂n ).
If the true underlying value of θ is θ0 from Remark 3, we have for
large n,
(θ̂n − θ0 ) ' N (0, (ni(θ0 ))−1 )
and hence

2 log Rn ≈ χ2p .
Note that in the last step we have used that if Z ∼ Nr (0, Σ), then

Z> Σ−1 Z ∼ χ2r .


56 / 97 Unit 2. Likelihood Theory ACTL30004
Exercise 6
Losses due to vandal damage to cars (in $) over a period of six
months in a certain community are displayed in the following table.

38 56 77 110 112 138 152 168 188 210


228 241 252 273 283 288 291 299 305 317
321 356 374 422 485 527 529 559 567 656

Fit by using maximum likelihood estimation an Exponential distri-


bution with rate parameter λ and a Gamma distribution with shape
parameter α and rate parameter β to this set of data.

Use the likelihood ratio test to test the null hypothesis that Ex-
ponential distribution with rate parameter λ is appropriate against
the alternative hypothesis that Gamma distribution is appropriate.
Perform your statistical test at the 5% significance level.

57 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 2.6
The Exponential distribution has pdf

f (y|λ) = λ e−λ y , with λ > 0 and y > 0.

The Gamma distribution has pdf

β α α −1 − β y
f (y|α, β) = y e , with β, α > 0 and y > 0
Γ(α)

and where
Z ∞
Γ (t) = xt−1 e−x dx.
0

When λ = β and α = 1 the Exponential distribution is nested in


the Gamma distribution.

58 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 2.6
library(maxLik)
damage=read.csv("damage.csv",header=FALSE)
attach(damage)
x_data=damage[,1];

logLikFun <- function(param) {


lambda<- param[1]
sum(dexp(x_data, rate = lambda, log = TRUE))
}

mle <- maxLik(logLik = logLikFun, start = c(lambda = 0.003215025))


summary(mle)

LogLikFun2 <- function(param) {


alpha<- param[1]
beta<-param[2]
sum(dgamma(x_data,shape=alpha,rate = beta, log = TRUE))
}

mle2 <- maxLik(logLik = LogLikFun2, start = c(alpha=3.2388,beta =0.0110))


summary(mle2)

59 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 2.6
--------------------------------------------
Maximum Likelihood estimation
Newton-Raphson maximisation, 11 iterations
Return code 1: gradient close to zero
Log-Likelihood: -200.5142
1 free parameters
Estimates:
Estimate Std. error t value Pr(> t)
lambda 0.003401 0.000621 5.476 4.36e-08 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
--------------------------------------------
--------------------------------------------
Maximum Likelihood estimation
Newton-Raphson maximisation, 7 iterations
Return code 2: successive function values within tolerance limit
Log-Likelihood: -193.8312
2 free parameters
Estimates:
Estimate Std. error t value Pr(> t)
alpha 2.775605 0.677443 4.097 4.18e-05 ***
beta 0.009439 0.002526 3.737 0.000186 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
60 / 97
--------------------------------------------
Unit 2. Likelihood Theory ACTL30004
Solution to Exercise 2.6

The LRT for comparing the Exponential and Gamma distribution


gives a test statistic of

2 [−193.8312 + 200.5142] = 13.366.

The rejection region at the 5% significance level is values of the test


statistic greater than or equal to 3.84.

We therefore reject the smaller model (the Exponential model) in


favour of the model based on the Gamma distribution.

61 / 97 Unit 2. Likelihood Theory ACTL30004


Result 1
e− θ θ y
The Poisson distribution has pmf f (y|θ ) = , y = 0, 1, 2, . . . .
y!

The Negative Binomial distribution has pmf:


r  y
y+r−1
 
1 β
f (y|r, β) = y = 0, 1, . . . r, β > 0.
y 1+β 1+β

The Poisson distribution is a special case of the negative binomial


distribution.

That is, P (θ ) is limiting case of the N B(r, β) distribution when


both r β = θ and β → 0.

62 / 97 Unit 2. Likelihood Theory ACTL30004


Proof of Result 1
First, the calculation of the combinatorial term for non–integer r is
performed using

x(x − 1) . . . (x − k + 1) Γ (x + 1)
 
x
= = .
k k! Γ (k + 1) Γ (x − k + 1)

By re–parameterizing the negative binomial distribution we have

1 ω
= and r = ω.
1+β θ+ω
We get
ω  y
y+ω−1
 
ω θ
f (y|θ, ω ) = y = 0, 1, . . . ω, θ > 0.
y θ+ω θ+ω

As β → 0, ω → ∞ and so we write

63 / 97 Unit 2. Likelihood Theory ACTL30004


Proof of Result 1

θy (y + ω − 1)!ω ω
  
lim (f (y|θ, ω )) = lim
ω →∞ y! ω → ∞ ( ω − 1 ) ! ( θ + ω ) ω +y
 !
θy (y + ω − 1) ! θ −ω

= lim 1+
y! ω →∞ (ω − 1)!(θ + ω )y ω
e− θ θ y
=
y!

This shows that the Poisson distribution is a limiting case of the


negative binomial distribution.

64 / 97 Unit 2. Likelihood Theory ACTL30004


Exercise 7
Consider the following data on motor vehicle claims frequency. For
simplicity, in fitting our models we will assume that the 7+ category
relates exactly to 7 claims. The error introduced by this assumption will
be negligible.

# of claims (yi ) # of drivers (ni )


0 20592
1 2651
2 297
3 41
4 7
5 0
6 1
7+ 0

(a) Fit the Poisson and negative binomial models to the above
data using the method of the maximum likelihood.
(b) Compare the fit of the two models using the likelihood ratio
test. Which model is preferable?
65 / 97 Unit 2. Likelihood Theory ACTL30004
Solution to Exercise 2.7 (a)

Excel and R are used to estimate parameters for both models.

Poisson model:

θ̂ = 0.144220.

Negative Binomial model:

r̂ = 1.11790 and β̂ = 0.129010.

66 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 2.7 (a)
> obs=c(0,1,2,3,4,5,6,7);
> freq=c(20592,2651,297,41,7,0,1,0)
>
> logLikFunPoi <- function(param) {
+ theta<- param[1]
+ sum(freq*dpois(obs, theta, log = TRUE))
+ }
>
> mle3 <- maxLik(logLik = logLikFunPoi, start = c(theta = 0.156))
> summary(mle3)
--------------------------------------------
Maximum Likelihood estimation
Newton-Raphson maximisation, 3 iterations
Return code 2: successive function values within tolerance limit
Log-Likelihood: -10297.84
1 free parameters
Estimates:
Estimate Std. error t value Pr(> t)
theta 0.144220 0.002473 58.33 <2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
--------------------------------------------

67 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 2.7 (b)
7+
Define n = ∑ ni = 23, 589.
i=0

Under the Poisson model the maximum of the log–likelihood is


7+ 7+
−nθ̂ + ln θ̂ ∑ yi ni − ∑ ni ln yi ! = −10, 297.84.
i=0 i=0

Under the negative binomial model, we calculate the log–likelihood


using
7+
r + yi − 1
    
β
∑ ni ln yi
− r ln(1 + β) + yi ln
1+β
= −10, 223.42.
i=0

The GAMMALN function in Excel is useful in calculating this sum.

68 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 2.7 (b)

The LRT for comparing the Poisson and NB models gives a test
statistic of

2 [−10, 223.42 + 10, 297.84] = 148.84.

The rejection region at the 5% significance level is values of the test


statistic greater than or equal to 3.84.

We therefore reject the smaller model (the Poisson model) in favour


of the model based on the negative binomial distribution.

69 / 97 Unit 2. Likelihood Theory ACTL30004


Fisher–Scoring

Fisher–Scoring is a numerical method used to calculate the ML Es-


timates when


∂ log Ln (θ |y)
∂θ1 =0 


.. ..

.. is difficult to solve.
. . .
∂ log Ln (θ |y)


=0 

∂θp

The method of Fisher–Scoring is an extension of the Newton–Raphson


method for approximating the roots of non–linear equations.

70 / 97 Unit 2. Likelihood Theory ACTL30004


Fisher–Scoring

Following steps are required:

1. Revise Newton–Raphson method for approximating the solution


to a single equation with a single unknown variable.
2. Extend Newton–Raphson method to approximate the solution
to a system of n equations in n unknowns, and
3. Introduce the Fisher–Scoring algorithm in the context of ML
estimation and look at some applications of the method to real
actuarial data.

71 / 97 Unit 2. Likelihood Theory ACTL30004


Newton–Raphson Single equation and single unknown

Consider the equation g(x) = 0, with g : R → R


To approximate the solution to this equation:

1. Choose an initial estimate x = x0 , and get g(x0 ).


2. To update this value to find a better estimate of the solution
⇒ write a linear approximation of g about x = x0 (Taylor series
with two terms).
3. Set the equation equal to 0 and solve for x.
4. Updated estimate is x = x1 .

72 / 97 Unit 2. Likelihood Theory ACTL30004


Newton–Raphson Single equation and single unknown

Our Taylor series expansion is

g ( x ) ≈ g ( x0 ) + ( x − x0 ) g 0 ( x0 ) .
Setting this to 0, replacing x with x1 and solving for x1 , we have

0 ≈ g(x0 ) + (x − x0 ) g0 (x0 )
g(x0 )
⇒ x1 = x0 − 0 .
g ( x0 )

We can iterate this process many times to get improved estimates


of the solution to the equation g(x) = 0.

73 / 97 Unit 2. Likelihood Theory ACTL30004


Remark 4

The Newton–Raphson procedure does not work if the initial estimate


is at a stationary point. The process also does not generally converge
if the initial estimate is on the other side of a stationary point or a
discontinuity in the function g to the location of the solution of the
equation.

74 / 97 Unit 2. Likelihood Theory ACTL30004


Exercise 8
Use the Newton–Raphson method with four iterations to approxi-
mate the solution to the equation x2 = 2. Use an initial estimate of
x = 1.5.

75 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 2.8

The equation to be solved is

g(x) = 0 where g(x) = x2 − 2 and g0 (x) = 2x.

To update our initial estimate, we use

g ( x0 )
x1 = x0 −
g0 (x0 )
1.52 − 2
= 1.5 − = 1.41667.
2 × 1.5

Iterating this process, we get the following table:

76 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 2.8

x value Updated x value


x0 = 1.50000 1.41667
x1 = 1.41667 1.41422
x2 = 1.41422 1.41421
x3 = 1.41421 1.41421

Hence after four iterations, our estimate of the solution to the equa-
tion x2 = 2 is x = 1.41421.

77 / 97 Unit 2. Likelihood Theory ACTL30004


Newton–Raphson n equation and n unknown variables

Suppose we have a set of n functions

gi : Rn → R, i = 1, 2, . . . , n.

Goal: To find an approximate solution to the system of equations

gi (x) = 0, i = 1, 2, . . . , n

where x = (x1 , x2 , . . . , xn ).

Again, we need to write a linear approximation of gi about x = x0


(Taylor series with two terms) where x0 = (x01 , x02 , . . . , x0n ). (Taylor
series with two terms).

78 / 97 Unit 2. Likelihood Theory ACTL30004


Newton–Raphson n equation and n unknown variables
Let us consider f : Rn → R, with

x = (x1 , x2 , . . . , xn ).

The Taylor Series expansion about x0 = (x01 , x02 , . . . , x0n ) is

 >
∂f ∂f ∂f
f (x) = f (x0 ) + , ..., (x − x0 )
∂x1 ∂x2 ∂xn
 2 2

∂ f ∂ f ∂2 f
∂x21 ∂x1 ∂x2 ... ∂x1 ∂xn
 
∂2 f ∂2 f ∂2 f
...
 
1 >
 ∂x2 x1 ∂x22 ∂x2 ∂xn

+ (x − x0 )  .. .. ..
 (x − x0 ) + . . .
2 .. 

 . . . . 

∂2 f ∂2 f ∂2 f
∂xn x1 ∂xn x2 ... ∂x2n

79 / 97 Unit 2. Likelihood Theory ACTL30004


Newton–Raphson n equation and n unknown variables

Consider the function f (x) where f : Rn → Rn ,

with f (x) = (f1 (x), f2 (x), . . . , fn (x)).

The Jacobian matrix is a matrix made up of first order partial


derivatives and is
 ∂f1 ∂f1 ∂f1 
∂x1 ∂x2 ... ∂xn
∂f2 ∂f2 ∂f2
...
 
 ∂x1 ∂x2 ∂xn 
Jf = 
 .. .. .. .. 

 . . . . 
∂fn ∂fn ∂fn
∂x1 ∂x2 ... ∂xn

80 / 97 Unit 2. Likelihood Theory ACTL30004


Example 1
Consider f : R2 → R2 given by f (x1 , x2 ) = (x21 + x32 , ln x1 ) where

f (x1 , x2 ) = (f1 (x1 , x2 ), f2 (x1 , x2 )) with


f1 (x1 , x2 ) = x21 + x32 and f2 (x1 , x2 ) = ln x1 .

The Jacobian matrix is

2x1 3x22
 
Jf = 1 .
x1 0

81 / 97 Unit 2. Likelihood Theory ACTL30004


Newton–Raphson n equation and n unknown variables
Suppose we have a set of n functions

gi : Rn → R, i = 1, 2, . . . , n.

Goal: To find an approximate solution to the system of equations

gi (x) = 0, i = 1, 2, . . . , n

where x = (x1 , x2 , . . . , xn ).
We will write down n separate first order Taylor series approximations
to the equations, i.e. one for each

gi (x) = 0 for i = 1, 2, . . . , n.
We will expand the Taylor series about a vector of initial estimates

x0 = (x01 , . . . , x0n ).

82 / 97 Unit 2. Likelihood Theory ACTL30004


Newton–Raphson n equation and n unknown variables

Let us consider firstly a single equation gi (x) = 0.

Writing a Taylor series for left hand side

 >
∂gi ∂gi ∂gi
gi (x) ≈ gi (x0 ) + , ,..., (x − x0 ).
∂x1 ∂x2 ∂xn

83 / 97 Unit 2. Likelihood Theory ACTL30004


Newton–Raphson n equation and n unknown variables
Writing similar Taylor series expansions for all n functions

g1 (x), g2 (x), . . . , gn (x).

 >
∂g1 ∂g1 ∂g1
g1 (x) ≈ g1 (x0 ) + , ,..., (x − x0 )
∂x1 ∂x2 ∂xn
 >
∂g2 ∂g2 ∂g2
g2 (x) ≈ g2 (x0 ) + , ,..., (x − x0 )
∂x1 ∂x2 ∂xn
.. .
. ≈ ..
 >
∂gn ∂gn ∂gn
gn ( x ) ≈ gn ( x0 ) + , ,..., (x − x0 )
∂x1 ∂x2 ∂xn

84 / 97 Unit 2. Likelihood Theory ACTL30004


Newton–Raphson n equation and n unknown variables

Then collecting the n functions into a single function g : Rn → Rn


where

g(x) = (g1 (x), g2 (x), . . . , gn (x)).


Next, writing Taylor series approximation in matrix form we have

g(x) ≈ g(x0 ) + [Jg (x0 )] (x − x0 ),

where Jg (x0 ) denotes the Jacobian matrix of g at x = x0 .

Finally setting the left hand side equal to 0 and replacing x with x1
we have

x1 = x0 − [Jg (x0 )]−1 g(x0 ).

85 / 97 Unit 2. Likelihood Theory ACTL30004


Remark 5

The derivative on the denominator of the single equation algorithm


is replaced in the multiple equation algorithm by the inverse of the
Jacobian matrix.

86 / 97 Unit 2. Likelihood Theory ACTL30004


Exercise 9
Use the Newton–Raphson method to approximate the solution to
the system of equations:
(
x21 − x2 − 20 = 0
x1 x22 + x1 − 8 = 0
Use one iteration only. Use starting values of x0 = (5, 1).

87 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 2.9

Using the multivariate Newton–Raphson algorithm, we have

x1 = x0 − [Jg (x0 )]−1 g(x0 ),

where x0 = (5, 1) and g(x0 ) = (4, 2).

The Jacobian matrix is given by


 
2x1 −1
Jg =
x22 + 1 2x1 x2

and then  
10 −1
Jg (x0 ) = .
2 10

88 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 2.9

On substitution, we have

    −1  
5 10 −1 4
x1 = −
1 2 10 2
    
5 1 10 1 4
= −
1 102 −2 10 2
 
4.588
=
0.882
(to 3 decimal places).

89 / 97 Unit 2. Likelihood Theory ACTL30004


Fisher–Scoring Algorithm
The multivariate version of the Newton–Raphson algorithm can be
applied to solve

∂ log Ln (θ |y)
∂θ1 =0 


.. ..

.. System of p equations.
. . .
∂ log Ln (θ |y)


=0 

∂θp

Note that θ = (θ1 , θ2 , . . . , θp ). Application of the Newton–Raphson


method gives:

! −1
∂2 log L(θ |y)
!
∂ log L(θ |y)
θ1 = θ0 − | θ0 | θ0 where
∂θ∂θ > ∂θ
∂2 log L(θ |y)
− |θ0 is the observed information matrix at θ0 and
∂θ∂θ >
∂ log L(θ |y)
|θ0 is the score vector at θ0 .
∂θ
90 / 97 Unit 2. Likelihood Theory ACTL30004
Fisher–Scoring (Modification)
In practice the observed information is replaced by expected Fisher’s
information

" #!−1
∂2 log L(θ |y)
!
∂ log L(θ |y)
θ1 = θ0 − E | θ0 | θ0 or,
∂θ∂θ > ∂θ

" #!−1
∂2 log L(θ |y)
!
∂ log L(θ |y)
θ1 = θ0 + E − | θ0 | θ0 .
∂θ∂θ > ∂θ

where the matrix that is inverted is the Fisher’s information matrix


evaluated at θ0 .

This algorithm for finding the solution of the system of normal equa-
tions equal to 0 is called Fisher–Scoring.
91 / 97 Unit 2. Likelihood Theory ACTL30004
Exercise 10
Consider the following Poisson model:

Yi ∼ Poisson(λi = α + β zi ).

You should think of Y as response variables and z as predictors of


Y. The two unknown parameters are α and β.
Data on the response variable Y and the predictor z are given below:

y 29 37 32 33 26 30 24 28 20 21
z 1 2 3 4 5 6 7 8 9 10

Note that this data can be thought of as pairs (yi , zi ) for i =


1, 2, . . . , 10.

92 / 97 Unit 2. Likelihood Theory ACTL30004


Exercise 2.10

(a) Verify that the expected information matrix for a data


set with 10 pairs of data as given above is
 10   10   
1 zi
 ∑ α+ βz ∑ α + β zi 
i
I (α, β) =  i= i=1
 1 
10   10 
z 2  
zi
∑ α + β zi ∑ α + iβ zi
 
i=1 i=1

(b) Derive the Fisher scoring algorithm for this data.


(c) Using initial values for α and β based on fitting a nor-
mal linear model, implement your Fisher scoring algo-
rithm from (b) using EXCEL.

93 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 2.10 (a)
The likelihood function is
10
e−(α+ β zi ) (α + β zi )yi
L(α, β|y) = ∏ yi !
.
i=1
The log–likelihood function is
10
log(L(α, β|y)) = ∑ −(α + β zi ) + yi log(α + β zi ) − ln yi !.
i=1

The first order partial derivatives of the log–likelihood function are:

∂ log(L(α, β|y)) 10
yi
∂α
= ∑ −1 + α + β zi and
i=1
∂ log(L(α, β|y)) 10
yi zi
∂β
= ∑ −zi + α + β zi .
i=1

94 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 2.10 (a)
The Hessian matrix of second derivatives of the log–likelihood func-
tion
− yi − yi zi
     
10 10
∂2 ln(L(α, β|y))  ∑i=1 (α + β zi )2 ∑ i=1 2
 (α + β z2i )   .

=
−yi zi −yi zi
   
α
∑10 10
 
∂ ∂(α, β) i=1 ∑ i = 1
β (α + β zi )2 ( α + β zi ) 2

Replacing the yi values with Yi vectors, putting each term on the


negative and taking expectations, using the fact that E[Yi ] = α +
β zi , we get
    
1 zi

10 10
 ∑ i = 1 α + β zi ∑ i=1
 α + 2β zi  

I (α, β) =   
 10 zi zi
∑10

∑ i=1 i=1
α + β zi α + β zi
as required.
95 / 97 Unit 2. Likelihood Theory ACTL30004
Solution to Exercise 2.10 (b)

The Fisher–Scoring algorithm is

∂ log(Ln (α,β|y))
    !
α1 α0 −1
= + (I (α0 , β 0 )) ∂α
∂ log(Ln (α,β|y)) .
β1 β0 ∂β |(α0 ,β 0 )

96 / 97 Unit 2. Likelihood Theory ACTL30004


Solution to Exercise 2.10 (c)

See EXCEL spreadsheet Lecture Exercise 2.10 Calcs.xls.

y=c(29,37,32,33,26,30,24,28,20,21)
z=c(1,2,3,4,5,6,7,8,9,10)

initial=lm(y~z)
initial

Call:
lm(formula = y ~ z)

Coefficients:
(Intercept) z
35.800 -1.418

97 / 97 Unit 2. Likelihood Theory ACTL30004

Вам также может понравиться