Вы находитесь на странице: 1из 28

MTMS.01.

099 Mathematical Statistics


Lecture 7
Liivika Tee

Fall 2015

The Method of Maximum Likelihood



L() =

f (x1 ; ) f (x2 ; ) . . . f (xn ; ), continuous case


p(x1 ; ) p(x2 ; ) . . . p(xn ; ), discrete case.

Definition
The value from sample space A for which L() assumes its
largest value within A is called the ML estimate of :
= maxA L().
L()
Likelihood function and Maximum Likelihood estimate
The likelihood function L() = L(|x1 , x2 , . . . , xn ) gives the
likelihood of , given the data. A maximum likelihood estimate MLE is a value of that maximizes the likelihood.
Fall 2015

2 / 27

The Method of Least Squares


Let x1 , x2 , . . . , xn be a random rample from a distribution with
mean E(X ) = (), where () is a known function and an
unknown parameter with parameter space A. Let
Q() =

n
X

[xi ()]2

i=1

be the sum of the squares of the deviations of the observations


from ().
Definition
for which Q() assumes its least possible value
The value ,
within A, is called the LS estimate of :
= minA Q()
Q()
Fall 2015

3 / 27

Method of Moments
Let X1 , X2 , . . . , Xn denote random sample from a distribution
with r unknown parameters, (1 , 2 , . . . , r ). Let f (x; 1 , . . . , r )
denote the probability density function of this distribution.
The method of moments procedure equates each of the k
theoretical moments with the corresponding k th sample
moment to obtain a system of k equations in k unknowns,
k () = mk , k = 1, . . . , r , ()
P
where mk = n1 ni=1 x k is a k th sample moment and
k () = EX k is a k th theoretical moment.
Definition
The solution to the system of equations (), (1 , . . . , r ), are
then the method of moments estimates of 1 , . . . , r .
Fall 2015

4 / 27

Application to the Normal Distribution


1. A Single Sample
Let x1 , . . . , xn be a random sample from N(, 2 ), where both
parameters are unknowns. Lets find ML estimates for these
parameters.
Density function: f (x; , 2 ) =
Likelihood function:
L(, 2 ) =

n
Y

1 e
2 2

f (xi ; , 2 ) =

i=1
n

= (2 2 ) 2 e

1
2 2

n
Y

(x)2
2 2

i=1
Pn

1
2 2

(xi )2
2 2

2
i=1 (xi )

Logarithmic likelihood function:


P
l(, 2 ) = n2 ln(2 2 ) 21 2 ni=1 (xi )2
Fall 2015

5 / 27

Application to the Normal Distribution (2)


Logarithmic likelihood function:
P
l(, 2 ) = n2 ln(2 2 ) 21 2 ni=1 (xi )2
Differentiate:

n
l
1 X
(xi )2
= 2

i=1

n
l
n
1 X
=

+
(xi )2
2
2 2 2 4
i=1

Equate to zero and it follows


n

1X
xi = x unbiased
n
i=1
n

2 =

1X
(xi x)2 biased
n
i=1

Fall 2015

6 / 27

Application to the Normal Distribution (2)

Remark
If is known, we obtain from equation
ML estimate for 2 :

l
= 0 the following
2

2 =

1X
(xi )2 .
n
i=1

This estimate is unbiased.

Fall 2015

7 / 27

Application to the Normal Distribution (3)


2. Two Samples
Let x1 , . . . , xn and y1 , . . . , yn be two independent random
samples from N(1 , 12 ) and N(2 , 22 ), respectively. If all four
parameters are unknown, each sample can be treated
according to the the method in Single Sample case.

Fall 2015

8 / 27

Application to the Normal Distribution (4)


1. Two Samples

If its known that 12 = 22 = , then we can estimate using


both samples.

Fall 2015

9 / 27

Application to the Normal Distribution (4)

Let x1 , . . . , xn and y1 , . . . , yn be two independent random


samples from N(1 , 2 ) and N(2 , 2 ), respectively. The joint
likelihood function for two samples is
L(1 , 2 , 2 ) = L(1 , 2 ) L(2 , 2 ),
n1

where L(1 , 2 ) = (2 2 ) 2 e
L(2 , 2 )), so that

1
2 2

Pn

2
i=1 (xi 1 )

n1 +n2

(analogously for

Q( , )

1 2
L(1 , 2 , 2 ) = (2 2 ) 2 e 22
,
P 1
P 2
where Q(1 , 2 ) = ni=1
(xi 1 )2 + ni=1
(yi 2 )2 .

Fall 2015

10 / 27

Application to the Normal Distribution (5)


Logarithmic likelihood function is:
l(1 , 2 , 2 ) =

n1 + n2
1
ln(2 2 ) 2 Q(1 , 2 ).
2
2

Differentiating and equating to zero, we get the following


equation system:
n1
1 X
l
: 2
(xi 1 )(1) = 0,
1
2
i=1

n2
l
1 X
(yi 2 )(1) = 0,
: 2
2
2
i=1

l
n1 + n2 1
1
:
2 + 2 Q(1 , 2 ) = 0.
2
2
2

2
2
Fall 2015

11 / 27

Application to the Normal Distribution (6)


Solutions of equation system are the ML estimates for the
parameters 1 , 2 , 2 :

1 = x,

2 = y ,
Q(1 , 2 )
1

2 =
=
n1 + n2
n1 + n2

2 =

n1
X
i=1

(xi 1 )2 +

n2
X

!
(yi 2 )2

i=1

(n1 1)s12 + (n2 1)s22


.
n1 + n2

The first two estimates are unbiased, but the third one has to
be adjusted. The adjusted ML estimate is found to be
s2 =

(n1 1)s12 + (n2 1)s22


.
n1 + n2 2
Fall 2015

12 / 27

Standard Error of an estimate

We have unknown parameter . From sample we get a point


which is a random variable. The estimate of the
estimate ,
variance is a constant, but the estimate of that is again random
variable:
d
Var Var
,
Definition
An estimate of =

p
Var is called the standard error of
q
d .

se() = Var

Fall 2015

13 / 27

Standard Error
Example
The researcher wants to investigate the proportion of
households who are following BBC News Channel. It is known
that the standard error of the estimate cannot exceed 0.03.
How many households need to be in the sample?

Fall 2015

14 / 27

Standard Error
Example
The researcher wants to investigate the proportion of
households who are following BBC News Channel. It is known
that the standard error of the estimate cannot exceed 0.03.
How many households need to be in the sample?
The number of BBC followers X is from binomial distribution:
X Bin(n, p).
r
(1 p
)
p
) =
se(p
0.03
n
If we compute this with respect to n, we get the asked sample
size.

Fall 2015

14 / 27

Estimation of Probability Function, Density Function


and Distribution Function
Have you thought about how (and based on what) SAS, R,
SPSS etc are drawing the graphs (histograms) of empirical
distributions?

Fall 2015

15 / 27

Estimating the Probability Function


Let p(k ) be the unknown (discrete) probability function, which
can be theoretically given by the following table {k , p(k )}:

Fall 2015

16 / 27

Estimating the Probability Function (2)


Let us have a sample x1 , x2 , . . . , xn from this distribution. Fix
one concrete value k and find its frequency fk .
f
k = k .
fk B(n, p(k )) p
n
Sample (n=20)
22,28,26,25,26,25,24,26,24,23,27,25,27,25,27,28,28,29,23,26
Fix one k (for example, k = 26) and find the corresponding fk ,
fk = 4.
The estimate for the probability p(k ) is
(k ) = 4/20 = 0.2
p
This is how we can estimate the probabilities of all values.
Fall 2015

17 / 27

Estimating the Probability Function (3)

Fall 2015

18 / 27

Estimating the Density Function


Let f (x) be an unknown density function and consider sample
x1 , x2 , . . . , xn from this distribtuion.

Sample (n=20)
2.2,5.1,2.5,1.9,0.7,1.3,3.0,2.8,1.4,0.9,2.1,5.0,2.3,1.7,
0.5,1.4,3.3,2.7,1.8,0.9
Fall 2015

19 / 27

Estimating the Density Function (2)

Group the observations into r classes. Let p1 , . . . , pr be the


unknown areas under the density function f (x).

Need to estimate the probabilities p1 , . . . , pr .

Fall 2015

20 / 27

Estimating the Density Function (3)


If fi denotes the number of observations in i-th class, then
i = fi /n.
fi B(n, pi ) p
Sample (n=20)
2.2,5.1,2.5,1.9,0.7,1.3,3.0,2.8,1.4,0.9,2.1,5.0,2.3,1.7,
0.5,1.4,3.3,2.7,1.8,0.9

Fall 2015

21 / 27

Estimating the Density Function (4)

The conclusions are similar, but note that we do not obtain an


estimate of the density function f (x) itself, only estimates of
certain areas under this function. By choosing r large (which
requires a sufficiently large sample), we nevertheless obtain a
good picture of the form of the density function.
Fall 2015

22 / 27

Estimating the Distribution Function


Assume that we want to estimate F (x) for a given value x:
F (x) = P(X x).

Fall 2015

23 / 27

Estimating the Distribution Function (2)


Assume that we want to estimate F (x) for a given value x:
F (x) = P(X x).
Let us have a sample with size n from this distribution. Denote
gx = #(xi x),
e.g the values in the sample that are x.
Since gx denotes the number of successes in a sequence of n,
gx B(n, F (x)),
where F (x) is the probability, that the value in considered
sequence is x. It follows
b (x) = #(xi x) .
F
n
Fall 2015

24 / 27

Estimating the Distribution Function (3)


If the sample is small, or only moderately large, it is sometimes
useful to perform the above calculation for all the n sample
points x1 , . . . , xn . This implies that, proceeding from small to
large x, we estimate F (x) for x = x(1) , x = x(2) and so on,
where
x(1) x(2) . . . x(n)
is the ordered sample.
b (x) = #(xi x) . we get that
From F
n
b (x(i) ) = i (i = 1, 2, . . . , n)
F
n
In this way we obtain an estimate of the whole function F (x)
called the empirical distribution function.

Fall 2015

25 / 27

Estimating the Distribution Function (4)

Fall 2015

26 / 27

Estimating the Distribution Function (5)

Fall 2015

27 / 27

Вам также может понравиться