Вы находитесь на странице: 1из 8

100

CHAPTER 2. ELEMENTS OF STATISTICAL INFERENCE

2.3 Methods of Estimation


2.3.1 Method of Moments
The Method of Moments is a simple technique based on the idea that the sample
moments are natural estimators of population moments.
The k-th population moment of a random variable Y is
0k = E(Y k ),

k = 1, 2, . . .

and the k-th sample moment of a sample Y1 , . . . , Yn is


n

m0k =

1X k
Y ,
n i=1 i

k = 1, 2, . . . .

If Y1 , . . . , Yn are assumed to be independent and identically distributed then the


Method of Moments estimators of the distribution parameters 1 , . . . , p are obtained by solving the set of p equations:
0k = m0k ,

k = 1, 2, . . . , p.

Under fairly general conditions, Method of Moments estimators are asymptotically normal and asymptotically unbiased. However, they are not, in general,
efficient.
Example 2.17. Let Yi N (, 2 ). We will find the Method of Moments estimaiid

tors of and 2 .

0
0
2
2
2
0
0
We
Pn have2 1 = E(Y ) = , 2 = E(Y ) = + , m1 2= Y and m2 =
i=1 Yi /n. So, the Method of Moments estimators of and satisfy the equations

b=Y
n
1X 2
2
2

b +
b =
Y .
n i=1 i

Thus, we obtain

b=Y

b2 =

1X
1X 2
2
Yi Y =
(Yi Y )2 .
n i=1
n i=1

2.3. METHODS OF ESTIMATION

101


Estimators obtained by the Method of Moments are not always unique.


Example 2.18. Let Yi Poisson(). We will find the Method of Moments esiid

timator of . We know that for this distribution E(Yi ) = var(Yi ) = . Hence


By comparing the first and second population and sample moments we get two
different estimators of the same parameter,
b
1 = Y

1X 2
2
b
2 =
Yi Y .
n i=1

Exercise 2.11. Let Y = (Y1 , . . . , Yn )T be a random sample from the distribution


with the pdf given by
 2
( y), y [0, ],
2
f (y; ) =
0,
elsewhere.
Find an estimator of using the Method of Moments.

2.3.2 Method of Maximum Likelihood


This method was introduced by R.A.Fisher and it is the most common method
of constructing estimators. We will illustrate the method by the following simple
example.
Example 2.19. Assume that Yi Bernoulli(p), i = 1, 2, 3, 4, with probability of
iid

success equal to p, where p = { 41 , 42 , 34 } i.e., p belongs to the parameter space


of only three elements. We want to estimate parameter p based on observations of
the random sample Y = (Y1 , Y2 , Y3, Y4 )T .
The joint pmf is
P (Y = y; p) =

4
Y
i=1

P (Yi = yi ; p) = p

P4

i=1

yi

(1 p)4

P4

i=1

yi

The different values of the joint pmf for all p are given in the table below

CHAPTER 2. ELEMENTS OF STATISTICAL INFERENCE

102

P (Y = y; p)

1
4
81
256
27
256
9
256
3
256
1
256

2
4
16
256
16
256
16
256
16
256
16
256

3
4
1
256
3
256
9
256
27
256
81
256

P4

i=1

yi

0
1
2
3
4

P
We see that P ( 4i=1 Yi = 0) is largest when p = 41 . It can be interpreted that when
the observed value of the random sample is (0, 0, 0, 0)T the most likely value of
the parameter p is pb = 41 . Then, this value can be considered as an estimate of p.
Similarly, we can conclude that when the observed value of the random sample
is, for example, (0, 1, 1, 0)T , then the most likely value of the parameter is pb = 21 .
Altogether, we have
pb =
pb =
pb =

1
4
1
2
3
4

if we observe all failures or just one success;


if we observe two failures and two successes;
if we observe three successes and one failure or four successes.

Note that, for each point (y1 , y2 , y3 , y4)T , the estimate pb is the value of parameter
p for which the joint mass function, treated as a function of p, attains maximum
(or its largest value).
Here, we treat the joint pmf as a function of parameter p for a given y. Such a
function is called the likelihood function and it is denoted by L(p|y).

Now we introduce a formal definition of the Maximum Likelihood Estimator (MLE).
b whose value for a given
Definition 2.11. The MLE() is the statistic T (Y ) =
y satisfies the condition
b
L(|y)
= sup L(|y),

where L(|y) is the likelihood function for .


Properties of MLE
The MLEs are invariant, that is

b
MLE(g()) = g(MLE()) = g().

2.3. METHODS OF ESTIMATION

103

MLEs are asymptotically normal and asymptptically unbiased. Also, they are
efficient, that is
b = lim CRLB(g()) = 1.
eff(g())
n
b
var g()
b is approximately equal to the CRLB. Therefore,
In this case, for large n, var g()
for large n,

b N g(), CRLB(g())
g()
b
approximately. This is called the asymptotic distribution of g().

Example 2.20. Suppose that Y1 , . . . , Yn are independent Poisson() random variables. Then the likelihood is
L(|y) =

n
Y
yi e
i=1

yi !

Pn

yi n

e
.
i=1 yi !

i=1

Qn

We need to find the value of which maximizes the likelihood. This value will
also maximize `(|y) = log L(|y), which is easier to work with. Now, we have
`(|y) =

n
X
i=1

yi log n

n
X

log(yi !).

i=1

The value of which maximizes `(|y) is the solution of d`/d = 0. Thus,


solving the equation
Pn
d`
yi
= i=1 n = 0
d

Pn

yields the estimator = T (Y ) =


i=1 Yi /n = Y , which is the same as the
Method of Moments estimator. The second derivative is negative for all hence,
b indeed maximizes the log-likelihood.

Example 2.21. Suppose that Y1 , . . . , Yn are independent N (, 2 ) random variables. Then the likelihood is


n
Y
1
(yi )2
2

L(, |y) =
exp
2
2 2
2
i=1
(
)
n
X
n
1
= (2 2 ) 2 exp 2
(yi )2
2 i=1
and so the log-likelihood is
n
n
1 X
`(, 2 |y) = log(2 2 ) 2
(yi )2 .
2
2 i=1

CHAPTER 2. ELEMENTS OF STATISTICAL INFERENCE

104
Thus, we have

n
n
1 X
1 X
`
= 2
2(yi ) = 2
(yi )

2 i=1
i=1

and
n
n
`
n
n 1
1 X
1 X
2
(yi ) = 2 + 4
(yi )2 .
=
2 + 4
2
2 2 2
2 i=1
2
2 i=1

Setting these equations to zero, we obtain


n
n
X
1 X
(yi
) = 0
yi = n
,

2 i=1
i=1

so that
= Y is the maximum likelihood estimator of , and
n
n
X
n
1 X
2
2
=
(yi y)2 ,
2+ 4
(yi y) = 0 n
2

2
i=1
i=1

P
so that
2 = ni=1 (Yi Y )2 /n is the maximum likelihood estimator of 2 , which
are the same as the Method of Moments estimators.

Exercise 2.12. Let Y = (Y1 , . . . , Yn )T be a random sample from a Gamma distribution, Gamma(, ), with the following pdf
f (y; , ) =

1 y
y e , for y > 0.
()

Assume that the parameter is known.


(a) Identify the complete sufficient statistic for .
(b) Find the MLE[g()], where g() =
complete sufficient statistic?

1
.

Is the estimator a function of the

(c) Knowing that E(Yi ) = 1 for all i = 1, . . . , n, check that the MLE[g()] is
an unbiased estimator of g(). What can you conclude about the properties
of the estimator?

2.3. METHODS OF ESTIMATION

105

2.3.3 Method of Least Squares


If Y1 , . . . , Yn are independent random variables, which have the same variance and
higher-order moments, and, if each E(Yi ) is a linear function of 1 , . . . , p , then
the Least Squares estimates of 1 , . . . , p are obtained by minimizing
S() =

n
X
i=1

{Yi E(Yi )}2 .

The Least Squares estimator of j has minimum variance amongst all linear
unbiased estimators of j and is known as the best linear unbiased estimator
(BLUE). If the Yi s have a normal distribution, then the Least Squares estimator
of j is the Maximum Likelihood estimator, has a normal distribution and is the
MVUE.
Example 2.22. Suppose that Y1 , . . . , Yn1 are independent N (1 , 2 ) random variables and that Yn1 +1 , . . . , Yn are independent N (2 , 2 ) random variables. Find
the least squares estimators of 1 and 2 .
Since
E(Yi ) =

1 ,
2 ,

i = 1, . . . , n1 ,
i = n1 + 1, . . . , n,

it is a linear function of 1 and 2 . The Least Squares estimators are obtained by


minimizing
S=

n
X
i=1

{Yi E(Yi )} =

n1
X
i=1

(Yi 1 ) +

n
X

(Yi 2 )2 .

i=n1 +1

Now,

and

n1
n1
X
1 X
S
Yi = Y 1
= 2
(Yi 1 ) = 0
b1 =
1
n
1
i=1
i=1

n
n
X
1 X
S
= 2
(Yi 2 ) = 0
b2 =
Yi = Y 2 ,
2
n
2
i=n +1
i=n +1
1

where n2 = n n1 . So, we estimate the mean of each group in the population by


the mean of the corresponding sample.


CHAPTER 2. ELEMENTS OF STATISTICAL INFERENCE

106

Example 2.23. Suppose that Yi N (0 +1 xi , 2 ) independently for i = 1, 2, . . . , n,


where xi is some explanatory variable. This is called the simple linear regression model. Find the least squares estimators of 0 and 1 .
Since E(Yi ) = 0 + 1 xi , it is a linear function of 0 and 1 . So we can obtain
the least squares estimates by minimizing
n
X
S=
(Yi 0 1 xi )2 .
i=1

Now,
n

X
X
X
S
= 2
(Yi 0 1 xi ) = 0
Yi nb0 b1
xi = 0 b0 = yb1 x
0
i=1
i=1
i=1

and

X
X
X
X
S
= 2
xi (Yi 0 1 xi ) = 0
xi Yi b0
xi b1
x2i = 0.
1
i=1
i=1
i=1
i=1

Substituting the first equation into the second one, we have


n
X
i=1

xi Yi (yb1 x)

n
X
i=1

xi b1

n
X

x2i

i=1

Hence, we have the estimators


b0 = Y b1 x

and

=0

nx

n
X
i=1

x2i

n
X
b
xi Yi .
1 = nxy
i=1

Pn
i=1 xi Yi nxY
b
1 = P
n
2 .
2
i=1 xi nx

These are the Least Squares estimators of the regression coefficients 0 and 1 .

Exercise 2.13. Given data, (x1 , y1), . . . , (xn , yn ), assume that Yi N (0 +1 xi , 2 )
independently for i = 1, 2, . . . , n and that 2 is known.
(a) Show that the Maximum Likelihood Estimators of 0 and 1 must be the same
as the Least Squares Estimators of these parameters.

2.3. METHODS OF ESTIMATION

107

(b) The quench bath temperature in a heat treatment operation was thought to
affect the Rockwell hardness of a certain coil spring. An experiment was
run in which several springs were treated under four different temperatures.
The table gives values of the set temperatures (x) and the observed hardness
(y, coded). Assuming that hardness depends on temperature linearly and
the variance of the r.vs is constant we may write the following model:
E(Yi ) = 0 + 1 xi , var(Yi ) = 2 .
Calculate the LS estimates of 0 and of 1 . What is the estimate of the
expected hardness given the temperature x = 40?
Run
x
y

1
30
55.8

2
30
59.1

3
30
54.8

4
30
54.6

5
40
43.1

6
40
42.2

7
40
45.2

8
50
31.6

9
50
30.9

10
50
30.8

11
60
17.5

12
60
20.5

13
60
17.2

In this section we were considering so called point estimators. We used various methods, such as the Method of Moments, Maximum Likelihood or Least
Squares, to derive them. We may also construct the estimators using the RaoBlackwell Theorem. The estimators are functions
T (Y1, . . . , Yn ) ,
that is, their values belong to the parameter space. However, the values vary with
the observed sample. If the estimator is MVUE we may expect that on average
the calculated estimates are very close to the true parameter and also that the
variability of the estimates is the smallest possible.
Sometimes it is more appropriate to construct an interval which covers the unknown parameter with high probability and whose limits depend on the sample.
We introduce such intervals in the next section. The point estimators are used in
constructing the intervals.

14
60
16.9

Вам также может понравиться