Lect 8

Hypothesis Testing
CB: chapter 8; section 10.3

Hypothesis: statement about an unknown population parameter
Examples: The average age of males in Sweden is 27. (statement about population
mean)
The lowest time it takes to run 30 miles is 2 hours. (statement about population
max)
Stocks are more volatile than bonds. (statement about variances of stock and bond
returns)
In hypothesis testing, you are interested in testing between two mutually exclusive hypotheses, called the null hypothesis (denoted H0 ) and the alternative hypothesis
(denoted H1 ).
H0 and H1 are complementary hypotheses, in the following sense:
If the parameter being hypothesized about is , and the parameter space (i.e., possible
values for ) is , then the null and alternative hypotheses form a partition of :
H0 : 0
H1 : c0 (the complement of 0 in ).
Examples:
1. H0 : = 0 vs. H1 : 6= 0
2. H0 : 0 vs. H1 : > 0
Definitions of test statistics
A test statistic, similarly to an estimator, is just some real-valued function Tn

T (X1 , . . . , Xn ) of your data sample X1 , . . . , Xn . Clearly, a test statistic is a random
variable.
A test is a function mapping values of the test statistic into {0, 1}, where
0 implies that you accept the null hypothesis H0 reject the alternative
hypothesis H1 .
1 implies that you reject the null hypothesis H0 accept the alternative
hypothesis H1 .
1
The subset of the real line R for which the test is equal to 1 is called the rejection
(or critical) region. The complement of the critical region (in the support of the
test statistic) is the acceptance region.
Example: let denote the (unknown) mean male age in Sweden.
You want to test: H0 : = 27 vs. H1 : 6= 27
100 , the average age of 100 randomly-drawn Swedish males.
Let your test statistic be X
Just as with estimators, there are many different possible tests, for a given pair of
hypotheses H0 , H1 . Consider the following four tests:

100 6 [25, 29]
1. Test 1: 1 X

100 6 29
2. Test 2: 1 X

100 6> 35
3. Test 3: 1 X

100 6= 27
4. Test 4: 1 X
Which ones make the most sense?
Also, there are many possible test statistics, such as: (i) med100 (sample median); (ii)
max(X1 , . . . , X100 ) (sample maximum); (iii) mode100 (sample mode); (iv) sin X1 (the
sine of the first observation).
In what follows, we refer to a test as a combination of both (i) a test statistic; and
(ii) the mapping from realizations of the test statistic to {0, 1}.
Next we consider some common types of tests.

1.1
Likelihood Ratio Test
~ = X1 , . . . , Xn i.i.d. f (X|), and likelihood function L(|X)

~ = Qn f (xi |).
Let: X
i=1
Define: the likelihood ratio test statistic for testing H0 : 0 vs. H1 : c0
is
~
~ sup0 L(|X) .
(X)
~
sup L(|X)
~ is the restricted likelihood function, and the denominator
The numerator of (X)
is the unrestricted likelihood function.
2
The support of the LR test statistic is [0, 1].

~ = 1 (since the restriction
Intuitively speaking, if H0 is true (i.e., 0 ), then (X)
~ can be small (close to
of 0 will not bind). However, if H0 is false, then (X)
zero).
~ is small; for example,
So an LR test should be one which rejects H0 when (X)
~ < 0.75)
1((X)

Example: X1 , . . . , Xn i.i.d. N (, 1)
Test H0 : = 2 vs. H1 : 6= 2.
Then

P
1
2
(X
2)
exp
i
2
~ =

Pi
(X)
n )2
exp 12 i (Xi X
n is the unrestricted MLE estimator for .

(the denominator arises because X
Example: X1 , . . . , Xn U [0, ].
(i) Test H0 : = 2 vs. H1 : 6= 2.
Restricted likelihood function
~
L(X|2)
=
Unrestricted likelihood function

~
L(X|) =

1 n
2
if max(X1 , . . . , Xn ) 2
if max(X1 , . . . , Xn ) > 2.

1 n
if max(X1 , . . . , Xn )
if max(X1 , . . . , Xn ) >
which is maximized at nM LE = max(X1 , . . . , Xn ).

n

Hence the denominator of the LR statistic is max(X11,...,Xn ) , so that
(
~ =
(X)
0

max(X1 ,...,Xn )
2
n if max(X1 , . . . , Xn ) > 2
otherwise
~ c). Critical region consists of two disconnected parts

LR test would say: 1((X)
(graph).
3
(ii) Test H0 : [0, 2] vs. H1 : > 2.

In this case: the restricted likelihood is
(
n
1
if max(X1 , . . . , Xn ) 2
~
max(X1 ,...,Xn )
sup L(X|)
=
[0,2]
0
otherwise.
so
~ =
(X)
1 if max(X1 , . . . , Xn ) 2
0 otherwise.
(1)
(graph)
~ c) = 1(max(X1 , . . . , Xn ) > 2).

So now LR test is 1((X)

1.2
Wald Tests
Another common way to generate test statistics is to focus on statistics which are
asymptotically normal distributed, under H0 (i.e., if H0 were true).
A common situation is when the estimator for , call it n , is asymptotically normal,
with some asymptotic variance V (eg. MLE). Let the null be H0 : = 0 . Then, if
the null were true:
(n 0 ) d
q
N (0, 1).
1
V
n
(2)
The quantity on the LHS is the T-test statistic.

Note: in most cases, the asymptotic variance V will not be known, and will also need
p
to be estimated. However, if we have an estimator Vn such that V V , then the
4
statement
Zn
(n 0 ) d
q
N (0, 1)
1
V
n
still holds (using the plim operator and Slutsky theorems). In what follows, therefore,
we assume for simplicity that we know V .
We consider two cases:
(i) Two-sided test: H0 : = 0 vs. H1 : 6= 0 .
Under H0 : the CLT holds, and the t-stat is N (0, 1)
Under H1 : assume that the true value is some 1 6= 0 . Then the t-stat can be written
as
(n 1 ) (1 0 )
(n 0 )
q
= q
+ q
.
1
1
1
V
V
V
n
n
n
d
The first term N (0, 1), but the second (non-stochastic) term diverges to or ,
depending on whether the true 1 exceeds or is less than 0 . Hence the t-stat diverges
to or with probability 1.
Hence, in this case, your test should be 1(|Zn | > c), where c should be some number
in the tails of the N (0, 1) distribution.
Multivariate version: is K-dimensional, and asymptotic normal, so that under H0 ,
we have
d
n(n 0 ) N (0, ).
Then we can test H0 : = 0 vs. H1 : 6= 0 using the quadratic form
d
Zn n (n 0 )0 1 (n 0 ) 2k .
Test takes the form: 1(Zn > c).
(ii) One-sided test: H0 : 0 vs. H1 : > 0 .
Here the null hypothesis specifies a whole range of true (0 = (, 0 ]), whereas
the t-test statistic is evaluated at just one value of .
Just as for the two-sided test, the one-sided t-stat is evaluated at 0 , so that Zn =
n 0
.
1
n
Under H0 and < 0 : Zn diverges to with probability 1. Under H0 and = 0 :

the CLT holds, and the t-stat is N (0, 1).
Under H1 : Zn diverges to with probability 1.
Hence, in this case, you will reject the null only for very large values of Zn . Correspondingly, your test should be 1(Zn > c), where c should be some number in the
right tail of the N (0, 1) distribution.
Later, we will discuss how to choose c.

1.3
Score test
~ =
Consider a model with log-likelihood function log L(|X)
1
n
log f (xi |).
Let H0 : = 0 . The sample score function evaluated at 0 is

S(0 )
Define Wi =
1X
~

log L(|X)
log
f
(x
|)
=
.
i
=0
=0
n i

log f (xi |)=0 . Under the null hypothesis, S(0 ) converges to
Z
E0 Wi =
d
f (x|)|=0
d
f (x|0 )
f (x|0 )dx
=
f (x|)dx
1=0
=
(the information inequality). Hence,

V0 Wi =
(Note that
1
V0
E0 Wi2

= E0
log f (X|)=0
2
V0 .
is the usual variance matrix for MLE, which is the CRLB.)
Therefore, you can apply the CLT to get that, under H0 ,

S( ) d
q 0 N (0, 1).
1
V
n 0
(If we dont know V0 , we can use some consistent estimator V0 of it.)
6

1
S(0 )
n
So a test of H0 : = 0 could be formulated as 1 | 1 | > c , where c is in the
n
V0
right tail of the N (0, 1) distribution.

Multivariate version: if is K-dimensional:
d
Sn nS(0 )0 V01 S(0 ) 2k .

Recall that, in the previous lecture notes, we derived the following asymptotic equality
for the MLE:
P log f (xi |)
1

i
a
n
n n 0 = n 1 P 2 log f (x |) =0
i

2
n
=0
(3)
S(0 )
= n
.
V0
a
(The notation = means that the LHS and RHS differ by some quantity which is
op (1).)
Hence, the above implies that (applying the information inequality, as we did before)
S( )

1
d
a
0
N (0, )
n n 0 = n
V0
|
{z
} | {zV0 }
(1)
(2)
The Wald statistic is based on (1), while the Score statistic is based on (2). In this
sense, these two tests are asymptotically equivalent. Note: the asymptotic variance
of the Wald statistic (2) equal the reciprocal of V0 .
(Later, you will also see that the Likelihood Ratio Test statistic is also asymptotically
equivalent to these two.)

The LR, Wald, and Score tests (the trinity of test statistics) require different models
to be estimated.
LR test requires both the restricted and unrestricted models to be estimated
Wald test requires only the unrestricted model to be estimated.
7
Score test requires only the restricted model to be estimated.

Applicability of each test then depends on the nature of the hypotheses. For H0 :
= 0 , the restricted model is trivial to estimate, and so LR or Score test might be
preferred. For H0 : 0 , restricted model is a constrained maximization problem,
so Wald test might be preferred.
Methods of evaluating tests
n.
Consider X1 , . . . , Xn i.i.d. (, 2 ). Test statistic X
Test H0 : = 2 vs. H1 : 6= 2.
Why are the following good or bad tests?
n 6= 2)
1. 1(X
n 1.2)
2. 1(X
n 6 [1.8, 2.2])
3. 1(X
n 6 [10, 30])
4. 1(X
Test 1 rejects too often (in fact, for every n, you reject with probability 1). Test 2
n is close to 2. Test 3 is not so bad, Test 4
is even worse, since it rejects even when X
accepts too often.
Basically, we are worried when a test is wrong. Since the test itself is a random
variable, we cannot guarantee that a test is never wrong, but we can characterize
how often it would be wrong.
There are two types of mistakes that we are worried about:
Type I error: rejecting H0 when it is true. (This is the problem with tests 1
and 2.)
Type II error: Accepting H0 when it is false. (This is the problem with test
4.)
Let Tn T (X1 , . . . , Xn ) denote the sample test statistic. Consider a test with rejection region R (i.e., test is 1(Tn R)). Then:
P (type I error) = P (Tn R | 0 )
P (type II error) = P (Tn 6 R | c0 )

Example: X1 , X2 i.i.d. Bernoulli, with probability p.
1
2
vs. H1 : p 6= 12 .

2
Consider the test 1 X1 +X
6= 1 .
2
Test H0 : p =
Type I error: rejecting H0 when p = 21 .

2
2
2
6= 1|p = 21 ) = P( X1 +X
= 0|p = 12 )+ P( X1 +X
=
P(Type I error) = P( X1 +X
2
2
2
1
1
1
1
3
|p = 2 ) = 4 + 2 = 4 .
2
Type II error: Accepting H0 when p 6= 12 .

For p 6= 21 :
0 with prob (1 p)2

X1 + X2 1
with prob 2(1 p)p
=
2
2
1 with prob p2
2
So P( X1 +X
= 1|p)=p2 . Graph.
2

Power function
More generally, type I and type II errors are summarized in the power function.
9
Definition: the power function of a hypothesis test with a rejection region R is the
function of defined by () = P (Tn R|).
Example: For above example,

2
(p) = P X1 +X
6= 1|p = 1 p2 . Graph
2
Power function gives the Type I error probabilities, for any singleton null hypothesis H0 : = 0 .
From (p), see that if you are worried about Type I error, then you should only
use this test when your null is that p is close to 1 (because only for p0 close to
1 is the power function low).
1 (p) gives you the Type II error probabilities, for any point alternative
hypothesis
So if you are worried about Type II error, then (p) tells you that you should
use this test when your alternative hypothesis postulates that p is low (close to
zero). We say that this test has good power against alternative values of p close
to zero.
Important: power function is specific to a given test 1(Tn R), regardless of
the specific hypotheses that the test may be used for.

Example: X1 , . . . , Xn U [0, ].
~ < c).
Test H0 : 2 vs. H1 : > 2. Derive () for the LR test 1((X)
10
Recall our earlier derivation of LR test in Eq. (1). Hence,

~ < c|)
() = P ((X)
= P (max(X1 , . . . , Xn ) > 2|)
= 1 P (max(X1 , . . . , Xn ) < 2|)

0 for 2
=
n
1 2
for > 2.
Graph.

In practice, researchers often concerned about Type I error (i.e., dont want to reject
H0 unless evidence overwhelming against it): conservative bias?
But if this is so, then you want a test with a power function () which is low for
0 , but high elsewhere:
This motivates the definition of size and level of a test.

For 0 1, a test with power function () is a size test if sup0 () =
.
For 0 1, a test with power function () is a level test if sup0 ()
.
The 0 at which the sup is achieved is called the least favorable value of
under the null, for this test. It is the value of 0 for which the null holds,
but which is most difficult to distinguish (in the sense of having the highest
rejection probability) from any alternative parameter 6 0 .
11
Reflecting perhaps the conservative bias, researcher often use tests of size =0.05,
or 0.10.
n N (, 1/n), and Zn ()
Example:
X1 , . . . , Xn i.i.d. N (, 1). Then X

n(Xn ) N (0, 1).
Consider the test 1 (Zn (2) > c), for the hypotheses
H0 : 2 vs. H1 : > 2.
The power function
n 2) > c|)
() = P ( n(X
n ) > c + n(2 )|)

= P ( n(X
= 1 (c + n(2 ))
where () is the standard normal CDF. Note that () is increasing in .
Size of test= sup2 1 (c + n(2 )). Since () is increasing in , the

supremum occurs at = 2, so that size is (2) = 1 (c).
Assume you want a test with size . Then you want to set c such that
1 (c) = c = 1 (1 ).
Graph: c is the (1 )-th quantile of the standard normal distribution. You
can get these from the usual tables.
For = 0.025, then c = 1.96. For = 0.05, then c = 1.64.

Now consider the above test, with c = 1.96, but change the hypotheses to H0 : = 2
vs. H1 : 6= 2.
12
Test still has size = 0.05.

But there is something intuitively wrong about this test. You are less likely to reject
when < 2. So the Type II error is very high for the alternatives < 2.
We wish to rule out such tests
Definition: a test with power function () is unbiased if (0 ) (00 ) for every
pair (0 , 00 ) where 0 c0 and 00 0 .
Clearly, the test above is biased for the stated hypotheses. What would be an unbiased
test, with the same size = 0.05?
Definition: Let C be a class of tests for testing H0 : 0 vs. H1 : c0 . A
test in class C, with power function (), is uniformly most powerful (UMP) in
class C if, for every other test in class C with power function ()
() (),
for every c0 .
Often, the classes you consider are test of a given size . Graphically, the power
function for a UMP test lies above the upper envelope of power functions for all other
tests in the class, for c0 :
(graph)
2.1
UMP tests in special case: two simple hypotheses
It can be difficult in general to see what form an UMP test for any given size is.
But in the simple case where both the null and alternative hypotheses are simple
hypotheses, we can appeal to the following result.
Theorem 8.3.12 (Neyman-Pearson Lemma): Consider testing
H0 : = 0 vs. H1 : = 1
13
(so both the null and alternative are singletons). The pdf or pmf corresponding to i
~ i ), for i = 0, 1. The test has a rejection region R which satisfies:
is f (X|
~ R
X
~ Rc
X
~ 1 ) > k f (X|
~ 0)
if f (X|
~ 1 ) < k f (X|
~ 0)
if f (X|
()
and
~ R|0 ).
= P (X
Then:
Any test satisfying (*) is a UMP test with level .
If there exists a test satisfying (*) with k > 0, then every UMP level test is
a size test and every UMP level test satisfies (*).
In other words, for simple hypotheses, a likelihood ratio test of the sort
!
~ 1)
f (X|
>k
1
~ 0)
f (X|
~
1)
is UMP-size (where k is chosen so that P ( ff ((X|
> k|0 ) = ).
~
X|
)
0
~ which for these hypotheses is

Note that this LR statistic is different than (X),
~
f (X|0 )
~ 0 ),f (X|
~ 1 )] .
max[f (X|
Example: return to 2-coin toss again. Test H0 : p = 21 vs. H1 : p = 43 .
P
The likelihood ratios for the three possible outcomes i Xi = 0, 1, 2 are:
f (0|p = 34 )
1
1 =
4
f (0|p = 2 )
f (1|p = 43 )
3
1 =
4
f (1|p = 2 )
f (2|p = 43 )
9
1 = .
4
f (2|p = 2 )
~ =
Let l(X)
~
l(X):
P
f ( i Xi |p= 43 )
P
.
f ( i Xi |p= 21 )
Hence, there are 4 possible rejection regions, for values of
14
(i) ( 94 , +)
(ii) ( 34 , +)
(iii) ( 41 , +)
(iv) (, +).
Hence, the NP test would have one of these four rejection regions, depending on the
desired size:
a test with R=(i) is UMP-size 0 (implemented by setting k > 49 )
a test with R=(ii) is UMP-size 14 (implemented by setting 34 < k < 94 )
a test with R=(iii) is UMP-size 34 (implemented by setting 14 < k < 34 )
a test with R=(iv) is UMP-size 1 (implemented by setting k < 14 )
Example A: X1 , . . . , Xn N (, 1). Test H0 : = 0 vs. H1 : = 1 . (Assume
1 > 0 .)
By the NP-Lemma, the UMP test has rejection region characterized by

P
exp 21 i (Xi 1 )2
> k.
P
exp 12 i (Xi 0 )2
Taking logs of both sides:
(1 0 )
Xi > log k +
1X
Xi >
n i
1
n
n
(1 0 )2
2
log k + 12 (1 0 )2
d.
(1 0 )
n > d) = , where is the desired size. This

where d is determined such that P (X
makes intuitive sense: you reject the null when the sample mean is too large (because
1 > 0 ).

Under thenull, n(X
n 0 ) N (0, 1), so for = 0.05, you want to set d =
0 + 1.64/ n.
15
2.1.1
Discussion of NP Lemma
(From Amemiya pp. 190-192) First, consider a Bayesian approach to this testing
problem. Assume that decisionmaker incurs loss 1 if he mistakenly chooses H1 when
H0 is true, and 2 if he mistakenly chooses H0 when H1 is true. Then, given data
observations ~x, he will
Reject H0 (=accept H1 ) 1 P (0 |~x) < 2 P (1 |~x) ()
where P (0 |~x) denotes the posterior probability of the null hypothesis given data ~x.
In other words, this Bayesians rejection region R0 is given by

1
P (1 |~x)
>
.
R0 = ~x :
P (0 |~x)
2
If we multiply and divide the ratio of posterior probabilities by f (~x), the (marginal)
joint density of ~x, and use the laws of probability, we get:

P (1 |~x)f (~x)
1
R0 = ~x :
>
P (0 |~x)f (~x)
2

L(~x|1 )P (1 )
1
= ~x :
>
L(~x|0 )P (0 )
2

1 P (0 )
L(~x|1 )
>
c .
= ~x :
L(~x|0 )
2 P (1 )
In the above, P (0 ) and P (1 ) denote the prior probabilities for, respectively, the null
and alternative hypotheses. This provides some intuition for the likelihood ratio
form of the NP rejection region.
Next we see how this Bayesian-motivated rejection region is UMP. In the case of two
simple hypotheses: the feature of UMP reduces to that of being admissible, in the
sense that among all tests with size , the UMP/admissible test(s) must have the
smallest type 2 error . For any testing scenario, the frontier (in (, ) space) of
potential admissible tests is convex. (Why?)
Hence we are going to show that the test derived above with rejection region R0
lies on the (, ) frontier. Reconsider the (Bayesian) optimal test problem: choose
rejection region R to minimize expected loss
~ R)P (X
~ R) + 2 P (1 |X
~ R)P
(X
~ R).
min (R) 1 P (0 |X
R
We will show that the region R0 defined above optimizes this problem. Consider any
1 ). Then we can rewrite
other region R1 . Recall that R0 = (R0 R1 ) (R0 R
1 )P (R0 R
1)
(R0 ) =1 P (0 |R0 R1 )P (R0 R1 ) + 1 P (0 |R0 R
0 R1 )P (R
0 R1 ) + 2 P (1 |R
0 R
1 )P (R
0 R
1 ).
+ 2 P (1 |R
16
0 )P (R1 R
0)
(R1 ) =1 P (0 |R1 R0 )P (R1 R0 ) + 1 P (0 |R1 R
1 R0 )P (R
1 R0 ) + 2 P (1 |R
1 R
0 )P (R
1 R
0 ).
+ 2 P (1 |R
First and fourth terms of equations above are identical. From the definition of R0
1 , we know
above, we know that (R0 )ii < (R1 )iii . That is, for all ~x R0 R0 R
from (*) that 1 P (0 |~x) < 2 P (1 |~x). Similarly, we know that (R0 )iii < (R1 )ii ,
implying that (R0 ) < (R1 ).
Moreover, using the laws of probability, we can re-express
1 ) 0 R + 1 R
(R) = 1 P (0 )P (R|0 ) + 2 P (1 )P (R|
with 0 = 1 P (0 ) and 1 = 2 P (1 ). Hence, graphically, in (, ) space, the optimal
test (with region R0 ) lies at tangency of (, ) frontier with a line with slope 0 /1 .
Hence, the optimal Bayesian test (R0 ) is admissible.
Now consider non-Bayesian statistician. He doesnt want to specify priors P (0 ),
P (1 ), so 0 , 1 are not fixed. But he is willing to specify a desired size . Hence the
optimal test is the one with a rejection region characterized by the slope of the line
tangent at in the (, ) space. From the above calculations, we know that this
slope is equivalent to c. That is, the NP test corresponds to an optimal Bayesian
test with 0 /1 = c.
2.2
Extension of NP Lemma: models with monotone likelihood ratio
The case covered by NP Lemma that both null and alternative hypotheses are simple
is somewhat artificial. For instance, we may be interested in the one-sided hypotheses H0 : 0 vs. H1 : 0 , where is scalar. It turns out UMP for one-sided
hypotheses
n
o exists under an additional assumption on the family of density functions
~
f (X|)
:
~
~
Definition: the family of densities f (X|)
has monotone likelihood ratio in T (X)
0
~ such that for any pair < the densities f (X|)
~
is there exists a function T (X)
0
0
~
~
~
and f (X| ) are distinct and the ratio f (X| )/f (X|) is a nondecreasing function of
~ That is, there exists a nondecreasing function g() and function T (X)
~ such
T (X).
~ 0 )/f (X|)
~
~
that f (X|
= g(T (X)).
~ = x and T (x) = x, then MLR in x means that f (x|0 )/f (x|) is
For instance: let X
nondecreasing in x. Roughly speaking: larger xs are more likely under larger s.
~
~ Then:
Theorem: Let be scalar, and let f (X|)
have MLR in T (X).
17
~ which is
(i) For testing H0 : 0 vs. H1 : 0 , there exists a UMP test (X),
given by
~ >C
1 when T (X)
~ =
~ =C
(X)
when T (X)
~ <C
0 when T (X)
~ = .
where (, C) are chosen to satisfy E0 (X)
~ is strictly increasing for all points for which
(ii) The power function () = E (X)
0 < () < 1.
(iii) For all 0 , the test described in part i is UMP for testing H00 : 0 vs.
H10 : 0 at level = (0 ).
(iv) For any < 0 , the test minimizes () (type-I error).
Proof: See Lehmann and Romano, Testing Statistical Hypotheses, pg. 65. Sketch:
~ 1 )/f (X|
~ 0 ).
Consider testing 0 vs. 1 > 0 . By NP Lemma, this depends on ratio f (X|
~ where g() is nondecreasing.
Given MLR condition, this ratio can be written g(T (X))
~ 1 )/f (X|
~ 0 ) is large, which is when T (X) is large;
Then UMP test rejects when f (X|
~
~ is also UMP-size for
this is test (X). Since this test does not depend on 1 , (X)
testing H0 : 0 vs. H1 : 1 > 0 (composite alternative).
00
00
~ is also UMP-size E0 (X)
~
Now since MLR holds for all (0 , ; > 0 ), the test (X)
00
00
0
0
~ cannot be UMP (why?).
for testing vs. . Hence ( ) ( ), otherwise (X)
00
~ ) and f (X|
~ 0 ) rules out (00 ) = (0 ).
Furthermore, the distinctiveness of f (X|
Hence we get (ii).
~ for testing 0 vs.

Then from (ii), we can extend the UMP-size feature of (X)
H1 : > 0 to the composite hypotheses H0 : 0 vs. H1 : > 0 , which is (i).
(iii) and (iv) are immediate consequences.

Example A contd: From
for 1 > 0 , the likelihood ratio
Pabove, we see that
n . Hence,
simplifies to exp (1 0 ) i Xi n2 (21 20 ) which is increasing in X
~ =X
n.
this satisfies MLR with T (X)
Using the theorem above, the one-sided T-test which rejects when
n > 0 + 1.64
X
n
is also UMP for size = 0.05 for the one-sided hypotheses H0 : 0 vs. H1 : >
0 . Call this test 1.
18
Taking this example further, we have that for the one-sided hypotheses H0 : > 0
n < 0 1.64
will be UMP
vs. H1 : < 0 , the one-sided T-test which rejects when X
n
for size = 0.05. Call this test 2.
Now consider testing
H0 : = 0 vs. H1 : 6= 0 . ()
Note that the alternative hypothesis is equivalent to < 0 or > 0 . Can we
find an UMP size- test?
For any alternative point 1 > 0 , test 1 is UMP, implying that 1 (1 ) is maximal
among all size- tests. Furthermore, by second part of NP Lemma, if a UMP size-
test exists for problem (*), it must have the same rejection region as test 1.
For 2 < 0 , however, test 2 is UMP, implying that 2 (2 ) is maximal. Furthermore,
1 (2 ) < < 2 (2 ) from part (ii) of theorem above, which contradicts the statement
above that test 1 is UMP for this problem. Thus there is no UMP size- test for
problem (*).
But note that both Test 1 and Test 2 are biased for the hypotheses
(*). It turns
out
that the two-sided T-test which rejects when Xn > 0 +1.96/ n or Xn < 0 1.96/ n
is UMP among size- unbiased tests. See discussion in CB, pp. 392-395.
19
Large-sample properties of tests
In practice, we use large-sample theory that is, LLNs and CLTs in order to
determine the approximate critical regions for the most common test statistics.
Why? Because finite-sample properties can be difficult to determine:
Example: X1 , . . . , Xn i.i.d. Bernoulli with prob. p.
Want to test H0 : p
1
2
n =
vs. H1 : p > 21 , using the test stat X
1
n
Xi .
n is the distribution of
n is finite. The exact finite-sample distribution for X
a B(n, p) random variable, which is:

0 with prob n0 (1 p)n
n1 with prob 1 p(1 p)n1

2
with prob n2 p2 (1 p)n2
n
... ...
1 with prob pn
1
n
times
n > c), where the critical value c is

Assume your test is of the following form: 1(X
n > c|p) = , for some specified .1
to be determined such that the size supp 1 P (X
2
This equation is difficult to solve for!
d
On the other hand, by the CLT, we know that n(Xn p) N (0, 1). Hence, consider
p(1p)
q1

1
1
the T-test statistic Zn n Xn 2 / 4 = 2 n X
n 2 .
Under any p
1
2
in the null hypothesis,

P (Zn > 1 (1 )) ,
for n large enough. (In fact, this equation holds with equality for p = 12 , and holds
with strict inequality for p < 12 .)
Corresponding to this test, you can derive the asymptotic power function, which is
a (p) limn P (Zn > c), for c = 1 (1 ):
(Graph)
1
Indeed, the Clopper-Pearson (1934) confidence intervals for p are based on inverting this exact
finite-sample test.
20
Note that the asymptotic power function is equal to 1 at all values under the
alternative. This is the notion for consistency for a test: that it has asymptotic
power 1 under every alternative.
Note also that asymptotic power (rejection probability) is zero under every p of
the null, except p = 21 .
(skip) Accordingly, we see that asymptotic power, vs. fixed alternatives, is not
a sufficiently discerning criterion for distinguishing between tests.
We can deal
1
with this by considering local alternatives of the sort p = 2 + h/ n. Under
ad
n(Xn p)
ditional smoothness assumptions on the distributional convergence of
p(1p)
around p =
alternatives.
3.1
1
,
2
we can obtain asymptotic power functions under these local
Likelihood Ratio Test Statistic: asymptotic distribution
Theorem 10.3.1 (Wilks Theorem): For testing H0 : = 0 vs. H1 : 6= 0 , if

X1 , . . . , Xn i.i.d. f (X|), and f (X|) satisfies the regularity conditions in Section
10.6.2. Then under H0 , as n :
d
~
2 log (X)
21 .
Note: 21 denotes a random variable from the Chi-squared distribution with 1 degree
of freedom. By Lemma 5.3.2 in CB, if Z N (0, 1), then Z 2 21 . Clearly, 2 random
variables only have positive support.
Proof: Assume null holds. Use Taylor-series expansion of log-likelihood function
around the MLE estimator n :
X
i
log f (xi |0 ) =
X
i
log f (Xi |n ) +
X

log f (Xi |)=n (0 n )

1 X 2
log f (Xi |)=n (0 n )2 + ...
(4)
2
2 i
X

1 X 2
(0 n )2 + ...
=
log f (Xi |n ) +
log
f
(X
|)
i
=n
2
2
i
i
+
where P
(i) the second term
disappeared because the MLE n sets the first-order con

dition i log f (Xi |) =n = 0; (ii) the remainder term is op (1). This is a secondorder Taylor expansion.
21
Rewrite the above as:

2
log
Now
f (xi |0 )
f (Xi |n )
!
=
h
i2

1 X 2

log
f
(X
|)
n(
)
+ op (1).
i
0
n
=n
n i 2

2
1
1 X 2
p

,
log
f
(X
|)
E
log f (Xi |0 ) =
i
0
n
=
2
2
n i
V0 (0 )
where V0 (0 ) denotes the asymptotic variance of the MLE estimator (and is equal to
the familiar information bound 1/I(0 )).
n(n 0 ) d
Finally, we note that
N (0, 1). Hence, by the definition of the 21 random
V0 (0 )
variable:
~ 2 .
2 log (X)
1

In the multivariate case ( being k-dimensional), the above says that
a
~ = n(0 n )I(0 )1 (0 n ) 2 .
2 log((X))
k
Hence, LR-statistic is asymptotically equivalent to Wald and Score tests.

Example: X1 , . . . , XN i.i.d. Bernoulli with prob. p. Test H0 : p =
H1 : p 6= 21 . Let Yn denote the number of 1s.
~ =
(X)
n
yn
n
yn
1
2
vs.

1 yn 1 nyn
2
2
.

yn yn nyn nyn
n
n
For test with asymptotic size :

~ c) (c < 1)
= P ((X)
~ 2 log c)
= P (2 log (X)
= P (21 2 log c)
= 1 F21 (2 log c)

1 1
c = exp F2x (1 ) .
2
1
For instance, for = 0.05, then F1
2 (1 ) = 3.841; = 0.10, then F2 (1 ) =
x
x
2.706.
22

Lect 8

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Lect 8

Загружено:

Авторское право:

Доступные форматы

Hypothesis Testing

CB: chapter 8; section 10.3

Definitions of test statistics

A test statistic, similarly to an estimator, is just some real-valued function Tn

Likelihood Ratio Test

~ = X1 , . . . , Xn i.i.d. f (X|), and likelihood function L(|X)

The support of the LR test statistic is [0, 1].

n is the unrestricted MLE estimator for .

Unrestricted likelihood function

which is maximized at nM LE = max(X1 , . . . , Xn ).

~ c). Critical region consists of two disconnected parts

(ii) Test H0 : [0, 2] vs. H1 : > 2.

~ c) = 1(max(X1 , . . . , Xn ) > 2).

The quantity on the LHS is the T-test statistic.

Under H0 and < 0 : Zn diverges to with probability 1. Under H0 and = 0 :

log f (xi |).

Let H0 : = 0 . The sample score function evaluated at 0 is

(the information inequality). Hence,

is the usual variance matrix for MLE, which is the CRLB.)

Therefore, you can apply the CLT to get that, under H0 ,

right tail of the N (0, 1) distribution.

Sn nS(0 )0 V01 S(0 ) 2k .

Score test requires only the restricted model to be estimated.

Methods of evaluating tests

Type I error: rejecting H0 when p = 21 .

Type II error: Accepting H0 when p 6= 12 .

0 with prob (1 p)2

Recall our earlier derivation of LR test in Eq. (1). Hence,

This motivates the definition of size and level of a test.

n ) > c + n(2 )|)

Size of test= sup2 1 (c + n(2 )). Since () is increasing in , the

For = 0.025, then c = 1.96. For = 0.05, then c = 1.64.

Test still has size = 0.05.

UMP tests in special case: two simple hypotheses

~ which for these hypotheses is

Hence, there are 4 possible rejection regions, for values of

n > d) = , where is the desired size. This

Extension of NP Lemma: models with monotone likelihood ratio

~ for testing 0 vs.

Large-sample properties of tests

n1 with prob 1 p(1 p)n1

n > c), where the critical value c is

in the null hypothesis,

we can obtain asymptotic power functions under these local

Likelihood Ratio Test Statistic: asymptotic distribution

Theorem 10.3.1 (Wilks Theorem): For testing H0 : = 0 vs. H1 : 6= 0 , if

Rewrite the above as:

For test with asymptotic size :

Вам также может понравиться