Вы находитесь на странице: 1из 31

Chapter 8: Hypothesis Testing

1 Hypotheses
A hypothesis is a statement about a population parameter. Often, there are
two complementary statements/hypotheses about θ, respectively called the
null hypothesis and alternative hypothesis.

Let Θ be the parameter space. Let Θ0 be a subset of the parameter


space, called null region. A pair of hypotheses are denoted by H0 and H1 ,

H0 : θ ∈ Θ0
vs H1 : θ ∈ Θc0 .

Note Θ = Θ0 ∪ Θc0 .

Different Types of Hypotheses


(1) simple hypotheses: both H0 and H1 consist of only one probability
distribution.
H0 : θ = θ0 vs H1 : θ = θ1 .
(2) composite hypotheses: either H0 or H1 has more than one distribution:
• one-sided hypotheses: H0 : θ ≥ θ0 vs H1 : θ < θ0 .
• one-sided hypotheses: H0 : θ ≤ θ0 vs H1 : θ > θ0 .
• two-sided hypotheses: H0 : θ = θ0 vs H1 : θ = θ0 .

Example 1:
An ideal manufacturing process requires that all products are non-defective.
This is very seldom. The goal is to keep the proportion of defective items
as low as possible. Let p be the proportion of defective items, and 0.01 be
the maximum acceptable proportion of defective items.
statement 1: p ≥ 0.01 (the proportion of defectives is unacceptably high)
statement 2: p < 0.01 (acceptable quality)

Example 2:
Let θ be the average change in a patient’s blood pressure after taking a
drug. An experimenter might be interested in testing
H0 : θ = 0 (the drug has no effect on blood pressure)
H1 : θ = 0 (there is some effect)

91
Rejection region
A hypothesis testing procedure or hypothesis test is a rule that specifies:
i. for which sample values H0 is accepted as true
ii. for which sample values H0 is rejected and H1 is accepted as true.

The subset of the sample space for which H0 will be rejected is denoted as
R and called the rejection or critical region.
The complement set Rc is called the acceptance region.

The rejection region R of a hypothesis test is usually defined through a test


statistic W (X), a function of the sample. For example,

R = {X : W (X) > b}.

If X ∈ R, one rejects H0 (or accepts H1 ); otherwise if X ∈ Rc , one accepts


H0 (or rejects H1 ).

Example 2: Let X̄ be the average change of blood pressure for the sampled
patients. We are interested in testing

H0 : θ = 0, vs H1 : θ = 0.

The rejection region may look like R = { S/X̄√n > 3}.

92
2 Two Types of Errors
When deciding to accept or reject H0 , we might make a mistake no matter
whatever the decision is:
Type I error: if H0 is true (θ ∈ Θ0 ), but the test incorrectly rejects H0 .
Type II error: if H0 is false (θ ∈ Θc0 ), but the test incorrectly accepts H0

Decision
Truth Accept H0 Reject H0
H0 Correct decision Type I error
H1 Type II error Correct decision

Power Function:
The power function β(θ) of a hypothesis test with rejection region R is
the function of θ defined by
β(θ) = Pθ (X ∈ R) = Pθ (reject H0 )

Prob(committing Type I error) if θ ∈ Θ0
=
1 − Prob(committing Type II error) if θ ∈ Θc0
Therefore,
Prob(committing Type I error) = β(θ), for θ ∈ Θ0
Prob(committing Type II error) = 1 − β(θ), for θ ∈ Θc0 .

Remarks:
• An ideal test should have the power function satisfying:
β(θ) = 0 for all θ ∈ Θ0 ; β(θ) = 1 for all θ ∈ Θc0 .

• But such a test is almost impossible unless the truth is known. In


practice, a good test should have the power function satisfying
β(θ) is near 0 (small) for most θ ∈ Θ0 ;
β(θ) is near 1 (large) for most θ ∈ Θc0 .

93
Example: (Binomial power function)
Let X ∼Bin(5, θ). Consider testing
1 1
H0 : θ ≤ versus H1 : θ > .
2 2
Test 1: reject H0 if and only if all “successes” are observed, i.e R = {5}

(1) Compute and plot the power function.

(2) What is the maximal probability of making Type I error?

(3) What is the probability of making Type II error if θ = 23 ?

94
Test 2: reject H0 if X = 3, 4, or 5.

(1) Compute and plot the power function.

(2) What is the maximum Type I error probability?

(3) What is the probability of making Type II error if θ = 0.8?

95
3 Likelihood Ratio Tests (LRT)
Let L(θ|x) be the likelihood function of θ. The likelihood ratio test statistic
for testing H0 : θ ∈ Θ0 vs H1 : θ ∈ Θc0 is
supθ∈Θ0 L(θ|x) L(θ̂0 |x)
λ(x) = = ,
supθ∈Θ L(θ|x) L(θ̂|x)
where θ̂0 is the MLE of θ in Θ0 (“restricted” maximization); θ̂ is the MLE
of θ in the full set Θ = Θ0 ∪ Θc0 (“unrestricted” maximization).

A likelihood ratio test (LRT) is a test that has a rejection region


R : {x : λ(x) ≤ c},
where c is any number satisfying 0 ≤ c ≤ 1.

Comments:
• The numerator in λ(x) is the maximum probability of the observed
sample x computed over parameters in H0 . The denominator is the
maximum probability of the observed x over all possible parameters.
• θ̂0 is the value in Θ0 which makes the observation of data most likely;
θ̂ is the value in Θ which makes the observation of data most likely.
• If λ(x) is small, it implies that there are some parameter points in
H1 for which the observed sample is much more likely than for any
parameter in H0 . So the LRT suggests we reject H0 and accept H1 .
• The LRT statistic λ(x) is a function of x not a function of θ.
• 0 ≤ λ(x) ≤ 1.

About the cut-off value c


• Different choices of c ∈ [0, 1] give different tests and rejection regions.
• The smaller c, the smaller Type I error; The larges c, the smaller Type
II error.
• We will discuss the ideal choice of c later.

After finding an expression for λ(x), we should get the simplest


expression for R.

96
Example: (Normal One-sided LRT) X1 , . . . , Xn ∼ iid N (θ, σ 2 ) with θ un-
known and σ 2 known. Consider testing

H0 : θ ≤ θ0 versus H1 : θ > θ0 .

(i) Find the LRT and its power function.


(ii) Comment on the decision rules given by different c’s.

97
Example: Let X1 , . . . , Xn be a random sample from a location-exponential
family
f (x|θ) = e−(x−θ) if x ≥ θ, −∞ < θ < ∞.
Test H0 : θ ≤ θ0 versus H1 : θ > θ0 . Find the LRT and its power function.

98
LRT based on Sufficient Statistics
If T is sufficient for θ, then we can construct the LRT based on T and the
likelihood function L∗ (θ|t) = g(t|θ). Since T (x) contains all the information
about θ in x, the test based on T should be as good as the test based on x.
In fact, the tests are equivalent.

Theorem
If T (X) is a sufficient statistic for θ, λ∗ (t) is the LRT statistic based on
T , and λ(x) is the LRT statistic based on x. Then

λ∗ (T (x)) = λ(x)

for every x in the sample space.


Proof:

Comment: The simplified expression for λ(x) should depend on x only


through T (x) if T (X) is a sufficient statistic for θ.

99
Example: (Normal Two-sided LRT) X1 , . . . , Xn ∼ iid N (θ, σ 2 ) with σ 2
known. Test
H0 : θ = θ0 versus H1 : θ = θ0 .
Find the LRT and its power function.

100
LRT with Nuisance Parameters

Example: X1 , . . . , Xn ∼ iid N (θ, σ 2 ), both µ and σ 2 unknown. Test

H 0 : θ ≤ θ0 versus H1 : θ > θ0 .

(i) Specify Θ and Θ0 .


(ii) Find the LRT and the power function.

101
4 Unbiased Test
Sometimes we would like a test to be more likely to reject H0 if θ ∈ Θc0 than
if θ ∈ Θ0 , i.e.,
Prob(reject H0 when H0 is false)≥ Prob(reject H0 when H0 is true).
A test with such a property is unbiased. Recall β(θ) = Prob(reject H0 ).

Definition: A test with power function β(θ) is unbiased if

β(θ ) ≥ β(θ ), for every θ ∈ Θc0 and θ ∈ Θ0 .

• In most problems, there are many unbiased tests.

Example: (Unbiased test for Binomial)


Let X ∼Bin(5, θ). Consider testing
1 1
H0 : θ ≤ versus H1 : θ >
2 2
with the procedure:

reject H0 if X = 5.

Show that the test is unbiased.

102
Example: (Unbiased test for Normal)
Let X1 , ..., Xn ∼ N (θ, σ 2 ),with σ 2 known. Consider testing

H 0 : θ ≤ θ0 versus H1 : θ > θ0 .

(1) Construct the LRT.


(2) Graph the power function, and show the LRT is unbiased.
(3) If we wish to have a maximum Type I error probability of 0.1 and
to have a maximum Type II error probability of 0.2 if θ > θ0 + σ, how to
choose c and n?

103
5 Controlling Type I error Probability

For a fixed sample size, it is usually impossible to make both types of error
arbitrarily small. It is common to first control Type I error probability at
a specified level, α:

Maximal probability of making Type I error ≤ α.

Typical choices of α are: 0.01, 0.05, 0.10.

Size α test: A test with power function β(θ) is a size α test if

sup β(θ) = α.
θ∈Θ0

Level α test: A test with power function β(θ) is a level α test if

sup β(θ) ≤ α.
θ∈Θ0

1 1
Example: Let X ∼Bin(5, θ). Consider testing H0 : θ ≤ 2 vs H1 : θ > 2
with the procedure: reject H0 if X = 5.

• Is this test a level 0.05 test?

• Is this test a size 0.05 test?

• What is the size α of this test?

104
Remark 1: How to specify H0 and H1 ?
• Remember: Only Type I error probability is controlled in our proce-
dures. If an experimenter expects an experiment to support a partic-
ular hypothesis, but s/he does not wish to make the assertion unless
the data really gives convincing evidence. The test can be set up as

H1 is the one she expects the data to support and hopes to prove.

Example 1: H1 is sometimes called the research hypothesis. By using


a level α test with small α, the experiment is guarding against saying
the data support the research hypothesis when it is false. For example,

H0 : new tech./drug is same as old one; H1 : new tech./drug is better.

• Sometimes it is believed “announcing that a new phenomenon has


been observed when in fact nothing has happened (the so-called null
hypothesis) is more serious than missing something new that has in
fact occurred”. This statement is not always persuasive though.
Example 2: Similarly, in judicial system the evidence is collected
to decide whether the accused is innocent or guilty. To prevent the
possibility of penalizing an innocent person, the test should be set up

H0 : the accused is innocent; H1 : the accused is guilty

Remark 2: Generally take the following two steps to construct a good test:
(i) firstly, construct all the level α tests.
(ii) secondly, within this class of tests, we search for the test with Type
II error probability as small as possible; equivalently, we want the test with
the largest power if θ belongs to alternatives.

Review on (1 − α)th quantile of a distribution


Use zα to denote the point having probability α to the right of it for a
distribution Z ∼ N (0, 1), i.e.,P (Z > zα ) = α or zα = Φ−1 (1 − α).
• Note zα = −z1−α .
• Commonly used cutoffs:
z0.05 = 1.645, z0.025 = 1.96, z0.01 = 2.33, z0.005 = 2.58.
z0.95 = −1.645, z0.975 = −1.96, z0.99 = −2.33, z0.995 = −2.58.
Also, P (Tn−1 > tn−1,α ) = α and P (χ2p > χ2p,α ) = α

105
6 Choose c for LRT
Choose c such that Type I error probability of LRT is bounded above by α,

sup P (λ(X) ≤ c) = α.
θ∈Θ0

Example: n samples iid N (θ, σ 2 ), σ 2 known. Test H0 : θ ≤ θ0 vs H1 : θ > θ0 .


(1) Find the size α LRT test.
(2) Find size 0.05 test and 0.01 test.

106
Example: iid N (θ, σ 2 ), σ 2 is known. Consider testing for H0 : θ = θ0 vs
H1 : θ = θ0 . Find the size α LRT test.

Example: Let X1 , . . . , Xn be iid from N (θ, σ 2 ), σ 2 unknown. Consider


testing H0 : θ = θ0 versus H1 : θ = θ0 . Show that the LRT test that
rejects H0 if 
|X̄ − θ0 | > tn−1,α/2 S 2 /n
is a test of size α. (Homework 8.38)

107
Example: iid location-exponential dist. Consider testing H0 : θ ≤ θ0 vs
H1 : θ > θ0 . Find the size α LRT test.

108
7 Sample Size Calculation
For fixed n, it is usually impossible to make both types of error probabilities
arbitrarily small. But if we can increase n, it is possible to achieve the
desired power level.
Example: iid N (θ, σ 2 ), σ 2 is known. Test H0 : θ ≤ θ0 vs H1 : θ > θ0 . The

LRT test rejects H0 if (X̄ − θ0 )/(σ/ n) > C has the power function
 
θ0 − θ
β(θ) = 1 − Φ C + √ .
σ/ n

Note β(θ) is increasing in θ.

(1) The maximum Type I error is

sup β(θ) = β(θ0 ) = 1 − Φ(C)


θ≤θ0

no matter what n is. To make the test have size α, we choose C = zα .

(2) After C is chosen, it is possible to increase β(θ) for θ > θ0 by increasing


n. Thus we can minimize Type II error (Recall that Type I error is
already under control). Draw the picture of β(θ) for small and large
n. [Note: this is generally impossible when n is fixed].

(3) Assume C = zα . What n should we choose such that the maximum


Type II error is 0.2 if θ ≥ θ0 + σ?

(4) Compute n if we choose α = 0.05 in (3).

109
Example :
Let X ∼Bin(n, θ). Testing:

H0 : θ ≥ 3/4 vs H1 : θ < 3/4.

The LRT test for this problem is: (Exercise 8.3)

reject H0 if X ≤ c.

Choose c and n such that the following satisfies simultaneously:


(i) If θ = 34 , we have P r(reject H0 |θ) = 0.01; (control Type I error)
(ii) If θ = 12 , we have P r(reject H0 |θ) = 0.99. (control Type II error)

110
8 Uniformly Most Powerful (UMP) Tests

A good class of hypothesis tests are those with a small probability (say, less
than α) of Type I error; a desired test in such a class would also have a
small Type II error, or, a large power function for θ ∈ Θc0

Let C be a class of tests for H0 : θ ∈ Θ0 vs H1 : θ ∈ Θc0 .

A test in class C with power function β(θ), is uniformly most powerful in


class C (UMP) if
β(θ) ≥ β  (θ), ∀θ ∈ Θc0 ,
for every β  (θ) which is a power function of another test in C.

If we consider
C ={all the level α tests}.
The UMP test in this class is called a UMP level α test. It is the best test
in the class C, or the most powerful level α test.

Interpretation: The power function β(θ) of the UMP level α test satisfies:
β(θ) ≥ β  (θ) for θ ∈ Θc0
where maxθ∈Θ0 β(θ) ≤ α
maxθ∈Θ0 β  (θ) ≤ α.

Test function:
For each testing procedure, define a test function on the sample space

1 if x ∈ R
φ(x) = .
0 if x ∈ /R
Note φ(x) = I(x ∈ R). The expected value of φ is the power function:
Eθ [φ(X)] = Pθ (X ∈ R) = β(θ).

111
When do UMP tests exist and how to find it?

For simple hypotheses,

H 0 : θ = θ0 vs H1 : θ = θ1

the UMP level α test always exists.

Neyman-Pearson Theorem:
Consider testing H0 : θ = θ0 versus H1 : θ = θ1 , where the pdf or pmf
corresponding to θi is f (x|θi ), i = 0, 1, using a test with rejection region R
that satisfies

x ∈ R if f (x|θ1 ) > kf (x|θ0 ),


x ∈ Rc if f (x|θ1 ) < kf (x|θ0 ), (1)

for some k ≥ 0, and


α = Pθ0 (X ∈ R). (2)
Then
a) (Sufficiency) Any test that satisfies (1) and (2) is a UMP level α test.
b) (Necessary) If there is a test satisfying (1) and (2) with k > 0, then
i) every UMP level α test is a size α test;
ii) every UMP level α test satisfies (1) except on a set A satisfying
Pθ0 (X ∈ A) = Pθ1 (X ∈ A) = 0.

Proof: (see textbook)

112
Example: (UMP binomial test)
1
Let X ∼Bin(2,θ). We want to test H0 : θ = 2 versus H1 : θ = 34 .

113
Sufficiency statistic and UMP test
Consider H0 : θ = θ0 vs H1 : θ = θ1 . Suppose T (X) is sufficient for θ
and g(t|θi ) is the pdf or pmf of T corresponding to θi , i = 0, 1. Then any
test based on T with rejection region S is a UMP level α test if it satisfies

t ∈ S if g(t|θ1 ) > kg(t|θ0 ),


t ∈ Sc if g(t|θ1 ) < kg(t|θ0 ), (3)

for some k ≥ 0, where


α = Pθ0 (T ∈ S). (4)

Proof: (see textbook)

Example: (UMP normal test for mean)


Let X1 , . . . , Xn be iid from N (θ, σ 2 ) with σ 2 known. Consider testing
H0 : θ = θ0 versus H1 : θ = θ1 , where θ1 > θ0 . Find the UMP level α test.

114
Example: (UMP normal test for variance)
Let X1 , . . . , Xn be iid from N (0, σ 2 ) with σ 2 unknown. Consider testing

H0 : σ 2 = σ02 versus H1 : σ 2 = σ12 ,

where σ12 > σ02 . Find the UMP level α test.

115
9 Monotone Likelihood Ratio (MLR)
Question: When does the UMP test exist for one-sided composite hypothe-
ses? – Often when pdfs or pmfs have the monotone likelihood ratio property.
A family of pdfs or pmfs {g(t|θ) : θ ∈ Θ} for a univariate random variable
T with real-valued parameter θ has a monotone likelihood ratio (MLR) if
g(t|θ2 )/g(t|θ1 ) is an increasing function of t
for every θ2 > θ1 , on {t : g(t|θ1 ) > 0 or g(t|θ2 ) > 0}.

Examples:
• Normal, Poisson, Binomial all have the MLR property. (Exercise 8.25)

• If T is from an exponential family with the density


f (t|θ) = h(t)c(θ)ew(θ)t ,
then T has an MLR if w(θ) is a nondecreasing function in θ.

n
• If X1 , . . . , Xn iid from N (µ, σ 2 ) with µ known, then i=1 (Xi − µ)2
has an MLR.

Note: Monotone decreasing is similarly defined.

116
Karlin-Rubin Theorem
Suppose T (X) is a sufficient statistic for θ and the family {g(t|θi ), θ ∈ Θ}
is an MLR family. Then

(1) For testing


H : θ ≤ θ0 vs H1 : θ > θ0 ,
the UMP level α test is given by

rejects H0 if and only if T > t0 ,

where
α = Pθ0 (T > t0 ).

(2) For testing


H : θ ≥ θ0 vs H1 : θ < θ0 ,
the UMP level α test is given by

rejects H0 if and only if T < t0 ,

where
α = Pθ0 (T < t0 ).

Proof: (see textbook)

117
Example: Let X1 , . . . , Xn be iid from N (µ, σ 2 ), σ 2 known.
(i) Find the UMP level α test for testing H0 : θ ≤ θ0 vs H1 : θ > θ0 .

(ii) Find the UMP level α test for testing H0 : θ ≥ θ0 vs H1 : θ < θ0 .

118
Example: Let X1 , . . . , Xn be iid from N (µ0 , σ 2 ), σ 2 unknown, µ0 known.
Find the UMP level α test for testing

H0 : σ 2 ≤ σ02 vs H1 : σ 2 > σ02 .

Remark: Nonexistence of UMP test


• For many problems, there is no UMP level α test, because the class of
level α test is so large that no one test dominates all the others in terms of
power. Example 8.3.19 (textbook)
• Similar to UMVUE, we search a UMP test within some subset of the
class of level α test, for example, the subset of all unbiased tests.

119
10 p-value

One method of reporting the hypotheses results is to report the size, α, of


the test used and the decision to reject H0 or accept H0 .
- If α is small, the decision to reject H0 is fairly convincing.
- If α is large, the decision to reject H0 is not very convincing because
the test has a large probability of incorrectly making that decision.

Two issues of this testing procedure:


- The choice of α is subjective. Different people may have different
tolerance levels α.
- The final answer does not not show the strength of decision (Is it a
strong rejection or weak rejection? strong acceptance or weak acceptance?).

A p-value is the smallest possible level α̂ at which H0 would be rejected.


- p-value is a test statistic, taking value 0 ≤ p(x) ≤ 1 for the sample x.
- Small values of p(X) gives evidence that H1 is true.
- The smaller p-value, the stronger the evidence of rejecting H0 .
- A p-value if valid if, for every θ ∈ Θ0 and every 0 ≤ α ≤ 1,
Pθ (p(X) ≤ α) ≤ α.

Compute p-value
Theorem: Let W (X) be a test statistic such that large values of W give
evidence that H1 is true. For each sample point x, define
p(x) = supθ∈Θ0 Pθ (W (X) ≥ W (x)).
Then p(X) is a valid p-value.
Proof: (see textbook)

p-value testing procedure:


(i) Compute p-value based on the data x1 , ..., xn .
(ii) If p-value < α, we reject H0 at level α; otherwise accept H0 .

120
Example: (two-sided normal p-value)
Let X1 , . . . , Xn be iid from N (θ, σ 2 ), σ 2 unknown. Consider testing H0 :

θ = θ0 versus H1 : θ = θ0 , use the LRT statistic W (X) = |X̄ − θ0 |/(S/ n).
(i) Compute the p-value of the LRT statistic W (X).

(ii) Assume n = 16 and we observed x̄ = 1.5, s2 = 1. Assume θ0 = 1.


Calculate the p-value. Do you reject the null hypothesis at level 0.05? at
level 0.1?

Example: (one-sided normal p-value)


In the above example, consider testing H0 : θ ≤ θ0 versus H1 : θ > θ0 .

121

Вам также может понравиться