Вы находитесь на странице: 1из 62

18.

650 – Fundamentals of Statistics

4. Hypothesis testing

1/47
Goals
We have seen the basic notions of hypothesis testing:
I Hypotheses H0 /H1 ,
I Type 1/Type 2 error, level and power
I Test statistics and rejection region
I p-value

Our tests were based on CLT (and sometimes Slutsky). . .

I What if data is Gaussian, 2 is unknown and Slutsky does not


apply?
I Can we use asymptotic normality of MLE?
I Tests about multivariate parameters ✓ = (✓1 , . . . , ✓d ) (e.g.:
✓1 = ✓2 )?
I More complex tests: ”Does my data follow a Gaussian
distribution”?
2/47
Parametric hypothesis testing

3/47
Clinical trials

Let us go through an example to remind the main notions of


hypothesis testing.
I Pharmaceutical companies use hypothesis testing to test if a
new drug is efficient.
I To do so, they administer a drug to a group of patients (test
group) and a placebo to another group (control group).
I We consider testing a drug that is supposed to lower LDL
(low-density lipoprotein), a.k.a ”bad cholesterol” among
patients with a high level of LDL (above 200 mg/dL)

4/47
Notation and modelling

I Let d > 0 denote the expected decrease of LDL level (in


mg/dL) for a patient that has used the drug.
I Let c > 0 denote the expected decrease of LDL level (in
mg/dL) for a patient that has used the placebo.
I We want to know if
I We observe two independent samples:
iid 2
I X1 , . . . , X n ⇠ N ( , from the test group and
d)
iid 2
I Y1 , . . . , Y m ⇠ N ( , from the group.
c)

5/47
Hypothesis testing

I Hypotheses:

H0 : vs. H1 :

I Since the data is Gaussian by assumption we don’t need the

I We have

X̄n ⇠ and Ȳm ⇠

I Therefore
X̄n Ȳm ( d c)
⇠ N (0, 1)

6/47
Asymptotic test
I Assume that m = cn and n ! 1
I Using lemma, we also have

(d)
! N (0, 1)
n!1

where
n
X m
X
2 1 2 2 1 2
ˆd = (Xi X̄n ) and ˆc = (Yi Ȳm )
n m
i=1 i=1

I We get the the following test at asymptotic level ↵:


8 9
< =
R =
: ;

I This is -sided, -sample test.


7/47
Asymptotic test

I Example n = 70, m = 50, X̄n = 156.4, Ȳm = 132.7,


2 2
ˆd = 5198.4, ˆc = 3867.0,

156.4 132.7
q = 1.57
5198.4 3867.0
70 + 50

Since q5% = 1.645, we


I We can also compute the p-value:

p-value = = 0.0582

8/47
8/47
Small sample size

I What if n = 20, m = 12?


I We cannot realistically apply Slutsky’s lemma
I We needed it to find the (asymptotic) distribution of
quantities of the form
X̄n µ
ˆ2
iid 2 ).
when X1 , . . . , Xn ⇠ N (µ,
I It turns out that this distribution does not depend on µ or
so we can compute its

9/47
2
The distribution

Definition
For a positive integer d, the 2 (pronounced “Kai-squared”)
distribution with d degrees of freedom is the law of the random
2 2 2 iid
variable Z1 + Z2 + . . . + Zd , where Z1 , . . . , Zd ⇠ N (0, 1).

Examples:
I 2
If Z ⇠ Nd (0, Id ), then kZk2 ⇠
I 2 = Exp(1/2).
2

10/47
0.0 0.2 0.4 0.6 0.8 1.0 1.2

0
5
10
15
df=1

20
25

10/47
0.0 0.2 0.4 0.6 0.8 1.0 1.2

0
5
10
15
df=2

20
25

10/47
0.0 0.2 0.4 0.6 0.8 1.0 1.2

0
5
10
15
df=3

20
25

10/47
0.0 0.2 0.4 0.6 0.8 1.0 1.2

0
5
10
15
df=4

20
25

10/47
0.0 0.2 0.4 0.6 0.8 1.0 1.2

0
5
10
15
df=5

20
25

10/47
0.0 0.2 0.4 0.6 0.8 1.0 1.2

0
5
10
15
df=10

20
25

10/47
0.0 0.2 0.4 0.6 0.8 1.0 1.2

0
5
10
15
df=20

20
25

10/47
2
Properties distribution (2)

Definition
For a positive integer d, the 2 (pronounced “Kai-squared”)
distribution with d degrees of freedom is the law of the random
2 2 2 iid
variable Z1 + Z2 + . . . + Zd , where Z1 , . . . , Zd ⇠ N (0, 1).

Properties: If V ⇠ 2, then
k
I IE[V ] =
I var[V ] =

11/47
Important example: the sample variance

I Recall that the sample variance is given by


n
X Xn
1 2 1 2 2
Sn = (Xi X̄n ) = Xi (X̄n )
n n
i=1 i=1
iid 2 ),
I Cochran’s theorem states that for X1 , . . . , Xn ⇠ N (µ, if
Sn is the sample variance, then
I X̄n ?? Sn ;
nSn 2
I ⇠
2 n 1.

I 2
We often prefer the unbiased estimator of :

12/47
Student’s T distribution

Definition
For a positive integer d, the Student’s T distribution with d
degrees of freedom (denoted by td ) is the law of the random
Z 2
variable p , where Z ⇠ N (0, 1), V ⇠ d and Z ?? V (Z is
V /d
independent of V ).

13/47
14/47
Who was Student?

This distribution was introduced by William Sealy Gosset


(1876–1937) in 1908 while he worked for the Guinness brewery in
Dublin, Ireland.

15/47
Student’s T test (one sample, two-sided)
iid 2) 2
I Let X1 , . . . , Xn ⇠ N (µ, where both µ and are
unknown
I We want to test:
H0 : µ = 0, vs H1 : µ 6= 0
I Test statistic:
X̄n
Tn = p =
S̃n

I Since X̄n / ⇠ (under ) and S̃n / ⇠ 2 are


independent by theorem, we have:
Tn ⇠
I Student’s test with (non asymptotic) level ↵ 2 (0, 1):
↵ = 1I{|Tn | > q↵/2 },
where q↵/2 is the (1 ↵/2)-quantile of tn 1.

16/47
Student’s T test (one sample, one-sided)

I We want to test:

H0 : µ  µ 0 , vs H1 : µ > µ 0

I Test statistic:
X̄n
Tn = p ⇠
S̃n
under H0 .
I Student’s test with (non asymptotic) level ↵ 2 (0, 1):

↵ = 1I ,

17/47
Two-sample T-test
I Back to our cholesterol example. What happens for small
sample sizes?
I We want to know the distribution of
X̄n Ȳm ( d c)
q
ˆd2 ˆc2
n + m

I We have approximately

X̄n Ȳm ( d c)
q ⇠ tN
ˆd2 ˆc2
n + m

where
2 2 2
ˆd /n + ˆc /m
N= ˆd4
min(n, m)
ˆc4
2
n (n 1)
+ m2 (m 1)

(Welch-Satterthwaite formula)
18/47
Non-asymptotic test
I Example n = 70, m = 50, X̄n = 156.4, Ȳm = 132.7,
2 2
ˆd = 5198.4, ˆc = 3867.0,
156.4 132.7
q = 1.57
5198.4 3867.0
70 + 50

I Using the shorthand formula N = min(n, m) = , we get


q5% = 1.68 and
p-value = = 0.0614
I Using the W-S formula
5198.4 3867.0 2
70 + 50
N= 5198.42 3867.02
= 113.78
2
70 (70 1)
+ 502 (50 1)

we round to .
I We get
p-value = = 0.0596
19/47
Non-asymptotic test
I Example n = 70, m = 50, X̄n = 156.4, Ȳm = 132.7,
2 2
ˆd = 5198.4, ˆc = 3867.0,
156.4 132.7
q = 1.57
5198.4 3867.0
70 + 50

I Using the shorthand formula N = min(n, m) = , we get


q5% = 1.68 and
p-value = = 0.0614
I Using the W-S formula
5198.4 3867.0 2
70 + 50
N= 5198.42 3867.02
= 113.78
2
70 (70 1)
+ 502 (50 1)

we round to .
I We get
p-value = = 0.0596
19/47
Discussion

Advantage of Student’s test: Non asymptotic / Can be run on


small samples

Drawback of Student’s test: It relies on the assumption that the


sample is Gaussian (soon we will see how to test this assumption)

20/47
A test based on the MLE

I Consider an i.i.d. sample X1 , . . . , Xn with statistical model


d
(E, (IP✓ )✓2⇥ ), where ⇥ ✓ IR (d 1) and let ✓0 2 ⇥ be fixed
and given.
I Consider the following hypotheses:
(
H0 : ✓ = ✓ 0
H1 : ✓ 6= ✓0 .

I Let ✓ˆM LE be the MLE. Assume the MLE technical conditions


are satisfied.

I If H0 is true, then
⇣ ⌘ (d)
⇥ ˆM LE
✓n ✓0 ! Nd (0, Id )
n!1

21/47
Wald’s test

I Hence,
⇣ ⌘> ⇣ ⌘ (d)
n ˆM LE
✓n ✓0 ˆM LE
I(✓ ) ˆM LE
✓n ✓0 !
n!1
| {z }
Tn

I Wald’s test with asymptotic level ↵ 2 (0, 1):

= 1I{Tn > q↵ },

where q↵ is the (1 ↵)-quantile of 2 (see tables).


d

I Remark: Wald’s test is also valid if H1 has the form “✓ > ✓0 ”


or “✓ < ✓0 ” or “✓ = ✓1 ”...

22/47
A test based on the log-likelihood
I Consider an i.i.d. sample X1 , . . . , Xn with statistical model
d
(E, (IP✓ )✓2⇥ ), where ⇥ ✓ IR (d 1).

I Suppose the null hypothesis has the form


(0) (0)
H0 : (✓r+1 , . . . , ✓d ) = (✓r+1 , . . . , ✓d ),

(0) (0)
for some fixed and given numbers ✓r+1 , . . . , ✓d .

I Let
✓ˆn = argmax `n (✓) (MLE)
✓2⇥

and
ˆc
✓n = argmax `n (✓) (“constrained MLE”)
✓2⇥0

where ⇥0 =
23/47
Likelihood ratio test

Test statistic: ⇣ ⌘
Tn = 2 `n (✓ˆn ) ˆc
` n ( ✓n ) .

Wilks’ Theorem
Assume H0 is true and the MLE technical conditions are satisfied.
Then,
(d)
Tn !
n!1

Likelihood ratio test with asymptotic level ↵ 2 (0, 1):

= 1I{Tn > q↵ },

where q↵ is the (1 ↵)-quantile of 2 (see tables).


d r

24/47
Implicit hypotheses

I d
Let X1 , . . . , Xn be i.i.d. random variables and let ✓ 2
IR be
a parameter associated with the distribution of X1 (e.g. a
moment, the parameter of a statistical model, etc...)

I Let g : IRd ! IRk be continuously di↵erentiable (with k < d).

I Consider the following hypotheses:


(
H0 : g(✓) = 0
H1 : g(✓) 6= 0.

I E.g. g(✓) = (✓1 , ✓2 ) (k = 2), or g(✓) = (k = 1), or...

25/47
Delta method

I Suppose an asymptotically normal estimator ✓ˆn is available:


p ⇣ ⌘ (d)
n ✓ˆn ✓ ! Nd (0, ⌃(✓)).
n!1

I Delta method:
p ⇣ ⌘ (d)
n g(✓ˆn ) g(✓) ! Nk (0, (✓)) ,
n!1

> k⇥k
where (✓) = rg(✓) ⌃(✓)rg(✓) 2 IR .

I Assume ⌃(✓) is invertible and rg(✓) has rank k. So, (✓) is


invertible and
p ⇣ ⌘ (d)
n (✓) 1/2
g(✓ˆn ) g(✓) ! Nk (0, ) .
n!1

26/47
Wald’s test for implicit hypotheses

I Then, by Slutsky’s theorem, if (✓) is continuous in ✓,


p ⇣ ⌘ (d)
n ( ) 1/2
g(✓ˆn ) g(✓) ! Nk (0, Ik ) .
n!1

I Hence, if H0 is true, i.e., g(✓) = 0,


(d)
ng(✓ˆn ) > 1
(✓ˆn )g(✓ˆn ) ! 2
k.
| {z } n!1
Tn

I Test with asymptotic level ↵:

= 1I{ },

where q↵ is the (1 ↵)-quantile of 2 (see tables).


k

27/47
Goodness of fit

28/47
Goodness of fit tests

Let X be a r.v. Given i.i.d copies of X we want to answer the


following types of questions:
I Does X have distribution N (0, 1)? (Cf. Student’s T
distribution)
I Does X have distribution U ([0, 1])?
I Does X have PMF p1 = 0.3, p2 = 0.5, p3 = 0.2

These are all goodness of fit (GoF) tests: we want to know if the
hypothesized distribution is a good fit for the data.

Key characteristic of GoF tests: no parametric modeling.

29/47
The zodiac sign of the most
powerful people is....

Sign Count
Can your zodiac sign Aries 23
predict how successful you Taurus 20
will be later in life? Gemini 18
Fortune magazine collected Cancer 23
the signs of 256 heads of Leo 20
the Fortune 500. Virgo 19
Fyi:
Libra 18
256/12
Scorpio 21
=21.33
Sagittarius 19
Capricorn 22
Aquarius 24
Pisces 29
29/47
The zodiac sign of the most
successful people is....

Sign Count In view of this data, is there


Aries 23 statistical evidence that
Taurus 20 successful people are more
Gemini 18 likely to be born under
Cancer 23 some sign than others?
Leo 20
Virgo 19
Libra 18
Scorpio 21
Sagittarius 19
Capricorn 22
Aquarius 24
Pisces 29
29/47
275 jurors with identified racial group.
We want to know if the jury is representative of the
population of this county.

Race White Black Hispanic Other Total


# jurors 205 26 25 19 275
proportion
0.72 0.07 0.12 0.09 1
in county

29/47
Discrete distribution

Let E = {a1 , . . . , aK } be a finite space and (IPp )p2 K


be the
family of all probability distributions on E:

8 9
< K
X =
I K
K = p = (p1 , . . . , pK ) 2 (0, 1) : pj = 1 .
: ;
j=1

I For p 2 K and X ⇠ IPp ,

IPp [X = aj ] = pj , j = 1, . . . , K.

30/47
Goodness of fit test

iid
I Let X1 , . . . , Xn ⇠ IPp , for some unknown p 2 K, and let
0
p 2 K be fixed.

I We want to test:
H0 : p = p 0 vs. H1 : p 6= p 0

with asymptotic level ↵ 2 (0, 1).

I Example: If p 0= (1/K, 1/K, . . . , 1/K), we are testing


whether IPp is on E.

31/47
Multinomial likelihood

I Likelihood of the model:

N1 N2 NK
Ln (X1 , . . . , Xn , p) = p1 p2 . . . pK ,
where Nj = #{i = 1, . . . , n : Xi = aj }.

I Let p̂ be the MLE:


Nj
p̂j = , j = 1, . . . , K.
n
B p̂ maximizes log Ln (X1 , . . . , Xn , p) under the constraint

32/47
2
test
p 0
I If H0 is true, then n(p̂ p ) is asymptotically normal, and
the following holds.

Theorem
⇣ ⌘2
K p̂j 0
pj
X (d) 2
n ! K 1.
j=1
p0j n!1
| {z }
Tn

I 2test with asymptotic level ↵: > q↵ },


↵ = 1I{Tn
where q↵ is the (1 ↵)-quantile of 2
K 1.

I Asymptotic p-value of this test: p value = IP [Z > Tn |Tn ],


where Z ⇠ 2K 1 and Z ?? Tn .

33/47
CDF and empirical CDF
Let X1 , . . . , Xn be i.i.d. real random variables. Recall the cdf of
X1 is defined as:

F (t) = IP[X1  t], 8t 2 IR.

It completely characterizes the distribution of X1 .

Definition
The empirical cdf of the sample X1 , . . . , Xn is defined as:
n
X
1
Fn (t) = 1I{Xi  t}
n
i=1
#{i = 1, . . . , n : Xi  t}
= , 8t 2 IR.
n

34/47
Consistency

By the LLN, for all t 2 IR,


a.s.
Fn (t) ! F (t).
n!1

Glivenko-Cantelli Theorem (Fundamental theorem of


statistics)
a.s.
sup |Fn (t) F (t)| ! 0.
t2IR n!1

35/47
Asymptotic normality

By the CLT, for all t 2 IR,


p (d)
n (Fn (t) F (t)) ! N 0, .
n!1

Donsker’s Theorem
If F is continuous, then
p (d)
n sup |Fn (t) F (t)| ! sup |B(t)|,
t2IR n!1 0t1

where B is a Brownian bridge on [0, 1].

36/47
Goodness of fit for continuous distributions

I Let X1 , . . . , Xn be i.i.d. real random variables with unknown


0
cdf F and let F be a continuous cdf.

I Consider the two hypotheses:


0 0
H0 : F = F v.s. H1 : F 6= F .

I Let Fn be the empirical cdf of the sample X1 , . . . , Xn .

I If F = F 0 , then Fn (t) ⇡ F 0 (t), for all t 2 [0, 1].

37/47
Kolmogorov-Smirnov test

p 0
I Let Tn = sup n Fn (t) F (t) .
t2IR

(d)
I By Donsker’s theorem, if H0 is true, then Tn ! Z,
n!1
where Z has a known distribution (supremum of a Brownian
bridge).

I KS test with asymptotic level ↵:


KS
↵ = 1I{Tn > q↵ },

where q↵ is the (1 ↵)-quantile of Z (obtained in tables).

I p-value of KS test: IP[Z > Tn |Tn ].

38/47
Computatinal issues

I In practice, how to compute Tn ?

I F 0 is non decreasing, Fn is piecewise constant, with jumps at


ti = Xi , i = 1, . . . , n.

I Let X(1)  X(2)  . . .  X(n) be the reordered sample.

I The expression for Tn reduces to the following practical


formula:
n ✓ ◆o
p i 1 0 i 0
Tn = n max max F (X(i) ) , F (X(i) ) .
i=1,...,n n n

39/47
Pivotal distribution

I Tn is called a pivotal statistic: If H0 is true, the distribution


of Tn does not depend on the distribution of the Xi ’s and it is
easy to reproduce it in simulations.

I Indeed, let Ui = F 0 (Xi ), i = 1, . . . , n and let Gn be the


empirical cdf of U1 , . . . , Un .

i.i.d.
I If H0 is true, then U1 , . . . , Un ⇠
p
and Tn = sup n |Gn (x) x|.
0x1

40/47
Quantiles and p-values

I For some large integer M :


I Simulate M i.i.d. copies Tn1 , . . . , TnM of Tn ;

(n)
I Estimate the (1 ↵)-quantile q↵ of Tn by taking the sample
(n,M ) 1 M
(1 ↵)-quantile q̂↵ of Tn , . . . , Tn .

I Test with approximate level ↵:


(n,M )
↵ = 1I{Tn > q̂↵ }.

I Approximate p-value of this test:


j
#{j = 1, . . . , M : Tn > Tn }
p-value ⇡ .
M

41/47
K-S table

Kolmogorov–Smirnov Tables

Critical values, dalpha ;(n)a , of the maximum absolute difference between sample Fn (x) and population F(x)
cumulative distribution.

Level of significance, α
Number of
trials, n 0.10 0.05 0.02 0.01

1 0.95000 0.97500 0.99000 0.99500


2 0.77639 0.84189 0.90000 0.92929
3 0.63604 0.70760 0.78456 0.82900
4 0.56522 0.62394 0.68887 0.73424
5 0.50945 0.56328 0.62718 0.66853
6 0.46799 0.51926 0.57741 0.61661
7 0.43607 0.48342 0.53844 0.57581
8 0.40962 0.45427 0.50654 0.54179
9 0.38746 0.43001 0.47960 0.51332
10 0.36866 0.40925 0.45662 0.48893

42/47
Other goodness of fit tests

We want to measure the distance between two functions: Fn (t)


and F (t). There are other ways, leading to other tests:
I Kolmogorov-Smirnov:

d(Fn , F ) = sup |Fn (t) F (t)|


t2IR

I Cramér-Von Mises:
Z
2 2
d (Fn , F ) = [Fn (t) F (t)] dt
IR

I Anderson-Darling:
Z 2
2 [Fn (t) F (t)]
d (Fn , F ) = dt
IR F (t)(1 F (t))

43/47
Composite goodness of fit tests

What if I want to test: ”Does X have Gaussian distribution?” but


I don’t know the parameters?
Simple idea: plug-in

sup Fn (t) µ̂,ˆ 2 (t)


t2IR

where
2 2
µ̂ = X̄n , ˆ = Sn
and is the cdf of N (µ̂, 2
µ̂,ˆ 2 (t) ˆ ).

In this case Donsker’s theorem is no longer valid. This is a


common and serious mistake!

44/47
Kolmogorov-Lilliefors test (1)

Instead, we compute the quantiles for the test statistic:

sup Fn (t) µ̂,ˆ 2 (t)


t2IR

They do not depend on unknown parameters!

This is the Kolmogorov-Lilliefors test.

45/47
K-L table
Carlo calculations, using 1,000 or more samples for each value of N.

sample 1 Level of Significance for D = Max / F * ( X ) -Sn?(X) I


Size I

46/47
47/47
Quantile-Quantile (QQ) plots (1)
I Provide a visual way to perform GoF tests
I Not formal test but quick and easy check to see if a
distribution is plausible.
I Main idea: we want to check visually if the plot of Fn is close
1
to that of F or equivalently if the plot of Fn is close to that
of F 1 .
I More convenient to check if the points

1 1 1 1 1 2 1 2 1 n 1 1 n 1
F ( ), Fn ( ) , F ( ), Fn ( ) , . . . , F ( ), Fn ( )
n n n n n n
are near the line y = x.
I Fn is not technically invertible but we define
1
Fn (i/n) = X(i) ,

the ith largest observation.


47/47
Quantile-Quantile (QQ) plots (2)

47/47
17/25
Quantile-Quantile (QQ) plots (3)

Figure 2: QQ-plots for samples of sizes 10, 50, 100, 1000, 5000, 10000 from a t15 distribution. The
upper-left figure is for sample size 10, the lower-right is for sample 10000.
47/47
18/25

Вам также может понравиться