18.650 - Fundamentals of Statistics

18.
650 – Fundamentals of Statistics
4. Hypothesis testing
1/47
Goals
We have seen the basic notions of hypothesis testing:
I Hypotheses H0 /H1 ,
I Type 1/Type 2 error, level and power
I Test statistics and rejection region
I p-value
Our tests were based on CLT (and sometimes Slutsky). . .
I What if data is Gaussian, 2 is unknown and Slutsky does not

apply?
I Can we use asymptotic normality of MLE?
I Tests about multivariate parameters ✓ = (✓1 , . . . , ✓d ) (e.g.:
✓1 = ✓2 )?
I More complex tests: ”Does my data follow a Gaussian
distribution”?
2/47
Parametric hypothesis testing
3/47
Clinical trials
Let us go through an example to remind the main notions of

hypothesis testing.
I Pharmaceutical companies use hypothesis testing to test if a
new drug is efficient.
I To do so, they administer a drug to a group of patients (test
group) and a placebo to another group (control group).
I We consider testing a drug that is supposed to lower LDL
(low-density lipoprotein), a.k.a ”bad cholesterol” among
patients with a high level of LDL (above 200 mg/dL)
4/47
Notation and modelling
I Let d > 0 denote the expected decrease of LDL level (in

mg/dL) for a patient that has used the drug.
I Let c > 0 denote the expected decrease of LDL level (in
mg/dL) for a patient that has used the placebo.
I We want to know if
I We observe two independent samples:
iid 2
I X1 , . . . , X n ⇠ N ( , from the test group and
d)
iid 2
I Y1 , . . . , Y m ⇠ N ( , from the group.
c)
5/47
Hypothesis testing
I Hypotheses:
H0 : vs. H1 :
I Since the data is Gaussian by assumption we don’t need the
I We have
X̄n ⇠ and Ȳm ⇠
I Therefore
X̄n Ȳm ( d c)
⇠ N (0, 1)
6/47
Asymptotic test
I Assume that m = cn and n ! 1
I Using lemma, we also have
(d)
! N (0, 1)
n!1
where
n
X m
X
2 1 2 2 1 2
ˆd = (Xi X̄n ) and ˆc = (Yi Ȳm )
n m
i=1 i=1
I We get the the following test at asymptotic level ↵:

8 9
< =
R =
: ;
I This is -sided, -sample test.

7/47
Asymptotic test
I Example n = 70, m = 50, X̄n = 156.4, Ȳm = 132.7,

2 2
ˆd = 5198.4, ˆc = 3867.0,
156.4 132.7
q = 1.57
5198.4 3867.0
70 + 50
Since q5% = 1.645, we

I We can also compute the p-value:
p-value = = 0.0582
8/47
8/47
Small sample size
I What if n = 20, m = 12?

I We cannot realistically apply Slutsky’s lemma
I We needed it to find the (asymptotic) distribution of
quantities of the form
X̄n µ
ˆ2
iid 2 ).
when X1 , . . . , Xn ⇠ N (µ,
I It turns out that this distribution does not depend on µ or
so we can compute its
9/47
2
The distribution
Definition
For a positive integer d, the 2 (pronounced “Kai-squared”)
distribution with d degrees of freedom is the law of the random
2 2 2 iid
variable Z1 + Z2 + . . . + Zd , where Z1 , . . . , Zd ⇠ N (0, 1).
Examples:
I 2
If Z ⇠ Nd (0, Id ), then kZk2 ⇠
I 2 = Exp(1/2).
2
10/47
0.0 0.2 0.4 0.6 0.8 1.0 1.2
0
5
10
15
df=1
20
25
10/47
0.0 0.2 0.4 0.6 0.8 1.0 1.2
0
5
10
15
df=2
20
25
10/47
0.0 0.2 0.4 0.6 0.8 1.0 1.2
0
5
10
15
df=3
20
25
10/47
0.0 0.2 0.4 0.6 0.8 1.0 1.2
0
5
10
15
df=4
20
25
10/47
0.0 0.2 0.4 0.6 0.8 1.0 1.2
0
5
10
15
df=5
20
25
10/47
0.0 0.2 0.4 0.6 0.8 1.0 1.2
0
5
10
15
df=10
20
25
10/47
0.0 0.2 0.4 0.6 0.8 1.0 1.2
0
5
10
15
df=20
20
25
10/47
2
Properties distribution (2)
Definition
For a positive integer d, the 2 (pronounced “Kai-squared”)
distribution with d degrees of freedom is the law of the random
2 2 2 iid
variable Z1 + Z2 + . . . + Zd , where Z1 , . . . , Zd ⇠ N (0, 1).
Properties: If V ⇠ 2, then
k
I IE[V ] =
I var[V ] =
11/47
Important example: the sample variance
I Recall that the sample variance is given by

n
X Xn
1 2 1 2 2
Sn = (Xi X̄n ) = Xi (X̄n )
n n
i=1 i=1
iid 2 ),
I Cochran’s theorem states that for X1 , . . . , Xn ⇠ N (µ, if
Sn is the sample variance, then
I X̄n ?? Sn ;
nSn 2
I ⇠
2 n 1.
I 2
We often prefer the unbiased estimator of :
12/47
Student’s T distribution
Definition
For a positive integer d, the Student’s T distribution with d
degrees of freedom (denoted by td ) is the law of the random
Z 2
variable p , where Z ⇠ N (0, 1), V ⇠ d and Z ?? V (Z is
V /d
independent of V ).
13/47
14/47
Who was Student?
This distribution was introduced by William Sealy Gosset

(1876–1937) in 1908 while he worked for the Guinness brewery in
Dublin, Ireland.
15/47
Student’s T test (one sample, two-sided)
iid 2) 2
I Let X1 , . . . , Xn ⇠ N (µ, where both µ and are
unknown
I We want to test:
H0 : µ = 0, vs H1 : µ 6= 0
I Test statistic:
X̄n
Tn = p =
S̃n
I Since X̄n / ⇠ (under ) and S̃n / ⇠ 2 are

independent by theorem, we have:
Tn ⇠
I Student’s test with (non asymptotic) level ↵ 2 (0, 1):
↵ = 1I{|Tn | > q↵/2 },
where q↵/2 is the (1 ↵/2)-quantile of tn 1.
16/47
Student’s T test (one sample, one-sided)
I We want to test:
H0 : µ  µ 0 , vs H1 : µ > µ 0
I Test statistic:
X̄n
Tn = p ⇠
S̃n
under H0 .
I Student’s test with (non asymptotic) level ↵ 2 (0, 1):
⇢
↵ = 1I ,
17/47
Two-sample T-test
I Back to our cholesterol example. What happens for small
sample sizes?
I We want to know the distribution of
X̄n Ȳm ( d c)
q
ˆd2 ˆc2
n + m
I We have approximately
X̄n Ȳm ( d c)
q ⇠ tN
ˆd2 ˆc2
n + m
where
2 2 2
ˆd /n + ˆc /m
N= ˆd4
min(n, m)
ˆc4
2
n (n 1)
+ m2 (m 1)
(Welch-Satterthwaite formula)
18/47
Non-asymptotic test
I Example n = 70, m = 50, X̄n = 156.4, Ȳm = 132.7,
2 2
ˆd = 5198.4, ˆc = 3867.0,
156.4 132.7
q = 1.57
5198.4 3867.0
70 + 50
I Using the shorthand formula N = min(n, m) = , we get

q5% = 1.68 and
p-value = = 0.0614
I Using the W-S formula
5198.4 3867.0 2
70 + 50
N= 5198.42 3867.02
= 113.78
2
70 (70 1)
+ 502 (50 1)
we round to .
I We get
p-value = = 0.0596
19/47
Non-asymptotic test
I Example n = 70, m = 50, X̄n = 156.4, Ȳm = 132.7,
2 2
ˆd = 5198.4, ˆc = 3867.0,
156.4 132.7
q = 1.57
5198.4 3867.0
70 + 50
I Using the shorthand formula N = min(n, m) = , we get

q5% = 1.68 and
p-value = = 0.0614
I Using the W-S formula
5198.4 3867.0 2
70 + 50
N= 5198.42 3867.02
= 113.78
2
70 (70 1)
+ 502 (50 1)
we round to .
I We get
p-value = = 0.0596
19/47
Discussion
Advantage of Student’s test: Non asymptotic / Can be run on

small samples
Drawback of Student’s test: It relies on the assumption that the

sample is Gaussian (soon we will see how to test this assumption)
20/47
A test based on the MLE
I Consider an i.i.d. sample X1 , . . . , Xn with statistical model

d
(E, (IP✓ )✓2⇥ ), where ⇥ ✓ IR (d 1) and let ✓0 2 ⇥ be fixed
and given.
I Consider the following hypotheses:
(
H0 : ✓ = ✓ 0
H1 : ✓ 6= ✓0 .
I Let ✓ˆM LE be the MLE. Assume the MLE technical conditions

are satisfied.
I If H0 is true, then
⇣ ⌘ (d)
⇥ ˆM LE
✓n ✓0 ! Nd (0, Id )
n!1
21/47
Wald’s test
I Hence,
⇣ ⌘> ⇣ ⌘ (d)
n ˆM LE
✓n ✓0 ˆM LE
I(✓ ) ˆM LE
✓n ✓0 !
n!1
| {z }
Tn
I Wald’s test with asymptotic level ↵ 2 (0, 1):
= 1I{Tn > q↵ },
where q↵ is the (1 ↵)-quantile of 2 (see tables).

d
I Remark: Wald’s test is also valid if H1 has the form “✓ > ✓0 ”

or “✓ < ✓0 ” or “✓ = ✓1 ”...
22/47
A test based on the log-likelihood
I Consider an i.i.d. sample X1 , . . . , Xn with statistical model
d
(E, (IP✓ )✓2⇥ ), where ⇥ ✓ IR (d 1).
I Suppose the null hypothesis has the form

(0) (0)
H0 : (✓r+1 , . . . , ✓d ) = (✓r+1 , . . . , ✓d ),
(0) (0)
for some fixed and given numbers ✓r+1 , . . . , ✓d .
I Let
✓ˆn = argmax `n (✓) (MLE)
✓2⇥
and
ˆc
✓n = argmax `n (✓) (“constrained MLE”)
✓2⇥0
where ⇥0 =
23/47
Likelihood ratio test
Test statistic: ⇣ ⌘
Tn = 2 `n (✓ˆn ) ˆc
` n ( ✓n ) .
Wilks’ Theorem
Assume H0 is true and the MLE technical conditions are satisfied.
Then,
(d)
Tn !
n!1
Likelihood ratio test with asymptotic level ↵ 2 (0, 1):
= 1I{Tn > q↵ },

d r
24/47
Implicit hypotheses
I d
Let X1 , . . . , Xn be i.i.d. random variables and let ✓ 2
IR be
a parameter associated with the distribution of X1 (e.g. a
moment, the parameter of a statistical model, etc...)
I Let g : IRd ! IRk be continuously di↵erentiable (with k < d).
I Consider the following hypotheses:

(
H0 : g(✓) = 0
H1 : g(✓) 6= 0.
I E.g. g(✓) = (✓1 , ✓2 ) (k = 2), or g(✓) = (k = 1), or...
25/47
Delta method
I Suppose an asymptotically normal estimator ✓ˆn is available:

p ⇣ ⌘ (d)
n ✓ˆn ✓ ! Nd (0, ⌃(✓)).
n!1
I Delta method:
p ⇣ ⌘ (d)
n g(✓ˆn ) g(✓) ! Nk (0, (✓)) ,
n!1
> k⇥k
where (✓) = rg(✓) ⌃(✓)rg(✓) 2 IR .
I Assume ⌃(✓) is invertible and rg(✓) has rank k. So, (✓) is

invertible and
p ⇣ ⌘ (d)
n (✓) 1/2
g(✓ˆn ) g(✓) ! Nk (0, ) .
n!1
26/47
Wald’s test for implicit hypotheses
I Then, by Slutsky’s theorem, if (✓) is continuous in ✓,

p ⇣ ⌘ (d)
n ( ) 1/2
g(✓ˆn ) g(✓) ! Nk (0, Ik ) .
n!1
I Hence, if H0 is true, i.e., g(✓) = 0,

(d)
ng(✓ˆn ) > 1
(✓ˆn )g(✓ˆn ) ! 2
k.
| {z } n!1
Tn
I Test with asymptotic level ↵:
= 1I{ },

k
27/47
Goodness of fit
28/47
Goodness of fit tests
Let X be a r.v. Given i.i.d copies of X we want to answer the

following types of questions:
I Does X have distribution N (0, 1)? (Cf. Student’s T
distribution)
I Does X have distribution U ([0, 1])?
I Does X have PMF p1 = 0.3, p2 = 0.5, p3 = 0.2
These are all goodness of fit (GoF) tests: we want to know if the
hypothesized distribution is a good fit for the data.
Key characteristic of GoF tests: no parametric modeling.
29/47
The zodiac sign of the most
powerful people is....
Sign Count
Can your zodiac sign Aries 23
predict how successful you Taurus 20
will be later in life? Gemini 18
Fortune magazine collected Cancer 23
the signs of 256 heads of Leo 20
the Fortune 500. Virgo 19
Fyi:
Libra 18
256/12
Scorpio 21
=21.33
Sagittarius 19
Capricorn 22
Aquarius 24
Pisces 29
29/47
The zodiac sign of the most
successful people is....
Sign Count In view of this data, is there

Aries 23 statistical evidence that
Taurus 20 successful people are more
Gemini 18 likely to be born under
Cancer 23 some sign than others?
Leo 20
Virgo 19
Libra 18
Scorpio 21
Sagittarius 19
Capricorn 22
Aquarius 24
Pisces 29
29/47
275 jurors with identified racial group.
We want to know if the jury is representative of the
population of this county.
Race White Black Hispanic Other Total

# jurors 205 26 25 19 275
proportion
0.72 0.07 0.12 0.09 1
in county
29/47
Discrete distribution
Let E = {a1 , . . . , aK } be a finite space and (IPp )p2 K

be the
family of all probability distributions on E:
8 9
< K
X =
I K
K = p = (p1 , . . . , pK ) 2 (0, 1) : pj = 1 .
: ;
j=1
I For p 2 K and X ⇠ IPp ,
IPp [X = aj ] = pj , j = 1, . . . , K.
30/47
Goodness of fit test
iid
I Let X1 , . . . , Xn ⇠ IPp , for some unknown p 2 K, and let
0
p 2 K be fixed.
I We want to test:
H0 : p = p 0 vs. H1 : p 6= p 0
with asymptotic level ↵ 2 (0, 1).
I Example: If p 0= (1/K, 1/K, . . . , 1/K), we are testing

whether IPp is on E.
31/47
Multinomial likelihood
I Likelihood of the model:
N1 N2 NK
Ln (X1 , . . . , Xn , p) = p1 p2 . . . pK ,
where Nj = #{i = 1, . . . , n : Xi = aj }.
I Let p̂ be the MLE:

Nj
p̂j = , j = 1, . . . , K.
n
B p̂ maximizes log Ln (X1 , . . . , Xn , p) under the constraint
32/47
2
test
p 0
I If H0 is true, then n(p̂ p ) is asymptotically normal, and
the following holds.
Theorem
⇣ ⌘2
K p̂j 0
pj
X (d) 2
n ! K 1.
j=1
p0j n!1
| {z }
Tn
I 2test with asymptotic level ↵: > q↵ },

↵ = 1I{Tn
where q↵ is the (1 ↵)-quantile of 2
K 1.
I Asymptotic p-value of this test: p value = IP [Z > Tn |Tn ],

where Z ⇠ 2K 1 and Z ?? Tn .
33/47
CDF and empirical CDF
Let X1 , . . . , Xn be i.i.d. real random variables. Recall the cdf of
X1 is defined as:
F (t) = IP[X1  t], 8t 2 IR.
It completely characterizes the distribution of X1 .
Definition
The empirical cdf of the sample X1 , . . . , Xn is defined as:
n
X
1
Fn (t) = 1I{Xi  t}
n
i=1
#{i = 1, . . . , n : Xi  t}
= , 8t 2 IR.
n
34/47
Consistency
By the LLN, for all t 2 IR,

a.s.
Fn (t) ! F (t).
n!1
Glivenko-Cantelli Theorem (Fundamental theorem of

statistics)
a.s.
sup |Fn (t) F (t)| ! 0.
t2IR n!1
35/47
Asymptotic normality
By the CLT, for all t 2 IR,

p (d)
n (Fn (t) F (t)) ! N 0, .
n!1
Donsker’s Theorem
If F is continuous, then
p (d)
n sup |Fn (t) F (t)| ! sup |B(t)|,
t2IR n!1 0t1
where B is a Brownian bridge on [0, 1].
36/47
Goodness of fit for continuous distributions
I Let X1 , . . . , Xn be i.i.d. real random variables with unknown

0
cdf F and let F be a continuous cdf.
I Consider the two hypotheses:

0 0
H0 : F = F v.s. H1 : F 6= F .
I Let Fn be the empirical cdf of the sample X1 , . . . , Xn .
I If F = F 0 , then Fn (t) ⇡ F 0 (t), for all t 2 [0, 1].
37/47
Kolmogorov-Smirnov test
p 0
I Let Tn = sup n Fn (t) F (t) .
t2IR
(d)
I By Donsker’s theorem, if H0 is true, then Tn ! Z,
n!1
where Z has a known distribution (supremum of a Brownian
bridge).
I KS test with asymptotic level ↵:

KS
↵ = 1I{Tn > q↵ },
where q↵ is the (1 ↵)-quantile of Z (obtained in tables).
I p-value of KS test: IP[Z > Tn |Tn ].
38/47
Computatinal issues
I In practice, how to compute Tn ?
I F 0 is non decreasing, Fn is piecewise constant, with jumps at

ti = Xi , i = 1, . . . , n.
I Let X(1)  X(2)  . . .  X(n) be the reordered sample.
I The expression for Tn reduces to the following practical

formula:
n ✓ ◆o
p i 1 0 i 0
Tn = n max max F (X(i) ) , F (X(i) ) .
i=1,...,n n n
39/47
Pivotal distribution
I Tn is called a pivotal statistic: If H0 is true, the distribution

of Tn does not depend on the distribution of the Xi ’s and it is
easy to reproduce it in simulations.
I Indeed, let Ui = F 0 (Xi ), i = 1, . . . , n and let Gn be the

empirical cdf of U1 , . . . , Un .
i.i.d.
I If H0 is true, then U1 , . . . , Un ⇠
p
and Tn = sup n |Gn (x) x|.
0x1
40/47
Quantiles and p-values
I For some large integer M :

I Simulate M i.i.d. copies Tn1 , . . . , TnM of Tn ;
(n)
I Estimate the (1 ↵)-quantile q↵ of Tn by taking the sample
(n,M ) 1 M
(1 ↵)-quantile q̂↵ of Tn , . . . , Tn .
I Test with approximate level ↵:

(n,M )
↵ = 1I{Tn > q̂↵ }.
I Approximate p-value of this test:

j
#{j = 1, . . . , M : Tn > Tn }
p-value ⇡ .
M
41/47
K-S table
Kolmogorov–Smirnov Tables
Critical values, dalpha ;(n)a , of the maximum absolute difference between sample Fn (x) and population F(x)
cumulative distribution.
Level of significance, α
Number of
trials, n 0.10 0.05 0.02 0.01
1 0.95000 0.97500 0.99000 0.99500

2 0.77639 0.84189 0.90000 0.92929
3 0.63604 0.70760 0.78456 0.82900
4 0.56522 0.62394 0.68887 0.73424
5 0.50945 0.56328 0.62718 0.66853
6 0.46799 0.51926 0.57741 0.61661
7 0.43607 0.48342 0.53844 0.57581
8 0.40962 0.45427 0.50654 0.54179
9 0.38746 0.43001 0.47960 0.51332
10 0.36866 0.40925 0.45662 0.48893
42/47
Other goodness of fit tests
We want to measure the distance between two functions: Fn (t)

and F (t). There are other ways, leading to other tests:
I Kolmogorov-Smirnov:
d(Fn , F ) = sup |Fn (t) F (t)|

t2IR
I Cramér-Von Mises:
Z
2 2
d (Fn , F ) = [Fn (t) F (t)] dt
IR
I Anderson-Darling:
Z 2
2 [Fn (t) F (t)]
d (Fn , F ) = dt
IR F (t)(1 F (t))
43/47
Composite goodness of fit tests
What if I want to test: ”Does X have Gaussian distribution?” but

I don’t know the parameters?
Simple idea: plug-in
sup Fn (t) µ̂,ˆ 2 (t)

t2IR
where
2 2
µ̂ = X̄n , ˆ = Sn
and is the cdf of N (µ̂, 2
µ̂,ˆ 2 (t) ˆ ).
In this case Donsker’s theorem is no longer valid. This is a

common and serious mistake!
44/47
Kolmogorov-Lilliefors test (1)
Instead, we compute the quantiles for the test statistic:
sup Fn (t) µ̂,ˆ 2 (t)

t2IR
They do not depend on unknown parameters!
This is the Kolmogorov-Lilliefors test.
45/47
K-L table
Carlo calculations, using 1,000 or more samples for each value of N.
sample 1 Level of Significance for D = Max / F * ( X ) -Sn?(X) I

Size I
46/47
47/47
Quantile-Quantile (QQ) plots (1)
I Provide a visual way to perform GoF tests
I Not formal test but quick and easy check to see if a
distribution is plausible.
I Main idea: we want to check visually if the plot of Fn is close
1
to that of F or equivalently if the plot of Fn is close to that
of F 1 .
I More convenient to check if the points
1 1 1 1 1 2 1 2 1 n 1 1 n 1
F ( ), Fn ( ) , F ( ), Fn ( ) , . . . , F ( ), Fn ( )
n n n n n n
are near the line y = x.
I Fn is not technically invertible but we define
1
Fn (i/n) = X(i) ,
the ith largest observation.

47/47
47/47
17/25
Figure 2: QQ-plots for samples of sizes 10, 50, 100, 1000, 5000, 10000 from a t15 distribution. The
upper-left figure is for sample size 10, the lower-right is for sample 10000.
47/47
18/25

18.650 - Fundamentals of Statistics

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

18.650 - Fundamentals of Statistics

Загружено:

Авторское право:

Доступные форматы

18.

650 – Fundamentals of Statistics

Our tests were based on CLT (and sometimes Slutsky). . .

I What if data is Gaussian, 2 is unknown and Slutsky does not

Let us go through an example to remind the main notions of

I Let d > 0 denote the expected decrease of LDL level (in

I Since the data is Gaussian by assumption we don’t need the

X̄n ⇠ and Ȳm ⇠

I We get the the following test at asymptotic level ↵:

I This is -sided, -sample test.

I Example n = 70, m = 50, X̄n = 156.4, Ȳm = 132.7,

Since q5% = 1.645, we

I What if n = 20, m = 12?

I Recall that the sample variance is given by

This distribution was introduced by William Sealy Gosset

I Since X̄n / ⇠ (under ) and S̃n / ⇠ 2 are

I Using the shorthand formula N = min(n, m) = , we get

I Using the shorthand formula N = min(n, m) = , we get

Advantage of Student’s test: Non asymptotic / Can be run on

Drawback of Student’s test: It relies on the assumption that the

I Consider an i.i.d. sample X1 , . . . , Xn with statistical model

I Let ✓ˆM LE be the MLE. Assume the MLE technical conditions

I Wald’s test with asymptotic level ↵ 2 (0, 1):

where q↵ is the (1 ↵)-quantile of 2 (see tables).

I Remark: Wald’s test is also valid if H1 has the form “✓ > ✓0 ”

I Suppose the null hypothesis has the form

Likelihood ratio test with asymptotic level ↵ 2 (0, 1):

where q↵ is the (1 ↵)-quantile of 2 (see tables).

I Let g : IRd ! IRk be continuously di↵erentiable (with k < d).

I Consider the following hypotheses:

I E.g. g(✓) = (✓1 , ✓2 ) (k = 2), or g(✓) = (k = 1), or...

I Suppose an asymptotically normal estimator ✓ˆn is available:

I Assume ⌃(✓) is invertible and rg(✓) has rank k. So, (✓) is

I Then, by Slutsky’s theorem, if (✓) is continuous in ✓,

I Hence, if H0 is true, i.e., g(✓) = 0,

I Test with asymptotic level ↵:

where q↵ is the (1 ↵)-quantile of 2 (see tables).

Let X be a r.v. Given i.i.d copies of X we want to answer the

Key characteristic of GoF tests: no parametric modeling.

Sign Count In view of this data, is there

Race White Black Hispanic Other Total

Let E = {a1 , . . . , aK } be a finite space and (IPp )p2 K

I For p 2 K and X ⇠ IPp ,

with asymptotic level ↵ 2 (0, 1).

I Example: If p 0= (1/K, 1/K, . . . , 1/K), we are testing

I Likelihood of the model:

I Let p̂ be the MLE:

I 2test with asymptotic level ↵: > q↵ },

I Asymptotic p-value of this test: p value = IP [Z > Tn |Tn ],

F (t) = IP[X1  t], 8t 2 IR.

It completely characterizes the distribution of X1 .

By the LLN, for all t 2 IR,

Glivenko-Cantelli Theorem (Fundamental theorem of

By the CLT, for all t 2 IR,

where B is a Brownian bridge on [0, 1].

I Let X1 , . . . , Xn be i.i.d. real random variables with unknown

I Consider the two hypotheses:

I Let Fn be the empirical cdf of the sample X1 , . . . , Xn .

I If F = F 0 , then Fn (t) ⇡ F 0 (t), for all t 2 [0, 1].

I KS test with asymptotic level ↵:

where q↵ is the (1 ↵)-quantile of Z (obtained in tables).

I p-value of KS test: IP[Z > Tn |Tn ].

I In practice, how to compute Tn ?