Вы находитесь на странице: 1из 12

KOLMOGOROV-SMIRNOV TEST

An alternate goodness of fit test developed by


A. Kolmogorov and N.V. Smirnov which is
based on a comparison of the observed
sample
cumulative
relative
frequency
distribution with the hypothetical population
cumulative distribution function specified by
the null hypothesis
A fully non-parametric test for comparing two
distributions

Advantages of Kolmogorov-Smirnov
It is non-parametric and hence robust
It does not rely on the means location only
(like the t-test)
It works for non-normal data (the t-test can
fail if the data is too far from normal)
It is not sensitive to scaling
It is more powerful than 2
However, it is less sensitive than t if the
data is indeed normal

The Kolmogorov-Smirnov Test


Formula:
D=FS

where :
F is the population cumulative distribution function
S is the sample cumulative relative frequency
distribution function
and D is the largest absolute difference between
cumulative observed and theoretical frequencies

If D < D, we do not reject Ho; if D > D, we


reject Ho in favor of H1.
The hypothesis regarding the distributional form
is rejected at the chosen significance level
(alpha) if the test statistic, D, is greater than the
critical value obtained from a table.

Mr. Bond used a computer


to generate n = 20 random
numbers. The numbers were
supposed to be uniformly
distributed between 0 to 10.
The 20 sample values xi
were put in order and are
shown in the table. Mr. Bond
is worried that there might
be an error in the computer
program, and he wants to
test the null hypothesis that
the sample of data was
selected from a uniform
distribution.
Let = .05

xi

.8

.08

.05

1.6

.16

.10

1.7

.17

.15

1.9

.19

.20

2.3

.23

.25

4.0

.40

.30

4.5

.45

.35

4.7

.47

.40

5.3

.53

.45

10

5.4

.54

.50

11

6.2

.62

.55

12

6.4

.64

.60

13

6.7

.67

.65

14

6.8

.68

.70

15

7.9

.79

.75

16

8.4

.84

.80

17

9.0

.90

.85

18

9.1

.91

.90

19

9.7

.97

.95

20

9.8

.98

1.00

To test Ho, we need to


calculate S and F for each
observed value of the random
variable X. If X is uniformly
distributed between 0 and 10,
then F represents the area
under the uniform density
function between 0 and x. The
hypothesized uniform density
function has a height of 1/10
and ranges from 0 to 10

xi

.8

1.6

1.7

1.9

2.3

4.0

4.5

4.7

5.3

10

5.4

11

6.2

12

6.4

13

6.7

14

6.8

15

7.9

16

8.4

17

9.0

18

9.1

19

9.7

20

9.8

F (xi/10)

The cumulative
distribution function F
represents the area
under the density
function between 0 and
x. This area is x/10, so
we obtain the null
hypohesis.

xi

F (xi/10)

.8

.08

1.6

.16

1.7

.17

1.9

.19

2.3

.23

4.0

.40

4.5

.45

4.7

.47

5.3

.53

10

5.4

.54

11

6.2

.62

12

6.4

.64

13

6.7

.67

14

6.8

.68

15

7.9

.79

16

8.4

.84

17

9.0

.90

18

9.1

.91

19

9.7

.97

20

9.8

.98

8
i

xi

F (xi/10)

S (i/20)

.8

.08

.05

1.6

.16

.10

1.7

.17

.15

1.9

.19

.20

2.3

.23

.25

4.0

.40

.30

4.5

.45

.35

4.7

.47

.40

5.3

.53

.45

10

5.4

.54

.50

11

6.2

.62

.55

12

6.4

.64

.60

13

6.7

.67

.65

14

6.8

.68

.70

15

7.9

.79

.75

16

8.4

.84

.80

17

9.0

.90

.85

18

9.1

.91

.90

19

9.7

.97

.95

20

9.8

.98

1.00

The data
indicates
that the
greatest
difference
occurs just
prior to x =
40.

xi

F (xi/10)

S (i/20)

.8

.08

.05

1.6

.16

.10

1.7

.17

.15

1.9

.19

.20

2.3

.23

.25

4.0

.40

.30

4.5

.45

.35

4.7

.47

.40

5.3

.53

.45

10

5.4

.54

.50

11

6.2

.62

.55

12

6.4

.64

.60

13

6.7

.67

.65

14

6.8

.68

.70

15

7.9

.79

.75

16

8.4

.84

.80

17

9.0

.90

.85

18

9.1

.91

.90

19

9.7

.97

.95

20

9.8

.98

1.00

Solution:

D=FS
D = .40 - .25
D = .15

11
SAMPLE SIZE
(N)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
25
30
35
OVER 35

.20
.900
.684
.565
.494
.446
.410
.381
.358
.339
.322
.307
.295
.284
.274
.266
.258
.250
.244
.237
.231
.210
.190
.180
1.07
___
N

LEVEL OF SIGNIFICANCE FOR D = MAXIMUM [ F0(X) - Sn(X) ]


.15
.10
.05
.925
.950
.975
.726
.776
.842
.597
.642
.708
.525
.564
.624
.474
.510
.565
.436
.470
.521
.405
.438
.486
.381
.411
.457
.360
.388
.432
.342
.368
.410
.326
.352
.391
.313
.338
.375
.302
.325
.361
.292
.314
.349
.283
.304
.338
.274
.295
.328
.266
.286
.318
.259
.278
.309
.252
.272
.301
.246
.264
.294
.220
.240
.270
.200
.220
.240
.190
.210
.230
1.14
1.22
1.36
___
___
___
N
N
N

.01
.995
.929
.828
.733
.669
.618
.577
.543
.514
.490
.468
.450
.433
.418
.404
.392
.381
.371
.363
.356
.320
.290
.270
1.63
___
N

12

Since D (.15) is less than the critical value


obtained from the table (.294) , we do not reject
the H0.

Вам также может понравиться