Академический Документы
Профессиональный Документы
Культура Документы
net/publication/283705188
Evaluating the Three Methods of Goodness of Fit Test for Frequency Analysis
CITATIONS READS
28 303
3 authors:
Jichun Wu
Nanjing University
296 PUBLICATIONS 3,431 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
General stochastic user equilibrium traffic assignment problem with link capacity constraints View project
Reactive chemical transport in natural media: heterogeneity, nonstationarity and uncertainty. DOD Army Research Office View project
All content following this page was uploaded by Jichun Wu on 05 March 2016.
Evaluating the Three Methods of Goodness of Fit Test for Frequency Analysis
Abstract
In hydrological statistics, the traditional assessment of goodness of fit test is interested in the testing precision to the sample
generated from supposed PDF. However, the ability to reject the hypotheses when the supposed PDF is different from real
PDF should be also emphasized. In addition, the sensitivity of test method to series length is important in hydrological
analysis. Three methods of goodness of fit test that include Chi-Square (C-S), Kolmogorov-Smirnov (K-S), and
Anderson-Darling (A-D) tests are applied in this study. The results of power test indicate that the most powerful tests for
normal, uniform, P3, and Weibull distribution are K-S, C-S, and A-D tests, respectively. The test method with the best
comprehensive power is C-S test, followed by K-S and A-D test.
pk P(tk 1 X ) 1 F0 (tk 1 )
testing precision to the sample from known
distribution function, but also emphasis on the ability (3) Constructing a statistics:
to reject the hypotheses when the supposed PDF is k
(ni npi )2
different from the real one. In this paper, three often 2 (2)
used goodness of fit test methods, C-S, K-S, and A-D i 1 npi
tests were assessed. The samples used for test were
which obeys Chi-square distribution with the degree
generated from 4 common functions: normal, P3,
of freedom m, m=k-1, or m=k-1-r when there are r
Weibull, and uniform.
independent parameters of F0(x) need to be estimated
The paper is organized as follows: An introduction
by samples. And specifying a significance level α, if
is first provided for the frequency analysis and
p (χ2 ≥ χ21-α) ≥ α, then accept the hypothesis, otherwise
goodness of fit test in hydrologic frequency analysis.
reject it.
It follows the description to the methods, including
Kolmogorov-Smirnov test
C-S, K-S, and A-D tests, the probability distribution
Kolmogorov-Smirnov (K-S) test measures the
functions and parameters estimation, and samples
greatest discrepancy between the observed and
generation. Next, the statistic results of test, power
hypothesized distribution. The process can be
comparison of the three test methods, and a
summarized as follows (Melo et al., 2009; Wang,
discussion on the application of the goodness of fit
Wang, 2010):
test in hydrologic frequency analysis are presented.
(1) Sorting the samples X (x1, x2, …, xn) by ascending
Finally, the main conclusions drawn from the analysis
order, and storing it to a new vector X′ (x′1, x′2, …, x′n),
are presented.
calculating the empirical distribution function:
2 Methods 0, x x1
k
The methods used in this paper include goodness Fn ( x) , xk x xk 1 (3)
of fit tests, C-S, K-S, A-D tests, parameter estimation n
1, x xn
for probability distribution function (PDF), and
samples generation. (2) Calculating the K-S statistics D(n):
D ( n ) max Fn ( x) F0 ( x)
x1 x xn
2.1 Goodness of fit test
i i 1 (4)
Assuming (x1, x2, …, xn) is the samples from max F0 ( xi) , F0 ( xi)
1i n n n
population X, making a hypothesis: H0: F(x) = F0(x),
where F0(x) is the alternative PDF (supposed PDF) Specifying a significance level α, if p (D(n) ≥ D(n)(1-α))
with the parameters estimated by samples. ≥ α, then accept the hypothesis, otherwise reject it.
Chi-Square test Anderson-Darling test
Chi-Square (C-S) test is a simple and convenient Anderson-Darling (A-D) test emphasizes
method for hypothesis test, it is related to the overall discrepancies in both tails of the distribution, and that
fit, the process can be written as follows (Zhang, Luo, is often of prime importance in hydrologic frequency
2000): analysis. The process can be written as follows
(1) Choosing k-1 numbers: -∞ < t1 < t2 < … < tk-1 < (Coronel-Brizio, Hernandez-Montoya, 2010; Zhang
+∞, k ≈ 1.87(n-1)0.4, and the number axis is et al., 2009):
partitioned into k intervals, (-∞, t1], (t1, t2], …, (tk-2, (1) Sorting the samples X (x1, x2, …, xn) by ascending
tk-1], (tk-1, +∞]. order, and storing it to a new vector X′ (x′1, x′2, …, x′n),
(2) Collecting the number of samples dropped into calculating the empirical distribution function (Eq. 3).
the i-th interval ni, i=1, 2, … , k, and then calculating (2) Calculating the A-D statistics A2n :
the probability of the population which obeys 1 n
alternative PDF fallen into the i-th interval: An2 n [(2i 1) log F ( xi)
n i 1 (5)
+ (2n 1 2i) log(1 F ( xi)) i 1, 2,..., n
Fig. 1. The probability density curves of normal, uniform, P3, and Weibull distributions.