Академический Документы
Профессиональный Документы
Культура Документы
235245
2008 International Association of Hydraulic Engineering and Research
Keywords: Bootstrap resampling, canonical correlation analysis, extremes, flooding, risk, water level
1 Introduction
Revision received September 24, 2007/ Open for discussion until December 31, 2008.
235
236
BALTIC SEA
G u l f of
Gdansk
Mareograph
gauge
new outlet
since 1895
la
stu
i
V
Vistu
la
GDANSK
on
go
a
L
Benchmark
Station at
Tczew
POLAND
10 km
influenced by sea surges propagating up the river. The River Vistula meets the Gulf of Gdansk approximately 15 km southeast
of the first site. The regions surrounding the mouth of the river
were prone to flooding, caused by ice jams in the winter months
that constricted the flow of river water into the Gulf. In 1895 the
mouth of the Vistula was engineered to create a short, straight
route for the river to discharge into the Gulf as part of a flood
alleviation scheme. Prior to the scheme, the impact of coastal
surge events on upstream locations such as Tczew was effectively damped due to the much longer distance of the original
river course. With the scheme in place, coastal surges propagated much further upstream. The question therefore arises as to
what degree water levels on the coast and upstream are correlated through the common phenomenon of surge, and whether
the coastal monitoring station could be used to provide an early
warning of potential flood conditions upstream. From physical
considerations it might be considered that river discharges and
surges could be associated with a similar annual cycle of storms.
There is a significant amount of previous work devoted to the
problem of the joint probability of high tides combining with
large surge. Tides are, by and large, well understood and predictable, being essentially deterministic. Surges, in contrast, are
not so easy to predict and are of a more stochastic nature. Thus,
much of the research in this area was aimed at deconvolving the
deterministic and stochastic components of the total water level,
(e.g., Pugh and Vassie, 1980; Tawn and Vassie, 1989).
In the Baltic Sea, the tidal component of the water level variation is negligible and the water levels have a purely stochastic
nature. On the open coast, water level fluctuations are due to
surge. Surge is a term used to describe the combined effects of
statistical unknown
basic statistics
SSA
key patterns
key patterns
CCA
joint behaviour
bootstrap resampling
237
(4)
(5)
238
= US, where Z
constitutes predictions of Z with the Y field.
Z
Using Eq. (5), this can be written in a real space as
= YRS.
Z
(6)
(7)
1/ }
(8)
xp = + [1 (ln(1 P)) ],
(9)
Raw data
Bootstrap resample
Size n
Bootstrap replications
the best fit
Bootstrap replications
error norm
X*1
F*1(x)
*1
X*2
F*2(x)
*2
X*3
F*3(x)
*3
X*B
F*B(x)
*B
(10)
h
E( ) = max
(11)
F (xi ) ,
1in (n + 1)
where xi (i = 1, 2, . . . , n), are the original set of water levels in increasing order of magnitude. is the estimator of the
parameters in the approximating family for a particular bootstrap replication, and E is the expectation operator. In general,
analytical expressions for the expectation in Eq. (11) are not available, but a direct estimate may be obtained by using bootstrap
methods. Thus the expectation is estimated by computing the
average error norm over the B replications of the original data.
Finally, the parameters estimated from the maximum likelihood method for each family of distributions can be used to
estimate extreme return values corresponding to particular return
periods. By ordering these return values, the confidence intervals of them can be estimated in a straightforward manner. For
example, if 100 replications are performed the 90% confidence
interval is the interval between the 5th and 95th largest values.
The possibility of calculating directly the confidence intervals of
extreme values is a major advantage of the bootstrap technique
(Efron and Tibshirani, 1993). Here, the results are computed
from 500 replications. To test the sensitivity of the results to
the number of replications, the bootstrapping calculations for the
results shown in Tables 3 and 4 were repeated for 1000 and 2000
replications. Small changes in the numerical values of the error
norm were found but the relative ordering of the magnitudes was
unchanged, indicating that 500 replications provide stable and
robust estimates for this dataset.
239
240
Vistula - Tezew mean empirical statistical year of water level upon 1961-189 records
700
640
Empirical mean
580
SSA
520
460
400
340
Jan. Feb Mar. Apr. May Jun.
280
30
30
60
90
120
150
180
210
240
270
300
330
360
390
Raw series
600
Random deviations
500
400
300
200
100
0
100
200
Day
300
Gdansk harbor mean empirical statistical year of seawater level upon 1961-1989 records
501
530
1001
1501
2001
2501
Days: 1st Jan. 1961 - 31st Dec. 1970
3001
3501
Empirical mean
525
515
510
505
500
495
490
Jan.
485
30
30
Feb
Mar.
60
90
Apr.
May
120
Jun.
150
Jul.
180
Aug. Sep.
210
240
Oct.
270
Nov. Dec.
300
330
360
390
Day
120
SSA
520
Raw series
100
Random deviations
80
60
40
20
0
20
40
60
80
1
501
1001
1501
2001
2501
Days: 1st Jan. 1961 - 31st Dec. 1970
3001
3501
Figure 5 Fragment of CCA input versus centred raw data: Vistula (top),
Gulf of Gdansk (bottom)
quantities are equal to 0.25, and 0.31, respectively, again indicating the greater irregularity of this series. All in all, the random
deviations appear to be the only possible driver for joint extremes,
despite apparently insignificant individual contributions to water
level variability.
Bearing in mind the two last paragraphs it was assumed that
joint extremes can only occur between November and April, so
separate CCA runs were performed for the random deviations
from seasonality corresponding to these months with seawater as the predictors (Y -matrix) and Vistula as the predictands
(Z-matrix). The rows in these matrices contained the deviations for a given month studied (November, December, January,
February, March, April), starting from 1961 in the 1st row up to
1989 in the 29th row. Therefore, spatial locations defined by
column numbers referred to a day in a month, e.g., for November
the term Y (6, 25) contained the predictor seawater component
from 25th November 1966. The prediction skills for CCA runs,
i.e., percentages of variability of deviations from seasonality of
sea water level, explained by the variability of deviations from
seasonality of water levels in the River Vistula in consecutive
months, were low (Table 1).
November
December
January
February
March
April
0.186
0.100
0.134
0.169
0.170
0.127
SD
45
35
25
Mean
15
5
Nov
Dec
Jan
Apr
Mar
Feb
-5
1st Nov
-15
30th Apr
30
60
90
Day
120
150
180
SD
241
7
Mean
3
Feb
Jan
Dec
Nov
1
-1
1st Nov
-3
30th Apr
30
60
90
DAY
120
Model
Parameters
Normal
Log-normal
Gamma
Exponential
Weibull (II)
Weibull (III)
Gumbel
Gev
= 7.8203; = 1.2551
= 2.0433; = 0.1657
= 37.4946; = 0.2086
= 7.8203
= 7.0069; = 8.3548
= 3.6195; = 4.4886; = 3.7789
= 7.1909; = 1.2170
= 0.3606; = 7.4252; = 1.2869
Apr
Mar
150
180
242
12
12
(a)
(b)
10
10
data
Weibull(II)
Weibull(III)
Gumbel
Gev
-1
Gev
5% quantile of Gev
95% quantile of Gev
data
4
-2
-2
-1
Reduced variate
6.6
6.6
6.4
6.4
(c)
(d)
6.2
6.2
Reduced variate
6.0
data
Weibull(II)
Weibull(III)
Gumbel
Gev
5.8
5.6
5.4
6.0
5.8
5.6
Gev
5% quantile of Gev
95% quantile of Gev
data
5.4
5.2
5.2
-2
-1
-2
-1
Reduced variate
Reduced variate
Figure 7 (a) Gumbel QQ plot showing annual maxima water levels in Vistula and best fit distributions, (b) Best Gev fit with 95% confidence limits
together with annual maximum water levels in Vistula, (c) Gumbel QQ plot showing annual maximum water levels in gulf of Gdansk and best fit
distribution, and (d) Best Gev fit with 95% confidence limits together with annual maximum water levels in Gulf of Gdansk
Table 3 Computed expectation error norm (Eq. (11)) for Vistula data with 500 bootstrap replications
Model
h = 1.5
h = 1.25
h = 1.00
h = 0.75
h = 0.50
h = 0.25
Normal
Log-normal
Gamma
Exponential
Weibull (II)
Weibull (III)
Gumbel
Gev
0.1360
0.1390
0.1381
0.3877
0.1336
0.1311
0.1364
0.1286
0.1292
0.1316
0.1305
0.4311
0.1291
0.1237
0.1300
0.1208
0.1232
0.1233
0.1225
0.4763
0.1251
0.1173
0.1242
0.1161
0.1179
0.1181
0.1169
0.5172
0.1205
0.1151
0.1207
0.1135
0.1182
0.1337
0.1256
0.5227
0.1156
0.1251
0.1412
0.1109
0.1118
0.1506
0.1335
0.4124
0.0972
0.1367
0.1737
0.0993
distributions at different parts of the distribution curve, indicating the Gev distribution is the best-fit for the annual maximum
water levels in the Vistula, based on the data over the interval
from 1961 to 1989. Apart from the Gev distribution, the error
norms of the Weibull (III) are also small (Table 3). The fact
that the best fit to the data is achieved by the three-parameter
distributions is perhaps not so surprising as these distributions
243
Table 4 Return values of maximum annual extremes in Vistula with 95% confidence limits in bracket
Return period (years)
Gumbel (m)
Gev (m)
2
5
10
20
50
100
200
Note: Return period is the expected (mean) time (usually in years) between the exceedence of a particular extreme threshold
Table 5 Computed expectation error norm for data in Gdansk with 500 bootstrap replications
Model
h = 1.5
h = 1.25
h = 1.00
h = 0.75
h = 0.50
h = 0.25
Normal
Log-normal
Gamma
Exponential
Weibull (II)
Weibull (III)
Gumbel
Gev
0.1736
0.1686
0.1699
0.4769
0.2146
0.1348
0.1321
0.1258
0.1652
0.1604
0.1615
0.5310
0.2018
0.12907
0.1258
0.1188
0.1530
0.1487
0.1497
0.5822
0.1845
0.1223
0.1192
0.1119
0.1358
0.1323
0.1331
0.6170
0.1790
0.1182
0.1130
0.1068
0.1193
0.1159
0.1169
0.6020
0.2025
0.1406
0.1139
0.1091
0.1002
0.0962
0.0976
0.4585
0.1894
0.1741
0.1145
0.1107
244
Gumbel (m)
Gev (m)
2
5
10
20
50
100
200
for the extreme values are based on the 95% confidence interval
from the 500 bootstrap replications. The results determined from
the Gev are highlighted in Table 6, which performed best.
It may be noted that the confidence intervals for the water level
in the Gulf of Gdansk are much smaller than those for the Vistula.
This is not surprising when considering the respective ranges and
standard deviations of the series at Gdansk and Tczew, see Sec. 4.
7 Conclusions
(6)
(7)
In this paper the results of an investigation of the extreme distributions of the annual maximum water levels were presented
both at Tczew on the River Vistula and in the Gulf of Gdansk
with 29 years of simultaneous readings covering the period from
1961 to 1989. The extreme probability distributions were validated with the univariate bootstrap resampling technique on both
of the datasets. Some key conclusions are:
(1) The SSA and CCA approach highlights the synergistic
effect of combining two statistical methods; key behavioural
patterns in the data can be identified (SSA) and their
interdependence scrutinised (CCA).
(2) The SSA and CCA results signify practical independence of
both signals, which allows their separate bootstrap analyses.
If they are highly correlated, a joint probability approach,
such as proposed as Hawkes et al. (2002), should be
considered.
(3) The GEV distribution provides the best model for the distribution of extreme annual maximum water levels at both
locations. It has best goodness-of-fit, it is an asymptotic
model for extremes and it detects the finite character of the
phenomenon. It is also fairly robust in that the confidence
limits are such that extreme value estimates are useful for
practical purposes when bootstrap resampling is applied.
(4) The three-parameter Weibull distribution also performed
well, but exhibited greater sensitivity to the replications than
the GEV distribution.
(5) The two datasets exhibit very different behaviour. The
Gdansk water level gauge data were moderately well
described by a Gumbel distribution. In contrast the Tczew
extreme water levels exhibit a distinct curve on the QQ
plot which is not well-captured by a Gumbel distribution. This suggests that there is a limiting process that
restricts the extreme values of the water levels at Tczew.
(8)
(9)
Acknowledgements
The work described in this publication was supported by the European Communitys Sixth Framework Programme through the
grant to the budget of the Integrated Project FLOODsite, Contract
GOCE-CT-2004-505420. The paper reflects the authors views
and not those of the European Community. Neither the European
Community nor any member of the FLOODsite Consortium is
liable for any use of the information in this paper.
Appendix
Definitions of distributions and their parameters used in this
paper.
Normal distribution with location parameter and scale parameter
2
1
21 x
f(x) =
(A1)
e
2
Log-normal distribution with location parameter and scale
parameter
2
1
1 ln(x)
f(x) =
(A2)
e 2
2x
x e
()
Exponential distribution with scale parameter
f(x) =
F(x) = 1 e
(A3)
(A4)
(x )
>0
(A5)
The special cases of Gev used in this work, Weibull and Gumbel
distribution, are as follows:
Weibull (III) distribution with location parameter , scale
parameter and shape parameter
F(x) = e
F(x) = 1 e
x
(A6)
(A7)
Notation
. = Expected value
H = Weighting parameter
nm = Number of canonical modes
nt = Number of observations (realizations) of Y and
Z fields
ny = Number of spatial points in Y predictor field
nz = Number of spatial points in Z predictand field
t = Time index
w = Average prediction error; a standard deviation of
discrepancies (remainders) between predictand
Z and predictions Z
xP = Return level for return period 1/P units of time
y = Location index in Y
z = Location index in Z
I = Identity matrix
F (i) (x) = Best fit distribution function for ith bootstrap
replication
Q (nz nm) = Normalized eigenvectors of CCA system matrix
with Z swapped for Y
R (ny nm) = Eigenvectors of CCA system matrix, each scaled
to unit length
S(nm nz) = Matrix of regression coefficients relating canonical predictor amplitudes U to points in predictand Z, cf. CCA description
U (nt nm) = Canonical predictor field, each row has a unit
variance
V (nt nm) = Canonical predictand field
Xi ith = Bootstrap replications of original data sample X
Y (nt ny) = Predictor field
Z (nt nz) = Predictand field
(nt nz) = Predictions of Z with Y
Z
(nm) = Canonical correlations in CCA analysis
245
References
Chambers, J.M., Cleveland, W.S., Kleiner, B., Tukey, P.A.
(1983). Graphical Methods for Data Analysis, Duxbury,
Boston MA.
Coles, S.G., Tawn, J.A. (1994). Statistical Methods for Multivariate Extremes: An Application to Structural Design (with
discussion). Appl. Statistics 43, 148.
Cyberski, J., Wrblewski, A. (2000). Riverine Water Inflows and
the Baltic Water Volume 19011990. Hydrology and Earth
System Sciences 4(I), 111.
Efron, B. (1979). Bootstrap Methods: Another Look at the
Jackknife. Ann. Statist. 7, 126.
Efron, B., Tibshirani, R.J. (1993). An Introduction to the
Bootstrap, Chapman and Hall, New York.
Fisher, R.A., Tippett, L.H.C. (1928). Limiting Forms of the
Frequency Distributions of the Largest or Smallest Member of
a Sample. Proc. Camb. Phil. Soc. 24, 180190.
Graham, N.E. (1990). Canonical Correlation Analysis. World
Meteorological Organization report. WMO review of climate
diagnostic models.
Green, P.E. (1978). Analyzing Multivariate Data, Holt, Rinehart
& Winston, Hinsdale IL.
Green, P.E., Douglas Caroll, J. (1978). Mathematical Tools for
Applied Multivariate Analysis, Academic Press, New York.
Hawkes, P.J., Gouldby, B.P., Tawn, J.A., Owen, M.W. (2002).
The Joint Probability of Waves and Water Levels in Coastal
Defence Design. J. Hydraul. Res. 40, 241251.
Linhart, H., Zucchini, W. (1986). Model Selection, Wiley,
New York.
Ostrowski, R., Pruszak Z., Szmytkiewicz, M. (2005). Red River
Delta (Vietnam) and Vistula Delta (Poland)Similarities and
Differences. Proc. Seminar Sediment Transport in Rivers and
Transitional Waters, IBW PAN, Gdansk, pp. 6872.
Pugh, D.T., Vassie, J.M. (1980). Applications of the Joint Probability Method for Extreme Sea Level Computations. Proc
ICE, Part 2, 69, 959975.
Reeve, D.E. (1996). Estimation of Extreme Indian Monsoon
Rainfall. Int. J. Climatology 16, 105112.
Rozynski, G., Ostrowski, R., Pruszak, Z., Szmytkiewicz, M.,
Skaja, M. (2006). Data-Driven Analysis of Joint Coastal
Extremes Near a Large Non-Tidal Estuary in North Europe.
Estuarine, Coastal and Shelf Science 68(1/2), 317327.
Tawn, J.A., Vassie, J.M. (1989). Extreme Sea Levels: The Joint
Probabilities Method Revisited and Revised. Proc ICE, Part
2, 87, 429442.