Академический Документы
Профессиональный Документы
Культура Документы
DOI 10.1007/s00477-010-0400-5
ORIGINAL PAPER
Jinzhong Yang
Abstract In risk analysis, a complete characterization of function is obtained by sampling the approximate polyno-
the concentration distribution is necessary to determine the mials. Our synthetic examples show that among the MC
probability of exceeding a threshold value. The most popular methods, the Quasi Monte Carlo gives the smallest variance
method for predicting concentration distribution is Monte for the predicted threshold probability due to its superior
Carlo simulation, which samples the cumulative distribution convergence property and that the stochastic collocation
function with a large number of repeated operations. In this method is an accurate and efficient alternative to MC
paper, we first review three most commonly used Monte simulations.
Carlo (MC) techniques: the standard Monte Carlo, Latin
Hypercube sampling, and Quasi Monte Carlo. The perfor- Keywords Threshold probability Monte Carlo
mance of these three MC approaches is investigated. We then simulation Stochastic collocation Interpolation
apply stochastic collocation method (SCM) to risk assess-
ment. Unlike the MC simulations, the SCM does not require a
large number of simulations of flow and solute equations. In 1 Introduction
particular, the sparse grid collocation method and probabi-
listic collocation method are employed to represent the con- The risk of groundwater contamination has gained extensive
centration in terms of polynomials and unknown coefficients. attention in the design of underground tanks or landfills.
The sparse grid collocation method takes advantage of Uncertainties from geological properties, hydrologic
Lagrange interpolation polynomials while the probabilistic parameters, initial plume location, and leakage flux lead to
collocation method relies on polynomials chaos expansions. the risk of the concentration being overestimated or under-
In both methods, the stochastic equations are reduced to a estimated. Moreover, the potential risk associated with
system of decoupled equations, which can be solved with contaminant transport is of great importance in the design of
existing solvers and whose results are used to obtain the an efficient monitoring network (Meyer et al. 1994; Storck
expansion coefficients. Then the cumulative distribution et al. 1997; Yenigul et al. 2005; Bierkens 2006). A typical
risk analysis addresses the following questions: ‘‘What can
happen? How likely is it to happen? Given it occurs, what are
D. Zhang (&) H. Chang
Department of Energy and Resources Engineering, College the consequences?’’ (Bedford and Cooke 2003). In a
of Engineering, Peking University, 100871 Beijing, China groundwater system, the risk assessment consists of two
e-mail: dxz@pku.edu.cn main components: The likelihood of the system failure
(groundwater contamination) and the (health or economic)
D. Zhang L. Shi H. Chang
The Sonny Astani Department of Civil and Environmental consequences of contamination. Especially, when the human
Engineering, University of Southern California, Los Angeles, health consequence is considered, the behavioral and phys-
CA 90089, USA iological uncertainty should also be incorporated (Maxwell
and Kastenberg 1999). In this study, the focus is on
L. Shi J. Yang
National Key Laboratory of Water Resources and Hydropower the probability of exceeding a certain threshold value,
Engineering Science, Wuhan University, 430072 Wuhan, China also called the threshold probability, due to geological
123
972 Stoch Environ Res Risk Assess (2010) 24:971–984
uncertainty in the subsurface environment. The threshold model for characterizing the local uncertainty subject to
probability at a given location X and time t is defined as P(X, multiple random sources? Is it possible to describe the
t) = P{C(X, t) C Cth}, where Cth is a threshold value. It is probability of exceeding a given threshold with single dis-
straightforward to read P(X, t) from the Cumulative Distri- tribution irrespective of the observation location and time?
bution Function (CDF). Because of the difficulty in obtaining These questions have not been addressed.
analytical CDF in practical situations, the theoretical CDF is Recently, stochastic collocation technique is becoming
usually approximated by sample CFD. The sample CFD increasingly popular in the computational community (Tatang
converges to the theoretical CDF as the sampling size grows et al. 1997; Xiu and Hesthaven 2005; Ganapathysubramanian
to infinity. This sampling method is called Monte Carlo and Zabaras 2006; Babuska et al. 2007; Foo et al. 2007).
simulation. In reality, Monte Carlo simulation is the most Stochastic collocation method is essentially an ensemble
popular method for coping with uncertainties in risk based approach with the excellent compatibility to compli-
assessment (Vose 1996; Hamed and Bedient 1999; Kentel cated conditions. Unlike the moment equation method (Zhang
and Aral 2005). For example, the GMS Software (Environ- 2002), stochastic spectral finite element method (Ghanem and
mental Modeling Research Laboratory 2005) makes use of Spanos 1991), and KLME approach (Zhang and Lu 2004; Shi
standard Monte Carlo and Latin Hypercube approaches, et al. 2008), stochastic collocation method leads naturally to a
whereas all the random sources in GMS are considered as decoupled system without any modification to the original
random variables. In the framework of standard Monte partial differential equation. In this work, we apply the sto-
Carlo, for each simulation, a random number is generated for chastic collocation method to describing the threshold prob-
each parameter according to the specified distribution using ability. Because the concentration CDF is the key ingredient
the mean, standard deviation, maximum and minimum. As an for risk assessment, we will examine the performance of our
alternative, Latin Hypercube approach divides the probability model for sampling CDF rather than the mean and variance.
distribution curve into several sub-intervals of equal proba- Since stochastic collocation technique performs the sampling
bility. The random numbers are selected within each in its postprocessing, it is not necessary to assign a distribution
sub-interval. Unlike the work of GMS, most Monte Carlo a priori to the contaminant plume.
simulations introduce the concept of random field to charac- Despite the attractiveness in the concept of random field,
terize the spatial heterogeneity of some properties (e.g., in this work only the random variables are considered. It is
hydraulic conductivity) (Meyer et al. 1994; Yenigul et al. the special case of random field with infinite correlation
2005). length. This work can be extended to the cases of random
Although there are several other numerical methods fields with the aid of Karhunen–Loeve expansions or other
available to estimate the uncertainty (and the accompanying techniques (e.g., Li and Zhang 2007; Chang and Zhang
threshold probability) in contaminant transport, most exist- 2009). We explore the ability of stochastic collocation
ing works focus on first two moments, i.e., mean concen- method for handling multiple random sources. A detailed
tration and local variance (Zhang and Neuman 1996; Dagan comparison between different Monte Carlo methods and
and Fiori 1997; Fiori et al. 2002; Caroni and Fiorotto 2005). stochastic collocation method is presented.
However, due to the complexity of the subsurface flow and The remainder of the paper is organized as follows: In
solute transfer, Goovaerts (1997) addressed the necessity of Sect. 2, we give a review on Monte Carlo methods,
using local probability distribution function. Under certain including standard Monte Carlo simulation, Latin Hyper-
restriction, it is possible to derive the analytical concentra- cube Monte Carlo, and Quasi Monte Carlo. In Sect. 3, we
tion CDFs. Dagan (1982) found that in the absence of pore- cope with threshold probability based on the stochastic
scale dispersion, the point concentration at a given location collocation method. The performance of the proposed
has a binary CDF at early times. And Bellin et al. (1994) method is examined by numerical Monte Carlo experi-
showed in a small sampling volume the point concentration ments. Concluding remarks are presented in Sect. 5.
CDF honors bimodal distribution, whereas it approaches the
normal distribution as the sampling volume grows. Some
other work (Fiori 2001; Fiorotto and Caroni 2002; Caroni 2 Monte Carlo simulations
and Fiorotto 2005) illustrated the applicability of Beta dis-
tribution in representing the variability of local concentra- For completeness, we present a brief review on three well-
tion. Bellin and Tonina (2007) further gave a physical studied Monte Carlo methods—namely, standard Monte
interpretation for the choice of Beta distribution. They val- Carlo, Latin Hypercube sampling, and Quasi Monte Carlo.
idated their arguments with the Cape Cod tracer test data The convergence properties of these three methods are also
(Leblanc et al. 1991). However, all these conclusions were reviewed to determine their applicability. Here we consider an
obtained by only introducing the spatial variability of N-dimensional stochastic problem (with N independent ran-
hydraulic conductivity. Is Beta distribution still a good dom variables). The first step of all Monte Carlo simulations is
123
Stoch Environ Res Risk Assess (2010) 24:971–984 973
to construct sampling points n1 ; . . .; nM , where M is the 2.3 Quasi Monte Carlo (QMC)
sample size (the total number of realizations),
and
each vector
ni has N independent components ni1 ; . . .; niN . For the con- QMC is based on low-discrepancy sequence (LDS)
sistency with most existing literature, we focus on the sam- (Morokoff and Caffisch 1994; Morokoff and Caffisch 1995).
pling of random variables that are uniformly distributed on Low discrepancy sequences include Halton sequence, Faure
[0, 1]N. Random numbers for other distributions can be con- sequence, Sobol sequence, and Niederreiter sequence, all of
structed by probability distribution transformation. Transfor- which are based on the Van der Corput sequence. The Van
mation rules for several distributions commonly used in der Corput sequence uses a prime number b as its base, and
engineering can be found in Johnson (1994). expresses any natural number k with the following formula
k ¼ di b j þ di1 bj1 þ þ d1 b1 þ d0 ð1Þ
2.1 Standard Monte Carlo (SMC) simulation
where di 2 f0; 1; . . .; b 1g for i ¼ 0; 1; . . .; j. Obviously,
The basic idea of standard Monte Carlo simulation is to if b = 2, di di1 di2 d0 compose of the binary numbers
randomly draw samples from the uniform distribution on of k. Then we have
[0, 1]N. The sampling process is trivial, and its convergence d0 d1 dj
/b ðkÞ ¼ 0 þ 1 þ þ j ð2Þ
is proven at the rate of M-1/2 (regardless of dimension N). b b b
In many cases, the sample size M becomes prohibitively In this way, a different seed k derives a different /b(k), then
large to obtain a reasonably small error. This motivates the these values of /b(k) can form a Van der Corput sequence
exploration of improved techniques that can yield the same (see Table 1, where b = 2). In this work, we introduce the
error with less computational effort. Several other tech- Sobol sequence (Zsolt and Andras 2004) that reorders the
niques have been developed in this literature, notably Van der Corput sequence in each dimension, and we make
importance sampling (e.g., Lu and Zhang 2003), Latin sure that each column is independent to another. The
Hypercube sampling (McKay et al. 1979), Quasi Monte numerical implementation of QMC is quite similar to LHS,
Carlo (Niederreiter 1992; William and Russel 1994). Note except in step 1 QMC generates the random numbers by
that these techniques are devised to generate N-dimen- (2). We refer the reader to Homem-de-Mello (2008) for a
sional random vectors that are uniformly distributed on thorough discussion of the convergence property of QMC.
[0, 1]N and have independent components.
3 Stochastic collocation methods
2.2 Latin Hypercube sampling (LHS)
Although Monte Carlo simulation is straightforward to
Latin Hypercube sampling (LHS) is one of the stratified
apply as it only requires repetitive execution of determin-
sampling methods. It actually belongs to variance reduction
istic solvers at each sampling point, the statistics typically
techniques for reducing the variance in the estimated
converge at relatively slow rates. A class of stochastic
solution (Homem-de-Mello 2008). The basic idea of LHS
collocation methods have been proposed (Xiu and Hest-
is to partition the sample space into subspaces with equal
haven 2005; Xiu 2007; Babuska et al. 2007; Nobile et al.
probability and generate the samples in each subspace. The
2008). By using existing theory on multivariate polynomial
LHS is performed as follows: suppose we want to draw M
interpolations, the stochastic collocation methods generally
samples from vector n with N independent components
obtain fast convergence rates. Moreover, the numerical
n1 ; . . .; nN , each of which has a uniform distribution, U[0,
implementations of stochastic collocation methods are also
1]. There are two steps for each dimension i ¼ 1; . . .; N:
straightforward as the existing codes for deterministic
1. Divide the region to M sub-regions, and generate problems can be employed at each interpolation point. The
stochastic collocation methods are classified by different
1 1 2 2 3
s U 0; ; s U ; ; . . .; ways of generating collocation (interpolation) points. There
M M M
are four routine methods available to generate the collo-
M1
sM U ;1 cation (interpolation) points, including full tensor product
M
method (Tatang et al. 1997; Babuska et al. 2007), sparse
2. nij ¼ sPðjÞ ði ¼ 1; . . .; N; j ¼ 1; . . .; M Þ, where P is a
random permutation of 1; . . .; M. Table 1 Generation of the Van der Corput sequence (b = 2)
Although LHS does reduce the variance of estimated k 1 2 3 4 5 6 7
solution (mean), it cannot improve the convergence rate. It
dj dj1 dj2 d0 1 10 11 100 101 110 111
also converges at the rate of M-1/2, the same as standard
ub(k) 0.5 0.25 0.75 0.125 0.625 0.375 0.875
Monte Carlo simulation.
123
974 Stoch Environ Res Risk Assess (2010) 24:971–984
123
Stoch Environ Res Risk Assess (2010) 24:971–984 975
After determining the coefficients in Eq. 9, it is quite where u(.)s are coefficients; gki ðk ¼ 1; . . .; 3; i ¼ 1; . . .; NÞ
easy to sample the possible realizations of u. By expanding are roots of H3 ðgi Þ ¼ g3i 3gi ; 1ki ðk ¼ 1; . . .; 5; i ¼
Eq. 9, the level-2 (k = 2) sampling equation is given as, 1; . . .; NÞ are roots of H5 ð1i Þ ¼ 15i 1013i þ 151i ; rki ðk ¼
u~k¼2;N ðnÞ ¼ u~k¼2;N ðn1 ; . . .; nN Þ 1; . . .; 9; i ¼ 1; . . .; NÞ are roots of H9 ðri Þ ¼ r9i 36r7i
þ378r5i 1260r3i þ 945ri .
ð N 1Þ ð N 2Þ
¼ uð0; . . .; 0Þ ðN 1Þ By choosing nested points (i.e., hðk; N Þ hðk þ 1; N Þ),
2 it is possible to extend the sampling equation from level k
XN X 3 Y3
ni nli to k ? 1. The interpolation expression is rewritten by the
k l
i¼1 k¼1 l¼1;l6¼k ni ni following nested form,
X
u 0; . . .; 0; nki ; 0; . . .; 0
Puk;N ðnÞ ¼ Puk1;N ðnÞ þ Mi1 MiN ð13Þ
XN X 5 Y 5
ni nni jij¼Nþk
m
þ m n u 0; . . .; 0; ni ; 0; . . .; 0
n ni
i¼1 m¼1 n¼1;k6¼j i where Pu0 ¼ 0 and Mi ¼ Pui Pui1 .
N X
X 3 X
N X
3 Y
3 Y
3 Thus, to go from the level-2 sampling to the level-3
þ sampling in N dimensions, one only has to add the terms
i¼1 k¼1 j¼1;j [ i r¼1 k¼1;k6¼j s¼1;s6¼r related to points that are unique to the level-3,
ni nli ni nsi k r
X
u 0; . . .; 0; n ; 0; . . .; 0; n ; 0; . . .; 0
r s
nki nli ni ni
i j u~k¼3;N ðnÞ ¼ u~k¼2;N ðnÞ þ Mi1 MiN ð14Þ
jij¼Nþ3
ð11Þ
It is seen that the main computational cost for the stochastic
where u(.)s are coefficients for corresponding collocation
collocation method is in computing the coefficients in
points; nki ðk ¼ 1; 2; 3; i ¼ 1; . . .; N Þ are roots of H3 ðni Þ ¼
interpolation polynomials. Sampling (11–14) only requires
n3i 3ni . Similarly, the level-3 (k = 3) sampling equation
interpolation operations. Therefore, concentration realiza-
can be written as
tions can be obtained without solving stochastic partial
u~k¼3;N ðnÞ ¼ u~k¼3;N ðn1 ; . . .; nN Þ differential equations after getting the coefficients in the
ð N 1Þ ð N 2Þ ð N 3 Þ polynomials. Standard sampling, Latin Hypercube sampling
¼ uð0; . . .; 0Þ
6 and Quasi Monte Carlo technique can be used to sample the
ðN 1ÞðN 2Þ X N X 3 Y3
ni gli polynomials. We note that, the Lagrange interpolating
þ
2 gk gli polynomial is the polynomial of degree B(N - 1). When
i¼1 k¼1 l¼1;l6¼k i
the high-degree polynomials are used, a strong oscillation may
u 0; . . .; 0; gki ; 0; . . .; 0
occur between two interpolation points (although a perfect fit
X N X 5 Y 5
ni 1li is observed at the interpolation points). The oscillation
ðN 1Þ k 1l
u 0; . . .; 0; 1ki ; 0; . . .; 0
i¼1 k¼1 l¼1;l6¼k i
1 i becomes strong when the interpolating point is close to the
N 1 X
X 3 X
N X
3 Y
3 Y
3 ends of probability distribution interval, resulting in a
ðN 1Þ polynomial oscillating above and below the true function.
i¼1 k¼1 j¼1;j [ i r¼1 l¼1;l6¼k s¼1;s6¼r The divergence between interpolating function and
n gli nj gsj interpolating polynomial is known as Runge’s phenomenon
ki u 0; . . .; 0; g k
i ; 0; . . .; 0; g r
j ; 0; . . .; 0 (Berrut and Trefethen 2004). Such a phenomenon will be
gi gli grj gsj
XN X 9 Y9 showed in the next section. And we emphasize that the higher-
ni rni
þ m n u 0; . . .; 0; rm i ; 0; . . .; 0
order interpolation is also sensitive to coefficients, so a
i¼1 m¼1 n¼1;k6¼j i
r ri stringent numerical precision (double precision) is necessary
X
N X
3 X
N X
5 Y
3 Y 5
ni gli nj 1sj to alleviate the interpolation error.
þ
i¼1 k¼1 j¼1;j6¼i r¼1 l¼1;l6¼k s¼1;s6¼r i
gk gli 1rj 1sj
u 0; . . .; 0; gki ; 0; . . .; 0; 1rj ; 0; . . .; 0 3.2 Probabilistic collocation method
X
N 2 X
3 X
N 1 X
3 X
N X
3 Y
3 Y
3 Y
3
With the same operation as Eq. 5, probabilistic collocation
þ
i¼1 k¼1 j¼1;j [ i r¼1 p¼1;p [ j u¼1 l¼1;l6¼k s¼1;s6¼r v¼1;v6¼p
method also leads to a series of decoupled deterministic
n gli nj gsj np gvp partial differential equations. Probabilistic collocation
ki method utilizes the finite-dimensional polynomial chaos
gi gli grj gsj gup gvp
expansion in the following form (Li and Zhang 2007;
u 0; . . .; 0; gki ; 0; . . .; 0; grj ; 0; . . .; 0; gup ; 0; . . .0 ð12Þ Huang et al. 2007; Shi et al. 2009),
123
976 Stoch Environ Res Risk Assess (2010) 24:971–984
X
N coefficients in (16) sequentially, we can accomplish the
uðx; t; nÞ ¼ a0 ðx; tÞC0 þ ai1 ðx; tÞC1 ni1 sampling by substituting the randomly generated fni g3i¼1
i1 ¼1 into (16). We note that since the number of collocation
N X
X i1
points depends on the number of coefficients in the poly-
þ ai1 i2 ðx; tÞC2 ni1 ; ni2 nomials chaos expansion, the probabilistic collocation
i1 ¼1 i2 ¼1
method may be restricted by the ‘‘curse of dimensionality’’
N X
X i1 X
i2
þ ai1 i2 i3 ðx; tÞC3 ni1 ; ni2 ; ni3 þ ð15Þ when N is very large.
i1 ¼1 i2 ¼1 i3 ¼1
n o Cp ni1 ; . . .; nip denotes
where polynomial
of p order, 4 Numerical examples
nip are random variables, ai1 ; . . .; aip are coefficients.
For a stochastic system with Gaussian random variables, In this study, we focus on the solute transport by consid-
the polynomials are Hermite polynomials in terms of multi- ering advection and dispersion. The conservative solute
dimensional Gaussian random variables (Ghanem and transport in 2-D groundwater flow under advection and
Spanos 1991). The generalized polynomial chaos provides dispersion is given as
a more efficient way to represent non-Gaussian problems
ocðx; tÞ
(Xiu and Karniadakis 2002). We highlight that polynomial ¼ r Dij ðx; tÞrcðx; tÞ r ðqðx; tÞrcðx; tÞÞ
chaos expansion is often used to represent a random pro- ot
cess/field in the literature while in this work polynomial ð17Þ
chaos expansion is employed to n represent
o a smooth func- where c is the solute concentration; D is the hydrodynamic
tion about random variables nip . It is expected that dispersion tensor; q(x, t) is the pore water velocity; x is
formula (15) can approximate any functional in L2 and spatial coordinate; t is time. The components of the
converges in the L2 sense (Cameron and Martin 1947). dispersion tensor, Dij, are given by
Once the stochastic problem is cast in a Gaussian system qi qj
(by transforming all the variables to Gaussian), the prob- Dij ¼ aT jqðx; tÞjdij þ ðaL aT Þ þ Dd dij ð18Þ
jqðx; tÞj
abilistic collocation method selects points from combina-
tions of the roots of a Hermite polynomial of one order where aL is the longitudinal dispersivity, aT is the
higher than the order of orthogonal polynomial. If the transverse dispersivity, |q(x, t)| is the magnitude of the
polynomial order is p, then there are at most (p ? 1)N pore velocity, dij is the Kronecker delta function (dij = 1 if
points available by using tensor products. The total number i = j, and dij = 0 if i = j), and Dd is the molecular
of coefficients in the polynomial chaos of order p and diffusion coefficient in free water. The pore velocity is
dimension N is M ¼ ðN þ pÞ!=ðN!p!Þ; which is usually computed by q(x, t) = u(x, t)/n, where n is the porosity of
much smaller than (p ? 1)N. The probabilistic collocation the porous media and u is Darcy velocity. The steady state
method selects M points by keeping as many of the vari- flow in saturated media can be described as
ables of high probability as possible (Tatang et al. 1997). r ½K ðxÞrhðxÞ ¼ gðxÞ ð19Þ
For clarity, a three-dimensional second-order polynomial
chaos is shown here, where K(x) is hydraulic conductivity, h(x) is hydraulic head,
and g(x) is source/sink term. In this work, in the absence of
uðx; t; nÞ ¼ a0 þ a1 n1 þ a2 n2 þ a3 n3 þ a11 n21 1 detailed field measurements the conductivity, porosity, and
þ a12 n1 n2 þ a22 n22 1 þ a23 n2 n3 dispersivity are regarded as random variables with known
2
þ a33 n3 1 ð16Þ mean, variance, and probability distribution function.
Then, thepcollocation
ffiffiffi pffiffiffi points are the combinations of three
roots 3; 0; 3 of C3 ðni Þ ¼ n3i 3ni . We rank these
pffiffiffi p ffiffiffi 4.1 One-dimensional case
roots in order of decreasing probability, 0; 3; 3 ,
since 0 has the highest probability for the standard In the one-dimensional case, the two boundaries are
Gaussian random variable. The first collocation point, i.e., imposed with constant heads. There is an instantaneous
{0, 0, 0} is selected with the highest probability for each source with unit concentration (1 mg/l) released over
random variables. The other points are then generated x = 4.75–5 m and at t = 0. All the random parameters are
similarly. For the non-normal (or non-lognormal) inputs, assumed to be log-normal. The mean trends and variances
Askey polynomials (Xiu and Karniadakis 2002) have to be of conductivity, porosity, and dispersivity are given a pri-
introduced. Thus, the collocation points are constructed by ori. Simulations are conducted in mildly (i.e., r2ln k ¼ 0:1)
the combinations of roots of Askey polynomials. After the heterogeneous formation with minor log-porosity variance
selection of collocation points and solving for the r2ln / ¼ 0:01 and log-dispersivity variance r2ln a ¼ 0:1. The
123
Stoch Environ Res Risk Assess (2010) 24:971–984 977
corresponding coefficients of variation for conductivity, SMC and LHS. To obtain the convergent solution for this
porosity, and dispersivity are 32, 10, and 32%, respec- problem, SMC, LHS and QMC require about 300, 200, 100
tively. A single realization of the log-conductivity (or log- realizations, respectively. We also notice that concentration
porosity, log-dispersivity) was generated by CFDs at x = 17.5 approach a log-normal distribution at the
Yi ðxÞ ¼ hY ðxÞi þ rni ð20Þ late time (Fig. 2).
Figures 4 and 5 shows the estimated threshold proba-
where r is the standard deviation of Y; h i is the ensemble bility from three different Monte Carlo approaches. The
average; ni is the random number corresponding to reali- results for LHS and QMC are shown at t = 10 days. It is
zation i. The mean conductivity is shown as Fig. 1. The seen from Fig. 4 that 50 sets of 100 samplings of standard
mean porosity and dispersivity are given as 0.3 and 0.3 m, Monte Carlo leads to a series of threshold probability
respectively. We also define a threshold concentration curves fluctuate around the reference solution. It is seen
Cth = 0.02 mg/l. Flow equation is solved using the finite from Fig. 5 that both LHS and QMC show attractive per-
difference code Modflow (Harbaugh et al. 2000), and formance with 100 realizations because of the narrower
transport equation is solved by MT3DMS (Zheng and fluctuations.
Wang 1999). We then examine stochastic collocation methods for
We first give a comparison of all the Monte Carlo estimating the threshold probability. After solving the
methods described before. The reference CFDs are coefficients in formula (11) and (12), the interpolating
obtained numerically with 1000 Monte Carlo simulations operations are executed with random numbers. For clarity,
that are found to have achieved convergence. In the fol- we take the level-2 SGCM as illustration. The coefficients
lowing figures, the error bars associated with the reference and corresponding collocation points are listed in Table 2.
solution are also plotted. Concentrations at three time Fifty realizations with the level-2 and level-3 interpolation
levels, t = 10 days, t = 20 days, and t = 30 days are are shown in Figs. 6 and 7, respectively. Most curves in
observed. Three different Monte Carlo methods are eval- Figs. 6 and 7 have nearly identical shapes, while the level-
uated with regard to their ability to approximate the CDFs. 3 formula improves the interpolation accuracy for some
Their performance is particularly investigated with 100 points. Illustrated by the point of (n1, n2, n3) = (-2.29,
realizations. Figure 2 presents the CFDs at x = 17.5 m 0.63, -0.31), its corresponding realizations (highlighted by
from standard Monte Carlo (SMC) simulation. Results are the thick black lines in Figs. 6, 7) from the level-3 inter-
obtained by 50 sets of 100 samplings. The simulated CFDs polation is more reasonable than the level-2 interpolation.
from different 100 samplings show mild variation at all However, even the level-3 interpolation may generate
time levels. In the following analysis, we only give the unphysical (negative) concentration values (see Fig. 7). It
results at t = 10 for Latin Hypercube sampling (LHS) and is expected that a higher-level interpolation will further
Quasi Monte Carlo (QMC). Figure 3a shows the estimated reduce the number of negative concentrations. As we
CFDs using LHS. It is found that all the CFDs from dif- mentioned before, some abnormal realizations may be
ferent sets of 100 LHS realizations change in a much generated by Runge’s phenomenon. For example, realiza-
narrower range than SMC. As pointed out by Homem-de- tion 29 (highlighted by the thick red lines in Figs. 6, 7) is
Mello (2008), LHS indeed reduces the estimation variance constructed by random number (n1, n2, n3) = (1.26, 3.28,
of CFD. A small variation range is also observed in Fig. 3b -1.82). Each of them is located at a tail region of N(0, 1).
for Quasi Monte Carlo. It is seen that for all three Monte We also note that these strongly abnormal realizations
Carlo methods studied, the QMC has smallest estimation appear at a very low probability. Figure 8 shows 2000
variance while SMC approximates the estimator with a concentration realizations at x = 27.25 m. Only two
considerably large variance. Our further convergence strongly abnormal values are generated in these 2000
analysis shows QMC exhibits faster convergence than do realizations. The negative concentrations from insufficient
interpolation level and Runge’s phenomenon may result in
7
negative concentration in CFD when low-probability (such
6 as 0.01%) concentration is concerned. We suggest using a
5 large sample size (such as 10,000) to reduce the effect of
4 negative concentrations on CFD curve although negative
<K>
3
concentrations generated from interpolation can not be
2
1
avoided.
0 Unlike the Monte Carlo simulation, these realizations are
0 5 10 15 20 25 30
obtained from ordinary interpolation. The statistical prop-
x/m
erties of obtained realizations can be examined by comparing
Fig. 1 The mean conductivity of one-dimensional case their sample moments with theoretical moments provided by
123
978 Stoch Environ Res Risk Assess (2010) 24:971–984
Xiu and Hesthaven (2005). The estimated CFDs at x = 17.5 Table 3. The total number of collocation points for prob-
at t = 10 days from SGCM and PCM are shown in Fig. 9. abilistic collocation method is M = (p ? N)!/(N!p!). The
Except for the second-order PCM (Fig. 9b), the other three number of collocation points for sparse grid collocation
match the reference CFD very well. The simulated threshold method depends on the type of the points (Gaussian
probability from the stochastic collocation methods are abscissas in this study). It is seen that PCM has a relatively
given in Fig. 10. It seems that the level-2 SGCM (31 runs) is slow increase in the number of collocation points with
sufficient to get reasonable results while the level-3 SGCM dimensionality N while that of level-3 SGCM increases
(111 runs) improves the accuracy further. The second-order fast.
(p = 2, 10 runs) PCM (Fig. 10b) does not match the refer- By comparing all the stochastic collocation methods
ence solution well while the fourth-order (p = 4, 35 runs) with respect to their accuracy and computational cost, it is
PCM significantly enhances the agreement. By comparing to seen that the fourth-order PCM is preferred over the level-2
results from 100 runs of Monte Carlo simulation (Figs. 4a, and level-3 SGCM. Under small input variances, it is
5), it is seen that the level-3 SGCM (Fig. 10a) yields more expected that the fourth-order PCM provides decent results
stable results although at a similar computational cost. The with a lower computational cost than the level-3 SGCM,
same conclusion is obtained when comparing the level-2 while giving a better accuracy than the level-2 SGCM. This
SGCM with 31 Monte Carlo runs (no figure presented). advantage of the fourth-order PCM is significant for high-
The number of collocation points as functions of random dimensional problems (large N). In the two-dimensional
dimensionality N for the second-order PCM, fourth-order case to be discussed next, we will examine this point under
PCM, level-2 SGCM and level-3 SGCM are given in mild input variances.
123
Stoch Environ Res Risk Assess (2010) 24:971–984 979
4.2 Two-dimensional case in three facies, the porosity and longitudinal dispersivity
are independent with each other, we introduce five random
In the two-dimensional case, parts of the left and right sides variables in this example. We also treat the longitudinal
are assigned as constant heads (see Fig. 11, constant head and transverse dispersivity as fully correlated parameter,
parts are delineated with red lines), while the remaining and have aT = 0.1aL. The threshold concentration is
boundaries are impervious. There is an instantaneous 2 9 10-4 mg/l.
source with unit concentration (1 mg/l) released over two This problem is solved by SMC, LHS, QMC, SGCM,
cells (x = 7.25–7.5 m and y = 5–5.5 m) at t = 0. The and PCM. Here we shall not present the detailed compar-
domain of study is a confined aquifer with three facies, and isons among the three Monte Carlo approaches. It is found
the interfaces between different facies are known. We that the necessary realizations to obtain the convergent
ignore the spatial heterogeneity in each facies but charac- threshold probability for SMC, LHS and QMC are about
terize the conductivity of each facies by one random var- 1000, 700, and 500, respectively. The reference probability
iable. Three different values of mean hydraulic distributions at t = 1 day, t = 5 days and t = 10 days
conductivity and of variance for three different lithofacies (shown in Fig. 12) are computed by 1000 Quasi Monte
are specified. The mean log-conductivities for facies I, Carlo simulations. The threshold probability distributions
facies II, facies III are 2.3, 3.9, and 1.6, respectively. The at t = 1 day, t = 5 days and t = 10 days from the level-2
variances are 0.3, 0.5 and 1, respectively. There is a SGCM (71 representations), level-3 SGCM (351 repre-
pumping well located in region III with a pumping rate of sentations), and fourth-order PCM (126 representations)
5 m3/d. With the assumption that the values of conductivity are given in Figs. 13, 14 and 15, respectively. All of these
123
980 Stoch Environ Res Risk Assess (2010) 24:971–984
Concentration(mg/l)
0.06
0.040315 0 0 0
0.04
0.017468 1.7321 0 0
0.03401 0 1.7321 0 0.02
0.031309 0 0 1.7321
0.010477 -1.7321 0 0 0
0.039177 0 -1.7321 0
-0.0 2
0.051097 0 0 -1.7321 0 5 10 15 20 25 30
0.003042 2.857 0 0 x(m)
0.027288 0 2.857 0
Fig. 7 50 realizations from interpolating formula (12)
0.026664 0 0 2.857
0.001338 -2.857 0 0
0.04
0.034631 0 -2.857 0
0.03
0.05852 0 0 -2.857
0.02
c(mg/l)
0.025138 1.3556 0 0
0.01
0.035894 0 1.3556 0
0.033082 0 0 1.3556 0
123
Stoch Environ Res Risk Assess (2010) 24:971–984 981
Non-flow
h=4.11.5 m
123
982 Stoch Environ Res Risk Assess (2010) 24:971–984
Fig. 12 The threshold probability at t = 1 day (a), t = 5 days (b), and t = 10 days (c) from Quasi Monte Carlo (QMC) with 1000 realizations
Fig. 13 The estimated threshold probability at t = 1 day (a), t = 5 days (b), and t = 10 days (c) from Level 2 (k = 2) SGCM (71
representations)
Fig. 14 The estimated threshold probability at t = 1 day (a), t = 5 days (b), and t = 10 days (c) from Level 3 (k = 3) SGCM (351
representations)
Fig. 15 The estimated threshold probability at t = 1 day (a), t = 5 days (b), and t = 10 days (c) from fourth-order (p = 4) PCM (126
representations)
5. Stochastic collocation method is applicable to prob- study are qualitative rather than quantitative expressions. It
lems with non-normal (or non-lognormal) inputs. should be noted that for the threshold probability with random
Because of the generality of Lagrange polynomials, fields involved, additional techniques (such as Karhunen-Lo-
it is straightforward to use sparse grid collocation eve expansion) have to be introduced to represent the random
method to characterize non-normal inputs. However, fields as a combination of random variables. The Lagrange
for the probabilistic collocation method, the general- polynomials interpolation and polynomial chaos expansion
ized polynomial chaos (Xiu and Karniadakis, 2002) would retain the same form as presented in this paper although
may be needed to handle the non-normal inputs. one has to generate independent random vectors (rather than
variables) during the interpolation.
As expected, the performance of stochastic collocation
methods depends on the number of random sources being Acknowledgements This work is partially supported by Natural
considered and their respective variances. It is also known that Science Foundation of China (NSFC) under grants 50688901,
different parameters may have distinct impacts on the total 40672164, and 0620631. And LS Shi would like to acknowledge the
uncertainty. The ‘‘small’’ and ‘‘mild’’ uncertainties used in this support by China Scholarship Council through grant 2007101645.
123
Stoch Environ Res Risk Assess (2010) 24:971–984 983
123
984 Stoch Environ Res Risk Assess (2010) 24:971–984
William JM, Russel EC (1994) Quasi-random sequences and their Zhang D, Lu Z (2004) An efficient, high-order perturbation approach
discrepancies. SIAM J Sci Comput 15(6):1251–1279 for flow in random porous media via Karhunen-Loeve and
Xiu D (2007) Numerical integration formulas of degree two. Appl polynomial expansions. J Comput Phys 194:773–794
Numer Math 58(10):1515–1520 Zhang D, Neuman SP (1996) Effect of local dispersion on solute
Xiu D, Hesthaven JS (2005) High-order collocation methods for transport in randomly heterogeneous media. Water Resour Res
differential equations with random inputs. SIAM J Sci Comput 32(9):2715–2723
27(3):1118–1139 Zheng C, Wang PP (1999) MT3DMS: a modular three-dimensional
Xiu D, Karniadakis GE (2002) The Wiener-Askey polynomial chaos multispecies model for simulation of advection, dispersion and
for stochastic differential equations. SIAM J Sci Comput chemical reactions of contaminants in groundwater systems;
24:619–644 documentation and user’s guide, Contract Rep. SERDP-99–1,
Yenigul NB, Elfekia AMM, Gehrels JC, van den Akkera C, U.S. Army Eng Res and Dev Cent, Vicksburg
Hensbergenb AT, Dekking FM (2005) Reliability assessment Zsolt S, Andras P (2004) Alternative sampling methods for estimating
of groundwater monitoring networks at landfill sites. J Hydrol multivariate normal probabilities. J Econ 120(2):207–234
308:1–17
Zhang D (2002) Stochastic methods for flow in porous media:
copying with uncertainties. Academic Press, San Diego
123