Вы находитесь на странице: 1из 21

SEQUENTIAL GAUSSIAN SIMULATION

SEMINAR REPORT

Submitted by

Parag Jyoti Dutta


Roll No: 09406802
in partial fulfillment for the award of the degree
of

MASTER OF TECHNOLOGY
IN
GEOEXPLORATION
At

DEPARTMENT OF EARTH SCIENCES


INDIAN INSTITUTE OF TECHNOLOGY BOMBAY
MUMBAI - 400076
NOVEMBER 2009

Seminar

Contents

Chapter 1

Introduction.......................................................................................................................2

Chapter 2

Estimation versus simulation ..... 3


Reproducing model statistics by simulation
Using the spatial uncertainty model

Chapter 3

Monte-Carlo Simulation ....... 7


Modeling spatial uncertainty

Chapter 4

The MultiGaussian RF Model.......... 9


Normal Score Transform

Chapter 5

The Sequential Simulation Genre........ 12


Remarks
Implementation

Chapter 6

Sequential Gaussian Simulation ........ 16


Limitations

Bibliography

Seminar

Chapter 1

Introduction
Spatial interpolation concerns how to estimate the variable under study at an un-sampled
location given sample observations at nearby locations. This process of estimation (kriging)
aims at computing the minimum error variance (optimal) estimate of the unknown value and the
associated error variance at the unsampled location.
In many applications, however, we are more interested in modeling the uncertainty about the
unknown rather than deriving a single estimate. Uncertainty is modeled through conditional
probability distributions. The distribution

function

F ( x; z | (n)) Prob{Z ( x) z | (n)} made

conditional to the information available (n) fully models that uncertainty in the sense that
probability intervals can be derived, such as Prob{Z ( x) (a, b] | (n)} F ( x; b | (n)) F ( x; a | (n)) .
It is worth noting that these probability intervals are independent of any particular estimate z*(x) of
the unknown value z(x). Indeed uncertainty depends on information available (n), and not on the
particular optimality criterion retained to define an estimate. Such a model of local uncertainty
allows one to evaluate the risk involved in any decision-making process, such as delineation of
rich zones of mineralization where a drill-core sampling programme needs to be planned. From
the model of uncertainty, one can also derive estimates optimal for different criteria, customized
to the specific problem at hand, instead of retaining the least-squares error (kriged) estimate.
Each conditional cumulative distribution function (ccdf) F ( x; z | ( n)) provides a measure of local
uncertainty relating to a specific location x. However, a series of single-point ccdfs do not provide
any measure of multiple-point or spatial uncertainty, such as the probability that a string of
locations jointly exceed a given threshold value. Most applications require a measure of the joint
uncertainty about attribute values at several locations taken together. Such spatial uncertainty is
modeled by generating a set of multiple equiprobable realizations {z(x)(l), xA}, l = 1, 2,.., L}
of the joint distribution of attribute values in space, a process known as stochastic simulation.
The set of alternative realizations provides a visual and quantitative measure (a model) of
spatial uncertainty. All of these realizations reasonably match the same sample statistics and
exactly match the conditioning data. Each realization reproduces the variability of the input data

Seminar

in the multivariate sense; hence said to represent the geological texture or true spatial variability
of the phenomena.

Chapter 2

Estimation versus Simulation


The objective of estimation is to provide, at each point x, an estimator z*(x) which is as close as
possible to the true unknown value of the attribute z0(x). The criteria for measuring the quality of
2
estimation are unbiasedness and minimal estimation variance {[ Z ( x) Z * ( x )] } . There is no

reason, however, for these estimators to reproduce the spatial variability of the true values { z0(x)}.
In the case of kriging, for instance, the minimization of the estimation variance involves a
smoothing of the true dispersions. Typically, small values are overestimated, whereas large
values are underestimated. Another drawback of estimation is that the smoothing is not uniform.
Rather, it depends on the local data configuration: smoothing is minimal close to the data
locations and increases as the location being estimated is gets farther away from the data
locations. A map of kriging estimates appears more variable in densely sampled areas than in
sparsely sampled areas.
On the other hand, the simulation {z(l)(x)} with l denoting the lth realization, has the same first
two experimentally found moments (mean and covariance/variogram, as well as the histogram)
as the true values {z0(x)}, i.e., it identifies the main dispersion characteristics of these true values.
However, at each point x, the simulated value z(l)(x) is not the best possible estimator of z0(x). In
case of conditional simulation, in particular, the estimation variance of z0(x) by the conditionally
simulated value zc(l)(x) is exactly twice the kriging variance.
In general, the objective of simulations and estimations are not compatible. Conditional
simulation is preferred for a better reproduction of the variability of the attribute where too much
information would be lost, otherwise, by the smoothing effects of kriging. Therefore, we do not
simulate if our purpose is estimation. Estimation is preferable to locate and estimate reserves,
while conditional simulation is preferred to study the dispersion of the characteristics of these
reserves, remembering that in practice the real values are known only at the experimental
points x. A suite of conditional simulations also provides a measure of uncertainty about the
spatial distribution of the attributes of interest.

Seminar

Smooth interpolated maps should never be used for applications sensitive to the presence of
extreme values and their patterns of continuity. Let us consider the example of a problem of
assessing groundwater travel-times from a nuclear repository to the surface. A smooth map of
estimated transmissivities would fail to reproduce critical features, such as strings of large or
small values that form flow paths or barriers. The processing of kriged transmissivity map
through a flow simulator may yield inaccurate travel-times. Similarly, the risk of soil pollution by
heavy metals would be underestimated by a kriged map of metal concentrations that fails to
reproduce clusters of large concentrations above the tolerable maximum.

Reproducing model statistics by simulation


Instead of a map of local best estimates, stochastic simulation generates a map or a realization
of z values over the study area A, say, {z (l) (x), xA} with l denoting the lth realization, which
reproduces statistics deemed most consequential for the problem in hand. Typical requisites for
such simulated maps are as follows:
1. Data values are honoured at their locations:

x = x , = 1,., n

z (l) (x) = z (x)

The realization is then said to be conditional (to the data values).


2. The histogram of simulated values reproduces closely the declustered sample
histogram.
3. The covariance model C(h) or, better, the set of indicator covariance models CI (h; zk) for
various thresholds zk are reproduced.
4. Spatial correlation with a secondary attribute or multiple-point statistics may also be
reproduced.
Figure 1 (a, left) shows 29 sample data taken from a true field (b, right) sample variogram (broken line) of the 29
data versus variogram of the true field (more continuous line).

50 .

40 .

Locations
0.

of

29

Data

40 00

5.

000

4.

000

3.

000

40 .

30 .

20 .

10 .

0.

Variogram

17.
22.
190
750
3.
1.

30 .

20 .

8.
0.
2.

3.

0.

2.

21
0

5.
980

080

260
0.

0.

2.

5700
0.

940

330 0

2.

610

1.

7100

0.

550

1
.

10.

000

920
660

1
.

2800
0.

510

840 0

0.

0.

0.

1.

0. 1
700
01
0
4.

0.

10 .

9.

840
1
.

030

11
00

340
1.

360 2.

210

0 00

1
900

81
0

0.

090 00

20 .

30 .

40 .

50 .

0.

5.

10 .

1
5.

Distance

20 .

25.

30 .

Seminar

Figure 1:

Figure 2 shows (a) the true field along with the corresponding histogram (b) kriged estimates
based on the 29 data of Figure 1 (smoother than true field); the variance of the kriged estimate
is less than the actual variance, (c, d) two sequential Gaussian simulations constrained to the
29 data; the histograms of the Gaussian simulations are similar to the true field.
Figure 2: (a) True field and histogram, (b) kriging estimates (smoother than true field); notice that the variance of the
kriging estimate is less than the actual variance, (c,d) two sequential Gaussian simulations conditioned to the data;
the histograms of the Gaussian simulations are similar to the true field.

Histogram unknown

Unknow true field


50. 000

(a)

0. 500

Number of Data
mean
std. dev.
coef. of var

0. 400

maximum
upper quartile
median
lower quartile
minimum

10. 0 00

North

6. 000

Frequency

8. 000

102. 70
2. 56
0. 96
0. 34
0. 0 1

0. 200

4. 000

0. 100

2. 000

0. 000

0. 0
0. 0

0. 300

250 0
2. 58
5. 15
2. 00

0. 0

East

0. 0

10. 0
primary -

15. 0

(b)

4. 000

maximum
upper quartile
median
lower quartile
minimum

0. 300

Frequency

North

3. 000

Number of Data
mean
std. dev.
coef. of var

0. 400

5. 000

20. 0

Z or U

Histogram kriging map

kriging map

50. 000

5. 0

50 . 000

250 0
2. 75
2. 46
0. 90
22. 75
2. 89
1. 96
1. 48
0. 0 9

0. 200

2. 000

0. 100
1. 00 0

0. 000

0. 0
0. 0

0. 0

East

0. 0

10. 0

15. 0

Histogram real.

Number of Data 250 0


mean 3. 08
std. dev.
4. 83
coef. of var 1. 57

0. 400
10. 0 00

Frequency

North

6. 000

maximum
upper quartile
median
lower quartile
minimum

0. 300

8. 000

(c)

20. 0

Estimate

Simulated realization 1

50. 000

5. 0

50 . 000

30. 00
3. 02
1. 37
0. 42
0. 0 1

0. 200

4. 000

0. 100
2. 000

0. 000

0. 0
0. 0

0. 0

East

0. 0

5. 0

10. 0

15. 0

Simulated realization 1

0. 500

Histogram real.

Number of Data 250 0


mean 2. 53
std. dev.
4. 35
coef. of var 1. 72

50. 000

10. 0 00

0. 400

maximum
upper quartile
median
lower quartile
minimum

North

6. 000

0. 300

0. 200

4. 000

2. 000

0. 0
0. 0

Frequency

8. 000

(d)

20. 0

value

50 . 000

0. 0

East

50 . 000

0. 100

0. 000
0. 0

5. 0

10. 0
value

15. 0

20. 0

30. 00
2. 52
1. 00
0. 34
0. 0 1

Seminar

Using the spatial uncertainty model


Generating alternative realizations of the spatial distribution of an attribute is rarely a goal per se.
Rather, these realizations serve as inputs to complex transfer functions such as flow simulators
in reservoir engineering. Flow simulators consider all locations simultaneously rather then one
at a time. The processing of input realizations yields a unique value for each response, for
example, a unique value for the groundwater travel time from one location to another or
remediation cost. The histogram of the L response values, corresponding to those L input
realizations provides a measure of the response uncertainty resulting from our imperfect
knowledge of the distribution of the phenomena ( z) in space. That measure can be used in
subsequent risk analysis and decision-making. In the mining industry, simulations of the spatial
distribution of an attribute can be used for studying the technical and economic effects of
complex mining operations; for instance, complex geometries in underground mining or testing
various mining schedules on several different simulations. Thus simulations provide an
appropriate platform to study any problem relating to variability, for example risk analysis, in a
way that estimates cannot.

Seminar

Chapter 3

Monte-Carlo Simulation
Let F ( x; z|n )) be the conditional cumulative distribution function (ccdf) modeling the uncertainty
about the unknown z0(x), at the point x. Rather than deriving a single estimated value z*(x) from
that ccdf, one may draw from it a series of L simulated values z(l)(x), l = 1,, L. Each value z(l)(x)
represents a possible realization of the RV Z(x) modelling the uncertainty at the location x.
The Monte-Carlo simulation proceeds in two steps:
1. A series of L independent random numbers p(l), l = 1,, L, uniformly distributed in [0,1], is
drawn.
2. The lth simulated value z(l)(x) is identified with the p(l)-quantile of the ccdf (Fig 3):
1
(l )
z(l)(x) = F ( x; p | (n))

l = 1,, L

The L simulated values z(l)(x) are distributed according to the conditional cdf. Indeed,

Prob{Z ( l ) ( x) z} P rob{F 1 ( x; p ( l ) | (n))}

from the previous definition,

(l )
= Prob{ p F ( x; z | (n))}

since F ( x; z|n )) is monotonic increasing,

= F ( x; z|n ))

since p(l) are uniformly distributed in [0,1]

This property of ccdf reproduction allows one to approximate any moment or quantile of the
conditional distribution by the corresponding moment or quantile of the histogram of many
realizations z(l)(x)

F ( x; z|n ))

z(l)(x)

z -value

Seminar

Figure 3

Monte-Carlo simulation from a conditional cdf F ( x; z|n ))

Modeling spatial uncertainty


The basic idea is to generate a set of equiprobable realizations of the joint (spatial) distribution
of attribute values at several locations and to use differences among simulated maps as a
measure of uncertainty. Rather than modeling the uncertainty at one location, a set of simulated
maps {z(l)(x), = 1,., N}, l = 1,, L, can be generated by sampling the N-variate or N-point
ccdf that models the joint uncertainty at the N locations x:

F ( x1 , x 2 ,....., x N ; z1 , z 2 ,....., z N | (n)) Prob{Z ( x1 ) z1 , Z ( x 2 ) z 2 ,......, Z ( x N ) z N | (n)}


Inference of the above N-point conditional cdf requires knowledge or stringent hypothesis about
the spatial law (multi-variate distribution) of the RF Z(x). Ccdfs can be modeled using either a
parametric (a model is assumed for the multi-variate distribution) or non-parametric (indicator)
approaches. In the parametric approach, the multi-Gaussian RF model is commonly adopted
because it is one model whose spatial law is fully determined by the z-covariance function; it
underlies several simulation algorithms such as LU decomposition algorithm, Sequential
Gaussian Simulation and Turning bands Simulation. Other Gaussian-related techniques include
truncated Gaussian and pluriGaussian simulation algorithms.
Two shortcomings of the parametric approach are:
1. The spatial uncertainty assessment becomes very complex as the number of grid nodes
increases.
2. It is cumbersome to check in practice the validity of the Gaussian assumption, and data
sparsity prevents us from performing such checks for more than two locations at a time.

Seminar

Chapter 4

The MultiGaussian RF Model


The spatial law of the RF Z(x) as derived by the assumed model must be congenial enough so
that all ccdfs F ( x; z|n )) , x A, have the same analytical expression and are fully specified
through a few parameters. The problem of determining the ccdf at location x thus reduces to
that of estimating a few parameters, say, the mean and variance. The multivariate Gaussian RF
model is most widely used because its extremely congenial properties render the inference of
the parameters of the ccdf straightforward. The approach typically requires a prior normal score
transform of data to ensure that at least the univariate distribution (histogram) is normal. The
normal score ccdf then undergoes a back-transform to yield the ccdf of the original variable.
If {Y(x), x A } is a standard multivariate Gaussian RF with covariance function CY (h), then the
following are true (Goovaerts, 1997):
1. All subsets of that RF, e.g., {Y(x), x D A }, are also multivariate normal.

2. The univariate cdf of any linear combination of RV components is normal:


n

U Y ( x )
1

is normally distributed, for any choice of n locations

x A and any set of weights .

3. The bivariate distribution of any pairs of RVs Y(x) and Y(x + h) is normal and fully
determined by the covariance function CY (h) .

4. If two RVs Y (x ) and Y (x ) are uncorrelated, i.e., if Cov{Y ( x), Y ( x )} 0, they are also
independent.
5. All conditional distributions of any subset of the RF Y (x ) , given realizations of any other
subsets of it, are (multi-variate) normal. In particular, the conditional distribution of the
single variable Y (x) given the n data

y ( x ) is normal and fully characterized by its two

10

Seminar

parameters, mean and variance, which are the conditional mean and the conditional
variance of the RV Y (x ) given the information (n):

y E{ y ( x) | (n)}

[G ( x; y | (n))] G

Var{ Y ( x) | (n)}

where G (.) is the standard normal cdf.


Under the multiGaussian model, the mean and variance of the ccdf at any location x are
identical to the simple kriging (SK) estimate
data

*
y SK
( x)

and SK variance

2
SK
( x)
obtained from the n

y ( x ) (Journel and Huijbregts, 1978). The ccdf is then modelled as

[G ( x; y | (n))]

*
SK

*
y y SK
( x)

(
x
)
SK

with

*
y SK
( x) m( x ) SK [ y ( x ) m( x )]

1
n

2
SK
( x) C (0) SK C ( x x )

Normal Score Transform


The multiGaussian approach is very convenient: the inference of the ccdf reduces to solving a
simple kriging system at any location x. The trade-off cost is the assumption that data follow a
multiGaussian distribution, which implies first that the one point distribution of data (histogram)
is normal. However, many variables in earth sciences show an asymmetric distribution with a
few very large values (positive skewness). Thus the multiGaussian approach starts with an
identification of the standard normal distribution and involves the following steps:
1.

The original z-data are first transformed into y-values with a standard normal histogram.
Such a transform is referred to as a normal score transform, and the y-values

y ( x ) ( z ( x )) are called normal scores.


2.

Provided the biGaussian assumption is not invalidated, the multiGaussian model is


applied to the normal scores, allowing the derivation of the Gaussian ccdf at any
unsampled location x:

G ( x; y | (n)) rob{Y ( x) y | (n)}


3. The ccdf of the original variable is then retrieved as

F ( x; z | (n)) Prob{Z ( x) z | ( n)}

11

Seminar

Prob{Y ( x) y | (n)}
G ( x; ( z ) | (n))
under the condition that the transform function (.) is monotonic increasing
The normal score transform function (.) can be derived through a graphical correspondence
between the univariate cdfs of the original and standard normal variables (Figure 4).
Let F (z ) and G ( y ) be the stationary univariate cumulative density functions (cdf) of the original
RF Z (x ) and the standard normal RF Y (x ) :

F ( z ) Prob{Z ( x ) z}
G ( y ) Prob{Y ( x ) y}
The transform that allows one to go from a RF Z (x ) with cdf F (z ) to a RF Y (x ) with standard
Gaussian cdf G ( y ) is depicted by arrows in Figure 4 and is written as

Y ( x) ( Z ( x)) G 1 [ F ( Z ( x ))]
1
where G (.) is the inverse Gaussian cdf or quantile function of the RF Y (x )

F(z)

G(y)

y ( z)

z-values

y-values

Figure 4: Graphical procedure for transforming the cumulative distribution of original z-values into the
standard normal distribution of original y-values called normal scores.

In practice, the normal score transform proceeds in three steps:

12

Seminar

1.

The original data {z(x), = 1,., N}are ranked in ascending order. Since the normal
score transform is monotonic, ties in z-values must be broken.

2.

The sample cumulative distribution function of the original data variable z(x), is
calculated.

3.

The normal score transform of the z-datum with rank k is matched to the
quantile of the standard normal cdf:

p k*

y ( x ) G 1 [ F * ( z ( x ))] G 1 ( p k* )

Chapter 5

The Sequential Simulation Genre


The wide class of simulation algorithms known under the generic name sequential simulation is
essentially based on the same underlying theory: instead of modeling the N-variate ccdf, a
univariate ccdf is modeled and sampled at each of the N nodes visited along a random
sequence. To ensure reproduction of the z-covariance model, each univariate ccdf is made
conditional not only to the original n data but also to all values simulated at previously visited
locations.

{Z ( x ), j 1,......, N }

j
Let
be a set of random variables defined at N locations j within the study
area A. These locations need not be gridded. The objective is to generate several joint
realizations of these N RVs:

{z (l ) ( x j ), j 1,......, N }
{z ( x ), 1,......, n}
conditional to the data set

l = 1,, L,

Let us consider the joint simulation of z-values at two locations only, say, x1 and x 2 . A set of
(l )
(l )
realizations {z ( x1 ), z ( x 2 )} , l = 1,, L, can be generated by sampling the bivariate ccdf:

F ( x1 , x 2 ; z1 , z 2 | (n)) Prob{Z ( x1 ) z1 , Z ( x 2 ) z 2 | (n)}


An alternative approach is provided by Bayes axiom, whereby any bivariate ccdf can be
expressed as a product of two univariate ccdfs:

F ( x1 , x 2 ; z1 , z 2 | ( n)) F ( x 2 ; z 2 | (n 1)) F ( x1 ; z1 | (n))

13

Seminar

where |(n+1) denotes conditioning to the n data

z ( x ) , and to the realization Z ( x1 ) z ( l ) ( x1 ).

(l )
(l )
The above decomposition allows one to generate the pair {z ( x1 ), z ( x 2 )} in two steps: the
(l )

value z ( x1 ) is first drawn from the ccdf F ( x1 ; z1 | (n)) , then the ccdf at location x 2 is
(l )

conditioned to the realization z ( x1 ) in addition to the original data ( n) and its sampling yields
(l )

the correlated value z ( x 2 ) . The idea is to trade the sampling hence modeling of the bivariate

ccdf for the sequential sampling of two univariate ccdfs easier to infer, hence the generic name
sequential simulation algorithm.
The sequential principle can be generalized to more than two locations. By recursive application
of the Bayes axiom, the N-variate ccdf can be written as the product of N univariate ccdfs:

F ( x1 ,......x N ; z1 ,......z N | (n)) F ( x N ; z N | (n N 1)) F ( x N 1 ; z N 1 | (n N 2)) ............


. F ( x 2 ; z 2 | (n 1)) F ( x1 ; z1 | (n))
F ( xN ; z N | (n N 1)) is the ccdf of Z ( x N ) given the set of n original data
Z ( x j ) z (l ) ( x j ), j 1,......, N 1
values and the ( N 1) realizations
where, for example,

The above decomposition allows one to generate a realization of the random vector

{Z ( x j ), j 1,......, N }

in N successive steps:

Model the cdf at the first location x1 , conditional to the n original data

z ( x ) :

F ( x1 , z | (n)) Prob{Z ( x1 ) z | (n)}

(l )

Draw from that cdf a realization z ( x1 ), which becomes a conditioning datum for all
subsequent drawings.
.
.
.
.
.

At the ith node

xi visited, model the conditional cdf of Z ( xi ) given the n original data and
z (l ) ( x j )
x j , j 1,....., i 1 :

all the (i 1) values

simulated at previously visited locations

14

Seminar

F ( xi , z | (n i 1)) Prob{Z ( xi ) z | (n i 1)}


Draw from that ccdf a realization
subsequent drawings.

Repeat the two previous steps until all the N nodes are visited and each has been given
a simulated value.

The resulting set of simulated values


the

z (l ) ( xi ),

RF

{Z ( x), x A} over

{z (l ) ( x j ), j 1,......, N }

the

which becomes a conditioning datum for all

{z ( l ) ( x j ), j 1,......, N }
nodes

x j

Any

represents just one realization of

number

L of such realizations

, l = 1,, L, can be obtained by repeating L times the entire sequential

process with possibly different paths to visit the N nodes.

Remarks:
1. The sequential simulation algorithm requires the determination of a conditional cdf at
each location being simulated. Two major classes of sequential simulation algorithms
can be distinguished, depending on whether the series of conditional cdfs are
determined using the multi-Gaussian or the indicator formalisms.
2. Sequential simulation ensures that data are honored at their locations (conditional).
Indeed, at any datum location

x , the simulated value is drawn from a zero-variance,

unit step ccdf with mean equal to the z-datum

z ( x ) itself. If large measurement errors

render questionable the exact matching of data values, one should allow the simulated
values to deviate somewhat from data at their locations. If the errors are normally
distributed, the simulated value could be drawn from a Gaussian ccdf centered on the
datum value and with a variance equal to the error variance.
3. The sequential principle can be extended to simulate several continuous or categorical
attributes.

Implementation

15

Seminar

Search strategies
The sequential simulation algorithm requires the determination of N successive conditional cdfs

F ( x1 ; z | (n)),....... , F ( x N ; z | (n N 1)) , with an increasing level of conditioning information.


Correspondingly, the size of the kriging system (s) to be solved to determine these ccdfs
increases and becomes quickly prohibitive as the simulation progresses. The data closest to the
location being estimated tend to screen the influence of more distant data. Thus, in the practice
of sequential simulation, only the original data and those previously simulated values closest to
the location
distance

being simulated are retained. Good practice consists of using the semi-variogram

( x x ) so that the conditioning data are preferentially selected along the direction of

maximum continuity.
As the simulation progresses, the original data tend to be overwhelmed by the large number of
previously simulated values, particularly when the simulation grid is dense. A balance between
the two types of conditioning information can be preserved by separately searching the original
data and the previously simulated values (two part search): at each location

, a fixed number

n(x ) of closest original data are retained no matter how many previously simulated values are
in the neighborhood of

Visiting sequence
In theory, the N nodes can be simulated in any sequence. However, because only neighboring
data are retained, artificial continuity may be generated along a deterministic path visiting the N
nodes. Hence, a random sequence or path is recommended.
When generating several realizations, the computational time can be reduced considerably by
keeping the same random path for all realizations. Indeed, the N kriging systems, one for each
node

x j

, need be solved only once since the N conditioning data configurations remain the

same from one realization to another. The trade-off cost is the risk of generating realizations that
are two similar. Therefore, it is better to use a different random path for each realization.

16

Seminar

Multiple grid simulation


The use of a search neighborhood limits reproduction of the input covariance model to the
radius of that neighborhood. Another obstacle to reproduction of long-range structure is the
screening of distant data by too many data closer to the location being simulated.
The multiple-grid concept (the attribute values are first simulated on a coarse grid and then
continue on a finer grid) allows one to reproduce long-range correlation structures without
having to consider large search neighborhoods with too many conditioning data. The previously
simulated values on the coarse grid are used as data for simulation on the fine grid. A random
path is followed within each grid. The procedure can be generalized to any number of
intermediate grids; the number depends on the number of structures with different ranges to be
reproduced and the final grid spacing.

Chapter 6

Sequential Gaussian Simulation


Implementation of the sequential principle under the MultiGaussian RF model is referred to as
sequential Gaussian simulation (sGs). Several algorithms exist: algorithms for simulating a
single attribute using only values of that attribute, with modifications to account for secondary
information as well as for joint simulation of several correlated attributes. Here, only the first
case, i.e., accounting for a single attribute, is considered.

Let us consider the simulation of the continuous attribute z at N nodes


necessarily regular) conditional to the data set

x j

of a grid (not

{z ( x ), 1,......, n} .

Sequential Gaussian simulation proceeds as follows:


1.

First step: check the appropriateness of the multiGaussian RF model, which calls for a
prior transform of z-data into y-data with a standard normal cdf using the normal score

17

Seminar

transform. Normality of the bivariate distribution of the resulting normal score variable

Y ( x) ( Z ( x)) is then checked. In practice, if indicator semivariograms or ancillary


information do not invalidate the biGaussian assumption, the multiGaussian formalism is
adopted.
If the multiGaussian RF model is retained for the y-variable, sequential Gaussian

2.

simulation is performed on the y-data:

Define a random path visiting each node of the grid only once.

At each node x , determine the parameters (mean and variance) of the Gaussian

ccdf G ( x ; y | (n)) using SK with the normal score variogram model Y (h) . The

conditioning information (n) consists of a specified number n(x ) of both normal score
data

(l )
y ( x ) and values y ( x j ) simulated at previously visited grid nodes.

(l )

Draw a simulated value y ( x ) from that cdf, and add it to the data set.

Proceed to the next node along the random path, and repeat the two previous
steps.
Loop until all N nodes are simulated.

3.

The

final

step

{ y (l ) ( x j ), j 1,...., N }

consists

of

back-transforming

the

simulated

normal

scores

into simulated values for the original variable, which amounts to

applying the inverse of the normal score transform to the simulated y-values:

z (l ) ( x j ) 1 ( y (l ) ( x j ))

j 1,....., N

1
1
1
with (.) F (G (.)), where F (.) is the inverse cdf or quantile function of the variable Z,

and G(.) is the standard Gaussian cdf. That back-transform allows one to identify the original

z-histogram F (z ) . Indeed,
Prob{Z ( l ) ( x ) z} Prob{ 1 (Y ( l ) ( x)) z}
(l)
= Prob{Y ( x ) ( z )}

since (.) is monotonic increasing

18

Seminar

= G[ ( z )] F ( z )
Other realizations

from the definition of normal score transform

{z (l ) ( x j ), j 1,......, N }, l l ,
are obtained by repeating steps 2 and 3 with a

different random path.


The basic steps of the sGs algorithm are illustrated in the following flow chart.

Non-stationary

behaviors

could

be

accounted for using algorithms other than simple kriging to estimate the mean of the Gaussian
ccdf: ordinary kriging or universal kriging of the order k. However, Gaussian theory requires that
the simple kriging variance of normal scores be used for variance of the Gaussian ccdf (Journel,
1980).

Limitations:
Various limitations and shortcomings can be attributed to sequential Gaussian simulation:
1. sGs relies on the assumption of multi-variate Gaussianity, an assumption that can never be
fully checked in practice, yet always seems to be taken for granted. Multi-Gaussianity leads

19

Seminar

to simulated realizations that have maximally disconnected extremes (maximum entropy), a


property that often conflicts with geological reality.
2. sGs requires a transformation into Gaussian space before simulation and a corresponding
back-transformation after simulation is finished. However, often the primary variable to be
simulated has to be conditioned to a secondary variable that is a linear or non-linear volume
average of the primary variable. Normal-score transforms are non-linear transforms, hence
they destroy the possible linear relation that exists between primary and secondary variable,
or, they change the non-linearity if that relation is non-linear.
3. sGs reproduces, by theory, only the normal score variogram, not the original variogram
model. Usually reproduction of the normal score variogram entails reproduction of the
original data variogram if the data histogram is not too skewed. However in case of high
skewness, the reproduction of the variogram model after back-transformation is not
guaranteed at all.
Actually, reproduction of the covariance model CY (h) does not require the successive ccdf
models to be Gaussian; they can be of any type as long as their means and variances are
determined by simple kriging (Journel, 1994). This result leads to an important extension of the
sequential simulation paradigm whereby the original z-attribute values are simulated directly
without any normal score transform. This algorithm is called direct sequential simulation (dssim).
In the absence of a normal score transform and back-transform, there is, however, no control on
the histogram of simulated values. Reproduction of a target histogram can be achieved by post
processing the dssim realization.

Bibliography
Goovaerts, P., 1997. Geostatistics for Natural Resources Evaluation. Oxford Univ. Press, New
York, 512 pp.
Journel, A.G., Huijbregts, C.J., 1978. Mining Geostatistics. Academic Press, New York, 600 pp.
Journel, A.G., 1980. The lognormal approach to predicting local distributions of selective mining
unit grades. Mathematical Geology, 12(4), 285303.
Journel, A.G., 1994. Modeling uncertainty: Some conceptual thoughts. Geostatistics for the
Next Century, pages 3043. Kluwer, Dordrecht.

20

Seminar

Вам также может понравиться