Practical Geostatistics 2000-2 Spatial Statistics

Practical Geostatistics 2000-2
Spatial Statistics
p|1
p|2
TABLE OF CONTENTS
Part 1 The Spatial Aspect ...................................................................................... 7
Spatial Relationships .................................................................................................................................. 9
Including Location as well as Value ...................................................................................................................9 Spatial Relationships ............................................................................................................................................ 11
Inverse Distance Estimation ................................................................................................................ 13

Inverse Distance Estimation .............................................................................................................................. 13
Worked Examples .................................................................................................................................... 19

Worked Examples .................................................................................................................................................. 19 Coal Project, Calorific Values ............................................................................................................................. 19 Iron Ore Project....................................................................................................................................................... 20 Wolfcamp Aquifer .................................................................................................................................................. 21 Scallops Caught ....................................................................................................................................................... 21
Part 2 The Semi-Variogram ............................................................................... 25

The Experimental Semi-Variogram ................................................................................................... 27
The Semi-Variogram ............................................................................................................................................. 27 The Experimental Semi-Variogram ................................................................................................................ 31 Irregular Sampling ................................................................................................................................................. 33 Cautionary Notes .................................................................................................................................................... 34
Modelling the Semi-Variogram Function ........................................................................................ 35

Modelling of the Semi-Variogram Function ................................................................................................ 35 The Linear Model.................................................................................................................................................... 35 The Generalised Linear Model .......................................................................................................................... 36 The Spherical Model.............................................................................................................................................. 36 The Exponential Model ........................................................................................................................................ 37 The Gaussian Model .............................................................................................................................................. 38 The Hole Effect Model .......................................................................................................................................... 38 Paddington Mix Model ......................................................................................................................................... 39 Judging How Well the Model Fits the Data .................................................................................................. 40 Equivalence to Covariance Function .............................................................................................................. 41 The Nugget Effect ................................................................................................................................................... 41
Worked Examples .................................................................................................................................... 42

Worked Examples .................................................................................................................................................. 42 Silver Example ......................................................................................................................................................... 43 Coal Project, Calorific Values ............................................................................................................................. 44 Wolfcamp Aquifer .................................................................................................................................................. 46
Part 3 Estimation and Kriging .......................................................................... 53

p|3
Introduction ................................................................................................................................................ 55
Estimation and Kriging ........................................................................................................................................ 55
Estimation Error ....................................................................................................................................... 58

Estimation Error ..................................................................................................................................................... 58 One Sample Estimation ........................................................................................................................................ 59 Another Single Sample ......................................................................................................................................... 62 Two Sample Estimation ....................................................................................................................................... 64 Another Two Sample Estimation ..................................................................................................................... 69 Three Sample Estimator ...................................................................................................................................... 71
Choosing the Optimal Weights ............................................................................................................ 73

Choosing the Optimal Weights ......................................................................................................................... 73 Three Sample Estimation .................................................................................................................................... 75 The General Form for the 'Optimal' Estimator .......................................................................................... 79 Confidence Levels and Degrees of Freedom ............................................................................................... 80 Simple Kriging ......................................................................................................................................................... 81
Ordinary Kriging ....................................................................................................................................... 82

Ordinary Kriging ..................................................................................................................................................... 82 'Optimal' Unbiassed Estimator ......................................................................................................................... 84 Alternate Form: Matrices .................................................................................................................................... 86 Alternate Form: Covariance ............................................................................................................................... 86 Three Sample Estimation .................................................................................................................................... 87
Cross-Validation ........................................................................................................................................ 88
Cross Validation ...................................................................................................................................................... 88 Cross Cross Validation.......................................................................................................................................... 93
Worked Examples .................................................................................................................................... 94

Worked Examples .................................................................................................................................................. 94 Coal Project, Calorific Values ............................................................................................................................. 94 Iron Ore Example ................................................................................................................................................... 96 Wolfcamp, Residuals from Quadratic Surface ............................................................................................ 97
Part 4 Areas and Volumes ................................................................................101

The Impact on the Distribution .........................................................................................................103
Areas and Volumes ............................................................................................................................................. 103 The Impact on the Distribution ..................................................................................................................... 104 Iron Ore, Normal Example ............................................................................................................................... 106 Geevor Tin Mine, Lognormal(ish) Example.............................................................................................. 109
The Impact on Kriging ..........................................................................................................................112

The Impact on Kriging ....................................................................................................................................... 112 The Use of Auxiliary Functions ...................................................................................................................... 114
p|4
Iron Ore Example, Page 95 .............................................................................................................................. 118 Wolfcamp Aquifer, Quadratic Residuals .................................................................................................... 118 Part 5 Other Kriging Approaches ................................................................................121
Universal Kriging ....................................................................................................................................123

Other Kriging......................................................................................................................................................... 123 Universal Kriging ................................................................................................................................................. 124 Wolfcamp Aquifer ............................................................................................................................................... 127
Lognormal Kriging .................................................................................................................................128

Lognormal Kriging .............................................................................................................................................. 128 The Lognormal Transformation.................................................................................................................... 130 Geevor Tin Mine, Grades .................................................................................................................................. 131 SA Gold Mine ......................................................................................................................................................... 132
Indicator and Rank Uniform Kriging...............................................................................................133

Indicator Kriging.................................................................................................................................................. 133 Rank Uniform Kriging........................................................................................................................................ 136 Summary of Part 6 .............................................................................................................................................. 138
p|5
p|6
Part 1 The Spatial Aspect
p|7
p|8
Spatial Relationships
Including Location as well as Value
Apart from the last couple of applications of Least Squares regression in Part 5 of the previous course, all of our discussions so far have considered only the measured values at each sample location. We have concentrated on assessing the 'global' qualities of our variables and on estimating population parameters be they means and standard deviations or correlations and relationships. However, our original problem, as defined in the Introduction, was to produce 'maps' of the values at unsampled locations. That is, to estimate unknown values at locations which have not been sampled. We will use a set of data published in the original Practical Geostatistics (Clark (1979)) which was a simulation based on an actual iron ore project (Iron Ore ). The values are the average quantity of iron (%Fe) in borehole samples taken through the whole intersection of the 'economic mineralisation'. Several boreholes have been drilled, all at the same angle, on a regular 100 foot grid. The resulting values are shown in the borehole layout. Notice that some of the boreholes have not (yet?) been drilled. Within the terms of our original question, we might ask 'what would be the value at the indicated location where no borehole has been drilled?'. In our statistical analyses in the previous course we have, tacitly, assumed that the measured values are drawn randomly from some idealised population of all possible samples. It was that population in which we were interested. No reference was made to location. We have to amend our basic assumptions if we redefine our problem to refer to a particular potential measurement rather than the whole population. The first two assumptions are retained:

sample values are measured precisely and are reproducible; sample values are measured accurately and represent the true value at that location. Just to remind ourselves, the other two essential assumptions were:
these samples constitute a very small part of a large homogeneous population of all possible samples; these samples were gathered randomly and independently from that large population.
p|9
For our redefined problem we need to replace these last two assumptions with the following:
The samples are collected from a physically continuous, homogeneous population of all possible samples. In simpler terms, the phenomenon we have measured at the sample locations also exists at all the unsampled locations within the study area with no sudden changes in characteristic. For example, if we were dealing with a coal seam, we assume that the coal seam is present at all potential drilling sites within the study area and that there are no faults, washouts or burnt areas within that area. As another example, if we are counting weeds in a field, we assume that there are no areas where, say, the farmer has spread extra weedkiller and no significant changes in soil type which might affect weed growth. The values at unsampled locations are related to the values at the sampled locations. If there is no relationship between samples and unsampled values, then we are back to our 'random' concept and the best estimate for an unknown value () would be the average of the population. Our 'worse case scenario', then, is a random phenomenon, which would give us an estimated value of:
where: i. denotes the unknown value at the unsampled location; ii. * is any estimate of that unknown value; iii. * is our standard notation for the estimate of the true average of the population, and iv. is the simple arithmetic average of our sample values, 36.4 %Fe. Effectively, is just a g we haven't measured yet. If we want to find confidence limits for and g has a Normal distribution then we can say that comes from a Normal distribution with mean and standard deviation . We can state that we are 90% confident that:
p | 10
Of course, we don't have and , so we would have to substitute and s as estimates. In Part 2 of the previous course we have seen that replacing by s means that the 1.6449 (from Table 2 - above left) has to be replaced by the relevant value from Table 4 (above right) of Student's t distribution. Our estimate was produced from 47 samples, so (presumably) the degrees of freedom associated with this estimate are = 46. Checking Table 4 (above right), we find that:
the above expression becomes:
That is, we can be 90% certain that the true value at the unsampled location lies between 30.2 %Fe and 42.6 %Fe. Of course, the whole data set only ranges between 28 and 44 %Fe, so this is a pretty safe statement. In summary, then, if the values are completely random, our unknown value at the unsampled location is simply another number drawn at random from the population of all possible samples. If there is some relationship between sample values and unsampled values which depends on location, we ought to be able to do better than simply using the overall mean and getting very wide confidence intervals.
Spatial Relationships
In order to produce an estimate for the value at a specified location which is better than a random guess, we need to assume that there is some sort of relationship between values in the area which depends on the location of the samples. There are many different ways to approach this, the major factor being what kind of relationship we are willing to accept. All mapping packages are based on this assumption, with some packages (such as Surfer) offering several different algorithms to produce estimated grid values. As an example of the sort of assumption available, a bicubic spline mapping method assumes that we are trying to map a smooth continuous surface which needs to be 'differentiable' at every point. Basically, all mapping methods assume that the 'unknown' value tends to be related to sample values which are close to it. We tend to assume that if the locations are close together then the values will be close together. Conceptually,
p | 11
we assume that an estimator put together from neighbouring samples will be more useful than one which includes more distant sampling. In this book we tackle the estimation methods grouped under the title of 'geostatistics'. What differentiates geostatistical estimation from other mapping methods is, simply, the form of the relationship which is assumed to be present between values at different locations in the study area. In the next three parts of this course, we will see how 'kriging' methods evolve directly from this basic concept of spatial relationship. Looking at the plot of the borehole values in our example data set, we can see that there is some sort of 'continuity' in the values. Most of the samples in the middle of the area are in the mid 30s, shading to the 40s towards the northwest and the 20s towards the southeast. The point we have indicated where we want to estimate the value is towards the northwest corner. On the basis of apparent continuity in values, we would expect this borehole to have a value in the high 30s or low 40s. If we had selected a point in the south for our attention, we would expect a value in the low 30s or high 20s. Our estimator for the unknown value, T, will be some sort of combination of the neighbouring sample values. Let us consider the simplest of all combinations the linear combination or weighted average. We take the local samples where we have values and combine those sample values with weighting factors influenced by how close they are to the unsampled location. That is:
where m is the number of samples we want to include in the estimation and the gi the values of those samples. The wi are the 'weights' which are attached to each sample. Using the above example, let us home in on the area around the unsampled location of interest. For this illustration, we will consider the seven samples surrounding the unsampled grid node. The estimate for the unknown value becomes:
where the wi should be chosen according to how close each sample is to T. Samples 1, 3, 5 and 7 are 121 feet from the unsampled location. Samples 2, 4 and 6 are 100 feet from T. Intuitively, weights 2, 4 and 6 should be greater than weights 1, 3, 5 and 7. The only question is, how much greater? It is interesting that we have no direct measure of how 'close' two locations are. We can measure the distance between them but not the closeness. We will,
p | 12
therefore, have to assume that closeness is some inverse function of distance. For example, we could suppose that:
that is, the weighting is inversely proportional to the distance from the unsampled location. For our example, this would produce a weight of 0.01 for samples 2,4 and 6 and a weight of 0.007071 for the diagonal samples. The resulting estimator would be:
This does not make a lot of sense. We wanted a value in %Fe and we have a value in %Fe per foot. We expected an estimate between 37 and 44 %Fe and we have an estimate of just over 2.0. Obviously we are doing something wrong here. The problem is in the distance units. We need to remove the units of distance to get weights which are 'pure numbers''. We also need to do this in a way which gives us a sensible answer. The simplest way to do this is to choose a set of weights which add up to 1. At the moment our weights add up to 0.051213. If all of the samples had a value of 37 %Fe, our estimator would be 0.051213 of 37 %Fe. If all our samples had a value of 37 %Fe, surely our estimator should be 37 %Fe? The only way to guarantee this is to choose weights which add up to 1. To calculate weights which add up to 1 is pretty straightforward: find out what they do add up to and then divide each one by that number. Our estimator for the unsampled location is 39.6 %Fe, if we use 'inverse distance' estimation.
Inverse Distance Estimation

Inverse Distance Estimation
We have produced an estimator for unsampled locations which is a weighted average of the neighbouring sample values. This estimator is based on the assumptions that:

the values form a continuous 'surface' across the whole area; the relationship between values depends on the distance between their locations.
p | 13
In some cases, we might want to extend that last assumption to include direction. For example, sediments in a river would be expected to be more continuous downstream than across the river bed. The general form of the estimator is:
where
where f in this context denotes a function of the distance value, d. The concepts and calculation of this estimator are intuitively attractive and very straightforward. However, in practice, use of such an estimator gives rise to more questions than are answered.
In the following sections, we pose some of these questions and, in the next few sections, we will attempt to answer them. The order of the questions is not an indication of their relative importance. 1. What function of distance should we use in any given application? In the illustration above, we have used the function 1/d to produce our estimator. We could have easily used any one of the following functions:
or any other function which was an inverse function of distance. The higher the power of function, the more weight will be given to closer samples. 2. How do we handle different continuity in different directions? We have said above that there may be phenomena which have different relationships in different directions. There may be physical controls on how values were produced. This is known in geostatistics as 'anisotropy'. In pollution studies, we may have flow directions or plume shapes to deal with. In fishing, the offshore direction may have a different level of predictability from the longshore direction.
p | 14
3. How many samples should we include in the estimation? In the previous sections on statistical analyses, we have seen that the more samples we use the better our estimator becomes. Is this true in inverse distance type estimation? Well, no. The more samples we include the thinner we have to spread the available weight. Remember that the weights have to add up to one, so if you include more samples the weight for those has to come off the closer samples. Of course, you can compensate for this by changing the power of the function you use. 4. How do we compensate for irregularly spaced or highly clustered sampling? In the simple example above, we have a grid of samples. In other applications see, for example, the Wolfcamp data we have significant irregularities and clustering in sample locations. It is natural for a cattle rancher to sink a new water well into an aquifer where he has had good pressure in the past. It is almost inevitable in mining that a geologist will schedule more sampling in the rich areas than in the poor ones. It is difficult to obtain a budget from a project manager to sink holes in the 'waste' purely to balance your statistics. 5. How far should we go to include samples in our estimation process? This is not the same question as in 2 above. This is a question about the continuity of the phenomenon we are studying. For example, if you have a coal seam, you might believe that there is a relationship between quality of coal at samples more than a kilometre apart. If you have a gold reef, on the other hand, you will be lucky if there is any relationship more than 100 metres away. Rainfall is a pretty continuous phenomenon, especially over oceans, but not at the same scale over mountainous terrain. 6. Should we honour the sample values? Presumably we are not (in the real world) going to do all these calculations by hand. If we wish to produce a map, we usually lay a grid of nodes over the area and estimate the value at each grid node. The contours are then produced on the basis of the gridded values. A computer program or spreadsheet application can be used to produce the grid node estimates. However, at some of those nodes we will already know the value because we will have a sample there. What happens when d becomes zero? None of the 1/d type functions can be calculated if d is zero. The exponential and R-d type functions can be, but will not give all of the weight to the sample at that point. We have to make a special case for the calculation at the sample locations. So we have to answer this question. On the basis of our fundamental assumptions precise and accurate sampling the answer has to be yes. We return to this problem in the next few sections and see what a profound impact it can have on our results and our perception of our confidence in the final estimates. 7. How reliable is the estimate when we have it? In the previous sections, we have seen how we can produce confidence intervals for estimators. Can we do the same here? The estimator is a linear combination of the sample values. If the values come from a Normal distribution, then so does the linear
p | 15
combination. We can work out the mean and standard deviation of that combination:
provided that the weights sum to 1 and all of the samples come from the same Normal population. The variance of * is given by:
expanding out the square of the bracket produces a table of terms as follows:
multiplying through:
This horrendous list of terms has to be averaged over the whole population. We can simplify this a bit by remembering that the average of all the gs is and that the weights add up to 1. This means that:
and
so that
p | 16
and all of the terms with in them boil down to a single -2. The remainder of the huge expression is then a list of all of the possible cross-products:
for which the population average must be found. Using another algebraic trick, we could show that sum of all the terms {wiwj} is also 1. Try it if you do not believe us. If this is true then
so that the general expression above becomes:
or, with a very little jiggery pokery:
but we know from the previous course that:
is the covariance between gi and gj, so all of these terms must be covariances between each pair of samples, ij. When i = j, of course, the covariance is equal to the variance of the g values, 2g. So we can write the variance of the estimator * as
You may be wondering why this is so much more complicated than in the previous course, where we found the confidence intervals for the arithmetic mean using . Two reasons: a. all of the weights were equal at (in this case) 1/m;
p | 17
b. the sample values were assumed to be random and independent, so that all of those covariance terms were zero. Under these circumstances, the above expression would reduce to:
which is the result we had in the previous course for estimating from the sample mean, . In this case, we definitely do not want the sample values to be uncorrelated, so the covariance terms must be non-zero. If we want to evaluate how reliable our estimate is to put confidence limits round it we are going to have to be able to calculate the covariances between pairs of samples. We have assumed that the relationship between the samples is a function of distance. We now know what kind of relationship it is that interests us the covariance. All of the above algebraic gymnastics have served to explain to us that the covariance between two samples a given distance apart should depend only on that distance (and possibly the direction). This sounds remarkably like Krige's basic assumption for his weighted average template approach, which we discussed in the Part 5 of the previous course. We will return to this in the next section. 8. Why is our final map too smooth? A weighted average estimator cannot produce values which are larger than the largest single sample value or smaller than the smallest sample value. We know from the previous course that making averages of sample values reduces the standard deviation of the answers. This means that weighted average estimates must, by definition, have a smaller range of possible values that the actual individual values from the population. This is another instance of what Krige called the 'regression effect' back in the 1950s. For mapping purposes, we get the opposite effect to that when planning mining blocks: a weighted average will tend to under-estimate high values and over-estimate low values whe Part 5. Simulation is one way of quantifying just how smooth the predicted maps are. 9. What happens if our sample data is not Normal? We have seen in Part 3 of the previous course that using arithmetic mean calculations with highly skewed data can produce seriously erroneous results. Is it, therefore, at all sensible to use a weighted average of sample values if the data is from a highly skewed distribution whether positively or negatively skewed? We will discuss this problem at greater length in Part 6. 10. What happens if there is a strong trend in the values? In this context, a 'trend' or 'drift' is taken to mean a consistent change in the expected value of the phenomenon as we move across the study area. This is a trend as discussed in the previous section. A weighted average estimator will only be effective in the presence of trend if:
p | 18
a. the data is taken on a regular spacing in all directions, and b. the form of the trend is a simple increase or decrease in one particular direction. If the change in values is more complex perhaps peaking or forming troughs in values a weighted average will smooth out the 'dips' and 'humps' and leave us with an even smoother map than that discussed in point 8 above. Coping with trend will be discussed (to some extent) in the next section and in Part 6. 11. How do we estimate average values over areas or volumes? This point is only made here to suggest its relevance to our original problem. In mining applications, in particular, values are generally required for mining blocks or stoping areas. Rarely is a mine planned on the basis of 'point' values. Average values over an area or volume can be produced by estimating a grid of point values within the area and averaging the resulting estimates. We will see in Part 5 that there are simpler and quicker methods to obtain a direct estimate for the average over a volume or area.
Worked Examples
Worked Examples
This section contains worked examples using the following datasets:

Coal Project Iron Ore Wolfcamp Scallops
Coal Project, Calorific Values

If we are to consider location, the first thing we need to do is to get an idea of the layout of the samples in two or three dimensions. The simplest way to do this is to draw a 'post plot' of the sample data. This is simply a map showing the locations of the samples and their measured values. Post plots can be labelled, as in the figure here, or coloured or shaded by value. For this illustration we wish to estimate the grid point which has not been drilled and is labelled . We have 5 points in the immediate neighbourhood. We might want
p | 19
to include the sample 300 metres to the east and that 300 metres to the north to fill in the gaps. Let us try both ways. We will use the simple weighting function of inverse distance squared. Our calculation table would be as follows for the 5 sample case, numbering samples clockwise from North:
so that our estimator for the unsampled location would be:
Using seven samples, we would obtain:
giving us a slightly lower estimator at 24.944 MJ. The impact of changing the search radius is illustrated in Figures 2 and 3 below. Both mapping exercises use simple inverse distance weighting. Figure 2 uses a search radius of around 390 metres, which was evaluated on the basis of getting an average of 20 samples within the search circle. Figure 3 was produced using a search radius of 250 metres, which basically takes the first 'ring' of samples around each unsampled grid point.
Iron Ore Project

For this example we will use inverse distance methods to map the values over the area of the Iron Ore project data. We have chosen to estimate a grid of points every 10
p | 20
metres. In the two examples shown, we use simple inverse distance squared with various search distances. The area of interest is 400 metres square, containing 50 samples. To obtain an average of 20 samples for each inverse distance estimation, we need a search radius of around 140 metres. This map is intuitively unappealing. It looks ridiculous. The overlapping of the search 'circles' can be seen clearly as we move across and down the grid of estimated points. A solution to this would be to widen the search radius. Figure 5 shows the result when we use a search radius of 250 metres. The 'moon crater' effect seems to have disappeared, but so has most of the variation in the data. We can see clearly where the algorithm is struggling to honour individual data points which are quite inconsistent with the draconian weighted average in the same area. From the sublime to the even more ridiculous, we reduced the search radius to 100 metres (for reasons which will become plain in Part 2). Figure 6 shows the disastrous result of that exercise.
It would seem, basically, that inverse distance squared is not the most effective way to map this particular set of sample data. We could spend many happy hours trying different distance functions and search radii.
Wolfcamp Aquifer
In complete contrast to the above example, we try inverse distance with the Wolfcamp data and get almost the same map no matter what function or what search radius we choose. Figure 7 shows the map obtained with inverse distance squared and a search radius of 58 miles. Lengthening the radius to 75 miles or shortening it to 35 miles produce only minor changes in the contours. We can make it look pretty rough with a search radius of 25 miles and an average of 5 samples per estimated point!
Scallops Caught
A rough post plot of Scallops samples shows the layout of the whole data set. This data is obviously irregularly spaced, possibly because of the difficulty in sampling fishing beds on a regular grid. For an inverse distance type estimator, we will 'home in' on a section in the centre of the sampled area.
p | 21
The point of interest in this illustration is at longitude 72.6, latitude 39.8. Distances are calculated by Pythagoras' theorem. The estimated value at the unsampled location is almost 1,053 scallops in total. Notice that over half of this value comes from a single sample number 6. This sample is the closest, but it also has a very high value at almost twice that of the next highest sample value in this area (number 3). The sample is weighted at almost 0.3, but the contribution to the estimate is almost 60%. Sample 2, at roughly half the weight, contributes less than 4% of the final value. Samples 3 and 4 together have roughly the same weight as sample 6, but contribute only 30% of the estimated value. The reason for the apparent inequity in sample contribution is simply that the distribution of the sample values is highly skewed. We have seen in the previous course that taking arithmetic means of highly skewed data produces absurd estimates for the population mean. The smaller the number of samples, the worse the effect becomes. Estimating a lognormal-ish value from a weighted average of highly skewed sample values has to be tantamount to senselessness. Another problem is illustrated in this case that of 'anisotropy'. If we do the complete mapping exercise, we can see both the smearing due to the skewed nature of the data and the impact of direction on scallop growth. Two examples are illustrated below:
Figure 10 (above) shows inverse distance results with an isotropic search radius designed to pick up an average of 20 samples; Figure 11 (above) shows inverse distance with the same search radius in the northwest direction and one-quarter of the distance in the northeast direction. Weights are scaled by the relative anisotropy.
p | 22
Note of Caution On a note of caution, many mapping packages offer algorithms with anisotropic search ellipses. Make sure that the package actually changes the weighting factors with direction too. Some packages use an anisotropic search but still weight a sample, say, 50 m away the same in all directions.
p | 23
p | 24
Part 2 The Semi-Variogram
p | 25
p | 26
The Experimental Semi-Variogram

The Semi-Variogram
In Part 1, we looked at the problem of estimating an unknown value at a particular location within a study area. We laid out a set of essential assumptions for producing such an estimate and chose a weighted average method of estimation. To recap, if our unknown value is denoted by , then our estimator * is expressed as:
where:

gi are the values of the samples included in the estimation; wi are the weights given to each of the samples; m is the number of samples included, and
to ensure that the resulting estimator is unbiased. We assume that the relationship between known and unknown values depends on the distance between their locations and, possibly, the direction between them. We used a simple simulated example to illustrate how an 'inverse distance' estimator is produced and discussed the questions which arose when such an estimator was considered. We produced a list of 11 such questions, all of which are important but some more immediately relevant. In most of the previous course, we have produced estimates for unknown population parameters of one kind or another. In almost all of those cases we have been able to quantify how 'reliable' the estimate is as a reflection of what is actually going on in the population. We have seen that it is not enough to produce an estimate we must also provide confidence levels to attach to the estimation process. If we can answer question 7 in our list 'how reliable is the estimate when we have it?' then we could answer many of the questions posed in Part 1. For example, we could find the confidence limits for simple inverse distance and compare them to those for inverse distance squared. Presumably, the better estimation method would be the one which gives the 'narrowest" confidence intervals. We have seen in Part 1 that the production of confidence intervals is a non-trivial problem when we have relationships between the known and the unknown values. The standard deviation for the estimation error
p | 27
or 'standard error' as it is often called is a function of all of the cross-covariance values between every pair of samples. Our problem is twofold:
how do we estimate the covariance between a single pair of samples?, and
how do we estimate the covariance between a known sample and the unknown value? Let us simplify the situation a little and look at the simulated example (Iron Ore) we used in Part 1. We need to find an estimate for the value at the unsampled location on the grid. Zooming in on the problem we would probably use the closest seven samples to produce such an estimate. Our estimator would, therefore, become:
where the weights would be calculated using some inverse function of distance. The 'reliability' can be measured quite simply as the difference between the estimated value and the actual value, . This is the same approach we used in finding confidence levels for the estimation of the 'global' population mean. We can define our 'error of estimation' as:
Now, our weights add up to 1, so we could rewrite this as:
which we could also write as:
If we rephrase this in words, the logic goes something like this:
the error we make in the estimation is the difference between the estimator and the actual value. this is the difference between a weighted average of the samples and the actual value.
p | 28
this is the weighted average of the individual differences between each sample and the unknown value. In other words, the error on a weighted average is simply the weighted average of the individual errors. If we can quantify one of these simple differences, we can bag the lot. Let us consider just the first sample and the unsampled location. The difference between these two is:
Of course, we do not know what this value is since we do not know what value has so we cannot calculate it. This is where the statistical training comes in handy. We have made an assumption that the relationship between g1 and depends on the distance 141 feet (and possibly direction, northeast/southwest) between them. If this is so, then we should look at our available information to find other pairs of known values this distance apart (in this direction). If our assumption is correct, these pairs should have the same kind of relationship as the one pair in which we are interested. For this data set, we have 31 pairs of sample 141 feet apart in a northeast/southwest direction. In statistical terms, we have 31 samples from the population 'pairs of samples 141 feet apart, NE/SW" and we can calculate the difference for each pair. From these samples we could calculate the average difference and estimate the standard deviation of the differences:
where N141,NE is the number of pairs found at a distance of 141 feet in the NE direction and
is the difference in value between the two samples found in each pair. We would estimate the variance of the differences using
If our original samples came from a distribution with mean and standard deviation , then is an estimate for the average difference:
p | 29
That is, if our samples all come from the same underlying population the true (population) average difference between any two samples is zero, by definition. diff will only be non-zero if the 'expected' value of the samples changes from place to place in the study area. For example, if there were a trend or 'drift' in values over the area, the expected value would change and the mean difference would not (necessarily) be zero. On the assumption of no trend, is an estimate of zero. Seems a bit of a waste of time to actually calculate an estimate for zero! If we accept the assumption of no trend for the moment, then
and
Which is simply the average of the squares of the differences between the sample values. Notice that we do not lose the usual 1 degree of freedom, because we are not estimating the mean from the samples. We will see later in this part of the course (cf. Wolfcamp) what happens when our assumption on 'no trend' is wrong. For the distance and direction in question, we would get:
so that
p | 30
To summarise: the difference between two values which have locations 141 feet apart in a northeast/southwest direction comes from a population of similar differences with a true mean of zero (in the absence of trend) and an estimated standard deviation of 2.778 %Fe. If our original samples come from a single Normal distribution, the differences will also be Normal, with these parameters. We could state with 95% confidence that:
where t0.02531 is the Student t value read from table 4 with 31 degrees of freedom. Now g1 = 40, so
and our 95% confidence interval for the true value at the unsampled location would be between 34.3 and 45.7 %Fe.
The Experimental Semi-Variogram

We have seen, now, how we can begin to answer the question posed. To find the difference between the weighted average estimator and the actual value , we need to look at all the individual samples and repeat this calculation for each one of them. Of course, we can reduce this task slightly. The relationship between and sample 1 should be the same as that between and sample 5. If direction is not a factor, both of these should be the same as the pairs {,g3} and {,g7}. Similarly, pairs {,g2} and {,g6} whilst {,g4} will be similar if direction is not a factor. Once we have all of these values, we can begin to estimate any missing point on the grid. Of course, as we pointed out in Part 1, we do not just want to estimate points actually on the grid we want to estimate values all over the study area. Using this approach, how would we get a standard deviation for, say, 150 feet? And what would we do in a situation where our original samples were not on a grid and we would have trouble finding many pairs of samples at a specified distance in a specified direction? We need to generalise this process somehow so that we can produce a routine 'algorithm' for the calculations. Let us restate the situation in more general terms. Let h denote a specified distance and direction. For that h, we can find all of the possible pairs of samples. Assuming a true mean difference of zero, we can estimate the variance of the differences as:
p | 31
We repeat this calculation for as many different values of h as the sample data will support. The results can be tabulated or displayed in a graph. Before we do that, however, a little historical background. This type of approach was investigated by many different workers in widely different fields of application from Gandin in Russia studying meteorology (Gandin (1963)) to Matrn in Sweden applying similar methods to forestry problems (Matrn (1960)). There is a good paper by Noel Cressie called Mathematical Geology (Cressie (1993)) which discusses the origins of the techniques which we will cover in the rest of this book. The particular work, notation and nomenclature which we will follow was laid out by Georges Matheron in his seminal work The Theory of Regionalised Variables, in the early 1960s (Matheron (1965)). In his work, he suggests the above calculation but with a slight modification. He defines the quantity he is interested in as one-half of the variance of the differences and uses a different symbol for the result:
We will see in the next section one of the reasons this was proposed. Generally, using half the variance instead of the whole variance simplifies the mathematics a little. It is (intuitively) more pleasing to have some terms which are '2x' than to have lots of terms which are '0.5x'. This 'semi-variance' is calculated for each direction and each distance and the results are tabulated for our example in Table 1.
In Part 1, we saw that the values of the samples in this simplistic example vary much faster in the north/south direction than they do in the east/west direction. We now have a quantitative measure of that in a semi-variance of 5.35 %Fe2 north/south and only 1.46 %Fe2 in the east/west direction. This is an indication of how 'anisotropic' the continuity is. In the process of trying to answer question 7, we seem to have come up with the beginnings of an answer to question 2. This is a very small and regular example and already we have a fairly complex table of results to interpret. Matheron suggested that the easiest way to interpret the results was to plot them as a graph of the semi-variance versus the distance
p | 32
between the samples. Directions are indicated by different symbols or different colours. Because we assume precise and accurate sampling (sic), we have an added point on our graph at zero on both axes. That is, there is no difference between two samples at the same location. Since this is a graph of the semivariance it is generally referred to as a 'semi-variogram'. In recent times, with the spread of geostatistics, authors increasingly refer to the graph as a 'variogram'. We find this confusing not to say, sloppy and will use the full form, semivariogram, throughout this book. The semi-variogram graph is a picture of the relationship (difference) between sample values versus the distance between their locations. This is, effectively, an approximation to the distance function based on the sample data. Once again, in an attempt to answer question 7 in Part 1, we have come up with a way of answering question 1 : 'what function of distance should we use in any given application?'. It would seems sensible, given the foundations which we have built in the previous course, to assess the function of distance by considering the relationships which exist between the samples that we do have. If we can (reliably) produce a distance function from the available data, we will have some basis for applying that to situations where we do not have the other sample in the pair the {,g1} problem. This calculated or 'experimental' semi-variogram is an illustration of the relationships which exist amongst the sample values. The graph will verify for us whether there actually is a relationship with distance. If our basic assumption is incorrect, then the graph will be a scatter of points more or less around a horizontal line. There are few things more illogical than the person who insists 'we could not get a good semi-variogram graph, so we used an inverse distance weighting method'. If there is no distance relationship, how can you weight by a distance function?
Irregular Sampling
The calculation of a semi-variogram differs slightly if sampling has been carried out on an irregular or highly clustered basis. In these cases see, for instance, the Wolfcamp or Scallops data sets specifying an exact distance and direction will give very few pairs of samples for any given point on the semi-variogram graph. In this situation, we take the same approach as we do with a histogram rather than a 'bar chart' we group the sample pairs into intervals and use the average value within the interval. For example, in the Wolfcamp data, which is discussed in detail later in this part, we choose an interval of 5 miles with a tolerance of 2.5 miles. That is, any pair of samples which is between 2.5 miles and 7.5 miles apart gets amalgamated into a single interval. The next point on the graph would be the average squared difference between all pairs of samples 7.5 to 12.5 miles apart and so on. The semi-variance can then be plotted against the average distance between the pairs
p | 33
included in that interval. It is often necessary to experiment with interval widths and numbers of intervals, particularly if your data is very irregularly spaced.
Cautionary Notes
Remember that the semi-variogram graph is an illustration of the variance of differences in sample values. This graph will only be stable if the standard deviation is a sensible measure for the variability of the values. As an example, consider the lognormal distribution. If your sample values come from a lognormal, calculating the variance of the values from 'raw' untransformed data values is tantamount to outright stupidity. The variance on a lognormal is a measure of skewness not of variability. If you want to measure variability or continuity, you must do it with some transformation which will produce a better behaved base distribution a logarithm or a rank uniform transform, for example. Unless you are absolutely certain that there is no anisotropy in your data, always calculate directional semi-variograms. You can combine them later if you need to. Isobel was once presented with an 'omni-directional' experimental semivariogram for data with two spatial co-ordinates and variation through time. That is, the 'distance' between the samples was a function of X, Y and time. When she asked how many metres were equivalent to a minute, she was met with complete incomprehension. Remember that one of our basic assumptions is physical continuity of the phenomenon being measured. In geology, contain your area within fault blocks and relatively homogeneous mineralisations. In environmental studies, check that there are no factors which affect the spread of the substance of interest. As an extreme case, think of measuring air temperature in an area containing the Grand Canyon or Victoria Falls. If you take the time to stop and 'listen' to your experimental semi-variograms, you can often pick up inconsistencies in your data or structural factors in your values. Too many people applying geostatistics want to surge onto the next section and let the computer do the scut work. Your semi-variogram is a picture of your data spatially and will give you a lot of information about relationships you may not even have thought about. Remember that each point on your semi-variogram graph is an estimate of onehalf of the variance and depends heavily on how many pairs of samples were available for its calculation. You might want to consider imposing a minimum number of pairs of samples before a point is included on the graph.
p | 34
Modelling the Semi-Variogram Function

Modelling of the Semi-Variogram Function
When we looked at classical statistics, we drew a histogram or probability plot of our data and we then proposed some theoretical function for the distribution of values within the whole population. We now have the equivalent of a spatial histogram for the sample data and need to theorise about what this graph would look like if we had the whole population of all possible pairs of values over the whole study area. There are many possible models for the idealised 'population' semi-variogram, (h). As with probability distributions, there are mathematical restrictions on the models which can be applied, mostly designed to ensure that we do not end up with answers involving, say, negative variances. You can invent your own semivariogram models if you wish, but remember the restrictions. We will present here a set of the most commonly used semi-variogram models. This is not an exhaustive set, but you will find all of these in the software associated with this book (Practical Geostatistics 2000 software). There are other models in general use which are not included here, such as the de Wijsian model favoured by some South African gold mining companies (Krige (1979)). If we stick to the documented models, we should have few problems with the mathematical constraints. Remember that the major purpose for fitting a model is to give us an algebraic formula for the relationship between values at specified distances. This will be equivalent to the 'distance function' discussed in Part 1 and will allow us to produce weighting factors for our samples based on the actual relationship between their values and that of the unsampled location.
The Linear Model

This is the simplest model for a semi-variogram graph, being a straight line with a positive slope and a positive (or zero) intercept with the semi-variogram axis. The formula and shape for this model are shown below:
where represents the value on the semi-variogram axis and h the distance between the two points of interest. The parameter p represents the slope of the
p | 35
line and C0 the nugget effect on the axis. This intercept is common to many semi-variogram models and has been dubbed the 'nugget effect' or 'nugget variance'. It reflects the difference between samples which are very close together but not in exactly the same position. This has been interpreted in various ways, but is generally accepted to be due either to sampling errors or to the inherent variability of the mineralisation.
The Generalised Linear Model

This is a generalisation of the Linear Model for a semi-variogram graph, being a line with a positive slope, a positive (or zero) intercept with the semi-variogram axis. The 'generalisation' lies in the fact that the distance values are raised to a specified power rather than linear. The formula and shape for this model are shown below:
where is again the value on the semi-variogram axis and h the distance between the two points of interest. Added to the parameter p representing the slope of the line and C0 the nugget effect on the axis, we have introduced for the power to which distance is raised. For mathematical reasons, this power can only take values in the range 0 is less than or equal to which is less than 2. The accompanying diagram shows two generalised linear models with = 0.5 and = 1.5 for illustration.
The Spherical Model

This is a model first proposed by Matheron and represents the non-overlap of two spheres of influence. The formula is a cubic one since it represents volumes, and relies on two parameters: the range of influence (radius of the sphere) and the sill (plateau) which the graph reaches at the range. In addition to these, there may be a positive intercept on the axis the 'nugget effect' described above. The formula and shape for this model are shown below:
p | 36
where is the semi-variogram and h the distance between the two points of interest. The parameter a represents the range of influence of the semivariogram. We generally interpret the range of influence as that distance beyond which pairs of sample values are unrelated. C is the sill of the Spherical component and C0 the nugget effect on the axis. You will note that the final height of the semi-variogram model is C0+C. Unlike the previous two models, there are modifications which you can make to the standard Spherical model. There are often cases where the semi-variogram graph reaches a definite 'sill' but does not quite match the shape of a single Spherical model. In this case you may mix Spherical components with different ranges of influence and/or sill values in order to achieve the correct shape. Remember that, if you have (say) three component Sphericals, the final height of the graph will be C0+C1+C2+C3. The formula for such a model is simply the combination of ordinary Spherical models, remembering to stop each one as it reaches its range of influence:
The Exponential Model

This is a model developed to represent the notion of exponential decay of 'influence' between two samples. It relies on two major parameters: the range of influence (a scaling parameter) and the sill (plateau) which the graph tends towards at large distances. There is also a possible 'nugget effect'. The formula and shape for the Exponential model are shown below:
p | 37
where is the semi-variogram and h the distance between the two points of interest. The parameter a represents the so-called 'range of influence' of the semi-variogram, C the sill of the Exponential component and C0 the nugget effect on the axis. You will note that the asymptotic height of the semi-variogram model is C0+C. You may also note that, although parameter a is referred to as the range of influence, it is not possible to interpret is in the same way as for the Spherical model. This distance a is not the distance at which samples become 'independent' of one another. In fact, the exponential model reaches about twothirds of its height at a distance a, and must go to four or five times this distance to come close to its asymptotic 'sill'.
The Gaussian Model

This model represents phenomena which are extremely continuous or similar at short distances. Although illustrated with a nugget effect, this is almost an oxymoron. This sort of model occurs in topographic applications or where samples are very large compared to the spatial continuity of the values being measured. The formula for this curve is similar to that for a cumulative Normal distribution hence the name Gaussian it does not imply that the sample values must be Normal:
where is the semi-variogram and h the distance between the two points of interest. The parameter a represents the so-called 'range of influence' of the semi-variogram, C the sill of the Gaussian component and C0 the nugget effect on the axis. You will note that the asymptotic height of the semi-variogram model is C0+C. You may also note that, although parameter a is referred to as the range of influence, it is not possible to interpret is in the same way as for the Spherical model. This distance a is not the distance at which samples become 'independent' of one another. In fact, the Gaussian model reaches about two-thirds of its height at a distance , and must go to four or five times this distance to come close to its asymptotic 'sill'.
The Hole Effect Model

This is a model developed to represent a cyclic or periodic relationship between two samples. It relies on two major parameters: the cycle distance (a full cycle of the periodicity) and the sill (plateau) which the graph tends to oscillate around. There is also a possible 'nugget effect'. In many cases, a fourth parameter is added to these three, known as a 'damping' or decay parameter. Without this parameter the cyclic effect would continue on to infinity. In practical circumstances, the
p | 38
relationship usually tends to trail off and this may be reflected by including the damping parameter. The formula for the Hole Effect model with damping is:
where the formula for this model would be:
where is the semi-variogram and h the distance between the two points of interest. The parameter is the cycle interval (distance), represents the socalled 'decay' or damping parameter of the semi-variogram, C the sill of the Hole Effect component and C0 the nugget effect on the axis. You will note that the asymptotic height of the semi-variogram model is C0+C. The damping on the model is inverse to the magnitude of the parameter the bigger the the less the damping effect. It is also scaled by the distance h, and so will be relative to the scale at which we calculate the graph. Note of Caution The hole effect model is not mathematically stable and can lead to some weird results if used without care.
Paddington Mix Model

This model is included mostly as a illustration of how you can reflect the geology or structure of the measurements by combining components of various different shapes. In this case we have a fairly continuous phenomenon which has a weaker cyclic component present. The model was first used (by us) in an Australian gold deposit which was 'shear enhanced'. That is, the gold was present throughout the mining area but values were higher close to a quartz shear. Since shears in rock tend to occur with great regularity, the obvious Spherical structure for gold grade was modified by a 'ripple' effect due to the presence of shears. The shape of the graph changed according to whether the direction was parallel to or across the direction of the quartz shears.
p | 39
Other applications for this type of model include:

potholes in platinum reefs; diamonds occurring on the sea-bed (genuine ripples); plant or tree yields where fields have been trenched; and occurrences of species which show 'competition' effects. where the full formula would be:
where is the semi-variogram and h the distance between the two points of interest. The parameter a represents the range of influence of the semivariogram, C the sill of the Spherical component. The parameter is the cycle interval (distance), represents the so-called 'decay' or damping parameter of the semi-variogram, Chef the sill of the Hole Effect component and C0 the nugget effect on the axis. You will note that the asymptotic height of the semivariogram model is C0+C+Chef.
Judging How Well the Model Fits the Data

This is a tough one. There have been many attempts to develop automatic model fitting techniques, least squares methods and other confidence or sensitivity studies over the last 35 years. Noel Cressie came up with a very nice 'goodness of fit' statistic in the late 1980s which goes a long way to measuring how well the model fits the data(Cressie (1993)). We (your present authors) are still a little conservative on this matter and prefer a combination of statistic and visual assessment. The Cressie goodness of fit statistic is calculated as follows. For each point on the semi-variogram graph, calculate:
and sum the terms. This is analogous to but not the same as the 2 goodness of fit test, even allowing for the weighting by the number of pairs. This statistic allows bigger deviations between the experimental and model semi-variogram as the
p | 40
model becomes higher. It demands closer fitting at the lower levels usually the lower distances than at higher ones. It also demands better fitting where you have more pairs of samples in a point. The fit is directly weighted by the number of pairs of samples. In the software, you will find that this statistic has been modified slightly. The actual magnitude of the statistic depends on the total number of pairs of samples found during the calculation. Now, not all samples are paired the same number of times in all different directions. This means that a Cressie statistic of a certain size in one direction is not necessarily equivalent to the same value in another direction. To adjust for this, we suggest a modification to remove the scaling by total number of pairs. Simply:
dividing through by the total number of pairs of samples in that particular semivariogram fit.
Equivalence to Covariance Function

Matheron showed that, if the semi-variogram model has a sill or final asymptote, then the final height of the semi-variogram is theoretically equal to the population variance of measured values. That is, as h , 2 then . If the semi-variogram has a sill, this tends to support the assumption that the samples come from a 'stationary' distribution with a fixed mean and standard deviation. In this case, some workers prefer to use the covariance function rather than the semi-variogram function. The relationship between the two is:
where covh is used here as shorthand for the covariance between sample values at distance (and direction) h, and is estimated by:
There does not seem to be any indication in the geostatistical literature that Nh-1 might be more appropriate here, since we have to use or some other estimator to estimate the population average, . This shortcoming in the literature has been pointed out rather forcefully by Dr. Jan Merks in some moderately intemperate articles (Merks (1992 -1994)).
The Nugget Effect

The nugget effect or discontinuity at short distances in the semi-variogram is another cause of much dissension in the geostatistical world. If we stand by our
p | 41
initial assumptions that sample values are measured precisely and accurately or, if you prefer, they are reproducible and representative then the semivariogram model must go to zero at zero distance. That is,
The interpretation of the nugget effect then becomes one of the physical nature of the phenomenon being measured. The term nugget effect (or nugget variance) was coined on the basis of the interpretation of gold mineralisation. No matter how close the samples get, there will be large differences in value between the samples because of the 'nuggety' occurrence of the gold. We have to get down to the scale of a gold nugget to have values which are the same. Any further apart than the size of a nugget and one sample is inside a nugget and whilst the other is outside and the values are very different. If we accept that the semi-variogram takes the value zero at zero distance, then the nugget effect represents the difference between two samples right next to one another contiguous quadrats, two halves of a borehole core, fish swimming together and so on. If we do not accept 'zero at zero' then what we are basically saying is that we do not believe the data values. If the nugget effect is treated as an intercept on the axis, so that
this implies that two samples taken at the same location could have a variance between them of 2C0. We will see in Part 3 the impact that these two different assumptions have on our analysis. Embarrassing question to ask your software vendor: What does your package do with the semi-variogram at zero distance? Some software allows the user to fit a semi-variogram model but then uses the covariance function for the estimation process. You can lay good odds that, if your package does this, the semi-variogram model does not go through zero at zero distance.
Worked Examples
Worked Examples
This section contains worked examples using:
p | 42
Silver Example Coal Project Wolfcamp
Silver Example
We illustrate some of the problems with fitting semivariogram models with a simple example which appeared in Practical Geostatistics 1979 (Clark (1979)). A tunnel was driven horizontally into a base metal sulphide deposit in Southern Africa. Samples were chipped from the walls of the drive every one metre. The drive was 400 metres long. The calculation of the semi-variogram is extremely simple and the results are listed in Table 1. The graph of this experimental semi-variogram is shown in Figure 1. We see what looks like an ideal semivariogram shape for the first 70 metres or so. This graph starts at zero, rises gradually, slows down and levels off to a 'sill'. After about 80 metres, it surges into a parabolic rise. This last part of the graph is an indication of a trend in the values on the larger scale. We have calculated the semi-variogram on the assumption of no trend in the values. That is:
where we assumed that diff = 0. An exactly equivalent formula would be:
By assuming a zero mean we have failed to subtract a positive quantity from the semi-variogram calculation. If the mean is not zero, the graph shows the actual semi-variogram plus this squared component a parabola. Thus, parabolic behaviour in a semi-variogram is an instant diagnostic of trend or drift in the values. This is one reason why we are not allowed to fit generalised linear models with slopes of 2 or higher. In this case, the trend only becomes apparent after 75 metres. If we restrict our interpretation to within this distance, we should be safe enough assuming no trend.
p | 43
Figures 2, 3 and 4 show fitted models of the exponential, spherical and paddington mix variety. In all three cases there are some desirable and some not so desirable characteristics. Of the three, the exponential model gives the best Cressie statistic, but is possibly the least pleasing visually. This example highlights one of the problems with using a goodness of fit statistic it depends on which points we choose to include in the calculation. If we only look at the first 30 metres, we would judge the fits quite differently:

In Part 1 we showed a post plot of the calorific values from the Coal Project data set. This is a straightforward set of Normally distributed data on a regular 150 metre grid with some gaps. The calculation of the experimental semi-variogram is straightforward and we have used it for classroom exercises very successfully. We find that if each student calculates the point for a specified distance and direction, the construction of the graph becomes a team effort in which every student can contribute. It also gives a better intuitive idea of the calculation process. The experimental semi-variograms for the four main points of the compass: north/south, east/west and two diagonal directions are shown in Table 2 and in Figure 5. It is clear from this graph that there is no significant difference between the directional semivariograms. When judging the difference between directional semi-variograms, you should bear in mind that each point is an estimate of (one-half of) a variance. In ideal circumstances, you should be able to take each point and put a confidence interval around it, as we did in Part 2 of the previous course for variances and standard deviations. Aso remember that, when you compare the two different estimates for a particular semi-variance say, the east/west and the north/south at the same distance this is the equivalent of an F ratio test for variances. The combined 'omni-directional' experimental semi-variogram is very well behaved and exhibits a slight curve upwards with no sign of a sill on the graph. This is an ideal candidate for a generalised linear semi-variogram model. Fitting a
p | 44
generalised linear is very similar to fitting a straight line provided you can take logarithms. For our first estimates, we want a line which goes through, say,
Now, our generalised linear model is:
where C0 is the nugget effect, p the slope of the curve and the power for the distance. Because of mathematical restrictions, must have a value between zero and 2. From our experimental points, we need:
In our case we have an apparent zero nugget effect, but we will leave it in here for generality. If we rewrite this as:
then a log transform of both sides would give:
We estimate our nugget effect in this case to be zero. This reduces the equations to:
Solving for and loge p we find:
These values can be substituted back into the model equation to give us a 'model' value for each experimental point on the semi-variogram graph. The Cressie
p | 45
goodness of fit statistic can be calculated and the visual fit between model and data assessed. The parameters will need to be adjusted to get a 'best fit' model. Of course, those of you with fancy statistical packages will be able to use all of the experimental points in a weighted least squares solution to minimise the Cressie statistic. After a few adjustments, our best model was found to have:
The final Cressie calculation is shown in Table 3 and the fitted model along with the experimental semi-variogram in Figure 6.
Wolfcamp Aquifer
The Wolfcamp data set has been seen to exhibit several undesirable traits as far as classical statistical analysis is concerned. In particular, the samples are highly clustered spatially and there is a significant downward trend in the values from southwest to northeast. A post plot confirms this visually. Before we can calculate a semi-variogram, we have to choose the distance intervals between points on the graph and how many of those intervals we will want to see. One of the simplest (if not exactly the quickest) ways to assess the distance between the sample locations is to look at a 'nearest neighbour' distribution. For each sample, we find the nearest sample to that location. The distance is recorded and the process repeated for all of the sample points in turn. A histogram can be constructed of the nearest neighbour distances. If the sample locations are totally random, this histogram would follow a negative exponential form. If the data was on a strict grid, all of the nearest neighbour distances would be identical. This is also a useful timing exercise, since the nearest neighbour analysis takes exactly twice as long as the maximum calculation time for a semivariogram graph. A histogram of the results is shown in Figure 8. There is a clear mode at around 3 miles and a slow tail off into the larger distances for the more isolated points. Calculating the Semi-variogram
p | 46
From the above nearest neighbour analysis, we see that the minimum feasible interval for a semi-variogram calculation is around 2.5-3 miles. However, we must balance this 'natural' sampling interval against the need to acquire reasonable estimates for the semi-variance that is, a reasonable number of pairs of samples for each point on the graph. It is often necessary to experiment with the data to achieve the optimum compromise between number of points on the graph and reliability of each point. The maximum number of intervals should be chosen with this in mind. A general rule of thumb is to choose the maximum interval at around half the geographical extent of the study area. This ensures that we do not run out of pairs of samples. When a significant trend is present in the sample values, this shows up in the semi-variogram calculation as a 'parabolic' component. It is particularly noticeable if you construct semi-variograms in different directions. For example, if we calculate semi-variograms at 5 mile intervals in four major directions: north/south, east/west, northeast/southwest and northwest/southeast before taking out the trend, we get Figures 9 and 10. These are just two different ways of displaying the same graphs. Note that the shape of the semi-variograms in the different directions exactly reflects the shape of the trend in the values. In the northwest/southeast direction, the semi-variogram is comparatively low with relatively small differences between the sample values. In the northeast/southwest direction, the differences between the sample values get larger and larger as the square of the distance. This is the diagnostic for a trend in the values. Values are changed significantly more in one direction than in another. To complete the picture and ensure us of a consistent diagnosis, we find that the east/west and north/south semi-variograms lie neatly between the direction of maximum difference and that of minimum difference. These semi-variograms tell us that there is a trend and that the contours probably run northwest/southeast. They do not tell us whether the values are rising or falling to the northeast. The trend surface analysis tells us that. The semi-variogram also does not tell us what form the trend takes. In this case we would expect it to be a fairly low order, since one direction northwest/southeast appears to have no trend at all. Trend Surface Analysis The calculated coefficients for each term in the three trend surfaces. X represents the first co-ordinate (left-right on maps), and Y the second (bottom-top). It is difficult to judge from the coefficients listed in Figure 11, just which of these
p | 47
surfaces might 'best' describe the Wolfcamp data. The residual variation is the difference between what the trend says is there (the equation) and what the actual value was. We usually choose a surface which makes the residual variation as small as possible. A traditional method used by geologists since the mid 1950s is simply to calculate the sum of the squared residuals and compare this as a percentage of the original variation. A more formal method used by statisticians to judge the suitability of a Least Squares fit is the Analysis of Variance. This analysis requires that the residuals should be independent of one another. However, we feel that the Analysis of Variance is another way of getting an intuitive 'feel' for which surface (if any) may be best. This produces the results to the right: The final column is the important one. Under statistical assumptions of Normality and independence, the statistics shown in this last column would follow the F distribution. Reference tables 5(a) and 5(b) (below) of F distribution statistics at various levels of 'significance' are given in this book. The first item in the last column in the table is a value of 338.76. This statistic compares the variation on the original set of sample data with that left after fitting a linear (planar) surface. Looked at very simplistically we might say that fitting a linear trend surface has reduced the variation amongst the sample values by a factor of over 300. In other words, a simple surface described by two coefficients 'explains' a significant proportion of the variability in the original sample values. These two coefficients are the ones shown in the equation above multiplying the terms X and Y in the linear equation.
Now, when you look at the next division in the table there are two figures in the right hand column. The first compares the fitting of a quadratic surface with its five coefficients with assuming that there is no trend. In this case the figure is 165.94 indicating that a very significant proportion of the original variation can be 'explained' by a quadratic trend surface. This figure is lower than the linear F statistic, partly because we have had to calculate more coefficients. However, it might be because the quadratic is not a lot better than the linear surface. To compare these directly we produce the second figure. This is a measurement of how much more variation is explained by the quadratic than was already explained by the linear. In short, the amount of extra variation explained by the inclusion of three extra coefficients. The F statistic calculated for this Wolfcamp data was 6.37. In our simplistic interpretation we can interpret this statistic as comparing:
the variation explained by the extra terms in the trend equation with.
p | 48
the variation remaining after the trend has been removed. That is, the extra coefficients in the quadratic trend are explaining six times the remaining (residual) variation. This would seem to be significant. The last division in the table shows similar statistics for the cubic surface. Comparing the cubic surface with 'no trend at all' gives an F statistic of 100.02 which would appear to be significant. Comparing the cubic surface with the quadratic, however, gives 2.45. If our residuals were independent of one another we could compare this figure to the F tables 5(a) and 5(b) (reference tables above) with 4 and 75 degrees of freedom. These tables would show us that a value of 2.45 could be encountered purely by chance even if there was no trend. This is what is meant by the statisticians' phrase 'not significant'. Thus we have no real evidence that the cubic surface is actually any better than the quadratic. Another intuitive measure, popular with geologists, is simply to ratio the 'sum of squares' explained by a trend surface with the total sum of squares before a surface is fitted. These statistics should be used with caution. It is possible to produce a completely random set of data which will show impressive 'percentages explained'. This can lead to spurious conclusions. None of these criteria will give you a hard-and-fast objective decision factor as to which surface is the best of the three fitted. It is always possible that a different style or order of trend is necessary. A visual assessment of the trend is often a great help to deciding which one is most significant. To see the actual form of the trend, we plot a map of the surface over the study area. A grid of points is laid over the area and the trend equation calculated at each grid 'node'. This can be plotted as a contour or shaded map. One of the items of interest to us is in how much the individual data values differ from the trend value. A 'statistical' assessment of this is included in the Analysis of Variance table. A visual assessment is obtained by simply posting the data on top of the trend surface plot. Values which lie above or below the local trend can be seen readily in this way. As an example, we plot the linear trend surface for this data. Recall that the equations for the three surfaces were:
where X and Yare the co-ordinates of a specified point and g*trend denotes the value predicted by the trend equation at that location. Note that the 'contours' in Figure 13 show as straight lines, as would be expected with a linear equation. Note also that the negative coefficients for X and Y are reflected in a drop in value to the north and to the east. You can also see the differences between the original sample values and the local trend for each data
p | 49
point. The magnitude of the 'residuals' can be judged by the difference in colour and shading between contours and posted data. Very few of the data points show residuals more than one contour level out.
When we look at the quadratic plot in Figure 14, you can see the increasing complexity of the surface fitted. The plot of the cubic surface in Figure 15 shows that there is little visual difference between this and the quadratic. This bears out the findings in the ANOVA table, which suggest that the improvement in adding the extra 4 coefficients is marginal. Recalculating the Semi-variogram The (probably) most useful test to see if we have removed all of the necessary trend is to calculate the residual values (Figure 16):
and then re-calculate the semi-variogram. If no significant trend in the geostatistical sense remains, the parabolic behaviour on the semi-variogram would have disappeared. The same four directional semi-variograms as before were calculated on (a) the residuals from the quadratic trend (Figure 17) and (b) the residuals from the cubic trend (Figure 18). Comparing the two Figures 17 and 18 shows little difference between the quadratic and cubic residuals except, perhaps for a few wild points on the quadratic graph. Note that the scale on the semi-variogram axis is now less than 10% of the scale on semi-variograms calculated from the original sample data. This ties in nicely with the 'percentage sum of squares' reduction calculated previously. It is also apparent that we have removed any apparent 'anisotropy' and that all of the directions are now clearly comparable. Modelling the Semi-variogram Before we can proceed to any type of geostatistical estimation, we need to fit a theoretical model to the semip | 50
variogram graph. Since we have decided that the residuals are now isotropic, we can fit the model to the 'omni-directional' semi-variogram. This semi-variogram will also be the most reliable, since it contains all possible pairs in all possible directions and, hence, confidence in each point is maximised. Figure 19 shows a classic Spherical model with a nugget effect of 12,000 ft , a range of influence of 60 miles and a sill for the Spherical component of 23,000 ft . This produces a modified Cressie goodness of fit statistic of 0.035 indicating a pretty good fit to the data. This model will be discussed in Part 3 and more fully in Part 6. Patience!
p | 51
p | 52
Part 3 Estimation and Kriging
p | 53
p | 54
Introduction
Estimation and Kriging
In Part 1 we laid out the conditions under which we could pose and attempt to solve our basic question: "how can we estimate the likely value at an unsampled location given measured values at neighbouring sampled locations?" We found that the assumptions we had to make were, in summary:

sample values were measured precisely and could be reproduced, sample values were measured accurately and were representative of the actual value at that location, the locations, at which we wish to estimate values, exist as part of a physically continuous and reasonably homogeneous 'surface' of potential samples, and the values at those locations are related to one another in a way which is dependent on the distance (and possibly the relative direction) between their locations. This set of assumptions lead us to the concept of a 'weighted average' or 'linear' estimator, where neighbouring samples are combined to provide an estimated value for the unsampled location. If we denote the measured sample values as gi as usual and the value at the unsampled location as , then a weighted average estimator of would be written as:
where m is the number of samples we want to include in the estimation and the wi are the 'weights' which are attached to each sample. We also found that, to produce a sensible and unbiassed estimate:
that is, the weighting factors must sum to 1. Using this approach, we produced a class of estimators known generically as 'inverse distance' estimation methods. These are simple and straightforward (if tedious) to calculate and are intuitively attractive. To be less pompous, they make sense. However, they also lead to a whole lot more questions, 11 of which were listed in part 1.
p | 55
In Part 2, we set out to answer just one of those questions question 7: "How reliable is the estimate when we have it?". In the course of Part 2, we developed answers to several of the other questions, viz: 1. What function of distance should we use in any given application? We discovered that, in trying to answer question 7, we developed a graph from the data which illustrated an empirical distance function. In the 'experimental semivariogram' we calculated points on a graph of 'distance between samples' versus 'relationship between samples'. The semi-variogram, to be pedantic, measures the magnitude of the 'difference in sample value' for a given distance (and direction, where necessary). The 'covariance graph' sometimes dubbed a 'covariogram' is analogous to the semi-variogram under certain conditions and illustrates the covariance between values for a given distance (and direction) between their locations. It is, therefore, only sensible to use this semi-variogram graph as the basis for a 'model' for the required distance function for a given set of sample measurements. 2. How do we handle different continuity in different directions? Again this question has been answered by the development of the semi-variogram modelling process. If the continuity and, therefore, the predictability of the measurements is different for different directions, we will fit different semi-variogram models to those directions. Intermediate directions can be infilled by modelling, if sufficient sample information is available. More commonly, a simple form such as an ellipsoidal change in semi-variogram parameters is assumed and only major and minor axes modelled. 5. How far should we go to include samples in our estimation process? If our semi-variogram model has a 'sill' then we have gone some way to answering this question. For example, if our model is Spherical, the range of influence is obviously the maximum distance which would sensibly be considered as a 'search radius'. Beyond this distance, sample values become unrelated or, at least, have reached some background threshold of relationship which does not change. If we have an exponential model, the position is not so clear cut. The exponential reaches 98% of its sill value at around three times the 'distance scaling parameter' sometimes referred to in geostatistical literature as the 'range of influence'. For a Gaussian, the equivalent point is at 3 times the distance scaling parameter. For a hole effect model, it is sensible to stop looking for samples before you reach the peak of the first cycle. This is, generally, pretty unrealistic in practice. Semi-variogram models which have no sill have no clearcut answer to this question. 6. Should we honour the sample values? We have not, as yet, answered this question. However, we have defined how we indicate our decision to the semivariogram model. If we insist that the semi-variogram pass through the point:
p | 56
we are, effectively, forcing our model to accept the sample value as accurately representative of the 'true' value at that location. If we believe as some geostatisticians do that the nugget effect on our model is a measure of the error made during the sampling process, then:
and the nugget 'variance' of 2C0 is seen as a measure of the unreliability of our sample values. A more realistic perspective (perhaps) would be to assume that some of the nugget effect is due to sampling error and some is due to the inherent variability of the process which produced the sample values. This is certainly true in geological applications and has been demonstrated in pollution studies such as contaminated groundwater. The philosophical distinction here is between two samples at exactly the same location and two samples right next to one another. For example, in gold mineralisation, if you take two contiguous samples the values can be different by orders of magnitude no matter how accurate, precise or tightly controlled the sampling process. One of the greatest problem in determining which school of thought is appropriate lies in the fact that, mostly, it is physically impossible to take two samples at exactly the same location. Once you have chipped a sample off the gold reef it just is not there any more. Replicate sampling, in the statistical sense, is not possible for the majority of spatial sampling. In reality, then, we should probably use:
where p is some proportion of the total nugget effect. We shall see, later in this part of the course, what impact this decision has on our final estimation methods and its results. 9. What happens if our sample data is not Normal? The short answer to this question (so far) is 'make it Normal'. This is a question which will be dealt with in more detail in Part 6. 10. What happens if there is a strong trend in the values? We have seen, illustrated by the Wolfcamp data, that we can identify when a trend or drift is strong enough to interfere with our semi-variogram modelling. In the case of the Wolfcamp data, we solved this problem by removing a broad scale quadratic trend surface. We will discuss this problem in rather more detail in part 6. For the remainder of this part of the course and the next, we will assume that any trend present has been removed before we produce the semi-variogram model.
p | 57
This list of achievements is quite impressive if we forget that we actually set out to answer question 7.
Estimation Error
Estimation Error
Question 7: "How reliable is the estimate when we have it?". Rather than answer this question in the abstract, let us consider a particular example and then generalise from our findings at a later stage. The example from Practical Geostatistics 1979 (Clark (1979)) using a simulated iron ore deposit was anisotropic in nature and introduces a complication that we can do without for the present discussion. Let us, rather, take the ubiquitous Coal Project which has more desirable characteristics:
Normal distribution, estimated overall mean 24.624, estimated standard deviation 2.458, Generalised linear semi-variogram model, no nugget effect, slope of 0.0016 MJ2/m, power 1.25, and regular grid of samples at 150 m grid spacing. To simplify the discussion, let us home in on the unsampled location and those samples which immediately surround it. Five samples lie in the first 'ring' around our missing sample. The estimator we propose to use will be:
where
We can display our problem graphically with the particular sample values (left) or more generally using gi (right).
p | 58
We have seen in Part 1, when discussing question 7, that trying to assess the estimation error of a weighted average when the individual samples are highly related is a non-trivial task. To facilitate development of the mathematics, therefore, we shall approach the problems in stages of increasing complexity. If you want that in plain (British) English, we will start with a simpler case and sneak up on the final answer.
One Sample Estimation

The simplest case we can think of is where we only have one measured sample value available to carry out the estimation. Let us consider how well we could estimate the unknown value if we only had sample 1. In this case, * = gi = 21.86 MJ. The weight adds up to 1, so the estimator should be unbiassed. Maybe we should just check that, along with our attempt to answer question 7. We wish to evaluate the likely error of estimation associated with this process, so that we can produce a confidence interval to go with our estimate. We define estimation error as:
where , the Greek letter epsilon (e) stands for estimation error. If you prefer to define it as
go ahead. Our way simply makes the algebra look tidier. It in no way affects the results. We do not know the value of for this estimation, because we do not know the value of . However, in the construction of our semi-variogram, we found lots of pairs which were the same distance apart and according to our assumptions should have the same behaviour. Following the same logic as that used in Part 2, our particular value is just one of all the possible s we would get if we repeated this exercise in every possible location over the study area. Because our original g values follow a Normal distribution, our will also (presumably) be Normal. The difference between the two values will, therefore, be Normal. The values can be assumed to come from a Normal distribution with a mean and a standard deviation . The mean value is:
where E{} was introduced in the previous course to denote the 'expectation' or population average for a particular variable. In this case:
p | 59
if we assume that the average of a difference is equal to the difference of the averages. There are some circumstances under which this is not true, but none which are relevant to our current discussion. If you are unsure of the veracity of this statement, construct a spreadsheet as follows:
find all of the pairs in this data set which are 212 metres apart (the example is isotropic, so you can use all directions); write the first sample of the pair in one column of your spreadsheet; write the second sample in the second column; make the third column, one minus the other; at the bottom of each column add up the samples and divide by the number of pairs; if the difference between the first two column averages does not equal the average of the third, check your typing. Now, gi comes from a Normal population with mean . So does . So:
assuming there is no trend or population change between the sample location and the unsampled location. This implies that the average of the error distribution is zero. In other words, our estimator is unbiassed in the absence of trend or other 'non-stationarity'. The other parameter which we need to describe the distribution of the errors is the standard deviation. This we can obtain by studying the variance first and then taking the square root. It is a general rule that we have to do the mathematics with the variance and then get the standard deviation afterwards. The variance of the errors is the average of the squared deviations about the mean. Using the E notation to stand for 'average over the population', we get:
p | 60
since we have just shown that we believe is equal to zero. Expanding this a bit we get:
That is, the variance of the error made in this estimation is the average of the squared difference in value between pairs of potential samples with the same relative position as gi and . That is, the average squared difference between two potential samples 212 metres apart. Remember that we are not just considering the samples we do have here but the whole population of all possible samples and all possible pairs of samples. Of course, we do not have all of the possible samples, so we cannot calculate this variance. However, we do have a theoretical model for it or, rather, for half of it. We have a semi-variogram model, which can be expressed as:
where gi and gj are the values of any two samples a distance h apart. Matching this with our requirements above gives us:
From our model for the calorific values in this coal project we know that:
See Table 1 for this figure. So the estimation error in this case has a variance:
so that the standard deviation is
p | 61
After all that mathematics we have the results as follows: if we use only sample 1 to estimate the unknown value at the unsampled locations, we will have an estimator of 21.86 MJ. This estimate will be unbiassed if our assumption of 'no trend' is justified. The error made in the estimation process will follow a Normal distribution with a mean of zero and a standard deviation of 1.61 MJ. To avoid confusion, the standard deviation of the estimation error is often referred to as a 'standard error'.
We can say, then, that our estimate will be within 1.6449 standard deviations of the 'true' value of with a confidence of 90%, where the 1.6449 was acquired from reference table 2 (above) in the usual manner. We can be 90% confident that:
Juggling this around in the same way that we did in the previous course, we find that:
We can be 90% confident that the true value at the unsampled location lies between 19.21 and 24.51 MJ. Question: What sort of confidence interval would you get if you used sample 2 instead of sample 1?
Another Single Sample

According to our assumptions and our semi-variogram model for this case, samples which are closer together are more similar than those at greater distances. Let us try the same approach for estimating using only sample 3.
p | 62
In this case, * = g3 = 25.61 MJ. The estimation error is = * - and the same logic as before will show us that (in the absence of trend) the average estimation error will be zero. The variance of the estimation error, by the same reasoning, will be:
since
and the standard error becomes:
which is around 80% of the previous standard error. So, if we use sample 3, we get an estimator of 25.61 MJ with a 90% confidence interval of:
which gives us a 90% confidence interval on the 'true' value of of 23.47 to 27.75 MJ. This confidence interval is 20% narrower than the previous one. It is also centered around quite a different value. We now have two estimates and two confidence intervals for the same value. If we looked at all five samples, we would have five estimators and five confidence intervals:
It would be nice to pretend we could take this table and say that we can now be 90% confident that the true value of lies somewhere between the averages of the two columns, but life is never that simple. Apart from anything else, we do not want to weight all of the sample equally so we would have to consider weighting the confidence intervals also. Let us take one giant leap for geostatistical kind and consider taking two of the five samples.
p | 63
Two Sample Estimation

Let us now consider what happens when we use both sample 1 and sample 3 in the estimation of the unknown value T. Our estimator is:
where
Now, our estimator is still a linear combination of things we believe to be Normal, so the estimator and the estimation error will also be Normal. The mean of the estimation error will be:
using the fact that the average of 45% of a bunch of values is (usually) 45% of the average of the bunch of values. Try it on a spreadsheet if you need convincing. Of course, g1, g3and T are (we believe) all samples from the same Normal distribution with mean . This means that:
Remember that E{} is just short-hand for the average value over the population. Since w1+w3 = 1, the above again reduces to zero. That is, our estimation errors will average zero and our estimator be unbiassed provided that our original samples all come from the distributions with the same mean and that there is (by definition) no trend. The estimation variance becomes:
p | 64
Unfortunately we cannot do the simple separation trick with these terms we have to multiply out the brackets first. Bearing in mind that we could use the semi-variogram model to give us values to put into the previous calculations, we really want to get this expression down into terms which involve the square of differences between two 'sample' values. With this in mind, we rearrange the terms as follows:
which we can do so long as the weights add up to 1. Grouping the terms differently, we can simplify to:
This now begins to look like something that can relate to the semi-variogram. Squaring out the bracket produces:
This is becoming a little involved, so let us consider the terms separately. The first term is:
where (g1,T) is a new notation introduced to stand for 'the semi-variogram value at the distance between sample 1 and the unsampled location'. Remember that the semi-variogram model is the theoretical measure for one half of the average squared difference between values at any given distance. In our example, (g1,T) is the same as writing (212). In the same way, the second term becomes:
where
p | 65
is the semi-variogram value for the distance between sample 3 and the unsampled location. Those terms are pretty straightforward. The difficult one is the third term (as we have written the equation) the cross product:
since the weights and the '2' are simple numbers, the average we need is simply the 'average over the whole population' of terms which look like:
Now, this looks remarkably like a covariance type calculation. Unfortunately we have neither semi-variogram model nor covariance function for the cross product of two differences, only for differences squared. With a little bit of rearranging, however, we can juggle these terms into something we can handle. If you are (by now) confident in our results and do not feel the need to follow the mathematical intricacies, please feel free to skip ahead to the bullet list at the bottom of the next page. There is a classical algebraic trick called 'completing the square' which is what we will use here:
Looking at the expression carefully, we can see the terms:
which look almost like
Similarly,
is almost
But, of course, we cannot go adding in terms just because it is convenient. If we add them in we have to take them away again. To balance our equation then, we need to write:
p | 66
which reduces to
and
which now involves:

the semi-variogram between the first sample and the unsampled location, the semi-variogram between sample 3 and the unsampled location, and the semi-variogram between the two sampled locations. The first two of these are no surprise, since we know that the error we make is related to the difference between the sample values and the value which we are trying to estimate. What this term is telling us is that the error we make also depends on the relationship between the two samples. We will come back to this point shortly. Hold that thought. Taking these new semi-variogram terms and putting them back into out expression, we find:
Grouping like terms with like terms gives:
which can be simplified further into:
p | 67
But we know that w1+w3= 1, so this reduces to:
which is a pretty short answer to a very long process. Pulling all the bits together, we summarise our findings.
We use both sample 1 and sample 3 in the estimation of the unknown value T. Our estimator is:
where
Provided that there is no trend or other 'non-stationarity' present, the average estimation error will be zero and the variance of the estimation error can be calculated by:
which is the same weighted average of the variogram values (twice the semivariogram) sample 1 and T; sample 3 and T minus the variogram between the two samples multiplied by the weights for both samples. There is quite a nice pattern in this equation, in that wherever a sample appears its associated weight also appears. Plugging in actual values for this example we find the following:
The first need, here, is to choose the weights for the two samples. We will return, later in this part of the course, to how best to choose the weights. For the moment we will choose arbitrary weights of 0.45 for sample 1 and 0.55 for sample 3. This is not as arbitrary as it looks, since this choice gives approximately 20% less weight to sample 1. You may remember from the previous exercise that sample 3 gives us confidence intervals 20% narrower than that associated with sample 1.
p | 68
The error associated with this estimator has a Normal distribution with theoretical mean zero and theoretical variance:
giving a theoretical standard error based on the semi-variogram model of:
which is 25% lower than when we just used sample 3 and almost 40% lower than just using sample 1. Our 90% confidence interval for the 'true' value at the unsampled location is now between 22.31 and 25.52 MJ. Our conclusions in this part of the course so far are:

if you use a closer sample you get more confidence in your estimate, and if you use two samples you have more confidence than if you just use one. One has to question whether we needed all this algebra to actually come to these conclusions. Let us take an example whose result may be a little less intuitively obvious.
Another Two Sample Estimation
Once again, we have two samples one of which is 212 metres from T and one of which is 150 metres from T. Our estimator will be of the form:
where
p | 69
Provided that there is no trend or other 'non-stationarity' present, the average estimation error will be zero and the variance of the estimation error can be calculated by:
which is the same weighted average of the variogram values (twice the semivariogram) between sample 2 and T, sample 3 and T minus the variogram between the two samples multiplied by the weights for both samples. Substituting actual values for this example we find the following:
and if we use w2 = 0.45 and w3 = 0.55 as before, this becomes:
with an estimation variance of:
and a standard error of
which is virtually identical to what we get when we only used sample 3 to estimate T. The only thing which changed in the calculation of the magnitude of the estimation error was the semi-variogram term between the two samples. This result tells us that using two samples is not necessarily better than using one sample if they are close together. In attempting to answer question 7, we have run up against question 4 "how do we compensate for irregularly spaced or highly clustered sampling?". The last term in the estimation variance is actually a term which calculates how closely the samples are related to one another. There is a balance in the system between getting the samples as close to T as we can but keeping them as far from one another as is possible. Clustered samples do not give us as much information about unsampled locations as sampling which is regularly spread out across the study area.
p | 70
Exercises
for
reader:
If you want to 'reinforce' this section try all combinations of two samples to estimate T and find out which gives the 'best' estimator.
Three Sample Estimator

There is actually one other way in which we might improve that last estimator. Even if we are restricted to using those two samples in the estimation, we still have control over what weightings we apply to each sample. Part of the problem with confidence in this estimator is that we have not chosen 'optimal' weights for the samples. However, before we move on to how to best choose those weights, let us generalise what we have so far. In order to develop a 'general' solution we need to consider a slightly more complex situation than just two samples. In this next example, we will use all three of the above samples to obtain an estimate of the value at the unsampled location. For this example we will use
where
As before, our error will follow a Normal distribution with mean
provided that the weights add up to 1 and there is no trend. The estimation variance becomes:
Without working through all the algebra, it should be fairly obvious that we will have terms involving:
p | 71
a) semi-variograms between each sample in turn and the unsampled location b) semi-variograms between every pair of sampled locations We could write this as:
You can see, to some extent, how the added terms are introduced. We find it easier (and more rigorous) to write out the equation as follows:
Now, you are probably thinking that is far more complicated than the other way. This layout has two major advantages:
by laying out all the inter-sample pairs in a table, you make sure you do not 'lose' any in the process; this is a lot simpler to generalise to more samples. Suppose our next step would be to consider all five samples in this example. By laying the equation out this way, we can see that we only need to add two terms to the top line 2w4(g4,T) and 2w5(g5,T). Inside the 'big bracket' we would need to add two terms to each row and another two rows. The terms are simply the weights for that row and column multiplied by the semi-variogram between the sample for that row and the sample for that column. Those of you who are comfortable with matrices will already be amazed by the simplicity of this equation. Before we leap off into the wide blue yonder of ever increasing sample numbers, let us first solve this three sample problem. We need to define weights for each sample. We have three, two at 212 metres and one at 150 metres. We will use an arbitrary set of weights as follows:
to keep the calculation simple. Our estimator becomes:
p | 72
with an expected average error of zero in the absence of trend and an estimation variance of:
Remembering that:
for the calorific value in the coal example, we reduce this to:
so that
which is marginally worse than when we used samples 1 and 3 together. Our conclusion has to be that, if we use these samples with these weights, more samples is not better. It is becoming obvious that we are not making the best of our sample information. Since the only thing we can change in the standard error calculation is the weighting factors, it would seem prudent to turn our attention to this matter before we try any more complicated examples.
Choosing the Optimal Weights

Choosing the Optimal Weights
In the previous section, we have seen that we can quantify the error on our estimation as follows. If we have a weighted average estimator of the general type:
p | 73
where: gi are the measured values at each sample site i; wi are the weights accorded to each sample; m is the number of samples to be included in the estimation and T* is our estimator of the unknown value, T, at the unsampled location. We have seen further reasons why we need the weighting factors to sum to 1. In the absence of trend or any other kind of 'non-stationarity', scaling the sum of the weights to 1 ensures that T* is an unbiassed estimator of the unknown value.
Under these conditions, we find that we can derive the variance of the estimation error often shortened to estimation variance as follows:
or, in a more succinct form:
Looking at it either way, the confidence we can have in our estimator depends on:
the relationship between values at different locations as modelled in the semivariogram; the relationship between each sample and the unknown value to be estimated; the relationship between every possible pair of samples (including each with itself); the weighting factors chosen for each of the samples. In working out the estimation variance, we literally make every pair of locations and include it somewhere in the equation. This includes taking each sample with itself and taking the unsampled location with itself. We have, therefore, quite a
p | 74
few terms which include the expression (0). To be precise, we have from the above:
In this example, of course, (0) is zero, because we have a zero nugget effect on our semi-variogram model. Later in this part of the course we will discuss an example where the nugget effect is non-zero and examine the effect this can have on the resulting estimation variance.
Three Sample Estimation

Let us return to our coal example and the attempt to use 3 samples to estimate the unknown value at the unsampled grid location.
The estimator we wish to use is:
where
Our error will follow a Normal distribution with mean
provided that the weights add up to 1 and there is no trend. The estimation variance was:
Substituting in the distances and semi-variogram model for the calorific values:
p | 75
Using the appropriate semi-variogram model for the calorific value in the coal example:
This reduces to:
or, gathering the terms together:
This is, at last, a fairly simple expression which is determined by the location of the samples, the location of the unsampled site and the semi-variogram between any two locations. All of those terms in the equations are simply numbers which can be read off or calculated from the semi-variogram model. The only 'variable' quantities in this expression are the weights which we choose to give to each sample. So, we can see, that the confidence we will have in our estimator is dictated by the weights we choose. Now, we could try many different sets of weights and calculate the estimation variance for each set. We could, for example, choose different distance weighting functions and compare to see which one gives us the 'best' answer. Of course, in this context, the logical definition of 'best' is the one which gives us the narrowest confidence intervals the one with the smallest estimation variance. Two drawbacks to that approach: it is an awful lot of work and there is no guarantee that we have actually tried the truly optimal answer. We can find the 'best' of the set we try, but there may be another set of weights out there which would do even better. Matheron's approach to this was simple. We want the minimum value of the estimation variance. The estimation variance is a function of the weights chosen. How do we find the minimum of a function? We differentiate it and set the differential to zero and hope it is not a maximum! Life is not quite that simple, since we have three weights to determine and must, therefore, set three differentials to zero simultaneously. On the other hand, the
p | 76
differentiation is not exactly an onerous task if we bear in mind that when we are considering the differential for w1 then w2 and w3 are treated like fixed numbers. From
we get:
Rewriting and tidying up a bit:
Solving these equations, we eliminate any one of the weights, say w1:
Subtracting one from the other gives:
and, of course, we still have equation (i):
Using the same approach to eliminate w2 we find:
Subtracting one from the other produces:
p | 77
Substituting back into the previous equations, to find w2:
Using any one of the earlier equations we find that, for example:
These results are somewhat disturbing. For a start, one of them is negative. For a finish, they do not add up to 1 but 0.9444. Now, we did not insist that our weights should be positive, we only said we wanted them to add up to 1. We can interpret a negative weight as being told that the system just does not want to use that sample. Given that including this sample in previous calculations tended to make matters worse, this seems to be a support to our previous intuitive feeling that this sample does not give us any extra information about T. However, we should be somewhat concerned by the fact that the weights do not add up to one. This would mean that the expected value of our estimation error would be:
and we would need to adjust our estimator by 0.0556 to make it unbiassed. That is, out estimator would actually have to be:
to be unbiassed and have the minimum estimation variance of:
p | 78
and a standard error of
The General Form for the 'Optimal' Estimator

The general form for the above process would be expressed as follows:
which gives an unbiassed estimator with estimation variance:
where the weights are found by solving the equations:
for i = 1, 2, 3.......m. Each equation may also be written out in detail as:
so that we have m equations to solve to obtain m weights. Note: It is worth pointing out, at this point, that this is where the mathematical restrictions come in for the semi-variogram model. It is necessary for the semi-variogram model to be 'conditionally positive definite' to guarantee that the estimation variance above is positive. Matheron called this process of 'optimal' estimation 'simple kriging' in honour of Danie Krige's earlier work with the moving template and empirical covariances. The reason is perhaps clearer if we express the equations in terms of the covariance function rather than the semi-variogram. If we write:
then the simultaneous equations change to:
p | 79
or, if you prefer:
where ij represents the theoretical covariance between samples i and j and iT represents the covariance between sample i and the unsampled location. In this form, the term ij is the covariance between a sample and itself. If we believe our sample values
and, therefore,
If we do not believe our samples and use:
then the equivalent in the covariance form is
As indicated before, we will return to this question using an illustrative example with a non-zero nugget effect in Part 4.
Confidence Levels and Degrees of Freedom

The covariance form is identical to the approach used by Krige with one major exception. Instead of using covariances derived directly from historical data, we are using a covariance model derived from all available sample data. The estimation technique which has been developed is a theoretical one based on a theoretical model. It is heavily dependent on the assumption that the semivariogram model is the correct model for the population continuity within the study area. Because of this, confidence levels are determined by Normal theory and relevant percentage points read from Reference Table 2 (below) rather than the Student's t from Reference Table 4 (below). The concept of degrees of freedom is not relevant to this calculation, because we do not derive the estimation variance or the standard error directly from the sample data.
p | 80
Of course, this argument is specious since the semi-variogram model is not the 'true' population semi-variogram but an estimate of it. However, the model is derived from all possible pairs of samples available during its construction. Remembering that we did not estimate a mean from the samples (either), the relevant degrees of freedom would, therefore, probably be
if we wanted to get picky about it (Cressie, personal communication). In the coal project example, we started with 96 samples. The degrees of freedom applicable to this estimator would, by this argument, have 4,560 degrees of freedom. Since Table 4 (above) only goes up to 120 degrees of freedom before reverting to Table 2, (above) we can probably use the Normal theory without too much fear of inaccuracy.
Simple Kriging
We have done rather a lot of jumping around in the previous couple of sections. Let us return to our estimation process and, in particular, our three sample coal example.
Our 'optimal' estimator turned out to be:
with a standard error of
where the three weights were 0.3885, -0.0623 and 0.6182 respectively. This set of weights makes some sort of sense intuitively. The greatest proportion of the weight (almost two-thirds) is on the closest sample. Sample 1, which is further away but in a totally different direction gets a weight over 60% of the weight
p | 81
given to sample 3. Sample 2, which appears to be worse than useless in this case is actually getting a small, but negative, weight. This estimator should, theoretically, have the smallest possible estimation variance and, hence, the narrowest confidence band of any linear estimator using these samples. However, even if we put the question of a negative weight aside, there remains one enormous problem. To ensure that the estimator remains unbiassed we have had to introduce a proportion of the population mean to make up the shortfall in the weights. Now, we have to say that, if we knew what the population mean was, we would not be writing text books on geostatistics we would be down at the local stock exchange making millions. Simple kriging is sometimes referred to as 'kriging with known mean'. What happens if we put an estimate of the population mean in there? If we use
then the standard error is no longer
since this takes no account of the error made in estimating the population mean. There are various schools of thought on how we modify the estimation variance to reflect the uncertainty on the mean estimate. There are also various schools of thought on how we actually get an estimate of the population mean and whether it should be 'global' or 'local'. In a text book like this, we do not wish to become enmeshed in the ongoing controversies and opinions surrounding simple kriging. Especially since there is a much simpler way around the problem.
Ordinary Kriging
Ordinary Kriging
Our original intention was to produce a weighted average estimator purely from the neighbouring samples. There was no thought in our original formulation of introducing the 'global' population mean as a mitigating factor. In fact, given the regression problem discussed as question 8 in Part 1 "Why is our final map too smooth?" one of the least desirable things in our estimator would be to pull it towards the average value. Our estimates are already not variable enough. Putting a portion of the weight, no matter how small, onto the overall mean will reduce high estimates and increase lower estimates making our map even less variable than before. Once again, in trying to do the logical thing, we have introduced more problems than we started with. Matheron tackled this problem and we will follow his logic here. We want an estimate like:
p | 82
where
We developed the equation for the estimation variance of such an estimator:
or, in a more succinct form:
We wanted to find the minimum of this function and decided the easiest way to do that was to differentiate, set the differential to zero and assume we had a minimum and not a maximum. In fact, you can 'prove' algebraically that it will be a minimum. This gave us m equations to solve for m weights. The problem is that there are actually m+1 equations, because we also want:
Now, solving m+1 equations when you only have m unknowns is pretty much impossible except under pathological circumstances. If you have m+1 equations you really need to have m+1 unknowns to solve for. We need to introduce another unknown. This is a similar type of trick to the one we did when 'completing the square' to get something we could actually calculate. There is a mathematical technique called the 'method of Lagrangian multipliers' which you will find in any Calculus 101 text book. It gives us a way of minimising (or maximising) a function subject to a linear constraint. We introduce another unknown factor in such a way that it contributes nothing to our function to be minimised. Look carefully at the following function and see if you agree that it is exactly the same as 2:
p | 83
Your reaction should be, 'yes it is provided the weights add up to 1'. If it isn't go back and read the last couple of sections again. The term the Greek letter lambda is an unknown quantity. Whatever its value, the second term is zero so long as we make the weights add up to 1. To make the results simpler, we actually minimise the function:
but this is pretty academic. Following the same principle as before we differentiate this function by each weight in turn to get equations which must be solved simultaneously. These equations will look exactly as before, except that there will be a in each one. Let us look at any one equation in detail:
dividing through by the (irrelevant) 2 and tidying up we get:
If we do this for each weight in turn, we will generate m equations with m + 1 unknowns. Still not balanced. The last equation comes from minimising the expression with respect to . If we differentiate by , the whole of the 2 vanishes, the itself disappears and we are left with an equation which says:
or
which is exactly what we want. This process is called 'ordinary kriging' or, as Matheron called it originally, 'krigeage ordinaire'.
'Optimal' Unbiassed Estimator

The above process of logic supplies us with a template to find the weighted average estimator which will provide an unbiassed estimator with the smallest estimation variance given the constraint that we insist that the weights sum to 1. This estimation variance will not be as small as the one from simple kriging. On the other hand we do not have to pretend we know the population mean to get the optimal estimator. If the overall mean is not known, there is no guarantee that
p | 84
simple kriging will give an estimator with a true estimation variance less than that of ordinary kriging or 'kriging with unknown mean'. The general layout of a kriging exercise is as follows. Set up the ordinary kriging equations:
Work out all of the distances between all of your pairs of samples (left-hand side) and all the distances between each sample and the unsampled location (righthand side). Plug these into your semi-variogram equation to get the terms. Solve the equations to get the weights {w1, w2, w3, ..................wm} and . Use the weights to give you the 'best linear unbiassed estimator' sometimes known as 'blue'.
We can also work out the estimation variance for the estimator to provide confidence intervals on our estimated value. One of the nice things about this system is that we can use yet another algebraic trick to simplify the calculation of the estimation variance. With a quick bit of mental algebra, you can show from the above set of equations that:
which means that the estimation variance for ordinary kriging can be expressed as:
which is rather less tedious to calculate.
p | 85
Alternate Form: Matrices

If you are partial to matrices (which we are not) or use a package such as Matlab ordinary kriging can be reduced very elegantly as follows:
then the ordinary kriging equations become:
and the 'solution vector' is:
The estimator can be expressed as:
where g represents the array of sample values used completed with a zero in the (m+1)th element. The estimation variance can be written as:
Alternate Form: Covariance

The ordinary kriging equations can be equally well expressed in terms of the covariance function assuming, of course, that the semi-variogram model has a
p | 86
sill of some kind. The layout of the equations is identical to the semi-variogram form, with covariances substituted wherever we currently have semi-variograms. The only change is that the sign on the lagrangian multiplier must be changed. All references to + in the equations must be changed to -. Alternatively, you can change the sign of in the calculation of the kriging variance. The formula for the estimate itself remains unchanged, but the ordinary kriging variance becomes
if you used - in the equations (or vice versa). The terms iT denote the covariance function value between each sample and the unsampled location.
Three Sample Estimation

Let us return one last time to that three sample example which has dogged us for most of this part of the course.
Substituting all of the distances between all of the pairs of samples (left hand side) and all the distances between each sample and the unsampled location (right hand side):
Plugging these into the semi-variogram equation to get the terms:
gives:
p | 87
Solving the equations to get the weights {w1,w2,w3} and , we get:
The estimation variance for ordinary kriging can be found from:
giving a standard error of
and a 90% confidence interval on the 'true' value of T of 24.06 +/-1.59 that is from 22.47 to 25.65 MJ. This is a marginally higher estimation variance than that given by simple kriging. However, it requires no knowledge of the true population mean to calculate this estimate. If we use the semi-variogram form, we do not need to know the population standard deviation either or even assume that it is constant over the study area.
Cross-Validation
Cross Validation
The geostatistical estimation methods which are grouped under the generic heading of 'kriging' are different from almost all other mapping techniques, in that they depend on a 'model' of the spatial continuity which changes with study area and variable being studied. For example, the sulphur values in the coal seam
p | 88
will not have the same semi-variogram model as the calorific values unless, of course, the depositional mechanism and the source of material were highly related. We would expect a similar structure between ash content and calorific value because these are extremely highly correlated. However you will see, if you calculate the experimental semi-variogram for ash content, it is a different shape from that of the calorific values. We have, therefore, a framework under which the estimation of unsampled locations depends heavily on the semi-variogram model as the representation of the true spatial structure for that measurement in that area. All of the results discussed in this part of the course are based on the assumption that the semivariogram model, (h), is the true semi-variogram for the whole population in this area. This is probably the strongest reason for considering a geostatistical estimation technique above any more automated method. The next strongest reason for a geostatistical analysis is, probably, the measure of confidence which we can produce once a 'kriged' estimate has been found. In fact, it is possible to produce confidence levels for any weighted average or linear estimator using the full form of the estimation variance even inverse distance maps. There is one major potential weakness to this 'custom built' approach to estimation. The optimality of the results depends on our assumption that the semi-variogram model is the correct one. There is (as yet) no simple way to attach confidence limits or sensitivities to that model and include this in the assessment of confidence in our final estimates. There are numerous studies into confidence on the semi-variogram model and into certain jack-knifing approaches to sensitivity. However, these have not as yet worked their way down to routine practice, such as we discuss in this text book. It cannot be stressed too strongly that an inappropriate semi-variogram model will lead to inappropriate and potentially misleading estimates. Kriging only gives the 'best' answers if our model is correct. In 30 years of applying geostatistics to real situations, we have seen many cases where failure to consider the underlying process which produced the phenomenon resulted in completely spurious estimates. In one classic case in the early 1980s, a 25 metre wide orebody was turned into a 900 metre wide ore deposit because a consultant failed to construct directional semi-variograms and fed an omni-directional model into an automated computer package. An error as gross as this is, fortunately, fairly easy to spot. Over the years, as awareness of this limitation has risen, practitioners have become more careful of modelling. One of the methods which is suggested for checking on the semi-variogram model is called 'cross validation'. This is a very straightforward procedure and bears some little resemblance to classical jackknifing. It is, essentially, an attempt to see whether the estimates produced by the kriging process resemble those which really exist within the specified confidence levels. The process goes as follows:
p | 89
1) Remove a sample from the data set. This value is no longer considered to be a gi but is now a Ti. 2)Using the remaining samples, produce an estimator, T*i and its associated standard error, j. Any of the kriging techniques or even inverse distance can be used for this process, so long as the standard error is calculated based on the semi-variogram model. 3)The actual error incurred during the estimation can be calculated:
In theory, if our semi-variogram model is correct and the kriging is carried out correctly, this is one sample from a Normal distribution with a zero mean and standard deviation j. 4) This is repeated for each sample in turn. There are differences between practitioners as to what is done with the final list of i values. Some practitioners compare the average of 2i with the average of ej, presumably in some kind of empirical F ratio test. Since each i has (potentially) a different j this does not seem very stable in a statistical sense. An alternative approach is to calculate:
for each sample. This xi is now a standardised statistic which should if all has gone well come from a distribution with a mean of zero and a standard deviation of 1. That is, all of the xi should come from the same population. The mean of that population can be estimated by and the standard deviation by sx where:
If the semi-variogram model and kriging technique are appropriate, this mean and standard deviation should be approximately equal to zero and 1 respectively. As an example, a semi-variogram was calculated from the Organics data set. The model fitted to this semi-variogram was Spherical with a nugget effect of 1.5 (g/kg)2, a range of influence of 120 metres and a sill on the Spherical component of 7 (g/kg)2. The modified Cressie statistic for this fit was 0.0359, indicating good agreement. A cross validation exercise produced an average
p | 90
statistic of -0.0088 with an estimated standard deviation of 1.0561.There remain a few questions.
Is the use of (n-1) in the calculation of sx appropriate? Is it logical to assume that the estimation error at one location is independent of the estimation error at a neighbouring point, when they are both derived from combinations of essentially the same samples? What are the real degrees of freedom associated with this exercise? How far from zero and 1 can these two estimates get before we start to become concerned? If we do not know the correct degrees of freedom, we cannot use the classical statistical tests to tell us when the calculated statistics becomes significantly different from their expected values. If we do get ( = 0) and sx = 1, does that 'prove' we have the correct model? This one we can answer no, it doesn't. You can use an entirely inappropriate model and still get zero and 1. The logic does not work in reverse. So, why do cross validation? For one thing it is always nice to have some reassurance behind the visual fit and the Cressie statistic. For another, this is the only time in any analysis where you will have the chance to check on whether your errors follow a Normal distribution and, therefore, whether you can use Normal confidence intervals. A probability plot of the standardised errors should look reasonably Normal to justify all of the confidence interval calculations we have blithely carried out through this part of the course. Drawing a line on the plot with the ideal mean of zero and standard deviation of 1, will also tell us whether there are major deviations from the ideal behaviour. This assessment needs no knowledge of degrees of freedom or such to help us make a decision. There are other very useful results of a cross validation exercise. One of these is the ability to pick up 'outliers' in the sample data. We saw in the previous course that there was a problem with a seemingly aberrant sample value. When we look at the cross validation statistics in the probability plot, we can see that there are actually two samples which seem to have wildly different values from their neighbours. 102 of the 104 samples lie nicely along the Normal line with mean zero and standard deviation 1. The other two have very large, negative cross validation statistics. This implies that their estimated values (T*) are much lower than the value predicted (T). Looking at these two samples more closely, we find:
p | 91
The first of these two is the one we spotted in the previous course as a probable typographic error. The second has a perfectly reasonable value of 11.6. The 'typo' sample lies right in the centre of the sampled area and is clearly extremely different from its neighbours. The estimated value at this location from the other samples is 12.68. According to Table 2 (right), the chances of a Normal statistic taking a value of magnitude -3.88 is around 0.005. There is pretty obviously something wrong with this data value. Perhaps more interesting is the second sample: 11.6 g/kg is not an extreme value in this area, lying well within the usual range of sample values. However, this sample, which lies on the eastern edge of the sampled area, has a value well below all of its neighbours. A statistic of -3.11 would be expected around one time in one thousand. If we had a couple of thousand or so samples, we might expect an odd statistic to be this low. With 104 samples, it seems a little extreme. Certainly, we would recommend that this sample was re-examined for measurement error or incompatibility with the general population before it could be used with confidence in this study. Further uses of cross validation include visual assessment of the exercise. For example, we can compare the actual sample values with their estimated values in a scattergram. This gives us a pretty good idea of how well we can do when we know the answer. If there is little correlation between actual and predicted values, we know that kriging is not going to help us fill in the gaps in our knowledge. We can also plot the actual value against the standardised error statistic. You can see from the graph on the next page that there is a correlation of 0.638 between these two quantities. That is, the error we make is directly related to the value we are trying to estimate. High error statistics are under-estimates. The graph shows that we tend to under-estimate locations at which have high actual values. Similarly we tend to over-estimate locations at which we really have low values. This is a clear illustration of the regression effect which we discussed in the previous course. A similar plot of error statistic against estimated value would show no correlation at all. Note: The following was found in a consultant's report: "The cross validation exercise was somewhat less than satisfactory. However, this is of little concern, since kriging always provides the best linear unbiassed estimator". Be warned.
p | 92
Cross Cross Validation

A useful variation on the classic cross validation procedure is a 'cross cross validation'. We use this slightly clumsy phrase to suggest that the cross validation is actually being carried out between two different data sets. That is, one set of data supplies the 'actual' values, T, and the other provides the estimated values, T*. We have applied this technique successfully to many different situations, such as:
comparing different sampling methods to find out whether the results are biassed or more/less variable in one set; comparing new sampling against historical information available from other sources; comparing samples suspected to be from different populations and so on. The major advantage of this approach over the classical hypothesis testing methods is that it includes the spatial aspect of the data. For example, a new drilling campaign in a project area in the CIS revealed results which were significantly lower than that of the sampling carried out by a Russian company before the dissolution of the USSR. A cross cross validation exercise revealed that the results were spurious because the new drilling had all been carried out within a limited area within the deposit. Comparing these results with the whole deposit was inappropriate, because the area where the new sampling took place was known to be of lower grade. As a brief illustration of cross cross validation, we turn to the Geevor tin mine data set. A variation of this exercise was reported in a paper in 1979 (Clark, 16th APCOM). We have, in the previous course, commented on the apparent difference between the width of the tin vein revealed by development sampling and by the stope values in the mined out areas. There was a little doubt expressed as to whether we could do a direct comparison, since the development covers the whole study area, whilst the mined out areas are around 60% of that. This argument will be answered by a cross cross validation exercise. We construct the experimental semi-variogram and model it using the development samples only. We then carry out a standard cross validation exercise, except that the stope samples (zone code 2 on data file) are taken as the actual values, whilst the development samples (zone code 1) are used to produce the estimates. The results are summarised for the 2,715 stope samples as follows:
p | 93
We can see clearly from this table that the development samples consistently over-estimate the value which was later found in the stope after mining took place. We can also see that the standard deviation of the error statistics is over 15% lower than the ideal. This implies that the magnitude of the errors in the stopes is rather less than the variability experienced in the development. From this exercise, we have identified a significant bias and a difference in variability possibly in continuity between the samples used to plan mining and those values found when actual mining was carried out.
Worked Examples
Worked Examples
This section contains worked examples using:

Coal Project Iron Ore Wolfcamp

Just to complete the exercise, we will work through the same process with the whole five point example.
p | 94
The estimator will be a weighted average of all five samples, where the weights will be determined using ordinary kriging. The semi-variogram model for calorific values is:
The ordinary kriging equations will be:
Determining all of the distances between the pairs of points, gives:
Calculating the semi-variogram values from the model equation:
Solving these equation produces the following results:
so that
p | 95
so that 0k = 0.81 MJ. Our best linear unbiassed estimate for the value at the unsampled location based on 5 samples is 24.86 MJ. The standard error gives us 90% confidence of being within 1.33 MJ of the true value, so that our confidence interval for the value of T is between 23.53 and 26.19 MJ.
Iron Ore Example

In Part 1 we discussed a set of simulated sample data which was provided in Practical Geostatistics 1979 (Clark, (1979)) and is generally referred to as the Iron Ore - Page95 data set. This data comprises 50 samples taken at random over an area 400 by 400 metres in a simulated iron ore deposit. During that discussion, we came to the conclusion that inverse distance weighting was not an appropriate estimation technique for this data set. The maps produced by inverse distance techniques were either totally smoothed or ragged and discontinuous. We might suspect, then, that the structure of this phenomenon was not a distance dependent one. However, when we construct an experimental semivariogram, we find that we can model it with a classic Spherical model. There is no nugget effect in this model, a range of influence of 120 metres and a sill of 24 %Fe2. Given that the graph is only based on 50 samples, this would seem a pretty well behaved sample set. A modified Cressie goodness of fit statistic of 0.0552 seems to verify our feeling of stability in the model. To map values over this area, we laid a grid of points at 5 metre intervals over the 400 metre square study area. This gives us a total of 81 by 81 grid points to estimate. This is the same resolution we used in the inverse distance estimation. Since the range of influence of the semi-variogram model is 120 metres, we use this as a default search radius for inclusion of samples within the estimation of a grid point. That is, all samples within 120 metres of the grid point are included in the kriging system. For each grid point, a set of kriging equations is set up and solved, producing an estimated values at that location and a kriging standard error associated with the estimate. Because of this, a kriging exercise produces not one map but two: the interpolated map and a companion map of the standard errors. This latter map can visually highlight where the map is least reliable and where more samples would be of greater benefit.
p | 96
Wolfcamp, Residuals from Quadratic Surface

In Part 2, we saw that the Wolfcamp sample values could be modelled by a Spherical semi-variogram once a quadratic trend had been removed. This semivariogram model had a nugget effect of 12,000 ft2, a range of influence of 60 miles and a sill on the Spherical component of 23,000 ft2. All of the examples we have discussed so far in this part of the course, have had zero nugget effects. We will use the Wolfcamp quadratic residuals to demonstrate the impact of the decision on what to do with the semi-variogram model at zero distance. When we laid out our original assumptions, we stated that the sample values must be precise and accurate. If we accept this assumption, then it follows logically that
no matter what size the nugget effect is. We interpret the nugget effect as being the essential difference between two samples taken very close together but not at exactly the same location. Effectively, we have a discontinuity at the origin of our semi-variogram model, where we force it to go through zero at zero distance. We will take a small example to demonstrate the impact of the value of the semivariogram model at zero. We have selected a point to be estimated at {X = -10, Y = 90} and chosen five surrounding samples for the illustration. The sample locations and values are shown in the table. Remember that these are residuals from the quadratic trend surface, not absolute values. Some samples will have values above the local trend and some will have them below.
All of the sample/sample distances and the sample/unsampled location distances are calculated. These distances are substituted in the standard kriging equations using the semi-variogram model:
p | 97
results in the ordinary kriging equations:
Solving these equations for the weights and the lagrangian multiplier produces:
The estimator T* = 108.1 ft, the kriging variance is 21667 ft2 and the kriging standard error is 147 ft. In summary, if we accept the assumption that our sample values were measured precisely and accurately, we would expect to get a potentiometric value of 108.1 feet above the local trend in a well drilled at this site. We can express the reliability of this estimated value by constructing (say) a 90% confidence interval around it of 1.6449 x 147 feet. That is we are 90% confident that the true difference between the pressure level in the potential well and the general trend in that area will lie in the interval
Now, suppose we did not believe the data values were precise and accurate. Many practitioners of geostatistics (and many computer packages) allow the semivariogram model to intersect the axis at the nugget effect, so that:
What difference does this make to our kriging system and the results from it? The equations are almost identical, with only the diagonal elements changing:
solving these equations produces the following:
p | 98
Notice the massive shift in weighting towards the closest sample. Notice also the enormous change in the lagrangian multiplier. Remember that this value is a balance between the efficiency of the samples in estimating T and the clustering (or not) of the data. This value is added to the estimation variance, so a large negative figure is extremely desirable. The kriging variance is:
giving a kriging standard error of 81.6 feet. This is 55% of the kriging standard error obtained when we trust our data implicitly. The estimated value at the unsampled location is now 73.8 feet above the local trend value. Notice that the (T,T) term must also equal the nugget effect since it is equal to (0). If we did not subtract this figure, the kriging standard error would be 137 feet which is still almost 7% lower than the 147 ft given when we accepted (0) = 0. These results are totally counter intuitive. If we trust our sample values, we have less confidence in our estimator than if we do not trust our sample values. This just does not make sense. Whether we set (0) = 0 or (0) = C0, the kriging system will honour the data. But, telling the kriging system that the sampled value is in error results in significantly narrower confidence intervals. This is a nonsensical result and cannot be accepted in any real application. There are two possible solutions which are simple to implement: 1) Force the semi-variogram model to go through zero at zero distance. If your software uses the nugget effect at zero distance, introduce another component to your semi-variogram model. Instead of a nugget effect of C0, give your software a nugget effect of 0 and an extra Spherical component with a very short range of influence and the sill C0. If you do not know what your software does, try this and see if it makes a difference to your answers. 2) Carry out your kriging with the nugget effect at zero distance. Then acknowledge that you do not trust your data and add 2C0 to all of your kriging variances. In our example above, this would boost the kriging variance up to 30655 ft2 and produce a kriging standard error of 175.1 ft. This is now almost 20% higher than the case where we trust the data, which makes a lot more sense.
p | 99
Note: Apart from this discussion, all kriging carried out in this course is carried out with (0) = 0.
p | 100
Part 4 Areas and Volumes
p | 101
p | 102
The Impact on the Distribution

Areas and Volumes
In the previous parts of the course, we have discussed geostatistical methods as if the samples had no characteristics other than value and 'position'. We have ignored the size and shape of the sample, the way in which it may have been taken and/or measured, and so on. We have effectively assumed that the sample values were located at 'points' within the deposit. In this part of the course we will see what effect those other characteristics collectively called 'support' have on the sample value itself, and hence on the semi-variogram. Before we continue, however, it is worth making one point. When the first Practical Geostatistics was written, there was little general access to computer services for students and workers 'in the field'. Nowadays, with the prevalence of personal computers, many calculations have become much simpler and certainly faster. Because of this, we would like to concentrate on explaining the principles of extending geostatistical methods to areas and volumes, without going into excessive detail on the computational aspects. Those of you who require more than we give here can always download the 'old version' from the Web or consult more weighty tomes on geostatistics. For the bulk of this part of the course, we will illustrate our discussion with the Iron Ore - Page95 data in Practical Geostatistics 1979 (Clark, (1979)). In the previous course, we found that these samples appeared to come from a Normal distribution with mean 34.50 %Fe and an estimated standard deviation of 4.75 %Fe. The application of geostatistical estimation to unsampled locations was covered in Part 4. An experimental semi-variogram was calculated and a Spherical model fitted with no nugget effect, a range of influence of 120 metres and a sill of 24 %Fe2. If we square root the sill value, we get 4.90 %Fe, which is a little higher than our 'statistically' estimated standard deviation of 4.75 %Fe. Bear in mind that these are two different estimates of the same quantity the population standard deviation and are based on two different sets of assumptions. The square root of the sill is probably a better estimate for than s is, since the estimation of the sill value does not depend on the assumption that samples were taken randomly and independently. One of the things which distinguishes Matheron's work from almost all other workers in the field barring Krige himself is that he extended his considerations to dealing with average values over areas and volumes. In mining, in particular, we are concerned on a day to day basis with areas and volumes. There are not too many mines in the world who produce ore by hand picking rocks or chipping from the exposed face of the mineralisation. Generally, mining is planned and executed in terms of 'stopes' mining areas and volumes or in terms of mining blocks. The term 'stope' is usually given to mining areas within underground 'hard rock' metalliferous mines. In coal, mining proceeds by
p | 103
longwalls or by 'room and pillar' plans. Open cast mines (open pits) are planned with mining blocks which are generally, but not always, of similar size. The question which Matheron addressed was "how do we estimate and produce confidence on the average value over a mining block?". This is not a trivial question in the context of mine planning. Consider, for example, the grade/tonnage curve. According to our calculations in the previous course, 83% of our area lay above a cutoff of 30 %Fe and the average value of this 'payable' material was 35.96 %Fe. The area under study covers a square 400 metres on a side. If we plan to mine this area in 20 metre blocks, how many of those 400 blocks will be payable? If we plan on 50 metre blocks, how many of the 64 blocks will be payable? What average value will the payable blocks have in each case? Before launching into the geostatistics, let us just explain why the answers to these questions will depend on the size and shape of the block. Consider a 50 metre square within the Iron Ore - Page95 study area. In Figure 1 we see two typical cases which illustrate our problem. On the left, we see a block which is, overall, payable. On average, this block has a value of almost 32.5 %Fe. However, we can see clearly that there are areas within the block which lie below 30 %Fe. If we choose to mine this block because the average is above cutoff we will include material which is below that cutoff. In the block on the right, we see a block whose value is only 28.8 %Fe on average. This block would not be mined because its average is below 30 %Fe. However, there is payable material within the block which will be left behind. The overall effect of selecting on block averages is that we will, generally, end up selecting a higher proportion as 'payable' but the average value of that increased proportion will be commensurately lower than if we could achieve 'perfect selection'. Once we start averaging values over areas or volumes, we change the characteristics of our defined problem. Average values are not as variable as single 'point' values. Matheron dubbed this the 'volume/variance effect'. We have already seen evidence of this problem in Krige's 'regression effect' in the previous course. In the geostatistical literature, this effect has been given many names. Probably the most widely spread is the 'conditional bias'.
The Impact on the Distribution

Using the Iron Ore - Page95 data set, we will illustrate what happens to the distribution of the values when we consider averages over an area. We will show how you can theoretically determine the reduction in variance and how to apply this to assessment of the conditional bias. By averaging within an area, we are effectively 'removing' the variation of the values within the block. You can see from Figure 1 that, even in this fairly continuous data set, the variation of values within a block can be extensive. It is fairly obvious that, the bigger the area we take, the smaller the standard
p | 104
deviation of the average values will become. How can we quantify this reduction in variance? Let us consider a block of ground:
The area can be considered to be a lot of potential sample sites. Each 'point' within the area (or volume) has a value which we do not know because we have not sampled it. We want to derive the variability amongst all these potential sample values, without having to go in and actually take them all. The quantity we need is the variation between the values the differences between all of these individual values. This is the variation we remove when we average them all. Now, we cannot calculate the differences, because we do not have the actual values. However, we do have a model for the differences between any two locations, provided we can calculate the distance between them (and, where relevant, the direction). That model for differences is the semi-variogram function. We can use the formula for the semi-variogram model to provide the 'expected' difference between the values at any two specified locations within the block. If we work out this distance, we can get the magnitude of the difference in value from (h). Matheron did some theoretical investigation and found the following startling result: the variance of values within an area or volume is the average of all of the semivariograms between all of the possible pairs of potential samples within that area (or volume). In theory, we take every single point within the area. We pair each one up with every point within the area (including itself). We calculate the semi-variogram value between them. We add up all of these terms and divide by the total number of terms to get an average semi-variogram value. This is written, quite simply, as:
Where A stands for 'the area or volume over which we wish to average the values'. In more familiar notation, you may recognize this as:
where T is now the average value over the area, not the single value at an unsampled location. Most geostatistics books provide tables and charts to help us calculate these values for idealised cases. In this book, we provide Tables 12 and
p | 105
13 (below) for the Spherical model, solely for this illustration. In practice, this type of calculation is handled by software. It is, perhaps, appropriate at this stage to discuss the limitations of calculating this and other average semi-variogram values at this point in the text.
We have stated that the (T,T) quantity is the average semi-variogram between every possible pair of points within the area (or volume). Of course, in theory, there are a countably infinite number of 'points' in any area. Computers are not terribly happy with infinite numbers of anything countable or not. So, in practice, we do not use all of the points, only a representative subset. This, in the nature of things, raises yet another question. How many points are enough to be representative of a whole area? Studies dating back to 1975 show that we can get within 1% of the correct average semi-variogram value with 64 points regardless of the size of the area. This may sound a little strange, but it works. If we use 100 points to represent an area, we can get better than 0.5% accuracy on the (T,T) terms. This process of representing the area (or volume) by a finite number of points has become known in the geostatistical world as 'discretisation'. The important criteria in discretisation are to keep all of your points inside the area (not on the edge) and to have the same number of points in all directions that is, have an 8 by 8 or 10 by 10 grid, even if that means the spacing is different in the two directions.
Iron Ore, Normal Example

Let us return to the Iron Ore - Page 95 data to illustrate the calculation of the terms. This quantity measures the variability of values within the area being averaged. It is usually called the 'within block variance'. The remaining variation after we have (T,T) averaged over the blocks is known as the 'between block variance'. The two are generally assumed to be additive. That is:within block variance + between block variance = total variance (2) There are many different notations for these quantities, which we will not bore you with here. We will use (T,T) throughout for the within block variance and T2 for the variance between the true block averages. The above equation then becomes:
p | 106
which means we can get the variance between the block averages by:
The Iron Ore - Page 95 data was found to have a Spherical semi-variogram, no nugget effect, range of influence 120 metres and a sill of 24 %Fe2. We will take this sill as the best estimate for our population variance, 2. This implies that the 'point' values in this area have a Normal distribution with mean 34.50 %Fe and a standard deviation of 4.90 %Fe. Let us assume a planning block of 50 metres on a side. The within block variation is calculated by evaluating the semi-variogram between all possible pairs of points within this 50 metre square. Fortunately for us, someone worked out the theoretical formula for this quantity only for rectangular blocks though. Table 12 (right) lists values for a function called F(l,b). This function is the average semi-variogram value within a rectangular block. Table 12 (right) is standardised in a similar way to Normal distribution tables. The range of influence and the sill of the Spherical semi-variogram model are standardised to 1, so that we only need one table to serve for all possible Spherical models. If you remember the Spherical formula:
you will realise that everywhere the distance (h) appears, it is divided by the range of influence (a). If we simply divide all of our distance by the range of influence, we standardise to a range of 1 'unit'. In the Reference Table (above) therefore,
If, as in our example, we have a 50 by 50 block with a range of influence, we need:
You may note that the F(L.B) table is symmetrical, so it doesn't matter which way round you read it. Note:
p | 107
If you have an anisotropic situation, you need to align the blocks with the axes of the anisotropy to use these tables. Otherwise, use a computer. Reading values from Table 12 (below), and interpolating to get F(0.4167,0.4167), gives us a value of 0.3173:
Now, this value is relative to a sill of 1, so we need to multiply it by the sill of our Spherical component in this case 24 %Fe2:
plus the nugget effect if it was non-zero. We have a within block variance of 7.62 %Fe2 giving us a theoretical standard deviation of values inside the block of 2.76 %Fe. The variance of block averages the between block variance is, theoretically:
Try
this
one:
Show that a 20 metre block would give a within block variance of 3.146 %Fe and a between block standard deviation of 4.57 %Fe. We have concluded that 'point' values original samples have a standard deviation of 4.90 %Fe whilst 50 metre square blocks have a standard deviation of 4.05 %Fe. The blocks and point samples will, of course, have the same average value. We cannot change the average value over a deposit by changing the size of the unit unfortunately. The two distributions will look like Figure 4. You may be wondering where the 'bias' comes in if both distributions have the same mean. Well, the bias is a conditional one. In mining, we are (hopefully) only interested in those areas which are economically mineable. We apply a 'cutoff' or 'pay limit' to the values and plan to mine those areas above that cutoff only. We
p | 108
mentioned earlier a cutoff value for this example of 30 %Fe. Looking at the two distributions, you should be able to see that more of the blocks will be above the cutoff than were found in the sample distribution. As a consequence, the average value actually mined will be lower than expected. In mining terms, we have higher costs for a lower revenue per ton. Of course, we do recover more metal. In the previous course, we saw how to calculate grade/tonnage quantities for a Normal model. We repeat those calculations here for: 1) mean 34.50 %Fe, standard deviation 4.90 %Fe 2) mean 34.50 %Fe, standard deviation 4.05 %Fe. The results are shown in Figure 5. From this graph we can see the 'conditional bias' quite clearly. For cutoff values below the mean, the tonnage is higher and the average grade lower. For cutoffs above the mean, the tonnage selected is lower than that suggested by point sampling. In addition, the average grade of the recovered material is also lower. This means that, for high cutoffs, we mine less tonnage and lower grade resulting in less metal overall. In many mines, this effect is quantified as a 'Mine Call Factor'. It should be fairly obvious that the curves will change with the size and shape of the proposed mining block. The larger the block, the more impact on the curves. The conditional bias is a fact of life and cannot be ignored with impunity when planning a future project.
Geevor Tin Mine, Lognormal(ish) Example

It should be apparent that the conditional bias or volume/variance effect will be present whatever the distribution of the sample values. In this section we will discuss briefly the added complications when the distribution is skewed. To illustrate the problems, we will use the Geevor data set which is not exactly lognormal but is certainly highly skewed. One of the major advantages of this data set is that large areas have already been mined out. This means that we can actually find the averages over blocks of different sizes and study the histograms and so on. We can also compare theory directly with what was found 'in the ground'. Figure 6 shows a post plot of the sample locations and 100 foot blocks laid over the study area. Averages were also calculated over 50 foot blocks or panels. The descriptive statistics from these three data sets are shown in the table.
p | 109
There are minor differences in the observed average value caused mostly by panels with few interior samples affecting the overall results. In spite of this, it can be seen clearly that the distributions are becoming much less variable and more nearly symmetrical as the panel size increases.
Since the values are highly skewed, it makes more sense to calculate and model the semi-variogram on the logarithms of the values. This approach is discussed in more detail in Part 6, The Lognormal Transformation. The experimental and model semi-variogram for the logarithm of all non-zero grade values in the Geevor data set is shown as Figure 9. The fitted model has a nugget effect of 0.55 (loge%SnO2)2 and two Spherical components. The ranges of influence are 20 and 150 feet with sills of 0.475 (loge%SnO2)2 and 0.425 (loge%SnO2)2 respectively. We treat the nugget effect as having a range of influence of zero. Using Table 12 (above) we find that the within panel variance for a 100 foot square panel, for each component is:
giving a total for the 'within panel variance' of 1.2205 (loge%SnO2)2 and a 'between block variance' of:
p | 110
If we assume that the distribution of the samples is lognormal and that the distribution of panel averages is lognormal, we can carry out grade/tonnage calculations to evaluate the likely 'conditional bias'. It should be noted that these lognormal samples do not guarantee lognormal panel averages. There is no theoretical reason why the average of a set of lognormal values should be lognormal. However, it does seem to work out in practice that, if samples are lognormal, panels tend to be lognormal too. Things get even more complicated if there is an additive constant! Using an assumption of lognormality for both samples and panels, we can repeat the calculations for grade and tonnage as described in Part 3 of the previous course, with: 1) average grade of 50 %SnO2 (general estimate of average), logarithmic variance 1.45 (loge%SnO2)2. This gives a logarithmic mean of 3.187 and a standard deviation of 1.2042 to be used in the calculations. 2) average grade of 50 %SnO2, logarithmic variance 0.2295 (loge%SnO2)2. Deriving logarithmic parameters gives 3.7973 and 0.4791 for mean and standard deviation respectively. To refresh your memories, the above figures are derived using the relationship:
where is the average of the lognormal distribution (grade) and 2 is the variance of the logarithms. One important thing to note is that, if you change the variance of the logarithms you have to change the mean of the logarithms in order to keep the mean grade constant. The two parameters are not independent of one another as they are in a Normal distribution. The amount by which you shift the lognormal mean is simply:
where 2 represents the logarithmic variance for 'point' samples, T2 the between panel variance for logarithmic values and (T,T) the variance of logarithmic values within the panel. To show that this shift in the logarithmic mean is not simply an artifact introduced by the mathematics, we took the Geevor Tin Mine data and calculated averages over 100 foot panels. As shown in Figure 6, we laid a grid of 100 foot squares over the whole sampled area. Within each of those squares, we average all of the samples both drive and stope which we could find. However, we did this two different ways:
arithmetic mean of the grades of all samples within the block (average grade);
p | 111
arithmetic mean of the logarithms of grade for all of the samples within the block (average logarithm) To compare these directly we then took the logarithm of the average grade for each block. We now have two figures for each block the average of the logarithms and the logarithm of the average. A scatterplot would show that these two figures are almost perfectly correlated but not the same. Perhaps a better illustration is to show you the histogram of the two sets of numbers. The block averages, when logged, have an average value of 3.7306 (logeSnO2). If we take logarithms of all of the original sampling and make block averages of the logarithms, these blocks then average 3.2112 (logeSnO2). This is exactly analogous to the standard lognormal relationship between the mean of the logarithms (the median) and the mean of the lognormal values (see Part 3 in the previous course) It is easy to see the difference in the impact of using the same cutoff value on each of these histograms. Figure 11 shows the comparison between the two theoretical grade/tonnage calculations. The discrepancy between the expected average grade and the recovered grades is alarming. Even the vast increase in tonnage will not compensate for the major differences between expected costs and expected revenues. It is this 'conditional bias' that we discussed in Part 5 of the previous course under the heading of 'regression effect'. It is this same factor that leads to falling Mine Call Factors when cutoffs are raised without regard to regression corrections. In general, the more skewed the distribution and the higher the nugget effect on the semi-variogram model, the worse the conditional bias will appear. It is imperative that selection calculations are carried out with the size and shape of the selection unit as an integral part of the process.
The Impact on Kriging

The Impact on Kriging
The other area in which the size and shape of the area or volume under study has a great impact is in the estimation process itself. If we want to estimate the average value over an area, we have two choices: 1) estimate a grid of points over the area, superimpose our selected shape and then average all of the points inside the area.
p | 112
2) directly estimate the average over the area by modifying the kriging equations. There are two major advantages to the second approach: (a) you only have to solve one set of kriging equations and (b) the standard error given by the kriging system refers to that average value. 'Direct kriging' of areas and volumes leads to confidence levels on the final estimates which cannot be found from a 'grid intersection' approach. In mining applications, particularly in recent years, it has become more important to classify resources and reserves into categories such as measured and indicated. This can only be done if confidence levels are available on a stope by stope or block by block basis. Let us consider the kriging system developed in Part 3 and see how it would change with the direct estimation of an area or volume. The ordinary kriging equations were:
where the proposed estimator was
and the estimation variance was
Now, suppose that is the average over an area, how will these terms change? For the 'right hand side' of our equations and for the estimation variance we require the semi-variogram between each sample and the 'location' at which the value is unknown. The location is now an area or volume. Let us consider one sample with the panel to be estimated. It turns out that the semi-variogram between a point and an area is simply the average of all the semi-variogram values between the single point and every point in the area. As before, we simply pair up the sample with each point in the panel in turn, add up all the terms and divide by however many we have. Conceptually, the generalisation to estimation of an area or volume is a piece of cake. Computationally, of course, life is a little more complicated. Once again we turn to the 'discretisation' of the area. To get the relationship between a sample and the area to be estimated, we choose (say) 100 points to represent our panel and
p | 113
calculate 100 semi-variograms between these points and our sample location. These 100 semi-variogram values are averaged to provide the term:
to go on the right hand side of the equations and to substitute into the estimation variance. This latter will end up being:
Notice, particularly the (T,T) term. This was a little derisory in Part 3 but comes into its own with a vengeance at this point. The larger the block being estimated, the larger this term will be. This term is subtracted from the estimation variance. What this means in plain words is that the estimation variance for an area will be lower than that for (say) the central point in that area. In (British) English, this says "You can estimate the average over an area with far more confidence than you can say 'sample here and you will get T* '." Note: Once you get the hang of this, you can see that you can handle large samples (or clusters of samples) in a similar way. Instead of calculating (gi,gj) if your samples are more than one 'point', you simply calculate (T,T)(gi,gj) and read this term as 'the average semi-variogram between all the points in sample i and all the points in sample j'. Your kriging equations will look almost identical except that every will be a .
The Use of Auxiliary Functions

In Practical Geostatistics 1979 (Clark (1979)) we spent a lot of time on examples of how to estimate areas using tables of 'auxiliary functions'. These are similar to the F(L,B) we discussed in previous sections. They help us to calculate the required average semi-variograms for certain idealised situations. Even with access to modern, fast computing power estimation can be accelerated by the use of appropriate auxiliary functions. For manual exercises, these standardised average semi-variograms can help to 'reinforce' the principles and illustrate how the kriging equations optimise the estimation and include the size and shape of the area or volume to be estimated. We have included tables for the Spherical model for assistance in desktop exercises. Tables for other semi-variogram models can be found in other text books and journals. Or you can generate your own in spreadsheet or software applications. The tables included here cover the commonly used auxiliary functions:
p | 114
The auxiliary function H(l,b) is very versatile in that it can be used for two apparently different situations. The standard function is the average semivariogram between a point on the corner of a panel and the whole rectangular panel. However, the same mathematical expression is found when deriving the average semi-variogram between two 'line' segments at right angles to one another. In a mining context this might be a development drive and a raise, or two drives in (for example) a room and pillar operation. The two auxiliary functions (l;b) and (l;b) differ from the others in that the order of the arguments does matter. (l;b) does not take the same value as (b;l) if l and b have different values. (l;b) is the average semi-variogram value between one side of a rectangular panel and the whole panel. This function is useful for the relationship between (say) a drive or raise and a stoping panel. The (l;b) is the average semi-variogram between two 'lines' a specified distance apart. The lines drives, raises, borehole cores must be the same length and start and stop at the same levels. As a very brief example on the use of auxiliary functions, we select a single 50 metre square panel with the Iron Ore - Page95 study area. This block contains two internal samples, which we will use to estimate the block average.
p | 115
For the kriging equations, we will need the average semi-variogram between each sample and the block. The auxiliary function H(l,b) can be found using Table 15 (below) since we have a Spherical model. The standard function gives the average semi-variogram for a point on the corner of the block. We need to split the panel up into four sections and recombine the auxiliary functions to get the overall average. That is, the average semi-variogram between the point sample and the whole panel will be the average semi-variogram for each of the four component panels recombined in proportion.
The semi-variogram model for the Page95 data is Spherical, range 120 metres, sill 24 %Fe2. Taking the northern-most sample first, we have four panels:
The overall H(L,B) function will be:
which works out at 0.2479. Multiplying this by the sill of 24 %Fe2, we get:
Similarly, the second sample will give us:
p | 116
The overall H(L,B) function will be:
which works out at 0.3023. Multiplying this by the sill of 24 %Fe2, we get:
The distance between the two samples is 11.2 metres, so that:
Setting up the kriging equations:
That is:
solving the equations gives:
The estimated value is:
with a kriging variance of:
p | 117
producing a kriging standard error of 1.91 %Fe and a 90% confidence interval on the true block average of 39.88 to 46.18 %Fe which is fairly wide. However, given that we only have two samples in a total of 2,500 m2, maybe this is not surprising.
Iron Ore Example, Page 95

To return very briefly to the full Iron Ore - Page95 data set, we present here estimates for 50 metre panels across the whole study area. Compare the estimated blocks in Figure 9 with the kriged map in Part 4. The standard errors for these panels are shown in Figure 10. Notice how much lower these standard errors are than the individual 'point' standard errors in Part 4.
Wolfcamp Aquifer, Quadratic Residuals

The principles described in this part of the course are conceptually simple to generalise to irregularly shaped areas. We present (without adornment) an example here of a polygonal shape superimposed on a portion of the Wolfcamp aquifer study area. This outline is found in the software package as boundary file county.bln. The polygonal area is covered by a grid of points chosen so that at least 100 of the 'nodes' fall inside the polygon. All semi-variogram terms for the kriging equations are found by averaging over these internal points. A single set of kriging equations is set up and solved. The results are a single estimate for the average over the polygon and a standard error which applies to that estimate. In this example, bear in mind that the original Wolfcamp data had a strong trend. We removed the quadratic trend and constructed a semi-variogram on the residuals. The estimate over this polygon is the residual from the trend over that area. The estimated value is +62.8, implying that this area is expected to have a value above the general trend. The kriging standard error is 46.9 feet. A 90% confidence interval on the 'true' quadratic residual over this polygonal area would, therefore be between 62.8 46.9x1.6449 that is, from 14.4 to 139.9 feet.
p | 118
Notice how much narrower this confidence interval is than any equivalent interval for the estimate of a value at a single 'point' location.
p | 119
p | 120
Part 5 Other Kriging Approaches
p | 121
p | 122
Universal Kriging
Other Kriging
In this part of the course, we will discuss, very briefly, some of the more common and longest established variations on ordinary and simple kriging. In a text book of this level, we do not feel it appropriate to consider the more sophisticated methods or those which are new and unproven, which are unstable in practice or which are more suitable for advanced post-graduate studies. You will find that, in the geostatistical world, there are many and varied approaches to kriging and its extensions. Many of these are referred to by initials or acronyms. There is actually a bumper sticker which reads: Geostatisticians do it OK and Stanford University brought one out shortly after the introduction of 'indicator kriging' which read: I'd rather be IKing with a rather fetching picture of a duck in hiking gear. Our own modest contribution to this proliferation was 'Multivariable Universal Co-Kriging' (work it out). If there is enough demand amongst you, the readership, we will include this in a second volume along with such useful techniques as geostatistical simulation and maybe even non-linear geostatistics. For the moment we will discuss the following:

Universal kriging, to deal with polynomial type trend applications Lognormal kriging, to deal with (some) positively skewed situations Indicator kriging, for various reasons Rank uniform kriging, for intractable data sets
illustrated copiously with the data sets which we have studied throughout the rest of the book.
p | 123
Universal Kriging
When a trend is present in the sample values, we cannot guarantee that an Ordinary Kriging estimation method will produce unbiassed estimates. 'Ordinary kriging' is based on the assumption of no trend in the data and includes the restriction that the weights must sum to 1 to ensure unbiassed results. For data with a trend, we have to ensure that the trend components balance also. We will give the general form here and illustrate the process using the Wolfcamp aquifer. Suppose we have a set of sample data from an area where there is known to be a strong trend of the polynomial type (see Part 5 of the previous course). If we use {X,Y} to denote the geographical co-ordinates, we might express the sample values as:
if the trend was of a linear form, or:
if the trend was of a quadratic form and so on. The z,i value here represents the 'detrended' residual value. Obviously, the complexity of the trend equation would be dictated by the complexity of the way the 'expected' values change over the study area. Now, let us consider our weighted average estimator:
where
If we put this in context with the fact that each gi contains a trend component, our estimator can be expressed as follows:
if we had a simple linear type trend. Remember that the bj are the coefficients of the trend equation that is, simply numbers determined by the form of the overall trend. This means that they are constants and can be taken outside the summation sign:
Remembering also that the weights add up to 1, this reduces further to:
p | 124
Now, the quantity we are trying to estimate is:
Compare T with T*. Both contain the constant term b0, so that is catered for. If we have removed all of the trend, the term
will be an unbiassed estimator of zT provided the weights add up to 1. The only real problem is the main trend terms. We need to ensure that these balance out, in order to ensure that T* is still an unbiassed estimator for T. That is, we have to match:
The simplest way would seem to be to match each of the terms in turn. That is:
would ensure that the trend in the estimator exactly matched the trend component in the unsampled value. Notice that we do not, at the estimation stage, need to know what value any of the b coefficients has. Notice also, that if our trend has more terms in it, we just add more conditions to this list. The quadratic trend solution would need 5 extra equations. To summarise, we need to minimise the estimation variance subject to three conditions:
When we had one condition on our estimation variance, we introduced a lagrangian multiplier to get the extra equation and an extra unknown into the system. We simply generalise that approach here. Our original now becomes 0 and we introduce two new s to cope with the two new conditions. If we had more terms in the trend, we would have more s. We minimise:
p | 125
by all of the weights and each of the s in turn. This will give us (m+3) equations for a linear trend (in two dimensions), (m + 6) for a two dimensional quadratic, (m+10) for a three dimensional quadratic and so on. If the trend gets too complex, we will add so many conditions to the estimator there will be no room to actually produce a local estimate! It should be also borne in mind that, the more conditions we put upon our estimate, the more sub-optimal the result will be. To return to our minimisation. Before you (dear reader) start getting nervous about complicated calculus, let us just point out that this is identical to the ordinary kriging case except for the extra conditions. When we differentiate each of the extra terms, all of the bits will disappear except the ones directly related to that particular weight. The new kriging equations, will look remarkably similar to the old ones. Bear in mind, we are currently assuming a linear trend.
In fact, ordinary kriging is actually a special case of this with a constant unchanging trend. Matheron called this 'universal kriging' in his original Theory of Regionalised Variables. The semi-variogram model in the kriging equations is, of course, the one derived from the residuals from the trend. There are some major differences between this approach and that of removing the trend surface and carrying out geostatistics on the residuals. Universal kriging estimates the whole of our measurement in one go that is, it is making a local estimate of trend and residual at the same time. Our estimator is our usual one, which is a weighted average of the original sample values not the detrended residuals. This means that we are 'fitting' the trend each time we do the kriging on the scale of the local kriging exercise not fitting a global surface to the whole data set. The implication is that we can generally use a lower order surface, because we are only estimating the trend at a specific location each time. The other major difference is that the universal kriging standard error includes the error on estimating the trend component in addition to that for the residual. That is, our confidence intervals will acknowledge that we need to estimate the trend at that location as well as the detrended residual. Kriging the residuals and adding back in a global trend equation implicitly assumes that there was no error in getting the original trend expression. In short, universal kriging is only marginally more difficult than ordinary kriging and it saves you all the algebra of adding the trend equation back on in addition
p | 126
to giving more realistic confidence intervals. The shortened expression for the universal kriging variance (assuming linear trend) is:
Wolfcamp Aquifer
We illustrate universal kriging and, in particular, the selection of the relevant order of trend component using the Wolfcamp aquifer data. We have seen that this data has a very strong trend, best characterised by a quadratic equation. We developed the semi-variogram by subtracting the trend surface and calculating the experimental graph on the residuals from this surface. The model was a Spherical one, with a nugget effect of 12,000 ft2, a range of influence of 60 miles and a sill on the Spherical component of 23,000 ft2. A cross validation exercise was carried out using universal kriging with a quadratic trend component. One of the things which becomes immediately apparent when you start using universal kriging is that you need to increase the search radius (unless you have very dense data) to get enough equations to satisfy all of the conditions. With a quadratic surface, we have 6 extra equations, so we need enough samples within the search radius to evaluate that number of weights plus an extra six lagrangian multipliers. For these exercises, we increased the search radius to 75 miles in all cases. In the cross validation exercise, we obtained an average cross validation statistic of 0.0099 and a standard deviation of 1.1119. The average is acceptable but some practitioners would find the standard deviation a little high. We must also remember that the standard deviation is produced from a variance divided by (n1) when it would certainly be more appropriate to divide by (n-6). That would make it closer to 1.15 than 1.11. Leaving this issue aside for the moment, we krige a 5 mile grid of points across the Wolfcamp area using the increased search radius of 75 miles. We can see from the map in Figure 1 that we rapidly run out of enough samples to keep the equations stable and can produce no estimates for values in the 'corners' of this map. Trying to fit a quadratic equation to a small number of samples and produce an optimal weighted average is just too much for such a small data set. It would be advantageous if we could drop the order of the trend needed to krige the grid points at the scale of an area with a 75 mile radius. Remember that the full quadratic surface was fitted to an area 250 by 180 miles. Surely a linear surface would be sufficient on a smaller scale. We use cross validation to help us make this decision. A cross validation exercise with the same semip | 127
variogram model and linear trend constraints was carried out. In this case we only solve (m+3) equations each time. The average and standard deviation (with n-1) of the cross validation statistics were found to be 0.0230 and 1.1137 respectively. This is almost identical to the statistics obtained when using a local quadratic fit and use fewer extra equations. A new kriged map was produced and is shown in Figure 2. It is immediately apparent that this map is more stable and (probably) more realistic especially around the edges. The trend component ensures that, if values are rising towards the edges of the map, they keep rising and do not level off as they would with ordinary kriging or inverse distance interpolation. The standard errors for this map are shown in Figure 3. These standard errors contain the reliability measure for the local trend fitting as well as that for the estimation of the 'stationary' residual component. Universal kriging can also be used to estimate values over areas and volumes as discussed in Part 5. The major factor to be remembered here is that we have to construct an average for the trend over the desired area (or volume) as well as the averages for the semi-variogram values. The trend terms XT and YT have to be replaced by the average of all of the co-ordinates within the area to be estimated. With linear trends such as this, the requisite figures are simply the centre of gravity of the area . With higher order trends, the averages for each term must be calculated over the area to be estimated. This is another good reason to keep the order of the trend surface down.
Lognormal Kriging
Lognormal Kriging
Another problem arises when samples are expected to follow a log-normal or other positively skewed distribution. Whilst the construction of the experimental semi-variogram and the estimation procedures produced by geostatistics do not depend on what distribution the samples follow, there are one or two 'sideeffects' which become apparent when dealing with positively skewed samples. It was noted in Part 3 of the previous course, that the mean and standard deviation of the lognormal distribution are not independent of one another. The consequence of this is that, when we take different sets of samples from the same lognormal distribution, the standard deviation of the samples is directly proportional to their mean. Consequently the sample variance and hence the sill of any semi-variogram calculated on these samples is proportional to the square of the mean of the samples.
p | 128
The graphs below illustrate the proportional effect using samples contained within 100 by 100 foot blocks over the Geevor data set. If we take the untransformed, lognormal(ish) values, we find that the standard deviation of samples within each block is almost perfectly correlated with the average of the samples within that block. If, on the other hand, we consider the logarithms of the samples, we find no correlation between the mean and standard deviation of samples in the same block.
If experimental semi-variograms are constructed on different sets of samples within a deposit, this 'proportional effect' can have a radical effect on the individual experimental semi-variograms. Examples of proportional effect in the geostatistical literature usually concern cases where, in order to construct experimental semi-variograms in different directions, it has been necessary to use different sets of data in each. This produces directional semi-variograms which can differ by orders of magnitude in total sill value. In many textbooks and other publications, the differing heights of the semi-variograms have been taken as evidence of anisotropy in the spatial variation. This type of anisotropy, where sills are different, has been dubbed 'zonal anisotropy'. There are various methods recommended to remove the proportional effect. One of the most popular, in the past, has been to calculate a 'relative semi-variogram':
where h represents the average of the samples which were actually used in the calculation at a specific 'lag' distance in the semi-variogram. There are variations on this, including one where each difference is divided by the average of the individual pair of samples. The resulting semi-variogram is then 'relative to the mean (squared)'. In the early 1990s, Cressie showed that the relative semivariogram is generally analogous to the semi-variogram calculated on the logarithm of the sample values. Since this is computationally simpler, many practitioners now use logarithmic semi-variogram calculations rather than relative ones. However, it should be noted that relative semi-variograms are still widely used in some regions particularly in the USA and Canada. Examples of the relative semi-variogram approach can be found in Michel David's definitive text book.
p | 129
The Lognormal Transformation

If our data is positively skewed whether or not it is lognormal we will probably find a much more sensible picture if we take logarithms of the sample values. By at least approximately Normalising the data, the calculation of 'semivariance' becomes a lot more sensible and more stable. Several of the data sets provided with this book are positively skewed and produce very nice semivariograms when logarithms are taken. The experimental semi-variogram is calculated on the logarithms of the sample values (adjusted by an additive constant if necessary). The model can be fitted to this graph and cross validation carried out. Kriging ordinary, simple or universal can be carried out on the logarithms, as usual. The only potential problem is in the 'backtransformation' of the logarithmic estimates to the original sample value scale. Even when we are certain that the value distribution is lognormal, there appears to be some disagreement in the general geostatistical literature as to how this backtransform should be carried out. According to Cressie (book and personal communications), the correct theoretical backtransform is:
in the absence of trend. If trend is present, the usual trend terms replace the single lagrangian multiplier in the above formula. We use our consistent notation here: T* is the kriging estimator k2 is the estimation variance produced by the ordinary or universal kriging system is the lagrangian multiplier produced as part of the solution to the ordinary kriging equations, or
in the case of universal kriging. (T,T) is the within block variance discussed in Part 5 and included throughout Parts 3, 4 and 5 in the estimation variance. If we have used simple kriging rather than ordinary or universal kriging, then the formula can be expressed as:
p | 130
in which the logarithmic variance of the actual values is:
where 2 represents the total sill or 'point' variance. The variance of the estimators is:
that is, the total variance less the variance of values amongst the samples in the weighted average estimator. In either case, all of the required terms are produced in setting up and solving the relevant kriging equations. No extra computation is required other than the actual backtransformation. Confidence levels on the estimate can be used be referring to Sichel's theory (see Part 3 of the previous course), where the confidence factors are produced using the theoretical formula:
where NF is the percentage point read from Reference Table 2 (below).
Geevor Tin Mine, Grades

The tin values at Geevor are highly skewed, although not exactly lognormal. Logarithms of the tin grades were calculated with no additive constant. The experimental semi-variogram is shown in Figure 12.5, together with a Spherical type model. This model has two components, in addition to a substantial nugget effect at almost 40% of the total sill. The two ranges of influence are 20 and 150 feet respectively. A cross validation exercise was carried out on the 5,399 available samples. The mean and standard deviation of the standardised error statistics were -0.0023 and 1.0183
p | 131
respectively which is very respectable. However, the acid test is in whether we can reproduce the actual grades, rather than the logarithms. During the cross validation process, each estimate was backtransformed so that we can also compare the actual un-logged value with the final estimate. Standardising the error on the backtransform makes no sense, since it is obviously not Normal. We simply compare the overall statistics:
It looks like we can adequately reproduce the average tin grade, but we are still plagued somewhat by the regression effect. The standard deviation of the estimates is still much smaller than the true 'point' standard deviation, so we will tend to under-estimate high values and over-estimate low values if we use lognormal kriging as a mapping tool. There is an empirical method, sometimes referred to as the 'lognormal shortcut', in which we could use the logarithmic semi-variogram to krige untransformed (raw) grades. This was first formally proposed in a paper by Thurston and Armstrong at the 1987 APCOM (20th). The logic is that, if we have a range of influence of 150 feet on the logarithms, presumably we have the same range of influence on the 'raw' grades. This (apparently) works well with 'moderately skewed data', but we are given no guidance on what constitutes moderation. We selected the central portion of a single level within the tin vein for illustration. This has been extensively sampled, so the results are not indicative of sparse sampling variation. In Figure 4 you will see the same area estimated by ordinary kriging and by lognormal kriging. This picture is patched together from the two screen dumps of (upper) estimated by ordinary kriging and (lower) estimated by lognormal kriging. The logarithmic semi-variogram model was used for both and Figure 5 shows the associated logarithmic kriging standard errors and the sample information used in the kriging. Even with this closely controlled exercise, you can see the 'smearing' of high values which is the result of ordinary kriging with highly skewed data.
SA Gold Mine
SA Gold Mine is an extensive set of simulated samples based on a South African, Wits type, gold mine that has been used to illustrate several papers by Clark in recent years. We present a very brief discussion here to illustrate the difference between using the average of peripheral samples as against a full lognormal kriging exercise.
p | 132
The 'samples' data set is shown in the post plot in Figure 6. This area covers 3,200 metres east/west and 2,300 metres north/south. The reef is very narrow and quite shallow dipping and can be thought of as a thin sheet. The samples are chipped from faces exposed by tunnels driven to access the ore and from stopes mined during production. For those of you unfamiliar with South African gold mining, this is a very small proportion of the mine and of the available sampling information. Databases in RSA gold mines can run to millions of samples. This area was divided into 100 metre square blocks of ground. The samples in the 5 metres along one side of the block were classified as the 'face' sampling. Standard estimation procedure in many RSA gold mines is still to use the average of available face samples to estimate the content of the block. The graph below shows a comparison between the face value and the block value over this study area. The face averages can be a combination of up to 30 samples. The 'actual' panel value is taken to be the average of all of the sampling information available with the 100 metre square which can be up to 250 samples. For the kriging exercise, only face samples were used. However, the software was allowed to access up to 80 face samples in the area and to use lognormal kriging since the distribution is known to be two parameter lognormal (see Part 3 of the previous course). The comparison between actual block values and the lognormal kriging results is shown above. It is clear that lognormal kriging produces estimates in which we can have more confidence than the face average.
Indicator and Rank Uniform Kriging

Indicator Kriging
In 1982, at the 17th APCOM symposium in Colorado, Andre Journel presented the earliest concepts of "the indicator approach to estimation of spatial distribution". Many variations have grown out of the early relatively simple ideas. We will only lay out the basics here, as we feel the more sophisticated uses of the indicator approach such as multi-indicator kriging, probability kriging and soft kriging are inappropriate for an introductory textbook. The concept of indicator transforms is one of the simplest and (possibly) most elegant in modern geostatistics. It can be generalised into many applications which will not be discussed here. The idea is this: we specify some selection criterion usually a discriminator value in which we are interested. This value should not be confused with an economic cutoff or some critical level of toxicity. This would probably be something which affects the depositional mechanism of the variable we are measuring.
p | 133
Perhaps this is easier to explain around an example. Let us consider the Velvetleaf weed data set. When we looked at the probability plot of the weed counts (repeated as Figure 1), we found (a) that it was highly skewed and (b) that it appeared to have more than one component distribution associated with it. There seemed to be at least two separate distributions making up the weed probability plot split at around 40 per quadrat. Above this count, the line breaks slope and becomes more shallow (lower standard deviation), so that it does not reach the high values we would expect from a single lognormal population. We are dealing with a set of samples from a population which seems to consist of at least three components: 1. quadrats with no weeds, 2. quadrats with up to 40 or so weeds, and 3. quadrats with weed counts of 40 or higher. Of course, this is still a simplification since, if there are two populations of weed count they will overlap to some extent. Quadrats with values around 40 could come from either component. The indicator approach is as follows. Select your discriminator value say, 40. All samples with measurements above this value are coded as '1'. All samples below are coded as '0'. We now have a new set of sample values which are all zero or one. No extreme highs, no skewness, no complicated shapes just 0,1. This new measurement is an 'indicator' yes/no, presence/absence, etc. Note: Some practitioners including Journel code the indicator the other way round. That is, '1' for below and '0' for above. It makes no difference to the mathematics or to the semi-variograms. It does make a huge difference in your interpretation of the results, though. So be sure, when you are looking at indicators, which way your software goes. Ours does '0' for below and '1' for above. We recoded the velvetleaf data with an indicator transform at 40. Actually, we tried lots and found that 40 gave us the best looking semi-variograms. The experimental semi-variogram and fitted model are shown in Figure 2. Each pair of samples that goes into the experimental semi-variogram will be:
{0,0} if both samples are below the discriminator value, giving a difference of zero
p | 134
{1,1} if both samples are above the 'cutoff', giving a difference of zero {0,1} or {1,0} if one sample is above and the other is below. The calculated graph will be the average of these differences and represents the predictability of being above or below the 'cutoff' of 40 velvetleaf weeds per quadrat that is, how well we can predict whether we are in the main (variable) population or the 'high value' less variable areas. It is probably worth saying that the sill on an indicator semi-variogram is not really a measure of variability in any absolute terms. It is, rather, a reflection of the proportional split between the above and below populations. The highest indicator semi-variogram will (theoretically) be the one constructed on the median value where 50% of the samples become '0' and 50% become '1' and we have, therefore, the maximum number of differences. The real assessment of how predictable the indicators are is the ratio of sill to nugget effect and the length of the range of influence. In this example, the nugget effect is 30% of the total height of the semi-variogram model, suggesting that there will be some uncertainty at short distances about whether we stay in the same population or not. Given that uncertainty, we can then have some confidence in continuity up to around 55 metres. Figure 3 shows the kriged map of the indicator values. This map can be variously interpreted. The most common way is to read the predicted values as the "probability that the unknown value is above the cutoff value". With this interpretation in mind, we can look at Figure 3 and see that there is a zone in the south-east which is almost guaranteed to hold more than 40 velvetleaf weeds per quadrat. This area should probably be studied in detail or, perhaps, just sprayed rather more heavily than the rest of the field! From Figure 3, we can judge where the 'above 40' population is predominant. In the original analysis, we also saw a break between the quadrats with and without observable velvetleaf weeds. We repeated the above procedure for a 'discriminator value' of 1 weed per quadrat. Figure 4 shows the semi-variogram and Figure 5 the kriged map for what is, essentially, the presence of velvetleaf weeds. We see the same sort of continuity at this cutoff, with a nugget effect of 30% of the total sill height. The continuity is better, however, at 65 metres. One other complication shows up on the experimental semi-variogram. There seems to be much better continuity in the north/south direction than in any other direction considered. For the purposes of this brief illustration, we ignored this phenomenon. However, if you look at Figure
p | 135
5, you will see that there is definitely some structure in this area that affects the growth of the weeds. The eastern half of this field is almost totally infested with weeds the probability of weed presence is over 0.85 for almost all of the 'right hand side' of the map. On the western side, the likely incidence of weed is much lower with an almost total absence in the central western area. It is also now quite obvious that there is some control on the weed population which runs north/south within the field. Further study in this area would be aimed at identifying the controls on this north/south structure and on the difference between the western and eastern halves of the field. Note: Indicator coding can also be extremely useful when dealing with nonquantitative or categorical variables. For example, if you wanted to know whether land usage affected a particular variable, you could code a specific land use as '1' and all of the others as '0' and see what sort of semi-variogram you get. In geological applications, one can use indicators for mineralisation types, host rock types and so on.
Rank Uniform Kriging

If all else fails, we can always try the 'non parametric' or distribution free approach. In Part 5 of the previous course, we saw that the Scallops data had a rather complex distribution and could probably be treated better with an approach which did not assume a model for the distribution. We transformed the data values into a uniform distribution by 'ranking' them. To refresh our memories, we rank the data by re-ordering them according to value from lowest to highest. The first sample value is replaced by '1', the second lowest by '2' and so on. The highest value sample becomes 'n' and the next highest is 'n-1'. This transforms any shape of distribution into a uniform distribution where all possibilities are equally likely. In our software, we standardise the range of the values to a percentage by multiplying the actual rank by 100/(n+1). This is purely so that we do not have to remember how many samples we started with at all times. The resulting figures can be thought of as 'percentiles'. Remember, if you are ranking your samples, to keep the co-ordinates with the relevant samples! Ranking the 'total scallops caught', we find that there is a strong trend in the values. A standard trend surface analysis (see Part 5 of the previous course) produces an Analysis of Variance table which is shown below.
p | 136
It is fairly obvious from this table that the trend in the values is very strong and, perhaps, more complex than the cubic equation reflects. However, since the software only goes up to cubic, let us go with the flow for this illustration. Figure 8 shows a shaded plot of the cubic trend surface equation and the sample values in contrast. As usual some take values higher than the general trend and some lower. Subtracting the cubic equation from each sample value gives us a set of measurements which (we hope) are now 'detrended'. We construct an experimental semivariogram graph on the detrended values. If directional semi-variograms are plotted, we can see that the cubic trend has removed any apparent 'anisotropy' in the data values. The omni-directional semi-variogram is plotted and we fit a model to it. This is shown as Figure 9. The model fitted to the residuals from the cubic trend on rank values is a single Spherical component with a range of influence of 0.275 (degrees, latitude and longitude) and a sill of 425 (percentiles)2. Added to this is a nugget effect of 75 (percentiles)2. What this model is telling us is that we can predict, with a fair amount of certainty, the rank (percentile) of an unsampled location up to 0.275 degrees away. This argues a fair amount of continuity in rank, since this is the result after we have already removed a large amount of continuity in the form of trend. A cross validation exercise using this model, a cubic trend and the rank percentiles gives an average and standard deviation for the error statistics of 0.0074 and 0.9552 respectively. The low standard deviation suggests that we may be being rather pessimistic in the size of the nugget effect. Cross validation tends to test the very short distance part of the semi-variogram model rather than the longer distance points on the graph. For the sake of this illustration, we will proceed with this model, which gives a modified Cressie statistic of 0.0198. Since we have a strong trend of cubic form, we also checked the cross validation using a quadratic trend and obtained a mean and standard deviation on the error statistic of -0.0196 and 0.9907 respectively. Somewhat encouraged by this, we dropped the local trend in the kriging to a linear expression. This gave error statistics with average and standard deviation equal to 0.0117 and 1.034 respectively. For comparison, we used universal kriging with two sets of conditions:
p | 137
1. cubic trend and a search radius of 0.6, much larger than the range of influence of 0.275; 2. linear trend and a search radius of 0.275. The results are shown as Figures 10 and 11. The visual difference is impressive. It is obvious that the lower order trend even with a shorter search radius is far more stable than trying to fit a higher order cubic surface locally over each grid point. The standard errors for the two maps are virtually identical. Use the accompanying software to verify this. Figure 12 shows the final standard errors for kriging using the rank uniform (percentile) transformation, the Spherical semi-variogram model for the residuals and the linear trend locally.
Summary of Part 6
We would not like you to think that these are anywhere near a comprehensive set of variations on the basic kriging techniques. Over the four decades since Matheron presented his Theory of Regionalised Variables (and before) there have been many adaptations and extensions to the techniques of simple, ordinary and universal kriging. In this introductory textbook we can only discuss a very small subset of the methods which are available under the general heading of 'geostatistics'. If you are interested in more complex, sophisticated or just modern types of kriging and similar techniques, we urge you to read the conference proceedings, the journals and the other books in this field (see Bibliography). If you have suggestions to improve this course, feel free to e-mail us at courses@kriging.com or plug into the Web and post your comments at www.kriging.com. Positive comments are more likely to elicit a polite response.
p | 138

Practical Geostatistics 2000-2 Spatial Statistics

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Practical Geostatistics 2000-2 Spatial Statistics

Загружено:

Авторское право:

Доступные форматы

Practical Geostatistics 2000-2

Inverse Distance Estimation ................................................................................................................ 13

Worked Examples .................................................................................................................................... 19

Part 2 The Semi-Variogram ............................................................................... 25

Modelling the Semi-Variogram Function ........................................................................................ 35

Worked Examples .................................................................................................................................... 42

Part 3 Estimation and Kriging .......................................................................... 53

Estimation Error ....................................................................................................................................... 58

Choosing the Optimal Weights ............................................................................................................ 73

Ordinary Kriging ....................................................................................................................................... 82

Worked Examples .................................................................................................................................... 94

Part 4 Areas and Volumes ................................................................................101

The Impact on Kriging ..........................................................................................................................112

Universal Kriging ....................................................................................................................................123

Lognormal Kriging .................................................................................................................................128

Indicator and Rank Uniform Kriging...............................................................................................133

Part 1 The Spatial Aspect

the above expression becomes:

Inverse Distance Estimation

so that the general expression above becomes:

or, with a very little jiggery pokery:

but we know from the previous course that:

Coal Project Iron Ore Wolfcamp Scallops

Coal Project, Calorific Values

so that our estimator for the unsampled location would be:

Using seven samples, we would obtain:

Iron Ore Project

Part 2 The Semi-Variogram

The Experimental Semi-Variogram

how do we estimate the covariance between a single pair of samples?, and

Now, our weights add up to 1, so we could rewrite this as:

which we could also write as:

If we rephrase this in words, the logic goes something like this:

The Experimental Semi-Variogram

Modelling the Semi-Variogram Function

The Linear Model

The Generalised Linear Model

The Spherical Model

The Exponential Model

The Gaussian Model

The Hole Effect Model

where the formula for this model would be:

Paddington Mix Model

Other applications for this type of model include:

Judging How Well the Model Fits the Data

Equivalence to Covariance Function

The Nugget Effect

Silver Example Coal Project Wolfcamp

where we assumed that diff = 0. An exactly equivalent formula would be:

Coal Project, Calorific Values

Now, our generalised linear model is:

then a log transform of both sides would give:

Solving for and loge p we find:

Part 3 Estimation and Kriging

One Sample Estimation

so that the standard deviation is

Another Single Sample

and the standard error becomes:

Two Sample Estimation

Looking at the expression carefully, we can see the terms:

which look almost like

which now involves:

Grouping like terms with like terms gives: