Академический Документы
Профессиональный Документы
Культура Документы
Most of you are familiar with topographic contour maps. Those squiggly lines represent locations on the map of equal elevation. Many of you have probably seen a similar mode of presentation for scientic data, the iso-ness of those lines are comparable. What many of you are probably not familiar with are the mathematics that lie behind the creation of those maps and their uses.
104
products (gradients, etc.) and is compute intensive. Furthermore, if you were to sample at different locations you would get a different contour map.
Figure 5.1: An example of what a triangularization grid looks like. Choosing the optimal way to draw the connecting lines is a form of the Delaunay triangularization problem.
Better then to put your data onto a regular, rectangular grid. A regular grid is easier for the computer to use, but is more difcult for the user to generate. But the benets to be gain for this extra trouble are large.
105
Equations 5.1 and 5.2 are the bare bones of nearest neighbor formulation. Sometimes this method is augmented by the N-nearest neighbors (see Fig 5.2):
(5.3)
This method of generating grids is of particular use for lling in gaps in data already on a regular grid or very nearly so. Bilinear Interpolation: is a method that is frequently referred to as a good enough for government work method. The value at the grid point is an interpolation product ( ) from the following formulas for a 2-dimensional case:
(5.7) (5.8)
where are the actual data points surrounding the grid point (sometimes called node). But this method is best used for interpolating between data already on a grid. This method can be augmented and there are the logical extensions such as bicubic interpolation which yields higher order accuracy, but suffers from over- and under-shooting the target more frequently. Inverse distance: is actually a class of methods that weight the data points contribution to the grid point by the inverse of the distance between the grid point and data point (sometimes this weight is raised to a power, 2, 3, or even higher if there is a reason). This is basically Eqn 5.3 with the raised to the power mentioned. This method is fast, but has a tendency to generate bulls-eyes around the actual data points. Kriging: is a method to determined the best linear unbiased estimate of the grid points. We will discuss this in greater detail in section 5.4. This method is very exible, but requires the user
106
x2
+ + + + + + + + + + + + + + + +
Z1 Z2 Z4
^ Z8
Z3
+ + + + + + + +
x1
Figure 5.2: An example of a regular grid. The lines connecting the -points are the distances that could be calculated, the dashed lines indicate distances too large for the data to be expected to have any signicant inuence on the grid point value. For example, the N-nearest neighbors method ( ) estimation of grid point would use the points , and ; for the simpler nearest neighbor method . This is a two dimensional example with axes and .
to bring a priori information about the data to the problem. This information takes the form of a variogram of the semivariances and there are several models of variograms that can be used. Typically, real data is best dealt with a linear variogram unless there is rasonable amount of data to derive a robust variogram (more in sections 5.3 and 5.4).
107
(5.9)
5.1.4 Splines
We dont plan on covering splines per se. Like many of the topics covered in this course, splines are a course onto themselves. But we would be remiss if we did not mention them here. Splines got their start as long exible pieces of wood or metal. They were used to t curvilinearly smooth shapes when the mathematics and/or the tools were not available to machine the shapes directly (i.e. hull shapes and the curvature of airplane wings). Since then, a mathematical equivalent has grown up around their use and they are extremely useful in tting a smooth line or surface to irregularly spaced data points. They are also useful for interpolating between data points. They exist as piecewise polynomials constrained to have continuous derivatives at the joints between segments. By piecewise we mean, if you dont know how/what to do for the entire data array, then t pieces of it one at a time. Essentially then, splines are piecewise functions for connecting points in 2 or 3 dimensions. They are not analytical functions nor are they statistical models, they are purely empirical and devoid of any theoretical basis. The most common spline (there are many of them) is the cubic spline. A cubic polynomial can pass through any four points at once. To make sure that it is continuously smooth, a cubic spline is t to only two of the data points at a time. This allows for the use of the other information to maintain this smoothness. If you consider Fig. 5.3 there are four data points ( , and ). Cubic polynomials are t to only two data points at a time ( to , to , etc.). By requiring the tangent of at to be equal to the tangent of at , we can write a series of simultaneous equations and solve for the unknown coefcients. See Davis (1986) for more details and M ATLABs spline toolbox (based on deBoor, 1978). There are a number of known problems with splines. Extrapolating beyond the edges of the data domain quite often yields wildly erratic results. This is because there is no information beyond the data domain to constrain the extrapolation and splines are essentially higher order polynomials which will grow to large values (positive or negative). Closely spaced data points can develop aneurysms. In an attempt to squeeze a higher order polynomial into a tight space large over- and under-shoots of the true function can occur. These problems also occur in 3-D applications of splines. However, if a smooth surface is what you are looking for, frequently a spline (see spline relaxation in other texts) will give you a good, usable smooth t to your data.
108
Figure 5.3: A cubic polynomial is t piecewise from to , to , etc. Because only two points are used at any one time, the additional information from the other points can be used to constrain the tangents to be equal at the intersections of the piecewise polynomials, for example at .
(5.10)
where the grid estimate ( ) is the sum of a weighting scheme ( ) times the actual observations varies as we have seen in the rst part of this segment ( -nearest neighbors, ( ). The nature of inverse of the distance, inverse of the square of the distance, etc.).
109
6 9 2 5 1 3 + 7 8 4
Figure 5.4: A hypothetical study area divided into nine equal sub-areas or blocks. The red points represent actual data sampling locations and the blue represents just one out of several grid points an estimate is desired. As shown in Eqn 5.11, the value at the blue can be estimated as the weighted sum of the means of the surrounding blocks.
An estimator for the center of this design is then given by equation 5.11 and each sub-area can be estimated by making it the center of its own 3-by-3 block. (5.11) Here the s are the weights applied to the block means. These weights are determined by a number of methods, some of which are outlined in Section 5.1.3 or from eld data that allows the inversion of the system of equations in Eqn 5.11. One drawback to this approach is that although the mean of the block is relatively independent of the size of the block (once the block is above a certain, data dependent, size), the variance of the
110
block estimate tends to increase with increasing block size. It is quite possible that the variance of the block estimate may be too large to make the estimate of much use in your investigation. Your block size can go either way, smaller: not enough data to be realistic; larger: all the structure averaged out (see discussion of stationarity below).
+
8 4 7
Figure 5.5: The same study are as in Fig 5.4, but with the area surrounding the point to be estimated divided into four new areas indicated by the shaded areas. Instead of having only nine block averages to work with, Eqn 5.11 will have 13. The red data points have been rmoved from the gure for clarity and the new blocks have not been numbered.
It is left to the readers imagination as to how other geometries could be used to divide and re-divide the study area into blocks for estimating the blue cross. Keep in mind that only one blue cross was shown for demonstration purposes, but that each block has its own blue cross that is estimated in a fashion similar to the one we just discussed.
111
The averages from the blocks, the s, can also be weighted, or windowed. The results, in histogram form, are shown in Fig 5.6. Now in Eqn 5.11 the s are not equal over the entire block, but rather are a function of how distant the block centroid is from the point to be estimated.
Figure 5.6: The effects of two kinds of windowing on averaging (there are many forms of windowing, these are just examples of two). In a) a simple boxcar type of of windowing was applied to the data, in this case the windows are all of an equal width producing a classical looking histogram. In b) the windows are tapered, like a Gaussian, making the data points closer to the point of estimation more important (of greater weight) than points farther away. As an example compare the shaded areas, which enclose the data and their weights used to estimate the point being estimated.
112
In these cases it is possible to make your study variable a function of your coordinate system. You are, in fact, tting a trend surface to your data in terms of the coordinates that you use to locate your samples. It can be a trend surface of any order and rank; meaning that the trend can be order (a straight line, a at plane) or order (quadratic curve, surface or hyper-surface). The order refers to the highest power any independent variable is raised to, the rank refers to the dimensionality. You can set up the equations and solve them with either normal equations or the design matrix, in certain advanced cases you may need to apply the non-linear tting technique of Levenberg-Marquardt. Or, in most cases, you can use a handy little m-le Bill Jenkins wrote up called surfit.m. It uses the repetitious nature of higher and higher order polynomials and the SVD solution to the normal equations to t surfaces to your data of the form: (5.12) Most grid generation schemes work best when , in order to accomplish this it is important to remove any trend surface from your data rst. At the very least you should remove a rst order, -dimensional surface ( refers to the rank of your coordinate system) from your data before proceeding to run your grid generation routine. You can always add it back in to your grid estimation points because you now have an analytical equation that relates your property to the coordinate system of your study area. Higher order, -dimensional, surfaces can also be tted. The higher order you go, the better your t will be regardless of what you use as a goodness-of-t parameter. But keep in mind the better t may not be statistically signicant and you can use ANOVA to test for this (see also Davis, 1986, pp 419-425).
5.3 Variograms
At the heart of kriging is the semivariogram or structure function of the regionalized variables that you are trying to estimate. This amounts to the a priori information that you must supply to the software in order to make a regular grid out of your irregularly spaced data. Basically the idea is to have an estimate of the distance one would need to travel before data points separated by that much distance are uncorrelated. This information is usually presented in the form of the variogram, in which the semivariance is a function of distance or lag ( ).
113
where is equal to the dimensionality of your space). A regionalized variable is typically represented as and the grid point estimate of it as . A regionalized variable, then, seems to have two contradictory characteristics: a local, random, erratic aspect which calls to mind the notion of a random variable; a general (or average) structured aspect which requires a certain functional representation. Hence we are dealing with a naturally occurring property (variable) that has characteristics intermediate between a truly random variable and completely deterministic variable. In addition, this variable (property) can have what is known as a drift associated with it. These drifts are generally handled with trend surface analysis and can be analyzed for and subtracted out of the data much the same way an offset can be subtracted out of a data set.
5.3.2 Semivariance
First remember the denition of variance: (5.13) in most cases the variance of a data set is a number (scalar). The semivariance is a curve (vector) derived from the data according to: (5.14) where the asterisk indicates an experimental variogram computed from the data and is the lag distance between data point pairs. There also are theoretical semivariograms which model the structure of the underlying correlation between data points, such as the exponential model:
(5.15)
where
114
Range ( ): The scalar that controls the degree of correlation between data points, usually represented as a distance. Sill ( ): The value of the semivariance as the lag ( ) goes to innity, it is equal to the total variance of the data set. Given the two parameters range and sill and the appropriate model of semivariogram, the semivariances can be calculated for any . These quantities can be best visualized in Fig 5.7, a simple exponential model of semivariance.
10 9 8 7 6
(h)
5 4 3 2 1 0 0 5
Exponential Semivariogram
10
15
20
25
30
Lag (h)
Figure 5.7: A simple exponential semivariogram with a range of 5 and a sill of 10.
The constant offset ( ) added to the theoretical semivariance models is known as the nugget effect. This constant accounts for the inuence of high concentration centers in the data that prevent the experimental semivariogram from passing through the origin. This model has its beginnings with mining geologist who were looking for nuggets of gold, which were rarely sampled directly, hence the unresolved or sub-sampling grid scale variability. There are several models of semivariance to pick from, the trick is to pick the one that best ts your data. We will mention, later on in our discussions of kriging and cokriging, that if you are estimating the semivariogram experimentally (i.e. from actual data) often the linear model seems to give the best results. But there seems to be quite a bit of debated over what is the universal model. You have already seen the exponential model, there are also the:
115
spherical model - which rises to the sill value more quickly than the exponential model, the general equation for it looks like: (5.16) Gaussian model - is a semivariogram model that displays parabolic behavior near the origin (unlike the previous models which display linear behavior near the origin). The formula that describes a gaussian model is: (5.17) linear model - in this model the data do not support any evidence for a sill or a range and rather appear to have increasing semivariance as the lag increases. This is a key sign that the proper choice is the linear model. In these cases the linear model is concerned with the slope and intercept of the experimental semivariogram. It is given simply as: (5.18) and the slope ( ) is nothing more than the ratio of the sill ( ) to the range ( ).
5.3.4 2
Order Stationarity
Data elds are said to be rst order stationary when there is no trend, i.e. the mean of the eld is the same in all sub-regions. This is easily accomplished by tting and removing a trend surface to/from the data (if you know what the trend is in the rst place). Second order stationary data eld are realized when the variance is constant from one sub-region to the next. We say the data (actually really the residuals) are homoscedastic, that is to say, equally scattered about a mean of zero.
116
10 9 NorthSouth 8 7 6 EastWest
(h)
5 4 3 2 1 0 0 5 10 15 20 25 30
Lag (h)
Figure 5.8: Two semivariograms showing the presence of anisotropies in the data. In this case the range and sill for the east-west direction is 5 and 8, but in the north-south direction they are 3 and 8.
(5.19)
While somewhat overwhelming looking, upon inspection we see that this is just Eqn 5.14 modied. By taking the absolute value of the difference between two data points separated by a distance , then taking its square root, dividing by the number of data pairs separated by the
117
distance , and then raising the results to the fourth power we diminish the effects of these outliers. The denominator is nothing more than a normalization to make gamma unbiased. This form of the experimental semivariogram is very useful in cases where we have a lot of data to estimate the semivariogram from and outliers can become an irksome problem; although this equation also works on lower data densities.
5.4 Kriging
Kriging is a method devised by geostatisticians to provide the best local estimate of the value of the mean value of a regionalized variable. Typically this was an ore grade and was motivated by the desire to extract the most value from the ore deposit with the minimum amount of capital investment. The technique and theory of geostatistics has grown since those early days into a eld dedicated to nding the best linear unbiased estimator (BLUE) of the unknown characteristic being studied.
118
is and what it That is to say, you want to know the difference between what you estimated really is, a quantity we usually dont know ( ). There is a way to do this by requiring that the sum to one, this will result in an unbiased estimate if there is no trend. You can then weights calculate the error variance as: (5.22) It seems only logical that the closer a data point is to the grid point you wish to estimate the more weight it should carry. These weights used ( ) and the error of estimate ( ) are related to through the semivariogram. So, if we had three data points from which to estimate one grid point (as in Fig 5.9), we would have: (5.23) for the estimate and: (5.24) for the weights. The question that remains is: how do we nd the best set of s? Consider Fig 5.9, here we have three data (control) points and from them we wish to make a best linear unbiased estimate of the -eld at grid point . Using the semivariogram we can create the following sets of equations:
(5.25) (5.26) where is the semivariance over the distance between control points and and is the semivariance over the distance between the control point and the grid point . With Eqn 5.24 we have three unknowns and four equations (remember Eqn 5.24) and to force Eqn 5.24 to always be true we add a slack variable resulting in a matrix set of equations like:
(5.27)
119
x2
2.9 2
+ +
1
Z1
3.4 3.3 4.8
^ ZP
Z2
Z3
+
x1
Figure 5.9: Showing the layout of three control points and the grid point to be estimated in example 5.1. The distances ( ) between control points (dashed lines) used to calculate the left hand side of Eqn 5.29 and the distances from control points to used to calculate the righthand side are given.
yields the error of estimate. Now we want to point out that something really cool is happening here. If you stop and think about it you may wonder why the weights should apply to both the data points and the semivariances. We shouldnt have any problem considering Eqn 5.23, after all its just the best linearly weighted combination of the surrounding data points. But what about Eqn 5.25? Why should these also be true? Well, strictly they arent, not until you add the slack variable ( ) that allows Eqn 5.24 to always be true. What insight does this give you into the nature of regionalized variables? Well let you ponder that for a while. Now it sometimes happens that you dont want to or cant remove the trend surface prior to kriging. It is still possible to come up with a best linear unbiased estimate of your grid points using Universal Kriging, the matrix you form is even more complicated than the one in Eqn 5.27 and is covered in Davis, Chapter 5.
120
Modeling Methods for Marine Science Table 5.1: Example 5.1 Coordinate (km) Coordinate (km) 3.0 4.0 6.3 3.4 2.0 1.3 3.0 3.0
5.4.3 An example
This example is taken from Davis, Chapter 5 and addresses only the concept of punctual kriging. Suppose you wanted to dig a well and wanted a good estimate of the elevation of the water table before you began digging. Suppose further that you had three wells already dug distributed about your proposed site ( ) much in the same fashion as are the control points in Fig 5.9. Given the data in Table 5.1, you can use punctual kriging to make a best linear unbiased estimate of the water table elevation at your proposed site. From this information and a structure analysis (semivariogram) you can ll out the equations in Eqn 5.27 and then solve for the water table elevation at your site. The semivariogram analysis revealed a linear semivariogram out to 20 km with an intercept of zero and a slope of 4 m /km. So the matrices in Eqn 5.27 look like:
(5.29)
The numbers on the left-hand side of the equation come from the semivariances between control points calculated by knowing the distance between them and the linear semivariogram model. The numbers on the right-hand side of the equation come from knowing the distance between the proposed site and each control point and the linear semivariogram model. If the condition number of the matrix on the left-hand side isnt too bad, you can invert directly to solve for the Ws and lambda, otherwise you can use SVD to solve for the answer. Either way in this case you get a column vector of:
(5.30)
which when multiplied through Eqn 5.23 gives an estimate of 125.3 m and using Eqn 5.28 yields an error estimate of 5.28 m . The square root of this number represents one standard deviation (2.30
121
m), which represents the bounds of 68% condence. So plus or minus two times this standard deviation yields the elevation of the water table at your proposed site with a 95% condence. M ATLABs answers are a little different from the ones in Davis (1986), but we attribute that to the fact that Davis does not use singular value decomposition to invert his matrices (see Davis, 1986, Chap 3).
(5.31) The cross-semivariance is given by: (5.32) where refers to the number of data pairs that are separated by the same distance , when you have the denition of the semivariogram. One interesting thing about the cross-semivariance is that it can take on negative values. The semivariance must, by denition always be positive, the cross-semivariance can be negative because the value of one property may be increasing while the other in the pair is decreasing.
122
123
about the use of a linear model. As stated in the help information of cokri.m, the ranges in a linear model are arbitrary so that when they are divided into the also arbitrary sill values in they produce the slope of the linear semivariogram model being used for that linear model in that direction. c is the variable containing the by sills of the coregionalization model in use. In this program is used to represent the number of variogram models being applied to the problem. For example, one might wish to combine the effects of a nugget model with a linear model for three properties in 3-dimensions, then , is a 2 by 4 matrix, and is a 6 by 3 matrix of numbers. A nugget model is indicated when the intercept of a semivariogram model is not zero and that intercept value is put in the rst by sub-matrix of to correspond to the rst model row of the model variable. itype is a scale variable indicating the type of cokriging to be done. In ve different values just about everything is covered, from simple cokriging to universal cokriging with a trend surface of order 2. In general, simple cokriging should be used when the mean of the data is known and the data eld is globally stationary in its mean as well as locally stationary in it variance. avg, in his paper (Marcotte, 1991) states that this variable is not used, but later in one of his examples he uses it. We cannot get the program to run unless we provide a 1 by matrix of the averages of the individual properties being cokriged when doing simple cokriging. block is a 1 by vector of the size of the to be cokriged. If we were certain of the volume of our individual samples we could use something other than point kriging, i.e. any positive values will work in that case. nd is a 1 by vector of the discretization grid for cokriging, if using point cokriging make them all ones. ival is a scalar describing whether or not cross-validation should be done and how. We nd it easier and quicker to run the program with set to zero for no cross-validation. nk is a scalar indicating the number of nearest neighbors of the input matrix to use in estimating the cokriged grid point. This is a difcult parameter to give hard and fast rules for deciding how large to make this. You may wish cokri.m to use all of the data points and set this scalar to a very large number, on the other hand you may wish for only local effects to factor into the weighted estimates for the grid point. If you dont get satisfactory results the rst time around, increase or decrease this number. rad is a scalar that describes the radius of search for the nearest neighbors in , clearly they are interrelated and one helps constrain the other. Additionally, it is clear here that the units of the coordinates need to be in the same units, if not, standardization helps.
124
ntok a scalar descibing how many groups of grid points in will be cokriged as one. When is greater than one, the points inside the search radius will be found from the centroid location. Output:
x0s is, of course, your answer. It is a by matrix of the grid point estimates. The columns correspond to the grid point coordinates given in and the columns correspond to the estimates of the properties at those grid point coordinates. by matrix of the error estimates of the grid points. This is the big benet to s is a kriging in that it provides you with not only an estimate of a propertys value at a grid point, but also an estimate of the uncertainty in that estimate. sv is a 1 by vector of variances of points in the universe. ) by 2 matrix of the identiers of the (or, in Davis, ) weights for the last cokriging id is a ( system solved (i.e. the last grid point system of equations). ) minus ) by ( ) matrix with the (or ) weights and Lagrange multipliers l is a (( refers to the number of constraints of the last cokriging system solved. In this program applied to the simple cokriging system. A word of caution, for some reason, Marcotte has set up cokri.m to turn off case sensitivity. When the program is nished running variables Axb and axb are considered the same and making reference to a variable such as Axb will generate a variable or function not found error. Simply issue the command casesen and case sensitivity will be restored. We have modied the code we provide to you by simply commenting out the casesen off command with %casen off, so you neednt worry about this at rst (but it is available, just remove the % sign).
125
= data minus trend). Just because you calculated the semivariances in different directions doesnt mean you havent used all of the data points, since youve used all of the data and the sill represents the total variance contained in the data (anomalies), they will also be equal regardless of direction. 3. This last one may seem a little odd. The ranges should all be the same in a given direction, regardless of the property or cross-property. Think of it this way, the decorrelation scale length is always the same in a given direction; the medium (seawater, granitic batholith, etc.) doesnt change eventhough the property might. Now, of course, weve told you how difcult it is to render your data second order stationary and the above insights might not be strickly, numerically true. Your options are to return to your trend surface analysis and see if you cant nd a better lter to remove the large scale trend that is contaminating your anomalies. Or, if the tted parameters are close in value (remember that nlleasqr.m gives you error estimates of these parametes), averaging them can still yield useful results. Remember, the sill and nugget are averaged over directions, but the ranges are averaged over properties.
5.6 Problems
All of your problems sets are served from the web page:
http://eos.whoi.edu/12.747/problem_sets.html
which can be reach via a number of links from the main course web page. In addition, the date the problem set comes out, the date it is due, and the date the answers will be posted are also available in a number of locations (including the one above) on the course web page.
References
Clark, I., 1979, Practical Geostatistics, Elsevier, New York, 129 p. Cressie, N.A., 1993, Statistics for Spatial Data, Wiley-Interscience, New York, 900 p. Davis, J.C., 1986, Statistics and Data Analysis in Geology, 2 New York, 646 pp. Edition. John Wiley and Sons,
deBoor, C., 1978, A Practical Guide to Splines, Springer-Verlag, New York, 392 p. Marcotte, D., 1991, Cokriging with M ATLAB, Comp. and Geosci., 17(9): 12651280.
126