Академический Документы
Профессиональный Документы
Культура Документы
R. V. S. Wright
Prehistoric and Historical Archaeology, University of Sydney, NSW 2006, Australia
Kernel density estimates, which at their simplest can be viewed as a smoothed form of histogram, have been widely
studied in the statistical literature in recent years but used hardly at all within archaeology. They provide an effective
method of data presentation for univariate and particularly bivariate data and this is illustrated with a range of
examples. The methodology can be used as an informal approach to spatial cluster analysis, and one example suggests
that it is competetitive with other approaches in this area. A reason for the lack of use of kernel density estimates by
archaeologists may be the lack of accessible software. The analyses described here were undertaken in the MATLAB
package using routines developed by the second author, and are available on request. ? 1997 Academic Press Limited
appearance may be crucially affected both by the point KDE and the true density, leading to an estimate of h
at which the histogram is started—the origin—and the that ‘‘maximizes’’ the closeness. If it is assumed that
width of the intervals used, or ‘‘bin-width’’. Good the true density is normal then it can be shown that an
computer software packages will make automatic and optimal choice of h is
sensible choices for the origin and bin-width, but it
should be possible to vary these and this will affect the h=1·06n "1/5ó̂,
results obtained.
Let the origin of the histogram be m0, with subse- where ó̂ is an estimate (possibly robust) of ó, the S.D.
quent interval boundaries at m1, m2, etc. and assume of the normal distribution. This is the normal scale rule
that (mj–mj–1)=c for some constant c for j=1,2, . . . (i.e. and will typically over-smooth the data if the under-
intervals are of equal width). Let ä and q be values such lying density is not normal.
that ä is small and qä=c. It is then possible to imagine The estimate of h depends, in general, on properties
the construction of successive histograms with origins of the true density that are unknown, and in particular
at (m0 +iä) for i=0,1, . . . , q–1. If the q histograms so on a quantity that may be interpreted as the ‘‘rough-
obtained are averaged then an average shifted histo- ness’’ of the density. A family of direct plug-in (DPI)
gram (ASH) (Scott, 1992) is obtained. The appearance estimates can be defined in which an estimate of h can
of the ASH will not be dependent on the choice of m0. be obtained by ‘‘plugging-in’’ an estimate of roughness
Its smoothness will depend on c, and increases as c into the equation that defines h. More details are given
increases. The limiting form of the ASH, as ä]0, is a in the Appendix.
kernel density estimate. An example is given in Baxter A related approach is the ‘‘solve the equation’’
& Beardah (1995b). (STE) method, in which an equation that relates h to a
Another way to think of KDEs is as follows. Given function of the unknown density is defined. In essence,
n points X1, X2, . . . , Xn situated on a line a KDE can an initial estimate of h leads to an estimate of the
be obtained by placing a ‘‘bump’’ at each point and density, that in turn leads to a new value for h and a
then summing the height of each bump at each point new density estimate. The process continues until the
on the X-axis. The shape of the bump is defined by a estimate of h converges. Wand & Jones (1995: 96)
mathematical function, the kernel K(x), that integrates suggest that a suitable data analytic strategy is to look
to 1. The spread of the bump is determined by a at several different estimates of h, but that if a single
window- or band-width, h, that is analogous to the value is required DPI and STE estimates appear to be
bin-width, c, of a histogram. The kernel is usually a among the more suitable.
symmetric probability density function. The prime purpose of the paper is to illustrate the
The shape of the resulting KDE does not depend on use of bivariate KDEs and the generalization to these
a choice of origin and is relatively insensitive to the is relatively straightforward. By analogy with the
exact form of K(x), which is taken to be a normal previous discussion of univariate KDEs we may
density function in the rest of the paper. The choice of think in terms of n points in a plane defined by
h is more critical and will be considered shortly. co-ordinates X(i) =(Xi, Yi), for i=1,2, . . . , n. Locating
We have presented two simple ways of conceptual- a ‘‘bump’’ at each point corresponds in this case
ising what a KDE is. Mathematically, the latter to centering a three-dimensional bump or ‘‘hill’’ at
approach gives the KDE as each point and then, at each point in the plane,
summing the height of the bumps. The bump, or
kernel, is taken in this paper to be a bivariate normal
distribution.
For two variables, X and Y, a bivariate normal
where f|(x) is an estimate of the density underlying the distribution is defined by the means of X and Y, taken
data. to be zero; their S.D.; and their correlation, which
Large values of h over-smooth, while small values determines the orientation of the bump. If this corre-
under-smooth the data. A variety of approaches can be lation is taken to be zero, as we do here, then smooth-
used to select h, including subjective choice and it may ing will be in the direction of the coordinate axes and
often be sensible to look at KDEs for several values the degree of smoothing is determined by the S.D. One
of h. will often not lose much by taking the correlation to be
More objective or data-driven choices of h can be zero, whereas smoothing equally in both directions, by
made, and a wide range of methods have been pro- using the same window-widths, is not generally to
posed for this. These are described in detail in Wand be recommended (Wand & Jones, 1995: 108).
& Jones (1995) and in summary form in Baxter & The theory underlying the optimal choice of
Beardah (1995b). An outline of a subset of these window-widths is not as well developed for the bivari-
methods is given here. ate as for the univariate case. The examples in this
The data can be thought of as a sample of n from paper use window-widths for the X and Y directions
an underlying and unknown true density, f(x). It is determined as for the univariate case, using either STE
possible to define a measure of ‘‘closeness’’ between the estimates or the normal scale estimates.
Kernel Density Estimates 349
Relative frequency
0.2
where h1 and h2 are the window-widths in the X and Y
directions. 0.15
An attraction of using KDEs is that they can be used
as a basis for producing contour plots of the data and 0.1
this leads to graphical representations of data of a kind
that archaeologists should find familiar. The following
discussion of how contouring can be used is based on 0.05
the paper by Bowman & Foster (1993).
After a bivariate KDE has been obtained each 0
(two-dimensional) data point is associated with a –8 –6 –4 –2 0 2 4 6 8
density height that may be ranked from largest to First component
smallest. The first 50% ranked observations, for Figure 1. Two univariate kernel density estimates for scores on the
example, may be used to define contours that enclose first principal component of an analysis of the chemical composition
the densest 50% of the data. The level of contouring of 105 specimens of Romano-British waste glass. ——: STE rule;
– – –: normal scale rule.
can be varied to contain any specified proportion of
the data, and several contours can be superimposed
on a plot, with the original data if this is helpful.
Bowman & Foster (1993: 173) note that in some Example 1
ways this provides a two-dimensional analogy to the Principal component analysis is one of the more com-
one-dimensional boxplot, and also that the approach monly used multivariate methods in archaeology and a
is useful for looking for modes or clusters in the detailed account and bibliography is given in Baxter
data. (1994). Typically, data are standardized and an analy-
A further extension, noted in the same paper, occurs sis results in new, linear combinations of the original
when the data points can be classified, by period or variables, called principal components, that can be
context for example. In this case a particular contour inspected for structure using plots (usually) based on
level such as 75% might be selected and then contours the first two or three components. If there is structure
at this level drawn for each group separately, to reveal in the data it will often show in the first component and
how similar or distinct they are. This will also be it can be useful to examine this using a KDE.
illustrated in the next section. The data used for the first example are 105 speci-
mens of Roman waste glass, with a principal compo-
nent analysis based on their chemical composition
with respect to 11 oxides. The data are given, and
Examples extensively analysed, in Baxter (1994). The specimens
There are many ways in which univariate KDEs might come from two sites and the statistical analyses suggest
be used in archaeology, and several of these have been that there are perhaps three clusters in the data that are
illustrated in our previous work. Data presentation for related to, but do not exactly coincide with the site
a single data set and comparison between the distri- classification.
butions of different data sets are obvious uses. It is As a first illustration of kernel density estimation
worth remarking that the boxplot, another good way Figure 1 shows two KDEs for the principal component
of looking at and comparing univariate data, does not scores, based on the normal scale estimate of h and an
work well with multi-modal data. Bounded data, in the STE estimate of h. The normal scale estimate over-
sense that certain values are impossible, and data smooths the data, as expected, and misses the central
affected by outliers can be handled using boundary and smaller mode suggested by the STE approach.
kernels and adaptive estimates respectively, and this The usual bivariate component plot can be repre-
is discussed and illustrated in Beardah & Baxter sented by a KDE in various ways. Figure 2 shows a
(1995). scatter plot of the scores on the first two components
For practical purposes a distinction may be drawn and Figure 3 shows a KDE using the STE estimate of
between kernel density estimation as applied to simple, h. Three main concentrations are evident. For this
or simply transformed, variables, and as applied to example inspection of the scatterplot has led one of us
composite variables such as those derived in principal (Baxter, 1994) to the same conclusion, so that a KDE
component and other forms of multivariate analysis. is not essential. In Examples 3 and 4 much larger
This latter greatly extends the potential for the use of data sets are used for which the scatterplot is a less
KDEs and is illustrated in Examples 1, 3 and 4. useful tool.
350 M. J. Baxter et al.
4 12
11
2
10
0 9
Component 2
Component 2
8
–2
7
–4 6
5
–6
4
Normal scale rule
–8
–5 –4 –3 –2 –1 0 1 2 3 4 5 3 4 5 6 7 8 9 10 11 12 13
Component 1 Component 1
Figure 2. Principal component plot for the first two components Figure 4. A KDE of the Mask Site data using the normal scale rule.
from an analysis of the chemical composition of 105 specimens of The contours are for 25, 50, 75 and 100% inclusion levels.
Romano-British waste glass.
12
0.2
11
Relative frequency
0.15 10
Component 2
9
0.1
8
0.05
7
0 6
5
0 5
Component 2 –5 STE rule
2 4 6 4
–4 –2 0 3 4 5 6 7 8 9 10 11 12 13
–6
Component 1 Component 1
Figure 3. A KDE estimate, based on an STE rule for the selection of Figure 5. As for Figure 4 but using an STE estimate.
h, for the data.
0.1
0.08
Relative frequency
0.25 0.06
0.2
Component 2
0.04
0.15
0.1 0.02
0.05
0 0
14
12 –0.02
14
10 12
8 10 –0.04
6 8
Component 2 6
4 4 Component 1 –0.06
2 2 –0.15 –0.1 –0.05 0 0.05 0.1 0.15
Figure 6. A density plot of the Mask Site data using the STE Component 1
estimate. Figure 7. A scatterplot of the scores from a principal component
analysis of the Irish body and craniofacial measurement data. Based
on 7214 individuals, the purpose is to illustrate that such plots are of
How real is the structure suggested? In fact the limited use for looking at large data sets.
location of hearths, activity areas and features such as
rocks is known, and Blankholm provides a map of
these that can be overlaid on his figures. There are five Example 3
hearths and two of them are associated with concen- This third example is based on anthropometric rather
trations detected in all analyses—those to the left of than archaeological data, but is ideal for showing how
our figures. Two other hearths that are adjacent, and at KDEs can be used to illuminate the message of large
the bottom left, are associated with the third main data sets. The data are discussed and analysed in
concentration. Only our STE analysis suggests two Relethford & Crawford (1995) and consist of 17 body
subdivisions of this group. The fifth hearth is associ- and craniofacial measurements from 7214 male adults
ated with a less dense area of bone splinters in the in 31 birth counties in Ireland. The data were originally
upper right of the diagram and is suggested by our STE used to investigate the genetic distances between the
analysis and some of those reported by Blankholm. populations defined by the counties.
From this discussion we conclude that, for this It was of interest for one of us (RVSW) to investi-
example at least, the KDE approach is competitive gate the performance of a principal component analy-
with other statistical approaches to spatial analysis in sis, in order to see how the first two principal
archaeology of the kind that seeks clustering in artefact components relate to geography. Some strong corre-
scatters. lations of this sort, but for different data, have been
From the foregoing discussion it is obvious that reported by Wright (1992). An obvious problem, in
contouring of artefact scatters can be undertaken with- terms of the usual component plots presented from
out reference to kernel density estimation. The merits, such an analysis, is that there are too many data points
or otherwise, of different approaches will be discussed to plot the data sensibly in the usual way. Here we
in the concluding section. It will also be obvious that concentrate on what KDEs have to offer in terms of
KDEs can be used as an informal means of cluster handling such a mass of data, without going into
analysis for these kind of data, and in this sense aspects of substantive interpretation, and note that any
competes with more formal methods such as k-means two-dimensional scatter of data can be handled in a
cluster analysis. It is known that k-means analysis has similar way.
a tendency to produce spherical clusters, whether or Figures 7–10 show four different representations of
not the real structure has this form. This difficulty is these data. Figure 7 is an attempted scatter plot that
avoided by the clusters (or contours) suggested by shows how hopeless it is to try and discern structure in
KDEs. Determining the number of clusters is a prob- the data in this way; Figure 8 is a three-dimensional
lem for any clustering approach. With KDEs it should plot along the lines of Figures 3 & 6; and Figure 9 is a
be informative to examine contours at different levels contour plot along the lines of Figure 5. An STE
of inclusion as a means of looking for structure at estimate has been used. The interesting feature of these
different scales of spatial resolution. last two plots is that there are no interesting features;
Figure 6 shows the alternative representation, for the there is no evidence of any kind of grouping in the
STE estimate, of the KDE as a three dimensional data. The final plot in Figure 10 shows separate 75%
diagram. There are four clear concentrations, or contours for three of the counties, and there is no
modes, with a much gentler ‘‘hillock’’, visible behind evidence of any difference between them. Although the
the front peaks, that is associated with the fifth hearth. plot becomes very crowded this remains the case if all
352 M. J. Baxter et al.
0.03
500 0.02
Relative frequency
400
0.01
300
Component 2
200 0
100
–0.01
0
0.15 –0.02
0.1
0.05 0.2
0 0.1 –0.03
0
–0.05 –0.1
Component 2 –0.1 –0.2 Component 1 –0.04
–0.06 –0.04 –0.02 0 0.02 0.04 0.06
Figure 8. The data of Figure 7 represented as a density plot. Component 1
Figure 10. The data of Figure 7 showing 75% contours for three of
31 counties. The contours largely overlap, and this is the case
0.1 however the counties are selected.
0.08
0.06
Component 2
0.04
0.02
–0.02
–0.04
–0.06
–0.15 –0.1 –0.05 0 0.05 0.1 0.15
Component 1
Figure 9. The data of Figure 7 represented as a contour plot
showing 25, 50, 75 and 100% levels of inclusion.
counties are represented in a similar way on the same Figure 11. A scatterplot based on the first two components of a
correspondence analysis of the Victorian sites data. Many of the
plot. The plots indicate that the correlation (if any) points displayed represent multiple occurrences.
that exists between the principal components and
geography must be a low one.
multiple occurrences are not evident from the plot.
Solutions such as ‘‘jittering’’ exist, in which points are
Example 4 displaced by a small and random amount, which would
The final example is similar in kind to the previous tend to give a separate point for each site, but this then
example, and arises from a correspondence analysis leads to problems similar to that evidenced in Figure 7.
originally undertaken by one of us. It possesses An alternative approach is to use a KDE and
additional features of interest. contour it at some suitable level in order to see where
The data are the frequencies of eight different types the points ‘‘pile up’’, and this is done in Figure 12.
of archaeological site, recorded for 4712 km2 in the Here 90, 95 and 100% contours are shown using an
Australian state of Victoria. The resulting 4712 corre- STE estimate, and are suggestive of, perhaps, nine
spondence analysis ‘‘object’’ scores are plotted on groups.
Figure 11.
It is not possible to get a sensible looking plot here
because the structure of the data is such that many of
Discussion
the points represent multiple occurrences. This over- For simple data presentation and comparison of uni-
printing happens because many of the kilometre variate data KDEs can be regarded as an alternative to
squares have identical frequencies of sites, though the histogram. We think that there are aesthetic and
Kernel Density Estimates 353
3.5 use to which these data sets have been put in the paper,
and interpretations, rests with the present authors.
3
2.5
2 References
Component 2
varying degrees of ‘‘refinement’’, leads to the family of (email c.beardah@maths.ntu.ac.uk). The routines were
direct plug-in (DPI) estimates. Details are given in developed because our interest in KDEs occurred at a
Wand & Jones (1995). time when nothing else was obviously and easily avail-
Solve-the-equation (STE) estimates are closely able to us. We believe that kernel density estimation is
related to DPI estimates.The formula for hAMISE is the a valuable tool for data analysis that can be fruitfully
starting point, and R( f+) is replaced by an estimate that deployed by archaeologists. We are also aware that
depends on h and can be determined for an initial software to implement the ideas involved, including
choice of h. This leads to the estimate of hAMISE that in our own, is not readily available and is expensive. It is
turn is used to estimate a new R( f+) and a new hAMISE. likely that this situation will change, and that kernel
This process continues until h converges. density estimation will become available in accessible
These and other techniques have been implemented software packages. Our hope is that the present paper
in the MATLAB package by one of the authors (CCB) will encourage the use of such methodology when it
and are freely available to anyone who wants them becomes more readily available.