Вы находитесь на странице: 1из 3

RAREFACTION CURVES groups present in a rarefied sample was pro-

vided by Heck et al. [7], who also considered


BACKGROUND the determination of sufficient sample size
for data collection .
Sampling properties of the rarefaction
When one samples information about a measure were explored by Smith and Grassle
population of people, machines, butterflies, [15]. Rarefaction has been applied exten-
stamps, trilobites, and nearly anything else, sively in the study of diversity throughout
it is a natural tendency that as more objects the fossil record by Raup [10,11] and by oth-
are obtained, the number of distinct kinds ers. Some criticisms and suggestions relating
of objects increases. Rarefaction is a sam- to the application of rarefaction methods
pling technique used to compensate for the were made by Tipper [18]. Upper and lower
effect of sample size on the number of groups bounds on rarefaction curves were devel-
observed in a sample and can be important in oped by Siegel and German [13]. Because
comparisons of the diversity of populations. it is based on sampling from the data, rar-
Starting from a sample of units classified into efaction is related to the bootstrap method
groups, the rarefaction technique provides of Efron [1]; this connection is explored by
the expected number of groups still present Smith and van Belle [16].
when a specified proportion of the units are Rarefaction is related to the ideas of diver-
randomly discarded. In this way a large sam- sity and evenness in populations. A recent
ple can be rarefied, or made smaller, to overview of diversity measurement may be
facilitate comparison with a smaller sample. found in Patil and Taillie [9] with discussion
For example, suppose that I spend two by Good [4] and Sugihara [17]. Rarefaction
months collecting specimens and find 102 may be considered as an interpolation pro-
distinct species represented among the 748 cess, compared to the more difficult problem
individuals collected. If you then spend a of extrapolation as considered by Good and
week collecting specimens at another loca- Toulmin [5] and by Efron and Thisted [2], in
tion and find only 49 species among 113 which the goal is to estimate the number of
specimens, can we conclude by comparing additional groups that would be observed if a
your 49 to my 102 species that your popula- larger sample could be obtained.
tion was less diverse than mine? Of course
not. We need to correct for sample size to do
a proper comparison because if you had col-
lected for a longer time, you would probably
have obtained a larger number of samples
and of species. Applying the techniques of DEFINITION AND PROPERTIES OF
rarefaction to the detailed data (the method RAREFACTION
needs to know how many individuals are in
each species), the rarefied number of species
in my sample might turn out to be 50.14
species for subsamples of size 113. When com- Suppose that we have a situation in which
pared to your count of 49 species, we could N items are classified into K groups in such
then conclude that the population diversities a way that each item is in exactly one group
are not very different. and each group contains at least one item.
A brief history of rarefaction begins with For example, the items might be individual
work by Sanders [12], who developed a specimens that have been collected and the
technique to compare deep-sea diversity to groups might represent the various species
shallow-water habitats. Problems of overesti- present; for analysis at a higher taxonomic
mation were noted by Hurlbert [8], Fager [3], level, the items might be species grouped
and Simberloff [14], with an improved formu- according to genus.
lation given by Hurlbert and by Simberloff. To describe the situation completely, let
A formula for the variance of the number of the number of items in group i be denoted Ni .
Encyclopedia of Statistical Sciences, Copyright 2006 John Wiley & Sons, Inc.

1
2 RAREFACTION CURVES

The data may be described as follows: be updated using a multiply and a divide to
obtain successive terms:
N = total number of items    
N (j + 1) N 1
K = total number of groups n n
Ni = number of items in group i    
Nnj N j N 1
(i = 1, . . . , K). = .
Nj n n

To facilitate computation, we will define Mj The rarefaction values f (n) are often dis-
to be the number of groups containing exactly played as a continuous curve even though
j units (j  1): they are actually discrete values. Consider,
for example, the rarefaction curve for N =
Mj = number of Ni equal to j. 748 units (species) within K = 102 groups
(families) of bivalves from Siegel and Ger-
From these definitions, it follows that man [13], Fig. 1. (Data were collected by
Gould and are described in ref. 6.)

K


Ni = N, Mj = K, jMj = N.
i=1 j=1 j=1
SAMPLING PROPERTIES

Now consider a rarefied sample, con- In many situations it is more realistic to sup-
structed by choosing a random subsample of pose that the observed values of items and
n from N items without replacement. Some groups are not fixed but instead represent a
of the groups may be absent from this sub- sample from a multinomial population. The
sample. Let Xn denote the (random) number expected number of groups represented in a
of groups that still contain at least one item sample of n items from this population can be
from the rarefied sample: used as a measure of the population diversity.
Based on the observed data, the rarefaction
Xn = number of groups still present curve value f (n) can be used as an estimate
of this population diversity measure. Within
in a subsample of n items.
this context, Smith and Grassle [15] have
It must be true that Xn  K with strict proven that the rarefaction value is a mini-
inequality whenever at least one group is mum variance unbiased estimate (MVUE).
missing from the rarefied sample. They also provide an unbiased estimate of
The rarefaction curve, f(n), is defined as the variance of the estimate which takes into
the expected number of groups in a rarefied account the sampling variability of the pro-
sample of size n, and can be computed in cess that generated the data.
several ways:
REFERENCES
 1 
K  
  N N Ni
f (n) = E Xn =K 1. Efron, B. (1982). The jackknife, the Bootstrap,
n n and Other Resampling Plans. SIAM, Philadel-
i=1

   1 phia.
Nj N
=K Mj . 2. Efron, B. and Thisted, R. (1976). Biometrika,
n n 63, 435448.
j=1
3. Fager, E. W. (1972). Amer. Naturalist, 106,
It is always true that 0  f (n)  K, f (0) = 293310.
0, f (1) = 1, and f (N) = K. Moreover, f is mo- 4. Good, I. J. (1982). J. Amer. Statist. Ass., 77,
notone increasing and concave downward. 561563.
Because these binomial coefficients can 5. Good, I. J. and Toulmin, G. H. (1956).
become large and overflow when computers Biometrika, 43, 4563.
are used, it is preferable to compute directly 6. Gould, S. J., Raup, D. M., Sepkoski, J. J.,
with the ratio of the two coefficients, which is Schopf, T. J. M., and Simberloff, D. S. (1977).
always between 0 and 1. This ratio may easily Paleobiology, 3, 2340.
RAREFACTION CURVES 3

Figure 1.

7. Heck, K. L., Jr., van Belle, G., and Sim-


berloff, D. S. (1975). Ecology, 56, 14591461.
8. Hurlbert, S. H. (1971). Ecology, 52, 577586.
9. Patil, G. P. and Taillie, C. (1982). J. Amer.
Statist. Ass., 77, 548561, 565567.
10. Raup, D. M. (1975). Paleobiology, 1, 333342.
11. Raup, D. M. (1979). Science, 206, 217218.
12. Sanders, H. L. (1968). Amer. Naturalist, 102,
243282.
13. Siegel, A. F. and German, R. Z. (1982). Bio-
metrics, 38, 235241.
14. Simberloff, D. S. (1972). Amer. Naturalist,
106, 414418.
15. Smith, W. and Grassle, J. F. (1977). Biomet-
rics, 33, 283292.
16. Smith, E. P. and van Belle, G. (1984). Biomet-
rics, 40, 119129.
17. Sugihara, G. (1982). J. Amer. Statist. Ass., 77,
564565.
18. Tipper, J. C. (1979). Paleobiology, 5, 423434.

See also BOOTSTRAP; DIVERSITY INDICES; ECOLOGICAL STATISTICS;


and LOGARITHMIC SERIES DISTRIBUTION.

ANDREW F. SIEGEL

Вам также может понравиться