Вы находитесь на странице: 1из 16

Measurement of Faunal Similarity in Paleontology

Author(s): David M. Raup and Rex E. Crick


Source: Journal of Paleontology, Vol. 53, No. 5 (Sep., 1979), pp. 1213-1227
Published by: SEPM Society for Sedimentary Geology
Stable URL: http://www.jstor.org/stable/1304099 .
Accessed: 08/10/2013 15:57

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

SEPM Society for Sedimentary Geology is collaborating with JSTOR to digitize, preserve and extend access to
Journal of Paleontology.

http://www.jstor.org

This content downloaded from 165.123.34.86 on Tue, 8 Oct 2013 15:57:31 PM


All use subject to JSTOR Terms and Conditions
JOURNALOF PALEONTOLOGY,V. 53, NO. 5, P. 1213-1227, 9 TEXT-FIGS., SEPTEMBER 1979

MEASUREMENT OF FAUNAL SIMILARITY IN


PALEONTOLOGY

DAVID M. RAUP AND REX E. CRICK1


Department of Geology, Field Museum of Natural History, Chicago, Illinois 60605 and
Museum of Paleontology, University of Michigan, Ann Arbor, Michigan 48109

ABSTRACT-A probabilistic index of faunal similarity is proposed which compares the number of taxa
common to two faunas with the number that would be expected to be in common if the taxa were
distributed randomly. Departures of observed from expected numbers in common express the level
of similarity or dissimilarity. The frequency of taxa in the whole data set is used to adjust for the
differing probability of occurrence of taxa (cosmopolitan versus endemic). The new index can be used
to determine whether similarities or dissimilarities between faunas are statistically significant.
The index is tested with 1) modern biogeography of echinoids, 2) environmental distribution of
modern foraminifera in Santa Monica Bay, and 3) Ordovician biogeography of nautiloids. In each
case, the proposed index is more effective than traditional indexes of faunal similarity (Simpson,
Jaccard, and Dice coefficients) in addition to the advantage of making possible rigorous assessment
of statistical confidence. The index should also be useful in a biostratigraphic context. The computer
program used for calculating the index is available from the authors.

INTRODUCTION applied in evaluating the sizes of the assem-


A COMMON PROBLEM throughout paleoecolo- blages being compared. For example, if two
gy, biostratigraphy, and paleobiogeography is assemblages have 15 taxa each, the presence
that of comparing faunal lists to evaluate their of 10 taxa in common clearly indicates a great-
similarities and differences. If two lists have er fundamental similarity than if the two as-
no taxa in common, it can be assumed that semblages had had 100 taxa each. Questions
something was different. The possible causes and ambiguities occur, however, and these
vary from ecological differences (marine vs. have given rise to attempts to quantify the pro-
fresh water; shallow vs. deep water, etc.) to cess of comparison.
temporal differences (complete evolutionary Many quantitative measures of faunal sim-
turnover) to biogeographic differences (provin- ilarity have been proposed and several are in
ciality, separation by geographic barriers, common use by paleontologists. Perhaps the
etc.). If the two lists are identical, on the other one most widely applied is the Simpson Coef-
hand, ecological, temporal, and biogeographic ficient (Simpson, 1943, 1947) which may be
unity is assumed (at some scale, at least). The defined as follows:
problems arise when some but not all taxa are
shared. Ordinarily, two rather different ap- S = 100k/B,
proaches are taken in this intermediate situa- where: k = the number of taxa common to
tion: 1) assessment of similarity on the basis of two assemblages A and B,
well informed intuition, and 2) computation of B = the total taxa found in the
a numerical index of similarity. smaller assemblage (B < A).
The intuitive approach is not without value.
The experienced practitioner recognizes that The Simpson Coefficient varies from zero to
some taxa are more common than others and 100 and thus implies percentage similarity.
gives less weight to their joint occurrence in Although it is an easily managed index, au-
two or more assemblages. Many paleobiogeog- thors have pointed out (most recently, Hen-
raphers, for example, discount cosmopolitan derson and Heron, 1977), that the Simpson
forms. In biostratigraphy, the joint occurrence and comparable indexes have important short-
of long-ranging taxa is given much less weight comings. Text-figure 1 illustrates one of the
than that of short-ranging taxa. Judgment is prime difficulties. A series of Venn diagrams
shows several cases which yield identical val-
I Present address: Dept. Geology, Univ. Texas, ues of 'S' yet which are intuitively very differ-
Arlington 76019. ent. In each case, the area of the largest circle
Copyright ? 1979, The Society of Economic
Paleontologists and Mineralogists 1213 0022-3360/79/0053-12
13$03.00

This content downloaded from 165.123.34.86 on Tue, 8 Oct 2013 15:57:31 PM


All use subject to JSTOR Terms and Conditions
1214 DAVID M. RAUP AND REX E. CRICK

20 20
.2011 .02
.20 .04

1. 2.

;= 20 20
= .11 .11
)= .20 .20

3. 4.

SIMPSON = ro100 JACCARD= -k DICE = 2k


B A+B- k A+B
TEXT-FIG. 1-Venn diagrams showing hypothetical cases wherein two faunal assemblages (A and B) are
drawn from a pool of taxa (N). The number of taxa (k) common to A and B is indicated by the overlap
of the two smaller circles. For each case, the Simpson, Jaccard, and Dice similarity measures have been
calculated.

indicates the total pool (N) of taxa which could and Dice. Cheetham and Hazel (1969) have
occur in an assemblage and the two smaller provided an excellent comparative review of
circles (A and B) are assemblages drawn from these and about 20 other similarity coeffi-
this pool. The overlap zone (k) between the cients. Other critical reviews of selected coef-
smaller circles represents the number of taxa ficients exist in the literature (see, for example,
found in both assemblages. In all cases, the papers by Henderson and Heron, 1977, and
Simpson Coefficient is 20 yet one's intuition Simberloff, 1978). Most of these authors have
suggests that the four cases do not indicate the emphasized that it is important to have valid
same similarity in the sense of process (mean- measures of faunal similarity because of the
ing ecological, temporal or geographic similar- increasing use of multivariate statistical anal-
ity). ysis of large masses of distributional (presence/
In recognition of these and other difficulties, absence) data. The multivariate analysis can
many authors have proposed alternate be only as good as the matrix of similarity val-
schemes. Two of the coefficients most com- ues that forms its input!
monly applied in paleontology are the Jaccard At the risk of oversimplification, one can

This content downloaded from 165.123.34.86 on Tue, 8 Oct 2013 15:57:31 PM


All use subject to JSTOR Terms and Conditions
FAUNAL SIMILARITY 1215

argue that most existing similarity coefficients sprinkling. In the trilobite and echinoderm ex-
suffer from two main problems. First, they amples the departure from chance expecta-
have not been derived in a mathematically rig- tions is so obvious that sophisticated statistical
orous way: that is, they have been 'thought testing is unnecessary. But most cases of in-
up' rather than built on sound mathematical terest are more subtle and rigorous treatment
principles. Their validity has all too often been is obligatory-which is to say that one cannot
tested by whether they seem to work in prac- rely on intuition alone.
tice. Second, they have not been tied to clearly In actual cases, it makes no difference
defined null hypotheses; as a result, statisti- whether the null hypothesis of randomness is
cally meaningful comparisons between values rejectable 10% or 90% of the time. We wish
of a coefficient are impossible. It has been im- to use it only as a standard of comparison and
possible to say whether two assemblages are as a means of assessing the probability that
similar (or dissimilar) at the 95% level of con- two assemblages had different ecologic, tem-
fidence, for example. The discussion by Sim- poral, or geographic settings. When dealing
berloff (1978) includes a particularly good with assemblages from radically different fa-
treatment of this point. cies or from different continents, one would
Henderson and Heron (1977) recognized expect to be able to reject the null hypothesis
and discussed many of the problems just de- most of the time. On the other hand, when
scribed and made an attempt to produce a rig- dealing with assemblages from the same for-
orous and statistically valid similarity mea- mation in a local area, one would expect not
sure. The present effort takes a slightly to reject the null hypothesis and to conclude
different tack in the hope of developing a yet that the compositional differences between as-
more robust approach to the similarity ques- semblages are just the result of chance differ-
tion. Our approach is similar to that of Sim- ences in sampling.
berloff (1978), but our objectives and the re- We propose to use a comparison between
sulting technique are substantially different. the observed number of taxa common to two
assemblages (or faunas) and the probability
THE APPROPRIATE NULL HYPOTHESIS distribution of the expected number of com-
Suppose that taxa are sprinkled randomly mon taxa as a measure of the similarity of the
in space and time and that species lists are two assemblages. Assemblages which are
made up from the taxa that happen, by more similar than predicted by the null hy-
chance, to fall in certain areas and in certain pothesis will be interpreted as indicating a pos-
stratigraphic intervals. Most of the species lists itive bias in the make-up of the assemblages.
will differ from one another just because of That is, ecologic, temporal, or geographic fac-
the vagaries of sampling but they will have an tors must have limited the taxa available for
average similarity which is predictable from those assemblages. Conversely, assemblages
the numbers of taxa, areas, and stratigraphic less similar than predicted will be interpreted
intervals involved. As will be shown below, as indicating a negative bias.
it is possible under the random sprinkling hy- Simberloff (1978) had a quite different ob-
pothesis to predict how many species should jective. Working with modern species distri-
be expected to be shared ('k') and the expected butions in the Galapagos Islands, he was ask-
variation in this number. The expected 'k' and ing whether the total distribution represents
its probable variation constitute the appropri- a significant departure from the null hypoth-
ate null hypothesis for assessing faunal simi- esis of random sprinkling. That is, he was ask-
larity. ing whether the array of species lists is consis-
In the real world, the distribution of taxa in tent with the proposition that differences in
space and time is generally non-random. Tri- composition result solely from sampling error
lobites are confined to a small portion of geo- (in dispersal of species) and not from real bio-
logic history (the Paleozoic), echinoderms are geographic effects.
confined to marine environments, and so on.
When we use the temporal confinement of tri- METHODS
lobites or the ecological confinement of echi- Consider the Venn diagrams in Text-figure
noderms to make other interpretations, we are 1 and assume, as before, that the areas of the
tacitly rejecting the null hypothesis of random circles correspond to the numbers of species in

This content downloaded from 165.123.34.86 on Tue, 8 Oct 2013 15:57:31 PM


All use subject to JSTOR Terms and Conditions
1216 DAVID M. RAUP AND REX E. CRICK

.8
>..
I. 2.
_ .6 .6'
_J
-4

<:.4 .4
(a
0
a. .2 .2

0 0
0 1 2 3 4 5 0 3 4 5
kexp
.4 .4
3. 4.
.3 .3

.2 .2

.1 .1. k bs
i I
0 f I
f
mt
_ - - i.

0 5 10 15 20 0 5 10 15 20
TEXT-FIG. 2-Curves showing solutions to equation (4) for all possible values of 'k' in the four cases
illustratedin Text-figure1. The point markedkobsis the numberof taxa observedin commonin Text-
figure1. Note that the relationshipsare not continuousfunctions:only integervaluesof 'k' are possible.

the total pool and two assemblages drawn from the urn to define assemblage A. Then
from that pool. Assume further that all species replace them and draw 'B' balls to form as-
in the pool have an equal chance of being cho- semblage B. The question is: how many of the
sen for each of the assemblages. This is a sim- same balls will be found in both A and B?
plistic assumption because it is well known This problem was solved by Henderson and
that species vary in their abundance so that Heron (1977) by a logical series of steps cul-
some have a much higher probability of oc- minating in their equation (4) and in a slightly
curring in any given assemblage than others. different form by Simberloff (1978), (Null Hy-
But this is a convenient scenario with which pothesis I). The solution presented here is
to introduce a methodology and is the one used more straightforward and more flexible than
by Henderson and Heron (1977) and by Sim- either of the previous efforts.
berloff (1978). The total number of different 'A' assem-
The situation just presented can also be blages that can be drawn from the pool (N) is
thought of in the classic context of an 'urn the number of combinations of N things taken
problem.' Assume that a large urn contains A at a time, or
many balls, each numbered differently, and
that this collection of balls constitutes the pool NcA N= - (1)
of species (N). Now draw 'A' balls at random (N A)! A!(

This content downloaded from 165.123.34.86 on Tue, 8 Oct 2013 15:57:31 PM


All use subject to JSTOR Terms and Conditions
FAUNAL SIMILARITY 1217

TABLE 1-Probabilities calculated from equation (4) for the four cases shown in Text-figure 2: 'kobs' is the number of
species observed to be in common and 'kexp'is the number expected to be in common on the assumption of random
sprinkling of species.

Cases:
1 2 3 4
Probability that kexpis less than kobs: .77 .07 .39 .005
Probability that kexpequals kobs: .21 .26 .24 .012
Probability that kexpexceeds kobs: .02 .67 .37 .983
1.00 1.00 1.00 1.000

Similarly, the number of possible 'B' as- Prob [k species in common] =


semblages is NCB. Therefore, the total num- A! B! (N - A)! (N - B)!
ber of different pairs of assemblages is the N! k! (A - k)! [(N - B) - (N - k)]! (B -
k)!'
product:
(4)
NCA-NB. For any set of A, B, and N values, this equa-
The probability that some particular number tion can be solved for the series 'k' values rang-
of species ('k') will occur in both assemblages ing from 0 to B so that a precise probability
may be expressed as follows: distribution can be developed for variation in
the expected number of species in common.
Prob [k species in common] = This is illustrated in Text-figure 2 for the four
total ways of obtaining k species in common cases shown in Text-figure 1. The equation
total number of different assemblage pairs has also been applied to the cases treated by
Henderson and Heron (1977, fig. 3) and the
(2)
results are comparable though not identical to
The denominator in this expression is the theirs.
product developed above. The numerator can In Text-figure 2, the values of 'k' observed
be constructed by inspection of the Venn in Text-figure 1 are indicated by 'kobs'and the
diagrams in Text-figure 1, as follows: all several theoretically possible values of 'k' by
species in the 'k' area must also belong to The ordinate is the probability of each
'B' which, in turn, must belong to 'N.' There- 'kexp.'
particular kexpoccurring by chance. Note that
fore, all possible compositions of the 'k' area the distributions are of markedly different
can be expressed by: shapes depending on the size of the pool and
NCB B k. the sizes of the two assemblages. The distri-
butions are often highly skewed, in contrast to
There remain only the species that belong to the examples shown by Henderson and Heron.
'A' but not to 'k.' All possible compositions of In fact,
skewing is probably typical of real
this group can be expressed by: world data because N is generally much larger
than A or B.
(N - B)(A - k).
The numerical data from Text-figure 2 are
The numerator of the probability expression is summarized in Table 1. For Case 1, there is
thus the product: only a .02 probability of finding more than the
- B)C(A - k). observed number of species in common follow-
NC'Bck(N
ing the hypothesis of random sprinkling. For
The terms for NCB in numerator and denomi- Case 4, on the other hand, the observed value
nator cancel and we are left with: will be exceeded by chance more than 98% of
Prob [k species in common] = the time. These relations can be used to define
a similarity measure, as follows:
BCk(N - B)C(A -k) = 1 minus the prob-
INDEXOFSIMILARITY
NCA
ability that kexp
When this is evaluated using factorials as in will be greater
equation (1), the result is: than kobs.

This content downloaded from 165.123.34.86 on Tue, 8 Oct 2013 15:57:31 PM


All use subject to JSTOR Terms and Conditions
1218 DAVID M. RAUP AND REX E. CRICK

This is equivalent to: assemblage. In other words, the species pool


INDEX OF SIMILARITY= the probability can be constructed with species of differing
that kexp will frequencies. This is analogous to an urn which
be less than or has more of some kinds of balls than of others.
Simberloff (1978) solved this problem in the
equal to kobs.
formulation of his Null Hypothesis II. He used
The four values of the INDEX are thus .98, the actual frequencies of species in a region as
.33, .63, and .02, respectively. an explicit measure of the probability of these
At this point, we can ask if any of the above species appearing in any one assemblage
figures are statistically significant in the sense drawn from that region. We will follow the
that the null hypothesis of random sprinkling same approach.
can be rejected. Questions concerning statis- The formation of the pool can be illustrated
tical significance must be framed carefully. It by using one of the examples of actual data
is tempting to say that the INDEX OF SIMI- that we will employ later in this paper as a
LARITY in Case 1 allows for the rejection of test of the methodology. We will use data on
the null hypothesis with 98% confidence. But the present biogeography of the 222 genera of
note that the observed number in common in living echinoid echinoderms. Their distribu-
Case 1 is expected to occur 21% of the time tion is expressed in terms of their presence or
under the null hypothesis so that the existence absence in 40 sampling areas in the present-
of this number in common is not startling. In day oceanic world. Some of the areas are ar-
Case 4, on the other hand, a 'k' greater than bitrarily defined and some are based on tra-
that observed is expected more than 98% of ditional biogeographic divisions. The basic
the time and we can conclude that the null data set thus consists of lists of genera for each
hypothesis can be rejected. We can go further of the 40 sampling areas. Some genera are en-
and say that because kobs is significantly low, demic to a single area while others are found
the two assemblages are significantly dissimi- in many areas. In this data set, 60 of the 222
lar-which is to say that something influenced genera are endemics and the most 'cosmopol-
the selection of species from the pool such that itan' genus is found in 20 of the 40 areas. The
fewer than expected occur in common. None number of genera occurring in a given area
of the other cases show significant departure ranges from 1 to 120.
from chance expectations at the 95% level of We will assume, a la Simberloff, that the
confidence. A significant similarity would be probability of occurrence of a genus in a sam-
represented by a number greater than or equal pling area is directly proportional to the num-
to 0.95 in the first row of Table 1. Thus, the ber of areas in which that genus occurs. There-
INDEXOF SIMILARITY as defined here cannot fore, a genus which occurs in only one area is
be used as a direct test of statistical signifi- seen as having a probability of 1/40 of occur-
cance but the data contributing to it can be so ring in any given area. A genus which is found
used. in 10 areas has a probability of 10/40 of oc-
The foregoing scheme is unfortunately not curring, and so on. There is a definite hint of
appropriate for general application because it circularity in this reasoning but Simberloff
uses the simplifying assumption that all (1978) has presented convincing arguments for
species are present in equal numbers in the the lack of significant circularity. The main
pool and thus have an equal chance of occur- point is that the reasoning does not make any
ring in any assemblage. When equation (4) is demands on the spatial distribution of occur-
used with actual assemblage pairs, the ob- rences within the whole region under study: it
served 'k' values are usually much higher than does not preclude a concentration of occur-
would be expected. Most assemblages appear rences in one part of the region. Therefore, the
to be significantly similar to each other. This null hypothesis of random sprinkling is a valid
is because most related faunas contain a few null hypothesis and can be falsified.
common, nearly ubiquitous species which The pool of taxa from which local faunas
have the effect of elevating the observed 'k' are formed has as many occurrences of each
values. This problem can be avoided by ex- genus as there are occurrences of that genus
plicitly accounting for differences in the rela- in the total data set. When assemblages are
tive probability of each species occurring in an selected at random from such a pool, the cos-

This content downloaded from 165.123.34.86 on Tue, 8 Oct 2013 15:57:31 PM


All use subject to JSTOR Terms and Conditions
FAUNAL SIMILARITY 1219

co 30
mopolitan genera are more likely to occur and
are thus more likely to be genera common to
both members of a pair of assemblages. Local
endemics (those with probabilities of 1/40, in
the echinoid case) can occur in two assem- 20- \ A= 36
blages but the probability of this event is low. L.
0
B 19
One would naturally like to be able to derive
an equation equivalent to equation (4) which
would predict values of kexpunder the condi-
tions described above. We have been unable
to derive this equation. Therefore, we have
had to rely on monte carlo simulations-just
as Simberloff did for his purposes. Our meth- 0 5 10 15 19
od is as follows:
kexp
1) For each pair of assemblage sizes in the
real world data set construct an imaginary pair TEXT-FIG. 3-Example of treatmentof a compar-
of assemblages (A and B) by drawing species ison between two echinoid faunas. kohs is the
from the pool. This is accomplished by com- numberof generaactuallyobservedto be in com-
mon. The curveshowsthe percentof simulations
puter with a random number generator. Hav- yielding each value of kexp.The ruled portion
ing made the two assemblages, the lists of gen- indicates the numberof simulationshaving 'k'
era are compared and the number of genera values less than or equal to the observedvalue.
in common is recorded. This number is one
outcome of sampling under the random sprin-
kling hypothesis: that is, one point in a kexp A special problem arises where the smaller
probability distribution. assemblage (B) is very small. In one echinoid
2) The same procedure is repeated many pair, for example, assemblage B contained
times with the number of taxa shared by each only two genera and thus the only possible
pair of assemblages being recorded. values of 'k' are 0, 1, and 2. Fifty simulations
3) A frequency distribution of the results is produced the following k's:
an estimate of the probability distribution of
kexpunder the specified conditions of A and B. k=0 40 (80%),
4) The number of taxa actually shared (kobs) k= 1 9 (18%),
by assemblages of these sizes in the real world k= 2 1 (2%).
is compared with the monte carlo generated 50
distribution and the INDEXOF SIMILARITY is
computed as in the simplified case described There were no genera actually common to the
earlier. two areas (kobs= 0). Thus, using the proce-
An actual example of this procedure is il- dure described earlier, computing the INDEX
lustrated in Text-figure 3 for a pair of sam- would yield a value of 0.80. But this implies
pling areas in the echinoid data set: these areas a higher similarity than may exist. In other
had 19 and 36 genera, respectively. A fre- words, we do not know where the value of
quency distribution of 50 simulated assem- kobs falls within the 80%. Therefore, an arbi-
blage pairs is shown in Text-figure 3 along trary convention was adopted: the INDEX is
with the actual number observed in common computed on the basis of the midpoint of the
(5). In this case, kobsfalls nearly at the center string of simulated 'k' values which are equal
of the simulated distribution and the null hy- to the observed 'k.' The INDEXin this case is
pothesis cannot be rejected. But the percent- recorded as 0.40. The same convention was
age of simulations having 'k' less than or equal followed throughout. In the case illustrated in
to kobs may be used as the INDEX OF SIMILAR- Text-figure 3, the INDEXwas recorded as 0.39.
ITY. This was arrived at by summing the percent-
When this procedure was followed with the ages of the simulations that gave 'k' values less
entire echinoid data set, most cases fell be- than kobs (2 + 6 + 20 = 28) and adding one-
tween the 5% tails of their distributions (as in half the percentage of simulations where 'k'
Text-fig. 3). equaled kobs(?2 x 22 = 11).

This content downloaded from 165.123.34.86 on Tue, 8 Oct 2013 15:57:31 PM


All use subject to JSTOR Terms and Conditions
1220 DAVID M. RAUP AND REX E. CRICK

The use of monte carlo methods calls for distributional data all come from Mortensen's
considerable computation time-much more Monograph of the Echinoidea (1928-51)
than would be required if an analytical expres- which provides a consistent and authoritative
sion comparable to equation (4) were avail- taxonomic base. Of the forty geographic sam-
able. But the results are just as accurate, given pling areas used for the present study, most
enough simulations. In the echinoid case we are relatively shallow water coastal or insular
used 50 simulations for each pair of assem- areas where distributions of taxa tend to re-
blage sizes. 100 or 1,000 simulations per pair flect regional climate. The others have uni-
would have produced more precise distribu- formly cold water faunas: the non-insular
tions but 50 was chosen as the best compro- ocean areas of the North Pacific, South Pacif-
mise with the limitations of computer budgets. ic, North Atlantic, Central Atlantic, and
The important point is that the simulation South Atlantic and the Arctic and Antarctic
technique does not sacrifice rigor unless the Oceans.
number of simulations is too low. As indicated earlier, the data set consists of
The computer program used for the echi- 222 genera which range from local endemics
noid and other analyses is available from the to those found in as many as 20 of the 40 sam-
authors. It is a relatively expensive program pling areas. In keeping with the philosophy of
to run. The cost depends on the number of the method, no data were discarded because
assemblages and the variation in their sizes. of endemism or cosmopolitanism and no areas
The echinoid data set described here is unusu- were excluded because of small sample size.
ally large and requires about 26 cpu minutes The basic computer program was run to as-
on an IBM 360/65 to produce the similarity sess similarity between the members of all pos-
matrix plus a complete record of the 19,750 sible pairs of the 40 generic lists. Fifty simu-
simulations required for the job. A variety of lations were used for each pair of assemblage
techniques could be used to reduce the cost sizes. The output consisted of 1) the tabulated
but they would sacrifice accuracy. results of all simulations (number of genera in
common) and 2) a matrix of values of the com-
APPLICATIONS
puted INDEX OF SIMILARITY. Various analy-
The method described in this paper can be ses were performed on the output, some of
applied to any data set consisting of presence which will be described below.
and absence of taxa. In other words, any sit- Text-figure 4 shows how one of the sam-
uation which yields floral or faunal lists is ap- pling areas compares with the other thirty-
propriate. Each list may represent a single col- nine. The reference area (marked by an 'X')
lecting locality or a composite of information on the west coast of Central America was cho-
from a group of geographically, ecologically or sen arbitrarily and other choices produce com-
stratigraphically related localities. parable results. Values of the INDEX OF SIM-
In order to test the methodology, we will ILARITYare contoured and show decrease in
present three quite different examples: 1) glob- similarity with distance from the reference
al biogeography of living echinoid echino- area. Contouring was straightforward; that is,
derms, 2) distribution of benthic foraminifera extreme contortion of contour lines was not
in Santa Monica Bay, California, and 3) global necessary. Furthermore, the resulting pattern
biogeography of Ordovician nautiloid cepha- is plausible and interpretable in biogeographic
lopods. It should be emphasized that the pro- terms. The map shows clearly that the echi-
posed INDEX OF SIMILARITY, like all other noids of the Eastern Pacific are much more
similarity measures, is a purely descriptive similar to those of the Western Atlantic than
tool. Its purpose is to measure similarities and to those of the Western Pacific and Indian
differences between taxonomic lists and to as- Ocean regions. In fact, the presence of the
sess the statistical significance of these simi- Central American barrier is not evident in the
larities and differences. It does not interpret pattern. (This would not be the case at the
the results in the sense of telling us the biolog- species level where virtually no echinoid
ical or geological factors responsible for the species are common to both sides of the Isth-
similarities or differences. mus of Panama.) While some details of the
Echinoid biogeography.-The data set used pattern may reflect sampling error, there is no
for this test was presented briefly above. The reason to believe that Text-figure 4 is not a

This content downloaded from 165.123.34.86 on Tue, 8 Oct 2013 15:57:31 PM


All use subject to JSTOR Terms and Conditions
FAUNAL SIMILARITY 1221

TEXT-FIG.4-Analysis of echinoidbiogeographicdata. Dotted outlinesseparatethe samplingareas:the


numbersindicatethe INDEXOF SIMILARITYof each area with respectto an arbitrarilychosenreference
area(markedwith X: Pacificcoastof CentralAmerica).Numbersenclosedin circlesindicatesignificant
similarity to the referencearea; numbersin boxes indicatesignificantdissimilarityto the reference
area. Intermediatevalues of the INDEXare contoured.

valid description of generic distribution in this faunas in a rigorous probabilistic way. In


group of echinoderms. Text-figure 4, seven of the similarities are sig-
Similar plots have been made using the nificantly high (at the 95% level of confidence):
Simpson, Dice, and Jaccard similarity index- these are the areas whose INDEX values are
es. The results are approximately the same circled. In these cases, the number of genera
although contouring was more difficult. (In observed to be in common is greater than in
particular, the Simpson Coefficient data con- at least 48 of the 50 simulations. The values
tained several unexplained anomalies.) The of the INDEXcontained in squares are signif-
good results produced by the conventional in- icantly low: that is, the number of taxa ob-
dexes are not surprising in view of the ex- served in common is less than in at least 48 of
tremely high quality of Mortensen's distribu- the 50 simulations. The distribution of circled
tional data and the basic simplicity of echinoid and boxed similarity values is the expected
biogeography at the generic level. one. The intermediate cases indicate a prob-
If the production of a map such as Text- ability of faunal similarity but do not lead to
figure 4 were the only purpose, the other in- the rejection of the null hypothesis. We can
dexes would probably be adequate and the say, for example, that the Alaskan echinoids
saving in computation time would be substan- appear to be different from those of Central
tial. But the similarity measure proposed here America but the difference is not statistically
allows one to evaluate differences between significant and thus could be caused by chance

This content downloaded from 165.123.34.86 on Tue, 8 Oct 2013 15:57:31 PM


All use subject to JSTOR Terms and Conditions
1222 DAVID M. RAUP AND REX E. CRICK

_ _____ _____

A:
COASTAL
)
J
TROPICAL/SUBTROPICAL 0 /
INDO-PACIFIC /
NORTHATLANTIC
I.
U 0

SEA OF JAPAN

SEA OF JAPAN
0

SOUTH PACIFIC 0 SEA OF OKHOTSK


0

0 ANTARCTIC

PC I
TEXT-FIG.5-Multivariate analysisof echinoidbiogeographicdata. The firsttwo principalcomponents
(PCI and PCII)are plottedfor the 40 samplingareas. PCI separatescoastalareas of the Indo-Pacific
from otherareasand fromthe cold water, open ocean areas. PCII reflectswatertemperature.

sampling error. The orderliness of the contour ration is the result of the East Pacific Barrier
lines strongly suggests, of course, that the (Ekman, 1953), an 1,810 km expanse of open
Alaskan and Central American echinoids are ocean separating the islands of Outer Polyne-
in fact different in the sense that they do not sia and the tropical/subtropical coast of Amer-
represent random sprinkling from the same ica. Under ordinary oceanic conditions, echi-
pool. noid larvae are not capable of crossing this
Text-figures 5 and 6 show 2-dimensional barrier and faunas on either side of the barrier
ordination plots of the first three principal are significantly different below the family
components axes representing 98.5 percent of level. Separation of the South Australian and
the variation in the data set. The principal New Zealand regions from the tropical/sub-
components, PCI, PCII, PCIII, account for tropical regions of the Indo-Pacific illustrates
51, 28, and 19.5 percent of the variation, re- cold-temperate character of the South Austra-
spectively. The sampling areas which form lian and New Zealand faunas. Although geo-
tight, natural groups are shown as solid dots graphically proximal, the echinoid faunas of
and the groups are labeled. Others are shown the Sea of Japan and the Sea of Okhotsk are
as open circles and identified individually. remarkably different. The echinoid fauna of
Text-figure 5 is an ordination of PCI and the Sea of Japan consists of shallow water,
PCII. PCI clearly separates the coastal areas warm-temperate genera derived from the sub-
of the Indo-Pacific region from those of the tropical Indo-Pacific via the warm Kuroshio
Eastern Pacific and the Atlantic. This sepa- Current while the echinoid fauna of the Sea of

This content downloaded from 165.123.34.86 on Tue, 8 Oct 2013 15:57:31 PM


All use subject to JSTOR Terms and Conditions
FAUNAL SIMILARITY 1223

Il -

ATLANTIC

INDIAN OCEANO

IU NORTHATLANTIC ARCT \

0
CENTRALATLANTIC0O SOUTH ATLANTIC
SEA OF OKHOTSK
Q
^ \~~~ SOUTH PACIFIC
S
?* OSEA OF JAPAN
/COASTAL
TROPICAL/SUBTROPICAL COASTAL
INDO-PACIFIC * \ C APACIFIC
EASTERN

( C NORTHPACIFI ORH PACIFIC 0


NORTH PATAGONIA

COASTAL ANTARCTIC
WESTERNATLANTIC

PC I
TEXT-FIG. 6-Multivariate analysisof echinoidbiogeographicdata. The first and third principalcom-
ponents(PCI and PCIII)are plotted for the 40 samplingareas. PCIII serves to separatethe coastal
areasof the EasternPacificand Atlanticinto distinctregions.

Okhotsk consists of cold-temperate genera de- pies were analyzed to produce distributional
rived from the north via the cold Oyashio Cur- data on 96 foraminiferal species and subspe-
rent. Any chance mixing of the faunas is re- cies. The samples covered an area of approx-
duced by a shallow submarine sill separating imately 100 square kilometers in water depths
the two bodies of water. Deep water and high ranging from 10 to 828 meters. All occurrence
latitude faunas tend to cluster in the lower data (in terms of percentage abundance) were
right corner of Text-figure 5. Text-figure 6 tabulated in the Zalesny paper. For the pres-
shows that the coastal areas of the Eastern ent study, these were converted to simple pres-
Atlantic, Western Atlantic, and Eastern Pa- ence and absence of taxa and the INDEX OF
cific are separated along PCIII. Naturally, sta- SIMILARITY was computed for all pairs of the
tistical significance cannot be assessed in the 70 sampling localities.
results of the multivariate analysis but the or- Text-figure 7 shows a contour map of raw
dination plots yield considerable information similarity data comparable to that for echi-
of biogeographic interest. noids (Text-fig. 4). The reference fauna (Za-
Foraminifera in Santa Monica Bay.-In a lesny's sample #3110) is in the left central part
superbly detailed study, Zalesny (1959) record- of the map and is indicated by an 'X.' Con-
ed and interpreted the distribution of the fo- tours reflect decreasing similarity of the other
raminifera in the bottom sediments of Santa 69 localities with respect to the reference lo-
Monica Bay, California. Seventy bottom sam- cality. Numerical values of the INDEX OF SIM-

This content downloaded from 165.123.34.86 on Tue, 8 Oct 2013 15:57:31 PM


All use subject to JSTOR Terms and Conditions
1224 DAVID M. RAUP AND REX E. CRICK

50 fm

80-
100,
0

* 0
0
x
#3110
0

I
I

\"' /40 ,/
II
II
I

TEXT-FIG. 7-Analysis of foraminiferal assemblages from Santa Monica Bay. Solid contours indicate
variation of the INDEX OF SIMILARITYwith respect to an arbitrary reference fauna (#3110). Triangles
represent assemblages which are significantly similar to the reference fauna: open circles are assemblages
significantly dissimilar to the reference fauna; solid circles are assemblages not significantly similar or
dissimilar to the reference fauna.

ILARITY are not shown in this case but the The contours of faunal similarity follow the
location of each site is shown by a small sym- bathymetry with remarkable faithfulness. The
bol. Those indicated by triangles are the ones shelf edge is clearly defined and both canyons
that are significantly similar to sample #3110, are evident. The one major anomaly is the
those indicated by open circles are significant- small 'bump' on the inner shelf produced by
ly dissimilar, and the solid dots represent lo- sample #3348. Similarity between this sample
calities which are not significant in either di- and the deep water reference fauna is substan-
rection. Also included are the bathymetric tially higher than is found between the other
contours for 10, 50, and 100 fathoms. The shelf faunas and the reference fauna. It is not,
shelf edge is well defined in Santa Monica Bay however, a statistically significant anomaly. In
and the continental slope is steep. The shelf is fact, when #3348 is compared with the three
indented by two major submarine canyons: closest localities, statistically significant simi-
Santa Monica Canyon (near the reference lo- larity is found! #3348 may therefore by a sim-
cality) and the Redondo Canyon (southeast ple chance departure from the overall pattern
corner of the mapped area). of the contoured similarity surface or it may

This content downloaded from 165.123.34.86 on Tue, 8 Oct 2013 15:57:31 PM


All use subject to JSTOR Terms and Conditions
FAUNAL SIMILARITY 1225

TEXT-FIG. 8-Analysis of Arenigianbiogeographyof nautiloids.Symbolsare as in Text-figure7 and


show the locationsof 52 samplingareas, one of which (Siberia)was used as the referencefauna. The
paleogeographicreconstructionis an interpolationbetweenpublishedmaps(Scotese,et al., 1979)pro-
vided by C. R. Scoteseand A. M. Ziegler.

result from a minor habitat difference between the other. In fact, 21 or 30% show statistical
#3348 and other shelf sites. The latter sug- significance and their distribution is obviously
gestion is likely in view of the fact that Zales- non-random over the geographic area. This
ny's sediment maps show that #3348 comes means that the null hypothesis of random
from a small patch of silt on the shelf surface sprinkling can be rejected when considering
otherwise covered with sand, gravel, or rock. foraminiferal assemblages of the whole bay.
All the deep water sampling sites are in silty This is not surprising in the Santa Monica Bay
sediments. It is not surprising therefore, that case and is certainly substantiated by Zales-
the silt patch on the shelf should yield an as- ny's detailed analysis of the distributions of
semblage with relatively high similarity to the individual taxa. It illustrates how the method
deep-water reference fauna. being presented here can be used to explore
It should be emphasized that Text-figure 7 the question of whether distribution of taxa is
does not in itself require a bathymetric or sed- purely stochastic or whether it is biased by
iment interpretation. As Zalesny (1959) points deterministic biological and/or physical fac-
out, many other ecological parameters such as tors. Even though the stochastic model can be
temperature and salinity parallel changes in rejected easily in this case, the INDEX based
depth and sediment type. The contoured IN- on the null hypothesis of random distribution
DEX OF SIMILARITY only provides a statistical is still a valuable aid to ecological interpreta-
framework for interpretation. tion.
Text-figure 7 can be used also to investigate Multivariate analysis was also carried out
another aspect of faunal similarity. The null on the foraminiferal data. Bivariate ordination
hypothesis of random sprinkling predicts that plots are eminently contourable and follow
about 5% of the sites should appear to be sig- bathymetry.
nificantly similar to the reference fauna and Ordovician nautiloid biogeography.-This
that about 5% should be significantly dissim- data set consists of 182 genera of Arenigian
ilar and that these cases of apparent statistical nautiloid cephalopods which range from en-
significance should be randomly distributed demics to those found in as many as 20 of 52
over the area. In this instance, therefore, 10% sampling areas. The data are taken from a
or about seven of the 69 assemblages should broader study of Ordovician biogeography
show statistical significance in one direction or (Crick, 1978). The location of sampling areas

This content downloaded from 165.123.34.86 on Tue, 8 Oct 2013 15:57:31 PM


All use subject to JSTOR Terms and Conditions
1226 DAVID M. RAUP AND REX E. CRICK

TEXT-FIG.
9-Multivariate analysisof Arenigianbiogeographicdata. A plot of the first two principal
components (PCI and PCII) separates the major geographic elements of the early Ordovician.

is shown in Text-figure 8 on a reconstruction Bear Island at the present time but were sep-
of Ordovician paleogeography developed by arated (as part of Baltica) from Bear Island in
Scotese et al. (1979). the Ordovician.
The faunal relationships were measured Patterns in 2-dimensional ordinations of the
with the computer program used in the pre- principal component axes are not quite as eas-
ceding examples. Contouring of the similarity ily interpreted as were comparable plots of
values with respect to Siberia as an arbitrary Recent echinoid and foraminiferal data. This
reference area (Text-fig. 8) shows the expected reflects loss of information about physical en-
decrease in similarity away from the reference vironments and a certain amount of geograph-
area. Contouring the same data on a map of ic uncertainty. However, information on as-
modern geography (not shown) reveals sub- sociated faunas and sediments, along with
stantial anomalies which reflect the differences knowledge of tectonic setting, does make the
between modern and Ordovician geography. multivariate plots understandable. Text-figure
For example, the Bear Island fauna is signif- 9 shows a plot of PCI and PCII for the 52
icantly similar (at the 95% level) to faunas sampling areas. Clusters showing the principal
from Arctic Canada and Scotland but it is not geographic regions (plates) are indicated. Plots
significantly similar to Norway, Sweden, or including PCIII (not shown) show separation
Estonia. The latter three areas are closest to of two important Ordovician facies; the plat-

This content downloaded from 165.123.34.86 on Tue, 8 Oct 2013 15:57:31 PM


All use subject to JSTOR Terms and Conditions
FAUNAL SIMILARITY 1227

form facies characterized by shelly faunas and or biogeographic proximity rather than tem-
the slope deposits (graptolitic facies). More de- poral identity. But this is an ever-present
tail on this aspect is given elsewhere (Crick, problem in biostratigraphy which must be
1978). dealt with regardless of the method used to
assess similarity. In the biostratigraphic con-
DISCUSSION text, tests of statistical significance could be
The similarity measure presented here is performed in the manner of the echinoid and
somewhat cumbersome and expensive because foraminiferal examples.
of the simulation technique. The rewards may
be worth the extra effort, however. These may ACKNOWLEDGMENTS
be summarized as follows: This work was supported in part by the
1) Distributional data are weighted on the Earth Sciences Section, National Science
basis of frequency so that widespread taxa do Foundation, NSF Grant DES75-03870. We
not have a disproportionate influence on mea- would also like to thank Richard K. Bambach
surement of similarity. and Alan H. Cheetham for helpful reviews of
2) There is no need to discard taxa on the the manuscript.
a priori grounds that they are too widespread
or too localized. REFERENCES
3) The similarity or dissimilarity of any two Cheetham,A. H. and J. E. Hazel. 1969. Binary
faunas can be tested for statistical significance. (presence-absence) similaritycoefficients.J. Pa-
Such tests are robust assuming that enough leontol. 43:1130-1136.
simulations have been run. Crick, R. E. 1978. Ordoviciannautiloidbiogeog-
raphy:a probabilisticand multivariateanalysis.
4) Because the evaluation of similarity does Ph.D. Dissertation,Univ. Rochester,166 p.
not presume any particular shape for the prob- Ekman, S. 1953. Zoogeographyof the Sea. Sedg-
ability distribution of expected numbers of wick & JacksonLtd., London,417 p.
taxa in common, the results may be considered Henderson, R. A. and M. L. Heron. 1977. A prob-
precise and not dependent upon generaliza- abilistic method of paleobiogeographic analysis.
tions drawn from computed variances of the Lethaia 10:1-15.
Mortensen, T. 1928-1951. A Monograph of the
probability distribution. Echinoidea. C. A. Reitzel, Copenhagen. 5 vols.,
5) An entire faunal realm or data set can be 4469 p.
investigated for significance of the observed Rohlf, F. J., J. Kishpaugh and D. Kirk. 1971. NT-
departures from a random sprinkling (stochas- SYS. Numerical Taxonomy System of Multi-
tic) model of taxon distribution. variate Statistical Programs. Tech. Rep. State
The three examples that have been de- Univ. New York at Stony Brook, New York.
Scotese, C. R., R. K. Bambach, C. Barton, R. Van
scribed do not include one in a biostratigraphic Der Voo and A. M. Ziegler. 1979. Paleozoic
context but biostratigraphicapplications should base maps. J. Geol. 87:217-277.
be straightforward and follow logically from Simberloff, D. S. 1978. Using island biogeographic
the biogeographic/ecological cases used here. distributions to determine if colonization is sto-
For example, the probable stratigraphic posi- chastic. Am. Naturalist 112:713-726.
tion of a new fossil assemblage could be as- Simpson, G. G. 1943. Mammals and the nature
sessed by comparing it with a large number of of continents. Am. J. Sci. 241:1-31.
. 1947. Holarctic mammalian faunas and con-
assemblages in a standard (possibly composite) tinental relationships during the Cenozoic. Geol.
sequence. This could be done in the fashion of Soc. Am. Bull. 58:613-688.
the contour maps of Text-figures 4, 7, and 8 Zalesny, E. R. 1959. Foraminiferal ecology of
except that it would be a one-dimensional in- Santa Monica Bay, California. Micropaleontol.
stead of a two-dimensional problem. The 5:101-126.
highest INDEX OF SIMILARITYwould be cen-
tered on the assemblages in the standard se- MANUSCRIPT RECEIVED FEBRUARY 17, 1979
REVISED MANUSCRIPT RECEIVED APRIL 12, 1979
quence most similar to the new assemblage.
This would not demand that a temporal cor- The Field Museum of Natural History contributed
relation be made at that point, of course, be- $500 in support of this article.
cause the similarity might be due to ecological

This content downloaded from 165.123.34.86 on Tue, 8 Oct 2013 15:57:31 PM


All use subject to JSTOR Terms and Conditions