Вы находитесь на странице: 1из 582

Developments in Numerical Ecology

NATO ASI Series


Advanced Science Institutes Series
A series presenting the results of activities sponsored by the NA TO Science
Committee, which aims at the dissemination of advanced scientific and
technological knowledge, with a view to strengthening links between scientific
communities.
The Series is published by an international board of publishers in conjunction with
the NATO Scientific Affairs Division

A Life Sciences Plenum Publishing Corporation


B Physics London and New York
C Mathematical and D. Reidel Publishing Company
Physical Sciences Dordrecht, Boston, Lancaster and Tokyo
o Behavioural and Martinus Nijhoff Publishers
Social Sciences Boston, The Hague, Dordrecht and Lancaster
E Applied Sciences
F Computer and Springer-Verlag
Systems Sciences Berlin Heidelberg New York
G Ecological Sciences London Paris Tokyo
H Cell Biology

Series G: Ecological Sciences Vol. 14


Develoments in Numerical
Ecology

Edited by

Pierre Legendre
Oepartement de Sciences biologiques
Universite de Montreal, C.P. 6128, Succ. A
Montreal, Quebec H3C 3J7, Canada

Co-editor for the Working Group Reports:

Louis Legendre
Oepartement de Biologie, Universite Laval
Ste-Foy, Quebec G1 K 7P4, Canada

Springer-Verlag
Berlin Heidelberg New York London Paris Tokyo
Published in cooperation with NATO Scientific Affairs Oivison
Proceedings of the NATO Advanced Research Workshop on Numerical Ecology
held at the Station marine de Roscoff, Brittany, France, June 3-11, 1986

ISBN-13: 978-3-642-70882-4 e-ISBN-13: 978-3-642-70880-0


001: 10.1007/978-3-642-70880-0

Library of Congress Cataloging in Publication Data. NATO Advanced Research Workshop on Numerical
Ecology (1986: Station marine de Roscoff) Developments in numerical ecology. (NATO ASI series. Series
G, Ecological sciences; vol. 14) "Proceedings of the NATO Advanced Research Workshop on Numerical
Ecology held at the Station marine de Roscoff, Brittany, France, June 3-11, 1986"-T.p. verso. "Published
in cooperation with NATO Scientific Affairs Division." Includes Index. 1. Ecology-Mathematics-
Congresses. 2. Ecology-Statistical methods-Congresses. I. Legendre, Pierre, 1946- . II. Legendre,
Louis. III. North Atlantic Treaty Organization. Scientific Affairs Division. IV. Title. V. Series. QH541.15.M34N38
1986 574.5'0724 87-16337

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in other ways, and storage in data banks. Duplication of this publication or
parts thereof is only permitted under the provisions of the German Copyright Law of September 9, 1965, in
its version of June 24, 1985, and a copyright fee must always be paid. Violations fall under the prosecution
act of the German Copyright Law.
© Springer-Verlag Berlin Heidelberg 1987
Softcover reprint of the hardcover 1 st edition 1987

2131/3140-543210
Table of Contents

Foreword by Pierre Legendre and Louis Legendre. . . . . . . . . . . . . . . . . . . . . . . . . ix

I. Invited Lectures
Scaling techniques

John C. Gower
Introduction to ordination techniques .............................. 3
J. Douglas Carroll
Some multidimensional scaling and related procedures devised at Bell
Laboratories, with ecological applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Yves Escoufier
The duality diagram: a means for better practical applications . . . . . . . . . . . . . . 139
Jan de Leeuw
Nonlinear multivariate analysis with optimal scaling . . . . . . . . . . . . . . . . . . . . 157
Willem J. Heiser
Joint ordination of species and sites: the unfolding technique . . . . . . . . . . . . . . 189

Clustering under a priori models

James C. Bezdek
Some non-standard clustering algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
Pierre Legendre
Constrained clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
Leonard P. Letkovitch
Species associations and conditional clustering ....................... 309

Fractal theory
Serge Frontier
Applications of fractal theory to ecology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335

Path analysis for mixed variables


Jan de Leeuw
Path analysis with optimal scaling 381

Spatial analysis

Brian Ripley
Spatial point pattern analysis in ecology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
Robert R. Sokal and James D. Thomson
Applications of spatial autocorrelation in ecology. . . . . . . . . . . . . . . . . . . . . . 431
VI

ll. Working Group Reports

Manfred Bolter (Chainnan)


Numerical ecology: developments for microbial ecology . . . . . . . . . . . . . . . . . . . 469
John C. Field (Chainnan)
Numerical ecology: developments for studying the benthos. . . . . . . . . . . . . . . . . . 485
Jordi Flos (Chainnan)
Data analysis in pelagic community studies 495
Louis Legendre (Chainnan)
Numerical ecology:
developments for biological oceanography and limnology. . . . . . . . . . . . . . . . . . . 521
Robert Gittins (Chainnan)
Numerical methods in terrestrial plant ecology. . . . . . . . . . . . . . . . . . . . . . . . . . . 529
Daniel Simberloff (Chainnan)
Novel statistical analyses in terrestrial animal ecology:
dirty data and clean questions ..................................... 559

List of participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573

Subject index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577


NATO Advanced Research Workshop on Numerical Ecology
Station marine de RoscofT, Brittany, France, June 3 .11, 1986.

1 - Michele Scardi, 2 - Marie-Josee Fortin, 3 - WiIIem J. Heiser, 4 - Leonard P. Lefkovitch, 5 - Pierre Legendre, 6-
Louis Legendre, 7 - J. Douglas Carroll, 8 - Pierre Lasserre, 9 - Bruno Scherrer, 10 - Shmuel Amir, 11 - Frederic
Ibanez, 12 - Fortunato A. Ascioti, 13 - Serge Dallot, 14 - Jean-Luc Dupouey, 15 - Jordi Flos, 16 - Richard L.
Haedrich, 17 - Alain Laurec, 18 - David W. Tonkyn, 19 - Julie Sokal, 20 - Steve H. Cousins, 21 - Robert R.
Sokal, 22 - Daniel Simberloff, 23 - Carol D. Collins, 24 - Rebecca Goldburg, 25 - John G. Field, 26 - Clarice M.
Yentsch, 27 - Serge Frontier, 28 - John C. Gower, 29 - Marta Estrada, 30 - James C. Bezdek, 31 - Janet W.
Campbell, 32 - Daniel Wartenberg, 33 - Marinus J. A. Werger, 34 - Marc Troussellier, 35 - Robert Gittins, 36 -
Eugenio Fresi, 37 - Peter Schwinghamer, 38 - Richard A. Park, 39 - Manfred B{l!ter, 40 - Brian H. McArdle, 41 -
S. Edward Stevens, Jr., 42 - Philippe Gros, 43 - Paul Berthet, 44 - Francisco A. de L. Andrade, 45 - Vincent Boy.
Not pictured: Michel Amanieu, Jan de Leeuw, Yves Escoufier, Roger H. Green, Jean-Marie Hubac, Michael Meyer,
Brian Ripley.
Foreword

During the Sixties and the Seventies, most community ecologists joined the general trend
of collecting information in a quantitative manner. This was mainly driven by the need for testing
implicit or explicit ecological models and hypotheses, using statistical techniques. It rapidly
became obvious that simple univariate or bivariate statistics were often inappropriate, and that
community ecologists should resort to multivariate statistical analyses. In addition, some methods
that are not traditionally considered as statistical (e.g., clustering) were sometimes used
alternatively to, or in conjunction with statistical techniques. The fIrst attempts were not always
conclusive, because straightforward applications of both statistical and nonstatistical multivariate
methods often led to unsatisfactory or trivial ecological results. This was either due to the fact that
ecologists did not fully grasp the complexities of the numerical techniques they used, or more
often because the specific nature of ecological data was not taken into account in the course of the
numerical analysis.

Despite these diffIculties, community ecology acquired a mathematical framework, with


three consequences: it could develop as an exact science; it could be applied operationally as a
computer-assisted science to the solution of environmental problems; it could exchange
information with other disciplines, using the language of mathematics. This new framework has
evolved from an unstructured set of independent results, into a comprehensive, formal system of
thought coupled with an integrated methodology, known as numerical ecology. Numerical
ecology is the field of quantitative ecology devoted to the numerical analysis of ecological data
sets. The objective of this analysis is to determine and interpret their multidimensional and/or
process structures.

As numerical ecology progressively developed, during the last decade, it proposed various
ways of integrating several multivariate techniques into analytical schemes, and it specified sets of
rules that state how conventional methods should be used within the context of community
ecology. Some methods were also modified to better fit multivariate ecological data sets. In the
last few years, however, it has become apparent that existing approaches in numerical ecology
often could not answer the increasingly complex questions raised by community ecologists, and
that a large body of ecological information remained unexploited by lack of appropriate numerical
methods. This was the main incentive for organizing a NATO Advanced Research
Workshop on Numerical Ecology, where community ecologists could meet with proponents
of new methods for the analysis of numerical data, and explore with them how these could be
applied to community ecology.
x

As stated above, numerical ecology typically combines several numerical methods and
models, of complementary character, to probe data sets describing processes that occur within
ecosystems. New mathematical models (e.g., fractals and fuzzy sets) and methods (generalized
scalings, nonlinear multivariate analyses, spatial analyses, etc.) have recently been developed by
mathematicians, or by statisticians and methodologists working in related fields (e.g.,
psychometrics). The first purpose of the Workshop was to bring methodologists and community
ecologists to the same conference room. The Workshop was designed as follows. Mathematicians
and methodologists presented their theories during morning sessions: Scaling techniques (I, II and
III); Clustering with models, includingjuzzy sets; Fractal theory; Qualitative path analysis; Spatial
analysis. During the afternoons, six working groups representing various branches of community
ecology met with the methodologists to discuss the applicability of these methods to the following
fields of specialization: Micro-organisms; Benthic communities; Pelagic communities; Dynamic
biological oceanography and limnology; Terrestrial vegetation; Terrestrialjauna. The Workshop
was also one of the first opportunities offered to numerical ecologists from the various disciplines
(aquatic and terrestrial; botany, microbiology, and zoology) to meet and work towards a common
goal.

The NATO Advanced Research Workshop on Numerical Ecology took place at the Station
marine de Roscoff, France, from 2 to 11 June 1986. There were 51 participants (listed at the end
of the book), originating from 14 countries: Australia, Belgium, Canada, France, Federal
Republic of Germany, Israel, Italy, the Netherlands, New Zealand, Portugal, South Africa,
Spain, the United Kingdom, and the United States of America. The International Organising
Committee for the Workshop was: Pierre Legendre and Louis Legendre (co-directors, Canada),
Michel Amanieu (France), John G. Field (South Africa), Jordi Flos (Spain), Serge Frontier
(France), John C. Gower (United Kingdom), Pierre Lasserre (France), and Robert R. Sokal
(USA).

This book of proceedings comprises the invited lectures, as well as the working group
reports. Lectures contributed by the participants are not included and will eventually appear
elsewhere. The published versions of the papers are often quite different from the oral
presentations in Roscoff, because the authors took into account the discussions that followed their
lectures, as well as criticisms and suggestions by external peer reviewers. As editors, we are
pleased to stress the good spirit and collaboration from all the authors during this critical phase of
paper improvement.
XI

The meeting was sponsored and funded by the Scientific Affairs Division of the North
Atlantic Treaty Organization (NATO). France provided additional financial support, through the
PIREN and PIROcean programs of the Centre national de la Recherche scientifique (grants to
Prof. Michel Amanieu), and the Ministere des Affaires etrangeres (grant to Prof. Pierre Lasserre);
the Station marine de Roscoff also contributed significant non-monetary support. We are sure that
the participants would want us to express their particular thanks to Prof. Pierre Lasserre and his
staff, for local arrangements and superb food, and to Marie-Josee Fortin who very ably assisted
the co-directors with administrative matters before, during and after the meeting, in addition to
being herself an active scientific participant.

In addition to the Editors, several colleagues listed henceforth refereed manuscripts for this
book of proceedings: J. Douglas Carroll, Serge Dallot, William H. E. Day, Yves Escoufier, Scott
D. Ferson, Eugenio Fresi, Robert Gittins, Leonard P. Lefkovitch, Benoit B. Mandelbrot, Brian
H. McArdle, F. James Rohlf, Michele Scardi, Bruno Scherrer, Peter Schwinghamer, Daniel
Simberloff, Robert R. Sokal, Marc Troussellier and Daniel Wartenberg. Their assistance is
gratefully acknowledged.

Pierre Legendre Louis Legendre


Departement de sciences biologiques Departement de biologie
Universite de Montreal Universite Laval
Ie Invited Lectures

Scaling techniques
INTRODUCTION TO ORDINATION TECHNIQUES

John C. Gower
Rothamsted Experimental Station
Harpenden, Herts. AL5 2JQ, UK

Abstract - The main ordination techniques used in ecology to display data on


species and/or sites are described and attention is drawn to three areas of
confusion whose clear understanding governs proper use. These are
(i) the relevance of different types of data and measurement-scales:
(e.g. presence/absence, abundance, biomass, counts, ratio-scales,
interval-scales)
(ii) the different implicit models that underly what superficially may
seem to be similar kinds of display but which are to be interpreted
differently (e.g. through distance angle or asymmetry)
(iii) the distinction between a two-way table and a multivariate sample
(units x variables).
Against this background the following methods are briefly described
Principal Component Analysis; Duality, Q and R-mode Analysis; Principal
Coordinates Analysis; Classical Scaling; Metric Scaling via Stress and Sstress;
Multidimensional Unfolding; Non-metric Multidimensional Scaling; the effects of
closure; Horseshoes; Multiplicative Models; Asymmetry Analysis; Canonical
Analysis; Correspondence Analysis; Multiple Correspondence Analysis;
Comparison of Ordinations; Orthogonal Procrustes Analysis; Generalised
Procrustes Analysis; Individual Differences Scaling and other three-way
methods.
The more important methods, not discussed in greater detail elsewhere in
this volume, are illustrated by examples and the provenance of suitable
software is given.

1. INTRODUCTION

In this paper I shall review the more common ordination techniques that
have found applications in ecology, together with related techniques, mainly
developed by psychometricians and generally termed multidimensional scaling,
that are of potential use to ecologists. Some of the methods covered are
developed in detail by other contributors to this volume. In the interest of
giving a cohesive account, I shall include some introductory comments on such

NATO ASI Series, Vol. G14


Developments in Numerical Ecology
Edited by P. and L. Legendre
© Springer-Verlag Berlin Heidelberg 1987
4

methods but refer the reader to subsequent chapters for more detailed
expositions. Examples of ecological applications of the methods illustrate the
various techniques; these examples have been drawn entirely from a
forthcoming book "Multivariate Analysis of Ecological Communities" by Digby
and Kempton (1986) and I am grateful to them and to their publisher, Chapman
and Hall, for giving permission. The reader is referred to the book for details
of background information and for many further examples.
Just as a scatter diagram gives a useful graphical representation of a
bivariate sample that allows salient features such as outliers, clusters and
collinearities to be picked out by eye, ordination methods aim to exhibit the
main features of multivariate samples in a few dimensions - ideally two. Thus
the emphasis is on informal graphical displays and not on problems of
inference. Formal inferential procedures are not usually available for the
methods discussed and indeed in my experience are rarely of interest in this
context. However when the effects of sampling variation are deemed relevant
the data-analytic techniques of jack-knifing and boot-strapping will usually be
available and will suffice to give an indication of the stability of displays and
associated confidence in their utility.
Underlying all graphical displays are informal or implicit models that allow
the coordinates of the points to be estimated and plotted. I shall try to make
clear the nature of these informal models. There is, of course, no claim that
the parameters of these models have any special ecological significance; they
are merely a mathematical contrivance that allows the data to be presented
conveniently. Occasionally patterns perceived in a display will suggest the
operation of some biological/ecological process that can be modelled more
formally. When this happens the classical statistical theories of estimation and
inference come into their own.
My aim is to describe the various ordination techniques in general terms,
indicating the assumptions made and how to interpret the graphical results.
This will entail using a little algebra from time to time but this will be kept to
an essential minimum. It is certainly not my aim to explain how to do the
calculations or how to construct suitable algorithms and thus develop computer
programs. For most of the methods discussed software is internationally
available and the provenance of specialised programs is given in the text; the
other methods are readily accommodated by good general-purpose statistical
languages and packages such as Genstat (Alvey et al. 1983).
5

2. THE STRUCTURE OF DATA

Table 1. Relative abundance of plot species (% total dry matter


yield per plot).

Plot 1 Plot 2 Plot 3 Plot 4

Agrostis tenuis 15.5 2.3 4.6 4.0


Alopecurus pratensis 2.5 1.0 2.8 1.0
Anthoxanthum ordoratum 7.2 6.5 9.6 13.1

Taraxacum officinale 0.1 0.3 0.6 0.4


Tragopogon pratensis 0.0 0.1 0.0 0.0

Plot Yield (t/ha) 0.8 1.6 2.8 2.3


Soil pH 5.2 6.8 5.2 6.B

For the most part we shall be concerned with data, as in Table 1, whose
n rows refer to species and whose p columns refer to sites. It is tempting for
mathematicians to refer to such a table as an nxp matrix X and then to ignore
its detailed structure. In this way crucial information may be ignored. Thus
in Table 1, the sites are plots which have each had different fertilizer
treatments and some of which have been limed and others not. In ecology the
sites are often spatially contiguous or they may fall into groups from
geographically different regions. The same species may have been repeatedly
sampled so that data for each species may occur in several rows of the table.
The whole table may have been sampled on several occasions or the different
sites may refer to the same site successively resampled. Such structural
information is vital to any sensible interpretation of the data.
Of equal importance is the type of information given in the body of the
table. In Table 1, a variable "relative abundance of plot species" is given.
This is a quantitative variable whose values, by definition, sum to 100% for
every plot (i.e. for every column). Apart from abundance, typical quantitative
variables of interest to ecologists are measurements (e.g. length of some plant
characteristic in centimetres, total biomass per site in grams per square metre
and counts, such as number of petals). As well as quantitative variables,
6

qualitative variables also are important. A typical qualitative variable may take
on one of a finite number of disjoint categories (e.g. black, white, green or
blue); the terms categorical, nominal and meristic variable also are used to
describe qualitative variables. Some qualitative variables may be ordinal
having an underlying notion of a natural ordering (e.g. smooth, textured,
rough). Of special importance are binary qualitative variables that take two
values (e.g. black/white, or presence/absence). In the latter example, absence
has a different logical status from presence and it may be wise to take
cognisance of the fact.
With quantitative variables we have already noted that some may be
counts, and hence dimensionless, while others are measured on scales that
carry with them definite units of measurement. These are of two principal
kinds, ratio-scales and interval-scales. Weight is an example of a ratio-scale,
where all weights are expressed as multiples of a standard Kilogram kept in
Paris; ratio-scales have a well-defined zero. Interval-scales are exemplified by
temperature, where two points on the scale are identified (e.g. the melting
point of ice and the boiling point of water) and the scale is then divided into
an equal number of steps; interval-scales do not have a well-defined zero (e.g.
zero Fahrenheit and Celsius are not equivalent). Weaker information is also of
importance in certain fields such as psychometrics. Thus with
paired-comparisons it is known only that one item is preferred to another; with
similarity data it is known that item A is more similar to item B than it is to
item C; with confusion data it may be recorded that the ordered sequence A,B
was identified nAB times and that this differs from nBA'
The above merely hints at some of the problems addressed in the major
discipline of the theory of measurement. However I hope it will suffice to
indicate their importance and that there are problems that ecologists should
think about before embarking on what may seem to be routine statistical
calculations. We have seen that a single variable may be exhibited in a
two-way table such as a species x sites table but in a more typical multivariate
sample the columns of the table/matrix X each refer to a different variable and
these different variables will often comprise a mixture of qualitative and
quantitative types, the qualitative variables being of differing numbers of
levels and the quantitative variables measured in different units. The
problems outlined above are thus compounded and the different interpretations
to be associated with a matrix X are extended.
In the following we shall see some of the more simple ways of handling
the difficulties associated with different types of data and different structures
7

of data. Ecologists have long recognised that the raw data will often require
some form of transformation or pre-processing before progress can be made.
Thus with the Braun-Blanquet scale, percentage cover is approximately
transformed to an additive scale in the range 1-5. By contrast the
Hult-Sernander-Du Rietz scale is a logarithmic one where 1 corresponds to less
than 6% cover, 2 to 6-12%% cover, 3 to 12%-25% cover, 4 to 25-50% cover and 5
to 50-100% cover. For insects Lowe (1984) has suggested another logarithmic
scale where 1 corresponds to one individual, 2 to 2-3 individuals, 3 to 4-7
individuals, 4 to 8-15 individuals and so on. These scales are chosen so that
particularly high abundance should not dominate subsequent analyses. When
working with computers it is probably more straightforward to do a logarithmic
transformation rather than to use such scales. The reasons for transformations
include the following:
To ensure independence from scales of measurement
To ensure independence from arbitrary zeros
To eliminate size effects
To eliminate abundance effects.

Table 2. Some useful transformations of data.

(deviations from the species means)

(ii) X·
1J'-x .J.
. . site means)

(iii) (a combination of (i) and (ii»

(iv) (proportion of ith

(v) - nXij!X.j
species at
(proportion
the jth site)
of jth site
jOften
as
expressed

containing the ith species percentages


(vi) (see Correspondence Analysis, Section 8.1.)
(vii) - log Xij (converts measurement scales into an additive
constant for ratio scales)

(viii) - Xij!rj (where rj = range or standard error;


eliminates scale)

(ix) (eliminates origin and scale)

(x) - categories (see Multiple Correspondence Analysis,


Section 8.2)

(xi) - monotonic (see Non-metric Multidimensional Scaling,


transformation Section 6.)
8

Writing xij for a typical entry in X, xi. for the mean of the ith species
and x.j for the mean at the jth site, then Table 2 lists some basic
transformations that are sometimes useful. Some of these transformations will
occur naturally in the following and will be discussed in their proper place.
However numbers (vii), (viii) and (ix) need some immediate comment. Number
(vii) is particularly attractive for ratio scales because the result of the
transformation is unaffected by the values of other items of data for the same
variable. This is not so for (viii) and (ix) unless rj is chosen as the ~ priori
range rather than the range in the sample. When rj is chosen as the sample
standard deviation, which is a very common choice, there are difficulties.
These arise because most ecological samples are likely to embrace mixtures of
several biological populations and the value of r j then depends on the mixing
proportions so it is not an estimate of any identifiable statistic. If the samples
are from a homogeneous population they probably have little interest; there is
also the problem that the usual formula for evaluating standard errors,
although unbiased, will with long-tailed distributions normally give gross
underestimates, balanced by occasional gross overestimates. In such cases
some preliminary transformation such as a logarithm or square-root is
indicated.

3. PRINCIPAL COMPONENTS ANALYSIS

Consider a data-matrix X giving information for each of n samples on each


of p quantitative variables. It is assumed that X has first been transformed to
reduce or eliminate some of the difficulties discussed in the previous section;
the transformations (vii), (viii) and (ix) of Table 2 will be especially relevant.
The ith sample has values (Xiu Xh, ... ,Xip) which may be regarded as the
coordinates of a point Pi referred to p orthogonal coordinate axes. The n
samples will then generate a cloud of n points, P U P 2, ... ,P n • Figure 1 shows
just two of these points, Pi and Pj, referred to p=3 axes labellea Xu X2 and
9

Figure 1. Diagram to illustrate Principal Components Analysis. The point Pi


has coordinates (Xiu xh, Xb), being the values of three variables
Xu X2, X3 for the ith sample; similarly for Pj. The axes for the
three variables are assumed to be orthogonal. The cloud of
points Pi(i=1,2, ... ,n) representing n samples has a best-fitting
plane, here represented by the axes labelled I and II. The
orthogonal projection of Pi onto the best-fitting plane is
n
labelled Qi and the plane is chosen to minimise I A2 (PiQi)'
i=l
Because of the choice of method for representing the sample the distance dij
between Pi and P j is given by:
p
=
2 2
dij I (Xik-Xjk) . (1)
k=l
This is essentially Pythagoras' theorem and hence this form of distance is
termed Pythagorean or Euclidean distance; the latter terminology is unfortunate
because, as is discussed in Section 4, other definitions of distance also satisfy
the Euclidean conditions. Other choices of distance will be discussed but it is
important to realise that the use of Principal Components Analysis brings with
it the particular choice of distance (1). This may be thought of as the basic
model behind Components Analysis.
10

3.1. The Ordination of the Units

Suppose now that it is wished to approximate all the pairwise distances


dij by distances 6ij between points Qi and Qj in some lower dimensional space,
say a two dimensional plane. In figure 1 axes defining such a plane are shown
as dimensions labelled I and II. One way of choosing Qi is as the orthogonal
projection of Pi onto the chosen plane. Thus the distance dij=6(PiJPj) is
approximated by 6ij=6(QhQj) and because of the use of projections 6ij " dij for
all pairs (i, j). The question is, can one choose the plane in some optimal
sense? In Principal Components Analysis the choice is made that minimises the
n
sum-of-squares of residuals I 6zCPiQi). Because of Huygens' principle, the
i=l
best fitting plane must pass through the centroid of the points Pi. Thus we
must work in terms of a matrix Y (say) of deviations from the mean; thus
Y=X(I-N), where N contains only elements of value lin. It turns out that
directions I and II are given as the first two eigenvectors of the corrected
sums-of-squares and products matrix Y'Y. The eigenvectors are the first two
columns of the matrix L that satisfies
Y'YL = LA, (2)
where A=diag(AuAz, ... ,A p ) is the non-negative matrix of the eigenvalues
arranged so that Al~Az~A3~ ... ~Ap. In general one may require an approximation
in k rather than two dimensions, in which case the directions determining the
space are given by the first k columns of L. Because the columns of L are to
be regarded as the direction-cosines of axes like I and II, L must be
normalised so that L I L=LL I =1, which can always be done because the
eigenvectors of a symmetric matrix are always orthogonal. This implies that
axes like I and II of figure 1, may be chosen orthogonally.
Thus the eigenvector calculations give the directions of the required axes,
which are termed the principal axes, and the projections Qi onto these axes,
often referred to as the component scores, are readily given by YL, giving a
set of coordinates that may be easily plotted when k=2. The plot of the points
Qi is said to be an ordination, a term originating in the special case where k=1
which, being one-dimensional, gives an ordering of the species.
If now in figure 1 it is assumed that the origin 0 has been chosen to be
at the centroid which, as we have seen, must lie in the fitted plane, then
11

n 2
n 2
n 2
that I dij = I 6ij + n I rij where the latter term is the minimised,
i,j i,j i,j
residual sum-of-squares. This shows that the criterion

I (dtj- 6 tj) (3)


i,j
has been minimised subject to the constraint that the distances 6ij arise as
orthogonal projections of the distances dij. The scores on the ith principal
axis are the projections given by Yli where Ii is the ith column of L. These
have sums-of-squares li'Y'Yli which from (2) is liliAi or, from the orthogonality
of L, simply Ai. Thus the total sum of squares in the fitted k-dimensional
k
plane is I Ai and the residual sum of squares orthogonal to the fitted plane
i=l
P P
is I Ai. Now I Ai = Trace A = Trace L'Y'YL = Trace y'y which is the total
i=k+l i=l
sum-of-squares of the elements Yij or what is the same thing, the total
sum-of-squares of the elements Xij expressed as deviations from their sample
means x.} Hence the usual phraseology that the proportion of the total sum-of-
k p
squares accounted for in the k-dimensional approximation is For I Ail I Ai.
i=l i=l
a good approximation one would expect this ratio to be fairly high, say at least
60%, but any short-comings in this respect can be overcome to some extent by
supplying supplementary information on the ordination diagrams, as will be
discussed later.
To recapitulate, we have so far approximated the distances dij by
distances 6ij that are obtained by projecting the points Pi onto a
k-dimensional plane, to give points Qi, in such a way that the residual
n
sum-of-squares I 4 2 (Pi Qi) is minimised. The directions of orthogonal
i=l
axes in this fitted plane are given by the first k columns of L, the
eigenvectors of Y' Y, and the coordinates of the point Qi are given by the ith
row of the first k columns of YL. This ordination contains information only on
the n samples; now we shall examine the possibility of including information on
the p variables.
12

3.2. Including Information on the Variables

/
/ /
/ /
/
/
/
... ../JYa
/
~
/

~o..:'2
/ ....
/
/ I

Figure 2. Diagram to illustrate how information on variables may be included


in a Principal Components Analysis. The Xi-axis projects into a
line in the best-fitting plane; this line is an axis labelled Yi in
the figure.

Consider any point XI on the Xi-axis (see figure 2). XI may be projected
into YI in the I-II plane, in exactly the same way as were Pi and Pj projected
into Qi and Qj. Indeed any point on the xI-axis will project into a point on
the line joining 0 and Yu so that this line, labelled YI in figure 2, may be
taken to approximate the XI -axis in the principal component plane. Points that
represent samples with positive (negative) deviations from the mean and close
to an x-axis will project into points with positive (negative) deviations from
the mean and close to the corresponding y-axis. Any sample that has values
of Xz and X3 close to the means of these variables, but a value of XI
SUbstantially different from its mean will project into a point close to the
y I-axis and this can, with caution, be used to aid interpretation. The caution
is necessary because any point on XI YI will project into Y I so that although it
13

is necessary for points close to XI to project into points close to y I it is by


no means correct to infer that points close to YI necessarily arise from points
close to XI' The axes Y2 and Y3 may be similarly derived to represent
approximations to the X2- and x3-axes.
Thus far Xl has been chosen arbitrarily and it is only the direction of YI
that is important. Some additional information can be obtained by choosing Xl
to be one unit, or one standard deviation along the Xl -axis, so that the end
points YI, and similarly Y2 and Y3 have significance. Then the differences in
the lengths A(OYt!, A(OY 2), A(OY 3 ) etc. can be used to infer the degrees of
distortion in the representation of the x-axis in the ordination given by the
principal axes. Suppose we choose Xi to be one unit from the mean (i.e. the
origin) then the points Xu X2, X3°o. may be regarded as pseudo-samples whose
values are given in a unit matrix I of order p. Just as YL gives the
component scores for ordinary samples, IL = L gives the scores for these
pseudo-samples so that the coordinates of Yi are merely given by the first k
elements of the ith row of L. The points X"X 2,X 3,00. are the vertices of a
regular simplex in p-1 dimensions, so there is no hope that YI, Y2' Y3,00' will all
be accurately represented in k dimensions when k is very much less than Pi
this underlines the caution needed when using interpretations appealing to the
Yi-axes. An ordination that contains information on both units and variables is
often referred to as a biplot, a terminology introduced by Gabriel (1971).

3.3. Approximating Correlations

Instead of, or as well as, plotting the rows of L as coordinates we may plot
those of LA~. The ith row of this matrix, when regarded as a coordinate of a
point Ri, does not lie on the Yi-axes, although its first k dimensions do lie in
the space of the first k principal axes. Thus although the points Ri could be
plotted in the same ordination as those containing the projections Qi and Y I,
Y2, Y3,00' it is best to plot them separately. The interest in the plot of the
points Ri arises from the algebraic identity
y'y = (LA~) (LA~)' = LAL' (4)
which shows that what is being approximated is now not a distance but an
inner-product. Geometrically the (i,j)th element of y'y is approximated in a
k-dimensional representation by A(ORi)A(ORj)cos(RiORj)' The approximation is
again optimal in the least-squares sense (see Section 3.4, below) that no other
k-dimensional representation will have a smaller sum-of-squares than
14

P 2
Trace(Y'Y-LAkL')2= I Ai where Ak differs from A in having zero diagonal
i=k+l
values Ai when i> k. Note that now the sum-of-squares accounted for is
expressed in terms of sums-of-squares of the original eigenvalues, rather than
their sums as previously. Normally Y' Y is the corrected sample
variance-covariance matrix of X, but when X has been normalised to eliminate
the effects of differing measurement scales by using the transformation (viii)
of Table 2, with rj set to the standard error of the jth variable, y'y will be
the product-moment correlation matrix of X. The inner-products will then
approximate the correlations between the variates, and the distances of each
point from the origin should all approximate the common unit variance. Thus
when examining such plots one should be looking for orthogonal pairs ORi, ORj
(suggesting zero correlation between Xi and Xj) or coincident directions ORb
ORj (unit correlation, but Ri and Rj should coincide and be close to the
desired unit distance from 0). Additional to the usual caveats concerning
caution when interpreting projections, extra caution is needed with
correlations. Correlations have well-defined meanings in linear situations such
as arise when data can be considered approximately multinormal. However this
is rarely the case with ecological samples. It should not be forgotten that
even exact non-linear relationships will not give high correlations, thus the
absence of correlation should not be taken to imply the absence of an
interesting relationship. My advice is that rather than examine plots of LA", it
is often better to examine all the pairwise scatter plots of Xi with xjo
The distances approximated by 4(RiRj) in the plots of LA" when y'y is
a correlation matrix with elements rij. is J2(I-rij). Clearly this form of
analysis may be regarded as an analysis of correlation. However it is
misleading to view the fundamental plots of Components Analysis in this light,
for the Pythagorean distance (1) takes no account whatsoever of the possible
correlations between the variables.

3.4. The Eckart-Young Theorem

It is instructive to present the algebra of Components Analysis in a


slightly different form to that given above. We know from algebra that any
real rectangular matrix Y may be expressed in its singular value decomposition
form as follows
Y = UN' (5)
15

where Y is of order (nxp), U is orthogonal (nxn), V is orthogonal (pxp) and I


is of order (nxp) with zero terms except on the "diagonal" where 0'1~0'2~."O'S"'O

and s=Min(n,p). The term orthogonal is that usually used in the current
context but more properly the term orthonormal should be used to indicate
that U' U and UU' are both unit matrices, and similarly for V. The
non-negative quantities O'i are termed the singular values of Y. Thus from (5)
Y' Y = VI' IV' and we may identify the previous orthogonal matrix L with V and
the diagonal matrix A with I'I, (i.e. ~i=O'i). Thus the previous expression for
the component scores, YL, may be written as UIV' V = UI. It follows that the
singular value decomposition may be written Y = (UI)V' = (YL)L'
simultaneously giving the component scores and loadings. Further LA%
corresponds exactly with VI. The decomposition (5) is important for a result
proved by Eckart and Young (1936) states that Yr the best rank r
n p r ( ).
approximation to Y (i.e. the one that minimises
I I (Yij-Yij)2 where Ylj
i=l j=l
is the (i,j)th element of Yr ) is obtained by replacing I by Ir where Ir is the
same as I except that O'i=O for all i > r.
With this change only the first r
s
columns of U and V are effective. Whereas we may write Y=I O'iUiVi' we have
i=l
r
that Yr = I O'iUivi' where ui and Vi are the vectors that are the ith columns
i=l
of U and V respectively. Clearly the residual sum of squares after fitting
s
Yr to Y is given by I O'i.
i=r+l
The Eckart-Young theorem shows that the equivalence of (YL)L' to the
singular value decomposition of Y implies that in Components Analysis the inner
product between component scores and the plots of L gives an approximation
to the data Y. That is Yij ~ d(OQi)d(OYj)cos(QiOYj). Also because y'y is a
symmetric matrix with non-negative eigenvalues, (4) gives its singular value
decomposition and the Eckart-Young theorem shows why taking the first k
columns of LA% gives the best k-dimensional approximation to the correlation,
or covariance, matrix.

3.5. Duality, Q and R-mode Analyses

One further property of Components Analysis is relevant to later


discussions. Equation (2) may be pre-multiplied by Y to give
YY' (YL)= YLA. (6)
16

This shows that VL =M (say) gives the eigenvectors of the nxn matrix YY'
and that diag(A) again gives the eigenvalues. Because of the previous
normalisation L' L = I, and from equation (2), the normalisation of the
eigenvectors M is given by M'M = L'V'VL = L'LA = A, i.e. the ith column of M
is scaled to have sum-of-squares Ai. Thus finding the eigenvalues of YY' and
scaling them as indicated, the component scores are found immediately; the
vectors L may then be determined by pre multiplying M by (V' V)-l V'. The
operation on the nxn matrix YY' is sometimes referred to as a Q-technique, as
opposed to the R-technique of operating on the pxp matrix V'V. The two
approaches give the same results and should be viewed as alternative methods
of computation. Usually p is much smaller than n so the R-technique will be
preferable.
Occasionally, there is no clear distinction whereby the variables can be
associated with one direction (the rows) and the units with another direction
(the columns). Then we may wish to regard the points Pi as p points referred
to n coordinate axes. The best fitting plane then passes through the point
representing the row-means (species means in Table 1) and X has to be
replaced not by (I-N)X but by X(I-P) where P is pxp with all elements equal to
lip. This too generates both a Q-technique and an equivalent R-technique but
the distance dij is now defined between columns and not between rows and will
therefore generate a different analysis from the one discussed above. When
the columns refer· to well-defined variables, the evaluation of row means is
invalid, for it implies summing quantities with disparate units of measurement.
Some alleviation of this difficulty can be achieved by normalising the variates
to dimensionless forms, as in transformations (vii) and (viii) of Table 2, but I
do not believe that such transformations are sufficient to legitimise the
process. In Section 7.3 a model is discussed where rows and columns have
equal standing.
Legendre and Legendre (1983) suggest that the mode of sampling may be
used to distinguish the units from the variables. The sampling units are then
the units of a Components Analysis and the descriptors of the samples are the
variables. Thus we have to consider carefully how a table, like Table 1, has
been compiled. There are three possibilities:
(i) individual plants are sampled, in which case species
name would be one (categorical) variable, pH a
quantitative variable etc. When all variables are
quantitative we have the classical set-up for
Components Analysis; when all variables are categorical
17

we have the classical set-up for Multiple


Correspondence Analysis (Section 8.2) which can also
handle a mixture of both types of variable;
(ii) the sites are sampled, in which case presence/absence
or abundance of the several species, or indeed other
categorical or quantitative descriptors of the sites,
might be regarded as the variables;
(iii) the species are sampled, in which case the properties of
the species, including the sites at which they occur,
could be regarded as the variables.
In my view sampling considerations might be useful as a guide but are
not decisive. The essential thing to consider is whether or not the distance
given by (1) is sensible and if it is, whether it is interesting. With Table 1,
the distance between rows gives a measure of difference between species-pairs
based on their propensity to take advantage of different nutrients. The
distance between columns measures differences between nutrients based on the
responses of the different species. In this case both Q and R techniques
might be of interest.

3.S. An Example of Components Analysis

We have already focussed on the difficulty that Table 1 is not in the


fundamental form required for a Components Analysis, and this seems true of
many ecological data-matrices. To be precise. Table 1 is a two-way table of a
single quantitative variable (relative abundance) classified by two categorical
variables (site and species). Viewed as a data-matrix it refers to only three
variables (p=3) and two of these are categorical and therefore cannot be
handled by Components Analysis although they might be by other methods (see
Section 8.2, below). We may, however, proceed by treating the table as if it
were a data-matrix, either by treating the species as variates, which implies a
Pythogorean distance between sites, or by treating the sites as variables,
which implies a Pythagorean distance between species.
18

-
*CO-

-
C\I
,...

I
a..
Holcus
lanatus.
Poa
• pratensis • Agrostis
() tenuis
a..
.. J
.
Poa • Anthoxanthum
trivialis • odoratum
• Arrhenatherum ••
elatius •
• ~

Alopecurus •
Dactylis

• Helictotrichon
• Festuca rubra
.pubescens

pratensis glomerata

PCP-I (40.4%)

Figure 3. Components Analysis of log-abundance data from Park Grass (see


Table 1). The first two axes are shown. accounting for 40.4% and
12.6% of the total dispersion. The percentages for the first four
axes are: 40.4. 12.6. 11.1 and 9.6. The underlined species are the
six dominant grass species. Points representing species other
than grasses are not named.

The latter has been done using the data of Table 1. first transformed to
logarithms of relative abundance and then with row and column means removed
(see Table 2 (iii». This transformation reduces the effect of the more
abundant species. which would otherwise dominate the analysis. The space of
the first two components is given in figure 3. The names of the most
abundantly occurring grasses have been underlined. The two-dimensional
space accounts for only 53% of the total dispersion but a third dimension
increases this to 64%. This third dimension is given in figure 4. where it is
seen that only Festuca rubra contributes significantly to the enlarged space.
Figure 3 may be converted into a biplot by superimposing the vectors given in
figure 5.
19

-
,...
~

-
,..
,..
Anthoxanthum
I • odoratum
a..
o
a..
Arrhenatherum
.elatius
Alopecurus

Agrostis tenuis
• pratensis

• Holcus lanatus

• Festuca
rubra

PCP-I (40.4%)

Figure 4. As for figure 3 but showing first and third principal axes and
only the six dominant grass species underlined in figure 3.
20

Continuously
limed
- -_ _ 8

b
Continuously limed
plus recent boost

Figure 5. Biplot, to augment figure 3. For simplicity, only six of the 38


variates (i.e. field-plot treatments) are shown. Before projection,
all six vectors are of equal length and the labelled points form
the matrices of a regular simplex. After projection, as in the
figure, considerable distortion has occurred. The letters a,b,c,d
are explained in the caption to figure 6.

The directions given in figure 5 refer to sites but, because in the Park Grass
experiment sites receive fertilizer treatments, it is more informative to label the
vectors by the treatment names. We note that plots with treatment N2 PK and
liming seem to be associated with Arrhenatherum elatius and Alopecurus
pratensis while recent liming is associated with Holcus lanatus. Unmanured
plots are most closely associated with Festuca rubra of the dominant grasses
and with herbaceous species that are unnamed in the figures. The direction of
the first component is associated with increased abundance of species per plot
so that the effect of liming and fertilizers is to decrease the number of spC;lcies
and increase productivity. This latter point may be examined in more detail by
doing a Components Analysis on the sites. This still uses logarithm of relative
21

abundance but Pythagorean distance is now defined between sites rather than
between species; it should be recalled that the two forms of analysis are not
simply related. In figure 6 the points plotted refer to sites and these have
again been labelled by their treatments and joined in pairs by directed
lines indicating those sites with increasing levels of liming. It can readily be
seen that pH increases in a roughly NE/SW direction (in the figure, not in the
field). Also plotted on figure 6 are contours defining regions of increasing
biomass (dry matter in tonnes per hectare). Productivity increases in a
direction roughly running from SE to NW. This interpretation is that of Digby
and Kempton (1986) and indicates how Components Analysis may be usefully
enhanced by adding relevant information not directly used in the analysis.

... ". '" " "

..... .....
".
- -
- ~t~d).d
i<p
.-
Unmanured
C a

Figure 6. Principal components of the 38 sites. Production increases with


increasing nutrients and generally with increasing applications of
lime.
------------ joins sites with the same fertilizer treatments
and four levels of liming.
----------- joins sites with the same fertilizer treatments
and two levels of liming.
b. Limed every fourth year plus a boost in 1965.
a. Limed every fourth year.
c. Unlimed except for boost in 1965.
d. Unlimed.
N,P,K are the usual nutrients; suffices refer to increasing
levels of application and + to additional nutrients.
22

4. MEASURES OF DISSIMILARITY, DISTANCE AND SIMILARITY

Suppose a Components Analysis were done not on a table of quantitative values


like those of Table 1 but on presence/absence data. Let 1 denote the presence
of a species at a site and 0 its absence, then the squared distance between
species and j is given by bij+cij, where bij is the number of sites containing
species but not j and Cij is the number of sites containing species j but not
i. Denoting the number of sites containing both species by aij and the number
of sites containing neither species by dij (note that dij in this section is not
now a distance), Table 3 may be constructed.

Table 3. The numbers of co-occurences of species i and j at p sites.

~
Species j
Total
present absent

present aij bij xi


Species i
absent Cij dij P-xi

Total Xj P-Xj P

The quantity (aij+dij)/P is termed the simple matching similarity coefficient


between species i and j, because it expresses the proportion of % and 1/1
matches for these two species. A similarity coefficient takes values between
zero and unity and is unity only when both species have the same pattern of
occurrences at all sites. A zero value generally indicates no relevant
co-occurrences, as when aij=dij=O. Thus a Components Analysis of 0/1 data is
equivalent to assuming a distance proportional to ~-Sij where Sij is the
simple matching coefficient. This is often a perfectly sensible choice of
distance. Even the superposition of the axes Yi still carries some useful
information. The only data-points that can occur on the Xi axis are those for
o and 1; the latter representing a rare species occurring only at site i.
Taking this as the point Xi> in the notation of the previous section, it will
project into a point Yi as before. The other points on the Yi-axis seem to
have little meaning. The plotting of correlations is also of dubious value.
23

4.1. Coefficients for Binary Variables

The ecological difference between comparing the absence of a species at


two sites and that of comparing the presence of a species that might occur in
two forms has already been noted. Thus the simple matching coefficient, which
includes the term dij, may be unacceptable and other coefficients that exclude
dij might be required. There are very many coefficients of both kinds and
some of the properties of some of these coefficients are discussed by Gower
and Legendre (1986), where further references may be found. Table 4 lists
just a few of these coefficients that have found application in ecology; the
suffices i and j have been dropped to help readability.

Table 4 Some typical binary similarity coefficients.

Similarity Coefficient Name

a+d
Simple Matching
p
a
a+b+c Jaccard

2a Sorensen
2a+b+c
a
Ochiai
j(a+b) (a+c)
ad-bc
Pearson's <I>
j(a+b) (a+c) (d+b) (d+c)

Similarity coefficients Sij calculated between all pairs of n species may be


arranged into a symmetric matrix S with unit diagonal. Dissimilarity is merely
the complement 1-Sij of similarity and these may similarly be arranged into a
symmetric matrix but with zero diagonal. The question then arises whether or
not the dissimilarities may be regarded as Euclidean distances, for if they can
be we have a set-up, similar to that of Principal Components Analysis, in which
the samples are represented by a cloud of points Pi(i=1,2, ... ,n) but where
dij=A(PiPj) now represents a dissimilarity, or perhaps some function such as
the square root of dissimilarity, rather than Pythagorean distance. The answer
is that sometimes we can, in which case the points are said to be imbedded in
a Euclidean space, and sometimes we cannot. When we cannot the
dissimilarities may nevertheless be metrics; that is the triangle inequality
24

dij+dik~djk holds for all triplets (i,j,k). The metric property is weaker than
that of Euclidean distance - all distances are metrics but not all metrics are
distances. When the triangle inequality is valid for all triplets then all the
triangles may be drawn, but higher dimensional Euclidean representations need
not exist. This is most easily seen by considering a tetrahedron whose base
ABC is an equilateral triangle with side 2 units and whose apex D is
equidistant d units from A, Band C. When d=l all triangle inequalities are
valid (with equality except for ABC) and D has to lie simultaneously at the
mid-points of AB, BC and AC. This is clearly impossible in a Euclidean space.
As d increases D moves away from the mid-points but must still occupy three
positions simultaneously, until a true Euclidean representation occurs when
d=2/J3 and D coincides with the centroid of ABC. As d increases further, D
moves out of the plane of ABC to give a normal three-dimensional Euclidean
representation of the tetrahedron.

I
Consider the similarity coefficients

Te = a+e(b+c)
a+d e ~ 0
and Se = a+d+e(b+c)
The family Te excludes negative-matches and the family 8e includes them.
Many coefficients commonly used in ecology are defined for specific values of e
(e.g. e=l, 1/2, 2). It can be shown (Gower and Legendre 1986) that as e
increases from zero, the dissimilarity coefficients I-Te, Ji-Te, I-Se and
Jl-se pass thresholds eM and e E at or above which the coefficients always
give, respectively, metric and Euclidean dissimilarity matrices. The explicit
results are as follows:

I-Te 1

Ji-Te 1/2
I-Se

Jl-se 1/3, eE = 1

These bounds define regions where the matrices are always metric or always
Euclidean. They do not imply for example that when e < 1/3 then all
matrices of Ji-Te are not metric or not Euclidean; the only claim is that
matrices cannot be guaranteed metric when e<e M and cannot be guaranteed
Euclidean when e<e E•
Another interesting property of both families arises from noting that if
25

Se(i.j)~Se(k.') then ~~~:~~~ ~ ~:::~:: and conversely. The conversity implies


that S4>(i,j)~S4>(k,,) for any positive value of 4>. In other words the coefficient
Se(i,j) is monotonic with e. The same is true for the family Te.
The import of this result is that for many of the classical ordination
techniques discussed in Section 5, Euclidean dissimilarity is a desideratum,
though perhaps not essential, and the coefficient should be chosen accordingly.
However for those ordination techniques that use only ordinal information on
dissimilarities (see Section 6), all choices among Se and all choices among Te
are exactly equivalent. This result implies either that discussions about the
relative merits of coefficients that are members of the same families are futile
or that ordinal methods are sacrificing crucial information. It is demonstrated
below (Section 9.2) that choosing coefficients from different families seems to
give bigger differences than choosing an ordinal rather than non-ordinal
method.

Table 5. Number of joint occurrences of species.


1. Agrostis 29
2. Alopecurus 22 30
3. Anthoxanum ·28 27 35

19. Ranunculus 16 18 19 ..........•. 19


20. Taraxacum 23 28 28 ............ 19 31

1 2 3 •..••.••..•. 19 20

To calculate similarity based on binary variables is a trivial matter but it

is worth noting that the information given by all (~) tables like Table 3

can be assembled into a single symmetric matrix A, as in Table 5, whose


off-diagonal values are aij and whose diagonal values are Xi. Thus the ith
diagonal of A contains the number of sites with species i. We then have that
bij+cij = xi+xj-2aij and dij=p+aij-xi-Xj which allow all coefficients in the families
Se and Te to be calculated.
26

4.2. Coefficients for Quantitative and Qualitative Variables

So far only similarities involving binary variables have been discussed.


The remainder of this section covers briefly how to deal with multi-level
qualitative and with quantitative variables, and how to combine information on
different kinds of variable.
To evaluate the similarity between unit i and unit j, based on a set of
multi-level qualitative variables, is a simple extension of the simple matching
coefficient. One merely has to evaluate the proportion of matches to
comparisons. Thus a variable with levels Red, Green, Yellow and Blue would
score a match if both units were the same colour, else not. The problem of
negative matches is not relevant because all colour-levels have equal logical
status. This need not always be the state of affairs as when a colour, White
(say), may signify a lack of a gene controlling colour. When negative matches
of this kind are recognised they may be easily disregarded in the manner of
the Jaccard coefficient. A more specifically ecological example might be one in
which the levels represent not colours but different kinds of disease, with
negative matches generated by pairs of samples with no disease. Ordered
categorical variables get no special treatment with this method. They may be
handled as quantitative variables (see the next paragraph) either by treating
ordinal numbers as if they are cardinal numbers or by assigning optimal
scores (see Section 8.2), a method available for all qualitatively defined
categories, or even categories with quantitative boundaries.

Table 6. Some typical quantitative dissimilarity coefficients.


For the Minkowski metric the quantities rk are normalisers
introduced to eliminate the effects of differing units of
measurement.

Dissimilarity Coefficient Name

dij is the Minkowski metric with special


cases (i) p=l, r=l City-block or Manhattan
metric and (ii) p=2, r=l Pythagorean distance.
p
r 1Xik-Xjk l
1 t=l
dij = p Bray-Curtis
p
r (Xik+Xjk)
t=l
p IXjk-XJk l
Canberra metric
t~l(Xij+Xjk)
27

Table 6 lists a few of the many suggestions that have been made for defining
dissimilarities when all variables are quantitative. Once again the question
arises as to whether or not nxn matrices of these coefficients are metric,
Euclidean or neither. An additional complication is whether or not to admit
negative values for xij. Although negative quantities are rarely, if ever,
observed in ecology, they may easily arise as the result of preliminary
transformations such as some of those listed in Table 2. Gower and Legendre
(1986) list the properties of ten different coefficients defined for quantitative
variables, both when negative values are allowed and when they are not.

4.3. Similarity with Different kinds of Variables

In practice, a combination of binary, qualitative and quantitative variables


often is encountered. Gower (1971a) suggested that this could be readily
handled by assigning a weight Wijk and a score sijk to the comparison
between units i and j for the values of the kth variable. Specifically
similarity is defined by:
p
I Sijk Wijk
k=1
= p
(7)
I Wijk
k=l

Table 7. Examples of the scoring and weighting systems.

Values

Unit i Unit j Score Weight


xik Xjk Sijk Wijk
+ + 1 1 (Simple matching, Jaccard) ;
2 (Czekanowski)
+ - 0 1 (simple matching, Jaccard
- + 0 and Czekanowski)
- - 1 1 (Simple matching);
0 (Jaccard, Czekanowski)
A A 1 1 (but 0 if AA is recognised as
a negative match)
A B 0 1 (Categorical values)
Xik Xjk l-Ixik-Xjkl/rk 1 (Quantitative values)

* * 0 0 (missing values)
28

Normally Wijk=l, unless double-negatives are to be excluded, or data are


missing on the kth variable for either or both units, in which case Wijk=O.
The metric and Euclidean properties of dissimilarity coefficients are established
only for complete data; when values are missing there is no guarantee that the
results remain true and it is known that often they do not. Table 7 gives the
weights and scores for some of the coefficients discussed above.
From Table 7 it is evident that not only can binary, qualitative and
quantitative variables be combined in the one coefficient but also that each of
the p variables may, if desired, be weighted differently. Thus some binary
variables may be treated in Jaccard form, others in Simple Matching form and
others in S~rensen form and so on. With qualitative variables, some may
recognise the possibility of eliminating negative matches and others not. Some
quantitative variables may be handled in City-block form and others as
Pythagorean. Note that for quantitative variables the scaling rk must be
chosen sufficiently large for Sijk never to become negative; the sample range
or the population range are two valid choices.

5. METRIC SCALING

In Section 3 it was demonstrated that Principal Components Analysis is an


ordination method that, by minimising a residual sum-of-squares, projects a
cloud of points Pi in a Euclidean space, whose interdistances are Pythagorean,
to give a k-dimensional approximation. In Section 4 it was shown that
dissimilarity between units can be defined in many ways, some at least of
which give Euclidean distances which, as with Components Analysis, may be
imagined as being generated by distances within a cloud of points Pi. Just as
in Components Analysis these distances may be approximated by distances
between points Qi that are projections of the Pi onto k dimensions. This is
the model behind Principal Coordinates Analysis (PCC), alias Classical Scaling.
Thus Principal Coordinates Analysis generalises Components Analysis to handle
any Euclidean distances, not necessarily Pythagorean, derived from the data x.
It will be seen that even the Euclidean assumption may be relaxed.
Thus the starting point is a set of distances or dissimilarities, dij'
arranged in a symmetric matrix which will have a zero diagonal. It turns out
to be more convenient to consider the matrix D defined to contain values
(-%dij). Then it can be shown (see e.g. Gower (1984a)) that coordinates Y that
generate the distances dij, and which are referred to their principal axes, may
29

be obtained from the spectral decomposition of


(I-N)D(I-N) = YY' (8)
where the eigenvectors Yare scaled such that Y'Y=A, the diagonal matrix of
the eigenvalues of the left-hand-side of (8). Compare (8) with (6); the
relationship is exact when dij is Pythagorean distance defined from the
data-matrix X; indeed in that case, and only in that case, component loadings
may be calculated from (Y'Y)-lY', as previously described. With general
distances dij, the principal coordinates in k dimensions Qj (i:l,2, ... ,n) are given
by the n rows and first k columns of Y. Using the matrix N in (8) ensures
that the coordinates lie in a plane containing the centroid, as desired by the
Huygens' principle. This is easily checked by pre- and post-multiplying (8)
by vectors of ones to give (I'Y)2=O, and noting that l'y=O is the condition for
each axis to be centred at the centroid.
The reason that the decomposition (8) has the desired effect is that any
symmetric matrix A may be written in the form A=YY' and Y will be real when
A is positive semi-definite. If now the rows of Yare treated as coordinates of
points, the squared distance between the ith and jth points is given by
aii+ajj-2aij. which when A is the left-hand side of (8) becomes dtj as desired.
When (8) is replaced by (I-N)S(I-N), where S is a similarity matrix with unit
diagonal, the decomposition gives squared-distances 2(I-Sij), so that distance
is then proportional to the square root of dissimilarity. This shows that
Principal Coordinates Analysis may operate directly on a similarity matrix to
give ordinations that approximate a perfectly acceptable distance.
When the dij are not Euclidean distances some of the eigenvalues of
A=diag(). •• ).2r;;-.,).n) will be negative and the scaling y'Y:A is not achievable in
real numbers. When only some of the smaller eigenvalues are negative the
k-dimensional solution will still be real and there is little problem (see Sibson
(1979) for a precise analysis of this situation). When the larger eigenvalues
are negative and have to be retained, we may appeal to the Eckart-Young
theorem for justification. The resulting ordination is not Euclidean but may
nevertheless give a useful approximation. However it should be realised that
whenever negative eigenvalues occur, the points Pi cannot have a Euclidean
representation and the least-squares rationale based on projections is invalid.
The justification via the Eckart-Young theorem is based on a different
least-squares criterion that coincides with the projection criterion only when
no eigenvalues are negative.
30

PCO-I(33·3%)

Figure 7. Principal Coordinates Analysis of association between species of


moss (Yarranton, 1966). The lettered regions denote
species-groups identified by Yarranton. Because the analysis
generates negative eigenvalues, goodness-of-fit is indicated in
terms of squared eigenvalues (see text).

Figure 7 gives the result of a Principal Coordinates Analysis of data on


associations between species of moss (Yarranton, 1966) where the association
coefficient dij is based on nij, the number of times that species i is sampled as
a nearest neighbour of species j. Digby &. Kempton (1986) give further details
but the effective measure of association is diFlognii+lognjj-2lognijo The
diagram is very like those of Principal Components Analysis but no information
on variables is included, nor does it exist. The two-dimensional fit
satisfactorily reproduces the species-groups recognised by Yarranton and it
suggests a gradient of increasing shade and moisture. The biggest and
smallest eigenvalues of the Principle Coordinates Analyses are 23.4, 16.5, 15.6,
11.3, •.. ,-2.1, -2.7, -2.9, -3.7. Goodness-of-fit is hence expressed in terms of the
squares of the singular values (in this case the same thing as the squares of
the eigenvalues) and the percentage sums-of-squares fitted are then 33.3, 16.5,
14.8, 7.8, ... , 0.3, 0.4, 0.5, 0.8.
31

Other forms of metric scaling have found some application in ecology. In


these methods the objective is once again the basic one of approximating given
dissimilarities dij by Euclidean distances 6ij arising from a set of points
Qi(i=1,2, ... ,n). Now however there is no appeal to the notion of projections and
hence no implicit assumption that the distances dij have a Euclidean
representation. The two most important of the criteria that are used to
estimate the Qi are:
Stress LWij(dij-6ij)2
Sstress LWij(dij- 6ij)2
which have to be minimised. The quantities Wij are assumed to be given
weights, commonly unity. Thus the objective is to find points Qi in some
specified number k, of dimensions. A very brief introduction to the properties
of such methods and the algorithms used to minimise Stress and Sstress is
given by Gower (1984a). De Leeuw, this volume, gives a more detailed
discussion (including the possibility of transforming the values dij) and
examples. The ordination methods based on eigenstructure have associated
with them excellent computer algorithms and well-understood mathematical
properties, the most convenient of which are: (i) solutions for k=1,2, ... ,K are
nested and can be computed in one passj and (ii) the number of local minima is
exactly p and they are associated with the eigenvalues. Methods based on
Stress and Sstress are much less well-understood mathematically. Iterative
computer algorithms are continually improving but the mathematical fact that
solutions are not nested and the lack of information on the occurrence of local
optima are a problem. Thus two and three-dimensional solutions (etc.) have to
be recalculated ab initio, and the calculations have to be repeated with
different starting configurations to protect against accepting sub-optimal
solutions. It is not even known whether two solutions both close to optimal
necessarily arise from similar configurations. It is known that in situations,
analogous to those that occur when Metric Scaling has some near-equal
eigenvalues, then different configurations can give similar optima, but it is not
known whether this is a more general problem. The methods are nevertheless
of great interest and have certain advantages that include (i) easy
accommodation of missing values by merely omitting the relevant terms from
the summations in the stress criteria, (ii) the easy ability to handle weights
(iii) the more robust nature of Stress which operates on distances rather than
squared distances and (iv) the possibility of transforming the values dij (to be
discussed below in Section 6).
32

The weights Wij may be provided externally or they may be chosen as


functions of dij or, in fancy versions, as functions of the fitted distances
0ij' When long distances are to be represented accurately Wij might be set to
dij or dtj. When local distance is felt to be important Wij might be set to
the inverse of dij or the inverse of dtj. The choice Wij=dijl with the Stress
criterion has been termed Non-Linear-Mapping (Sammon, 1969) while the choice
of wij=dij-6, with the Sstress criterion, is approximately that of Parametric-
Mapping (Shepard and Carroll, 1966). However, in the latter case the criterion
was originally expressed in correlational form, which gives an alternative
formulation of sums-of-squares criteria in most instances (see Gower, 1984a).
The effect of the transformation on the left-hand-side of (8) is to
generate a matrix B=(I-N)D(I-N). When the dij are Euclidean the geometrical
interpretation of bij is that it is equal to A(OPi)A(OPj)COS(PiOPj). The
Eckart-Young theorem is then concerned with finding values Pij that minimise
Strain : rWij(bij-Pij) 2
where wij=1. The general form of Strain, with weights, summation over a
selection of elements and transformtion of bij may be accommodated within a
similar framework to that of Stress and Sstress.
The special case where the values of dij are held in a pxq array may also
be accommodated in the Stress/Sstress framework by regarding the array as a
corner of a complete (p+q)x(p+q) symmetric array with the pxp and qxq
sections missing. Summation in the criteria then occurs only over the
non-missing portion of the whole array, to give coordinates for the p rows and
q columns. This technique is termed Multidimensional Unfolding and is
described in detail by Heiser, this volume.

6. NON-METRIC MULTIDIMENSIONAL SCALING

In the previous section the possibility was mentioned of allowing


transformation of the elements dij in the definitions of Stress and Sstress.
When such transformations are monotonic, the class of methods so defined is
termed Non-metric Multidimensional Scaling. It may be thought that some
particular choice of similarity or dissimilarity might not give particularly
satisfactory numerical information but that the ordinal values are more reliable.
We have already seen in Section 4 that the families S9 and T9 of similarity
coefficients are both monotonic in 9 and this might encourage us to seek an
ordination that is independent of any particular choice of 9. Suppose we have
33

a putative solution of coordinates Y that generate fitted values 0ij, then we


may plot dij against 0ij, as in figure 8.

....>- x
x
"-
as
"e
II)
x
II)

C
">
Q)

"-
Q)
II)
.0
o x
x
-x-

Fitted Distance

Figure 8. Monotonic regression of dij on 0ij showing a typical point


(~ij, dij) with value fitted by the monotopic regression of
(Oij, dij) and corresponding residual (oij-Oij) which is a single
contribution to the minimised Stress criterion.

The relationship between dij and 0ij is not exactly monotonic so a best-
fitting monotonic regression of dij against 0ij has been plotted as in figure 8.
In this regression we are especially interested in the residuals from the
monotone line parallel to the 0ij-direction. Corresponding to the point (oij,
dij) is the value (6ij,dij) fitted by the monotone regression, so the relevant
residual is 0ij-oij and the quantity to be minimised is E(6ij-Oij)2 which is the
modified form of Stress often used with monotonic transformations. Weights
may be introduced if desired and by replacing dij, 0ij and 6ij by dij, 0ij and
6ij a modified form for Sstress can be found. By defining the residuals from a
monotonic regression, it is clear from examining figure 8 that the modified
forms of Stress and Sstress are invariant to monotonic transformations of the
34

dij which involve stretchings orthogonal to the residuals. The computational


problem is one of fitting the monotonic regression, so that modified Stress or
Sstress can be calculated, and of iteratively adjusting the current version of Y
to improve the fit. A good introductory account of the methodology is given
by Kruskal and Wish (1978); Carroll and de Leeuw, this volume, give further
details, extensions and examples. Internationally available computer programs
for the general class of Non-metric Multidimensional Scaling include KYST
(Kruskal, Young, Shepard and Torgerson), ALSCAL (de Leeuw, Takane and
Young), MULTISCALE (Ramsay), MINIS SA (Guttman, Lingoes and Roskam) and the
programs of the Gifi System (de Leeuw 1984) - see Kruskal and Wish (1978)
for addresses of software distributors. These programs embrace a variety of
variants of criteria and algorithms and all embody a great amount of
experience. For example MULTISCALE is based on a stochastic model whose
parameters are estimated by maximum likelihood and which models monotonic
regression in terms of B-splines; MINISSA assesses monotonicity through a
criterion which is of correlational form and which replaces aij by a fitted
value termed the rank image transformation (Guttman 1968) •

• Holcus lanatus

• Poa pratensis Agrostis tenuis


Arrhenatherum • Poa trivialis


Festuca rubra
.elatius
• Alopecurus
••
• • •
pratensis

•• •
Dactylis

• Helictotrichon
Anthoxanthum
• pubescens • odoratum
glomerata

Figure 9. Non-metric Scaling (KYST) of Park Grass abundance data, based on
the same Pythagorean distance as used in the Components
Analysis shown in figure 3.
35

-
In
.~
'L:
ca
'E
In

o
In

"0
Q)
>
L-
Q)
In
.0
o

Fitted Distance

Figure 10. Monotonic regression for best-fitting two dimensional


representation of the Park Grass data (Table 1) given by KYST.

Figure 9 illustrates a Non-metric Scaling, using KYST, of the abundance data of


Table 1 based, as before, on Pythagorean distance of the log-transformed
values. The plot is very similar to the Components Analysis of the same data
(figure 3), which is not very surprising as figure 10 shows that in this case
the monotonic transformation between dij and 6ij differs little from being
linear.

7. MISCELLANEOUS METRIC METHODS - Closure, Horseshoes, Multiplicative


Models, Asymmetry and Canonical Analysis

The methods discussed in this section are described as miscellaneous, not


because they are unimportant but because they do not readily fit into the
framework of this presentation, but nevertheless deserve mention.
36

7.1. Closure in Components Analysis

In many applications the rows of the data-matrix may be constrained to


sum to unity (or 100 per cent). For example in geology one may have the
proportions of various minerals found in a sample of ore and in ecology the
proportion of a species occurring at a set of sites or the proportion of each
p
of several species occurring at a site. I Xij=l for
Thus typically we have
j=l
i=1,2" ... ,n. It has been argued that when variables are constrained in this way
then the usual formulae for estimating correlations are biassed and need some
kind of correction. This is indeed so if the correlations are regarded as
estimates of parameters of a constrained multi-normal distribution and it is
desired to estimate the principal axes of its density (see Chayes 1971 for
appropriate correction formulae). However it has already been pointed out that
most applications of Components Analysis, certainly those in ecology, need not
and should not appeal to distributional assumptions that will almost certainly
be false. A misunderstanding seems to have arisen because Components
Analysis commonly operates on a matrix of correlations but, as was shown in
Section 3, the correlation matrix arises as but one convenient way of
organising the calculations and has nothing to do with distributional
assumptions.
The geometry of the situation is very simple. The constraint is such that
the cloud of points Pi(i=1,2, ... ,n) lies in the plane Xl +X2+"'+Xp=1. When all
observations are positive, as they will be with proportions, this means that all
points lie in the positive orthant and on the plane. The plane must cut the Xi
axis at the point Ei whose ith coordinate value is unity, else zero. For the
three-variable case the position is illustrated in figure 11. The equilateral
triangular region EIE2E3 may be exhibited as in figure 11 where the distance
of Pi from the edge opposite EI is proportional to xiI. The quantities ().Xill
).xi2' ).xb) are termed barycentric coordinates; the proportionality factor may
be ignored (in fact when p=3, ).="2). The thing to notice is that although the
barycentric representation looks novel it is equivalent to the usual Euclidean
representation referred to orthogonal axes. Distance may be measured in the
triangle in the usual way, principal components found in the usual way and
angles measured in the usual way. The only difference is that because the
points lie in a plane, dimensionality has dropped by one, but this reduction in
dimensionality is one of the things required of a Components Analysis so this
property is a good one. Thus my advice on Components Analysis with
37

El (1,0,0)

Figure 11. The shaded region is the plane Xl +x2+x3=1 in which all points
must lie when the closure constraint is satisfied. This plane may
be exhibited in general as a regular simplex, and when p=3 as an
equilateral triangle, as above, where (AXi1l AXi2, AXil) are known
as barycentric coordinates.

MN{100%)

e Aborigenes
x Innuit
• American Indians
o Indians
t::. Chinese

o
9"'"
o~o;--
..........
~~ 'e,.
~ ,
/ t::. Hardy-Weinburg ,
/
"-
equilibrium {p2:2pq:q2}

MM{100%) NN{100%)

Figure 12. Frequencies of genotypes of MN blood-groups for five racial


groups represented in barycentric coordinates. A good fit is
shown with the parabolic curve representing Hardy-Weinburg
equilibrium.
38

closure-data is to ignore the constraint. This is not to say that


transformations, such as those of Table 2, might not be required to reduce
other undesirable effects such as undue influence from particularly rare
species giving poorly determined proportions that require down-weighting.
Figure 12 gives a barycentric coordinate representation of the MN blood group
genotypes for a range of human populations. If p is the probability of the M
gene (and q that of the N gene) then under Hardy-Weinberg equilibrium the
proportions of genotypes should occur in the ratio p2:2pq:q2. Thus in the full
space xl+x2+x3=(p+q)2=1 and X~=4XIX3 showing that the equilibrium surface cuts
the closure-plane in a parabola. This too is shown in figure 12 and clearly
represents the data very well. Although p varies from population to
population, and from major human group to major human group, equilibrium is
always maintained.
This particular set of data has a natural representation in two dimensions
and would not benefit from a Components Analysis. Aitchison (1983) in one of
an important series of papers on the closure problem, has discussed
Components Analysis. His general philosophy is to use logistic transformations
to transform proportions that are naturally restricted to representation in the
positive orthant to a set of points that occupy the whole of Euclidean space.
With Components Analysis this implies that data in the form of proportions
should first be transformed logarithmically and the covariance matrix W then
calculated. The principal components corrected for closure can then be shown
to be the eigenvectors of (I-N)W(I-N). It has been suggested that, amongst
other things, this process will linearise relationships that are curved in the
barycentric representation, but that this need not be so can be seen from
considering the example illustrated by figure 12. With the Hardy-Weinberg
equilibrium the transformed coordinates become -YI=210gp, -Y2=logp+logq+log2,
-Y3=210gq (where negative logs have been used for the convenience of keeping
Yu Y2 and Y3 positive). In place of the plane Xl +x2+x3=1 the transformed
points lie in the plane Yl +Y3=2Y2+210g2 and its intersection with the surface
-%y -%y
e I + e 3= I that is the new, non-linear, expression of closure. The
effect is to transform the parabola into another curve, albeit one that occupies
a greater part of Euclidean space. I cannot see why such transformations can
be expected to improve linearity; they may but they may also make matters
worse as when a straight line in the barycentric plane will be transformed into
a curve in a higher dimensional space.
39

7.2. Horseshoes

The previous example shows one way in which horseshoes or arches may arise.
A simple ecological example giving another way is derived from the following
table.

Table 8. Example to illustrate one way that horseshoes occur.

Site Species

1 1 1 1 1 1
2 1 1 1 1 1
3 1 1 1 1 1
4 1 1 1 1 1
5 1 1 1 1 1
6 1 1 1 1 1
7 1 I 1 1 1
8 1 1 1 1 1

In Table 8 there is no overlap of species between sites more than four steps
apart and thus any dissimilarity coefficient will be zero for pairs of sites such
as (1, 6) (1,7) and (1, 8) and also (8, 1), (8, 2) and (8, 3). The ordination
therefore has to set all these distances as equal to the maximum allowable
value. The inevitable effect is that points 6, 7 and 8 are close to a circle
centred at point 1 and points 1, 2 and 3 are close to a circle centred at point
8. To accommodate all such constraints the horseshoe effect appears.
Ecologists do not seem to be satisfied with the ordering implicit in
ordinations such as that of figure 12 and, regarding data like that in Table 8
as representing a linear gradient, expect a linear ordination. They have
developed linearisation methods such as Detrended Correspondence Analysis
(Hill and Gauch, 1980) and the Step-Across method of Williamson (1978).
Transformations to straighten horseshoes are discussed by Heiser, this volume,
in the context of Multidimensional Unfolding.

7.3. Multiplicative Models

This topic finds a place here because of its mathematical relationship to


Components Analysis, with which it is often confused, and because of its close
links with Correspondence Analysis, to be discussed in section 8. The problem
is to fit a model with additive and multiplicative terms, sometimes referred to
as a bilinear model, to a two-way table X of order pxq. Specifically we require
40

to fit
Xij = p+oq+flj+7i6jo
The least-squares solution for the additive terms is exactly the same as when
the multiplicative terms are absent.

l
Thus il = x ••
eXi = Xi.-X..
and Pj = x .J'-x .•
The residuals Zij = xij-il-iXi-flj
= Xij-Xi.-X.j+x ..
may be found and assembled into a pxq matrix
Z = (I-P)X(I-Q) (9)
where P has elements lip and Q has elements l/q. The least-squares estimates
of 7 and 6 are then obtained from the singular-value decomposition of Z =
UIV' (c.f. equation 5) where 7 is set to be proportional to the first column of
U and i is set to be proportional to the first column of V. The factors of
proportionality are not arbitrary but must have product O'lJ the first singular
value of Z. Because there is generally no reason to favour rows rather than
columns, it is usual to set:

7
A
= %
0'1 U1 and ~
v = ~
O'l V1 •

Further multiplicative terms may be included in the model by writing

and estimated by

from the rth singular value and the rth columns of U and V.

The quantities '1(r) r=I,2, .•. ,t (usually t=2)


i(r) r=I,2, ••. ,t " " )
may be simultaneously plotted in the manner of a biplot (in t dimensons). If
Pi represents the ith row-point and Qj the jth column-point, then
interpretation is via the inner-product
Zij ... A(OPi)A(OQj)COS(PiOQj)
which derives directly from (9) and its decomposition
Z = (UI%) (VI%) , •
We saw in Section 3 how a very similar decomposition of the matrix Y could be
used to express a Principal Components Analysis; hence the temptation to
regard the process just described as a Components Analysis of Z. Indeed if
we write Z = (UI)V' the resulting plots still satisfy the inner-product
41

interpretation and the distances between the plotted row-points approximate


the Pythagorean distances between the rows of Z. Note that by plotting UI
and vI one can simultaneously present both row and column least squares
approximations in the same diagram but at the expense of losing the
inner-product interpretation.
It should be noted that (9) represents the kind of preliminary
transformation of a data-matrix that might precede a Components Analysis (see
Table 2 (iii». Such transformations are valid only when it is legitimate to take
means across both rows and columns and this is not usually acceptable for the
multivariate data-matrix of classical Components Analysis. Much ecological data
is in the form of a two-way table (see e.g. the species abundances of Table 1)
and there is then no problem. This form of analysis is best thought of as
distinct from Components Analysis.

7.4. Canonical Variates Analysis

This is perhaps the most simple form of analysis in which the data-matrix
has structure imposed on the units. The units are supposed to belong to
known populations and it is convenient to assume that they have been
arranged so that the first nl units are samples from population number 1, the
second n2 units are samples from population number 2, ••• , and the last nk
units are samples from population number k. Thus we may study variation
both between and within the k populations. Suppose B is the sample
between-population dispersion matrix and Wi is the sample dispersion matrix
within the ith population. When the populations have homogeneous dispersions
we may assume that the Wi are all estimates of the same dispersion and
combine the separate dispersions to form a pooled within-population dispersion
matrix W given by:
k
W = I (ni-l)Wi/(n-k).
i=l
In Canonical Variates Analysis, the principle interest is in an ordination of the
k populations rather than in the n samples. An optimal measure of distance
between populations i and j is given by the Mahalanobis D-statistic whose
square is:
2 - - -1 - -
Dij = (Xi-Xj)W (Xi-Xj),
where Xi is the row-vector giving the p means of the variables in the ith
42

population. Ordination may now proceed by using Principal Coordinates


Analysis on the kxk matrix giving all the Mahalanobis DZ-statistics. Usually
additional steps are undertaken that allow loadings, akin to those of
Components Analysis, to be calculated and used to place the individual samples
in the canonical space containing the population means. Further if one is
willing to assume multinormality with equal dispersion for all populations,
confidence regions can be placed around each population mean. If the
ordination has been correctly scaled, not only are the Mahalanobis distances
approximated but also the confidence regions are circular.
The technique has many ramifications and is fully stochastic, supporting
the usual apparatus of statistical inference - indeed it is the only such
technique described in this chapter. Full details and many examples are given
in the recent book by Gittins (1984).
Ecologists may feel unwilling to accept all the assumptions of classical
Canonical Variate Analysis. This will be especially so when their data are not
in quantitative form let alone multinormal, or when there is clear lack of
homogeniety of dispersion between populations. The following method given by
Digby and Gower (1981) that might be, and perhaps has been, termed Canonical
Coordinate Analysis may then be useful.

Samples of 0 11 °12
°12 °13
°13
Population 1

Samples of
Population 2
0 21 0 22 0 23

Samples of
Population 3
I 0 31 0 32 0 33

Figure 13. Between-units squared distance matrix in a form blocked for three
populations. The elements of the 3x3 matrix on the
right-hand-side are formed by averaging the elements within the
corresponding blocks on the left-hand-side. The quantity
Dii+Djj-2Dij gives the squared distance between the centroids of
the points in the ith and jth populations.

We assume that a dissimilarity matrix giving the dissimilarities, expressed as


squared distances, between all pairs of units is available and that this is
43

presented in blocked form so that all the units within each population occur in
consecutive rows/columns. Figure 13 shows the situation for three populations.
The n sample units may be imagined as generating a cloud of points in a
Euclidean space. The ni points of population number i will then have a
centroid Gi. The squared distance A2(GiGj) between Gi and Gj may be obtained
as follows. First form Dpq the average of the npnq elements of Dpq' the
matrix giving the squared distances between all members of population p and
all members of population q. For the diagonal block-matrix Dpp this averaging
process includes the zero elements on its own diagonal. A kxk symmetric
matrix D is formed with elements Dpq. Then
11 2 (GiGj) = -%(Dii+Djj-2Dij)'
Thus a Principal Coordinate Analysis of D gives an ordination of the population
centroids. It is then a simple matter to add the individual samples to the
ordination display. Something like a confidence region can then be formed for
the points in the ordination that represent the ith population. This can be
done either by calculating convex hulls or minimal covering ellipses (Green
1981, Silverman and Titterington 1980).

7.5. Asymmetry

So far ordination has been concerned with analysing symmetric matrices,


usually of dissimilarities, proximities or distances. It is true that Non-metric
Multidimensional Scaling methods often admit non-symmetric square data by
summing over the complete matrix rather than operating on only the lower or
upper triangular parts. However distances, which are symmetric by definition,
are fitted and the effect is to fit to the average %(aij+aji) of the asymmetric
elements, if any. Thus any structure there may be in the asymmetry is
ignored. This is unfortunate, for departures from symmetry can be important
and interesting. Some of the methods already discussed do not insist on
symmetry. These include Multidimensional Unfolding (Heiser, this volume) and
the Multiplicative Analysis of a two-way table (Section 7.3, above). However
these methods operate on general rectangular arrays and do not allow for the
special structure often found in square arrays. This structure might be
described as that found in a square array whose rows and columns are
described by the same things but in different modes. Thus, outside ecology,
typical row and column-factors might be import/export, immigration/emigration,
44

father's occupation/son's occupation. In ecology we have already met "the


number of species present in site i but not in site j" which is, of course,
different from "the number of species present in site j but not in site i".
Digby and Kempton (1986) discuss an example from animal ecology describing
the pecking behaviour of hens. Thus the question of asymmetry in ecology is
of interest.
Often it is worth separating the symmetric and skew-symmetric
components of a square array A to give
aij = mij+nij
where mij=%(aij+aji) and nij = %(aij-aji). Thus the elements mij form a
symmetric matrix M and the elements nij form a skew-symmetric matrix N. The
symmetric matrix M may usefully be analysed by any of the ordination methods
already discussed. Gower (1977) discusses the application of the Eckart-Young
theorem to give a least-squares analysis of a skew-symmetric matrix N. Details
will not be given here (see Constantine and Gower 1978, 1982 for applications).
Suffice it to say that N may be represented as a set of points in a set of
planes (sometimes referred to as bimensions). When one bimension gives a
good approximation to N then the points Qi (i=I,2, ... ,n) have a special
non-Euclidean interpretation. In previous ordination diagrams it is the
distances ~(QiQj) or the inner-product ~(OQi)~(OQj)cos(Qi6Qj) that give the
approximations that are used for interpretation. With a skew-symmetric matrix
it is the area of the triangle OQiQj that approximate nij. The area is zero
either when Qi and Qj coincide or when 0, Qi and Qj are collinear, in which
case the distance between Qi and Qj may be great. Thus the temptation to use
distance in this representation should be avoided. Collinearities play an
important part in interpretation as do parallel lines. In particular when all
points are approximately collinear then nij has the most simple form of
skew-symmetry nij = ni-n j and this form has often appeared in applications. A
complete analysis of A would attempt to unify the analyses of M and N (see
Gower 1980, and Constantine and Gower 1982). Attention is drawn to this
approach for its potential usefulness.
45

7.6 A Note on Interpreting Ordination Diagrams

Distance
"._·_· . . .,Pj
/ \

I ·Pi

0
\
,
..... .- ....... /

Inner-product \
.p.
I J

I
\

Skew-symmetry

.....
.. ' Pi
....
............. "'0

Figure 14. -.-.-.-. Locus of all points that have the same relationship
with Pi as does Pj.
........ Locus of all points having a null relationship
with Pj (for distance this is only Pi itself).
-------- Locus of all points that have the same relationship
with Pi as does Pj, but with opposite sign.

All ordination techniques produce a diagram containing n points, usually


in two dimensions. There is a temptation to interpret them all in the same
way, yet it has been shown that some approximate distances, some approximate
inner-products and some approximate skew-symmetry.
Figure 14 shows the main interpretative tools for the three cases which
Gower (1984b) shows are components in most, if not all, ways of displaying
multidimensional information. The same disposition of Pi and Pj is shown for
each of the three cases. With distance the locus of all points equidistant from
Pi as is Pj is clearly a circle with centre Pi and radius A(PiPj)' The only
point zero distance from Pi is Pi itself. The origin and axes play no direct
46

part in the interpretation; however scaling the axes differentially badly affects
the distance interpretation.
For the inner-products 4(OPi)4(OPj)COS(PiOPj) to be constant as Pj varies
requires 4 (OPj)COS(PiOPj) to be constant. This is merely the projection of Pj
onto OPi, showing that the locus of points with equal relationships with Pi is
the line through Pj orthogonal to OPi' To be zero the locus must pass
through the origin. Thus in this case there are many points with a null
relationship with Pi and the origin plays a central role. Now however the axes
may be rescaled without affecting interpretation provided that the scaling of
one axis is balanced by the inverse scaling of the other. This follows from the
simple formula XiYi+XjYj for the inner-product in terms of the coordinates
1 1
Pi(Xi,Yi) and Pj(Xj,Yj). Clearly XiYi+XjYj = (~Xi)(~i) + (~Xj)(~j).
Another thing to remember is that negative values can occur. The locus of
points with the same magnitude but opposite sign to that given by the locus of
Pj is a parallel line equidistant on the other side of the origin.
If the Eckart-Young theorem is used to give an optimal rank r fit to a
matrix A then it is the inner-product interpretation that generates the
least-squares estimates sij of the individual elements aij of A. Included in
these estimates are the diagonal elements, and therefore from the cosine
formula we have that 4 2 (PiPj)=Sii+Sjj-2Sij' Thus, provided the approximations
are good 4(PiPj) itself approximates (aii+ajj-2aij)", but not in a direct
least-squares sense. This argument shows how certain distances as well as
inner-products are approximated in the same diagram.
With skew-symmetry it is the area of the triangle OPiP j that gives the
approximation. The locus of Pj that keeps area constant is a line through Pj
parallel to OPi. Zero areas are given by all the points on OPi. The axes may
be scaled as for inner-products and negative skew symmetry is given by the
locus parallel to OPj, equidistant but on the opposite side to Pj.
Thus, although ordinations may look alike superficially, one has to be
clear of the exact form of approximation being used and bear in mind the
interpretive tools outlined above. Although in a good approximation "close" Pi
and Pj can be safely interpreted as representing similar points, "distant"
cannot safely be interpreted as being dissimilar. Indeed with skew symmetry
4(PiPj) approximates the distance between the ith and jth rows of the
skew-symmetric matrix N. This distance can be small only if nik-n jk is small
for all k - which implies that nij must itself be small. Thus when Pi and Pj
are distant points on a line through the origin it can be deduced that nij is
small and that nik differs significantly from njk for at least one value of k.
47

& CORRESPONDENCE ANALYSIS

Correspondence Analysis covers much the same ground as Components


Analysis but is concerned with qualitative (categorical) variables rather than
with quantitative variables. There is precisely the same ambivalence as to
whether one is handling a two-way table (which is the concern of simple
Correspondence Analysis) or a multivariate sample of (categorical) variables
(which is the concern of Multiple Correspondence Analysis). However the
linkage between the two methods is more direct for Correspondence Analysis
than it is for Components Analysis. Suppose we have two categorical variables,
colour and shape (say), then we can form a two-way contingency table X whose
(i,j)th entry gives the number of values in the ith colour-category and the jth
shape-category (green and circular, say). Thus the entries in the table
contain information arising solely from the two categorical variables; we have
seen that in the quantitative case a two-way table also contains information on
the quantitative variable (abundance, say) that is classified by two categorical
variables. It follows that the set-up for simple Correspondence Analysis is the
special case of Multiple Correspondence Analysis with two categorical variables.

8.1. Simple Correspondence Analysis

We are given a two-way table X of counts (i.e. a contingency table). This


could be analysed precisely as described in Section 7.3. However when some
rows/columns contain high counts relative to other rows/columns some
adjustment of the row counts may be appropriate; we shall see below one way
that this occurs in ecology. In Correspondence Analysis the elements of the
table are inversely weighted by the square roots of the product of the
corresponding marginal totals. Thus writing Xi. and x.j for the totals of the
ith row and jth column, Xij is transformed to Xij/Jxi . X. j . In matrix terms
this may be written
(10)
where R is the matrix whose diagonal values contain row-totals and C is the
matrix whose diagonal values contain the column-totals; the non-diagonal values
of Rand C are zero. Correspondence Analysis is concerned with the least-
squares approximation to Y and hence with the Eckart-Young theorem and the
singular value decomposition of Y. This decomposition takes a special form
48

because of the definition (10). This follows because

Y(e%l) = R-%Xl = R%1 }

(I'R%)Y = I'Xe-% = l'e%


which jointly show that 1 is a singular value of Y corresponding to singular
vectors C%I and R%I. These vectors have to be normalised to unit sums-of-
squares, since they are columns of the orthogonal matrices of the singular
value decomposition. The sums-of-squares are I'Rl = I' Cl= x.. , the total of
the elements of X. Thus Y always has a unit singul~ value associated
with vectors e%I/~ and R%I/~ and we may extratt the first term from
the decomposition and write it in the form:
% % P
Y = R Il'e /x .. + I uiuiVi'
i=2
or what is the same thing:
p
Y - R%II'e%/x .• = I uiuivi' (11)
i=2
where the right-hand-side is the singular-value-decomposition of the
left-hand-side with elements:
Xij-XLX.j/X ••
(12)
JxLX . j
which, recalling that the values Xij are elements of a contingency table, will be
recognised as the square root of a term contributing to Pearson's Chi-square
for the independence of the margins of X, apart from a scaling factor 1/J:K •••
Thus simple Correspondence Analysis may be viewed as an analysis of
Chi-square.
From the point of view of ordination it is the rows of Ui and Vi scaled by
Ui that are plotted, usually in two dimensions (i.e. for i=2 and 3). As with
biplots of a quantitative table we could plot O'i%Ui and Ui%Vi and this would
give an ordination in which the inner-product between the ith row and jth
column points would give a least squares approximation to (12). If we wish to
approximate the distances between rows of (11) we would plot uiui and Vi, and
for column-distances (UiJuiVi). The row and column distances of (12) have no
great interest but there is another representation where they do have and this
arises from the ecological problem of associating scores with the rows (species)
and columns (sites) in the hope of finding ecological gradients. These scores
should be adjusted to account for variation in species abundance and site
richness. The row and column scores p, q therefore satisfy:
49

-I
R Xq = up }
(13)
p'XC- 1 O'q'
The equations (13) may be used as the basis for calculating the values of p
and q by iterating on initial values until convergence - this algorithm is
termed reciprocal averaging. From (13) it is clear that R%p and R%q are
singular vectors of Y. The first singular value 0'1=1 corresponds to R%p=R%1
and C%q = C%1 , i.e. p=l, q::l which contain no useful information. The scores
are therefore obtained from the second singular vectors to give P=R""%U2 and
q=C-%V2' Subsequent vectors may be similarly determined leading to the
simultaneous plotting of (O'iR""%Uh O'iC-%Vi)' Now the squared distance between
the ith and jth row points is

( ~ - ~ )C- I ( ~ - ~ )' (14)


Xi. Xj. xi. Xj.
which is termed the Chi-square distance between the ith and jth rows of X. A
similar chi-squared distance is defined between pairs of column. From (14) it
is clear that two rows with the same proportions relative to their row totals
are represented by points which are coincident; similarly for column points.
Thus in this representation distances have a useful interpretation, which
however is not approximated in the usual least-squares sense in
low-dimensional ordinations. That, of course, can always be done by using (14)
in a Principal Coordinates Analysis; then however row-distances and
column-distances have to be presented in separate diagrams. A further
property of this form of presentation derives directly from (13) which shows
that the means of the column-scores weighted by the column-proportions are
proportional to the row-scores; similarly the means of the row-scores weighted
by the row-proportions are proportional to the column-scores. Because of
these properties, the consequences of the formulae (13) are often termed the
barycentric principle.
50

• Holcus lanatus
I
c(
o


Poa pratensis
%nthoxanthum • • AgrostiS tenuis
Poa trivialis odoratum •
. • Alopecurus •
Arrhenatherun elatlus. .pratensis

• •••
• Dactylis Festuca rubra
·glomerata •

• • ••
• • I· ••
•••
•• .Helictotrichon pubescens
• ••
•• •
••

CA-I (.803)

Figure 15. Correspondence Analysis of the Park Grass data. The same
species are labelled as in figure 3. This diagram gives the
ordination of species; figure 16 gives the ordination of sites.

Figure 15 shows the row-points (of species) of a Correspondence Analysis


of Table 1. The method ought to be applied to a contingency table, which
Table 1 is not. However it can operate on any two-way table of positive
values or, strictly speaking, on any two-way table whose margins are positive.
The interpretations outlined above then have to be modified.
51

d.
I.d

---
N
I'
II
II
<0
-.:.... N 3 PK ,'1
........, '/
I

«
u
I
,/
/1

CA-I (·803)

Figure 16. Correspondence Analysis of the Park Grass data. The labelling is
as in figure 6. This diagram gives the ordination of sites; figure
15 gives the ordination of species.

Figure 16 shows the column-points (sites) of the same analysis. Because


the sites have associated treatments, these are also shown. Both sets of
information could be shown on the same diagram but in this case would
overload the ordination with too much information. The interpretation is
similar to that of the Principal Components Analysis of figures 3, 5 and 6.
Normalising by row and column means has given the scarcer species more
weight so these play a more prominent role than they do in Components
Analysis. It should be clear that although simple Correspondence Analysis of a
two-way table is concerned with the least-squares representation of the
left-hand-side of (11) there are several ways of presenting the ordination that
differ not only in scaling but also in intepretation (see e.g. Greenacre 1984
and Lebart et al. 1984). Thus in applications it is important to state precisely
in what form the results are presented. In figures 15 and 16 it is the
coordinates (uiW%Uj, uiC-%Vi) that have been plotted and hence Chi-square
distances for both species and sites are relevant.
52

8.2. Multiple Correspondence Analysis

At the beginning of this section it was remarked that Multiple


Correspondence Analysis is an analysis of a multivariate sample of categorical
variables. Thus, as for Components Analysis, X has n rows (the samples) and
p columns (the variables). An entry Kij records what is the jth category of
the ith sample and hence has a nominal rather than quantitative value. This
information can be coded in what is termed an indicator matrix, G. Suppose
the jth categorical variable has lj levels (e.g. if the levels are green, yellow,
p
red, blue then for this categorical variable Ij=4). G has n rows and I Ij
j=1
columns so that the original p variables are expanded into IIj pseudo or
indicator variables. In the lj columns for the jth variable each column is
assigned to one of the levels and is scored one if that level occurs in the ith
sample and zero otherwise. Thus the total score for every sample is p and the
column totals of G give the number of occurrences of each level of each
categorical variable. Table 9 illustrates an indicator matrix G for five samples
and three categorical variables with 2, 4 and 3 levels respectively.

Table 9. An indicator matrix for n=5 samples, p=3 variables with


1 1 =2, 12=4 and 13=3 levels.

Variable
Total
Sample Sex Age Group Nationality

1 0 1 0 1 0 0 1 0 0 3
2 0 1 1 0 0 0 1 0 0 3
3 1 0 1 0 0 0 0 0 1 3
4 0 1 0 0 0 1 0 1 0 3
5 1 0 0 0 1 0 0 0 1 3

Total 2 3 2 1 1 1 2 1 2 15

Indicator matrices like that of Table 9 are the raw material of Multiple
Correspondence Analysis. Note that quantitative information, such as
age-group in Table 9, may be presented in categorical form. This can be
useful when the effects of quantitative variables are believed to operate in
non-linear form.
53

A direct Correspondence Analysis of G leads from (10) to the singular-


value-decomposition of
p-%GC-% = UIV',
the simplification occurring because in this case R=pI. The vectors V are
simply derived from p-1C-%G'GC-% = VI2V'. The symmetric matrix G'G is
known as the Burt matrix. It has a simple pxp block structure (Xij) in which
the block Xij is the two-way contingency table of the ith and jth categorical
variables. The diagonal blocks Xii are themselves diagonal matrices giving the
column totals C of G. Thus from Table 9 we derive Xl l =diag(2,3),

X22 =diag(2,1,1,1) and X33 =diag(2,1,2). The block X13 = ( ~', 01 ,, ~ ).


Thus the off-diagonal blocks of the Burt matrix are just the contingency tables
of ordinary Correspondence Analysis and the diagonal blocks give their row
and column totals. Thus VI2V' = p-1 (Yij), in which the blocks Yij are now
precisely the row and column-corrected forms equivalent to (10) and the block
diagonals Yii are unit matrices. Plotting VI is therefore a direct generalisation
of ordinary Correspondence Analysis except that not only are all the two-way
contingency tables being approximated but also the unit matrices on the
diagonal. Algorithms might be derived in which the unwanted approximation to
the diagonal is omitted. Note that only two-way contingency tables occur in
the Burt matrix so that three-way and higher order interactions cannot be
accommodated by this method.
An alternative approach, closely related to the row and column scores
method of (13), is as follows. We seek to find scores for all the categories in
p
a vector x (of length I Ii) with certain optimal properties. With these
i=1
scores x the total scores for the n units are given by Y=Gx and the sum-of-
squares (uncorrected) between the units is x'GG'x, with a total sum-of-squares
of scores x' Cx. The optimal scores are chosen to minimise the variation within
units x' Cx-x' GG' x/p and hence the term Homogeneity Analysis often used (see
Gifi 1981). Because the criterion value depends on the scaling of x, some form
of constraint is needed and it is usual to choose x' Cx=l for the total sum-of-
squares. The problem then becomes that of maximising x'GG'x subject to
x'Cx=1 which has the simple solution
GG'x = >'Cx
or equivalently
(15)
as above for Multiple Correspondence Analysis. Reference to Table 9 will
readily convince that this has the trivial solution where the elements of x are
54

constant and, because of the constraint x'Cx=l, are equal to l/JDP. With
constant scores, within-unit variation is zero and hence fully homogeneous. As
with Correspondence Analysis the vector C%x corresponding to the second
largest eigenvalue then may be selected and because of the trivial vector C%1
it will satisfy l' Cx=O. This is sometimes considered as an additional constraint
but it may also be viewed as a consequence of the criterion being optimised.
Other choices of constraints are discussed in the test-score literature (Healy &.
Goldstein, 1976). For example it may be desired that the mean score for each
test be zero, which for Table 9 would imply the constraints Xl +X2 = X3+X4+XS+X6
= X7+X8+X9=O. Another possibility is that the scores for the lowest and highest
levels may be required to be zero and one, which for Table 9 yields XI=X3=X7=O
and X2=X6=X9=1. For an example of Multiple Correspondence Analysis set in the
Homogeniety Analysis framework see de Leeuw, this volume.

9. COMPARISON OF ORDINATIONS

Very often two or more ordinations may be done on the same samples
(sites, species, etc.) using either different ordination methods or different
variables, or both. Clearly there is an interest in asking to what extent the
different ordinations are giving similar information. If they differ, can the
differences be identified as mainly arising from some subset of samples or are
they of a more general nature? In this section methods are discussed that
address questions of this kind. Suppose we have matrices Yu Y2, ... ,Yk whose
rows give coordinates of points arising from k different ordinations of n
samples. Here Yj is of order nxrj and without loss of generality it can be
assumed that rl =r2= ... =rk=r (say); if this is not initially so, zero columns may
be appended to those Yj for which rj < Max(rllr2, ... ,rk)=r. It is important that
the ith row of every Yj corresponds to the same ith sample. Alternatively we
may be given k distance matrices Dl I D2, ... ,Dk from which the matrices
Yi(i=I,2, ... ,k) may be derived by some form of ordination, or which may be
operated on directly.
55

9.1. Orthogonal Procrustes Analysis

Here there are just two ordinations YI and Ya which may be regarded as
two sets of points P"Pa'''''P n and QuQa, ... ,Q n in r-dimensional Euclidean space.
In Orthogonal Procrustes Analysis the aim is to fit Ya to Y I using the "rigid
body motions" of translation and rotation in such a way that
n
m:a= I Aa(PiQi) is minimised.
i=l
Translation is readily handled by requiring the centroids of the two
configurations to be superimposed. This is equivalent to subtracting the
column-means from YI and Ya and it will be assumed that this has been done.
Rotations are represented mathematically by orthogonal matrices, which also
may allow for reflections. Thus to minimise the criterion m~a, an orthogonal
matrix H must be found that minimises Trace((Y,-YaH) (Y,-YaH)'). The solution
to this problem turns out to be related to the singular value decomposition of
the matrix Y l ' Ya = UIV'. In fact H = VU' and

Thus the computational problem is a simple one. A difficulty is that two


ordinations will usually be on different scales, so we may wish also to estimate
a scaling factor p applied to Ya . The estimation of p proceeds independently
.
H t 0 glve ~ - Trace(YzHY , ') = Trace I
of tha t 0f P - Trace YaYa , For technical details

see Sch~nemann and Carroll (1970) and Gower (1971b). The residual sum-of-
squares is then given by:

ml~ = Trace (YIY I ') - (Trace I)z/Trace(YaYa ).


There is now a problem because the scaling of fitting YI to Ya is not the
inverse of that of fitting Ya to YI and the value of m:a differs from m~I'
A solution is to normalise both Y I and Ya to have unit sum-of-squares, in
which case we have that

which does not depend on the order of matching. To examine the contributions
of each sample to the total m~a, one only has to examine the individual
residuals A(PiQi). This provides a complete solution to the Orthogonal
Procrustes problem.
56

9.2. Ordination of Procrustes Statistics

With k ordinations Y11Y2J"., Yk each normalised to unit sum-of-squares, as


described above, all pairs of ordinations Yi,Yj may be compared by evaluating
the statistics mij. These values may then be assembled into a kxk symmetric
matrix II which has zero diagonal. Clearly II itself may be treated as a
distance-matrix and presented as an ordination, using any desired method. It
can be shown that the mij are metrics, though not necessarily Euclidean
metrics (see Gower 1971b). In the resulting ordination the k original
ordinations are represented by points MII M2 , ... ,Mk whose inter distances
approximate mijo Figure 17 is an example of such an analysis in which the
twelve ordinations listed in Table 10 have been compared. The twelve
ordinations are all of Table 1 and differ by measuring dissimilarity in different
ways and by using different ordination techniques (Principal Coordinates
Analysis, Non-metric Multidimensional Scaling via KYST, and Correspondence
Analysis). The most interesting observation is that there is more difference
within techniques using different coefficients than there is between different
techniques using the same coefficients. It is surprising that the non-metric
method seems to be, if anything, more sensitive to the choice of coefficient
than is the metric method. However one should avoid generalising from a
single analysis of this kind; the point is that figure 17 draws attention to
features that merit further study.

Table 10. Ordination Methods.

PI Simple Matching Coefficient


P2 Jaccard Coefficient
P3 City Block on log relative abundances PCO
P4 As P3, ignoring joint abundances
P5 Pythagorean on log relative abundances
Nl-N5 As Pl-P5 but using KYST
Cl Correspondence Analysis +/-
C2 Correspondence Analysis species abundance
57

.C2

.P3 .P4
·Cl
·P5
.Nl
·P2
·Pl

·N2 .N3 ·N5

Figure 17. Comparison of ordinations using the m 2 -statistic. Three basic


methods: P (Principal Coordinates Analysis), N (Non-metric Scaling
by KYST) and C (Correspondence Analysis). The figures refer to
different coefficients described in Table 10.

9.3. Generalised Procrustes Analysis

Generalised Procrustes Analysis gives another way of comparing k ordinations


based on the Orthogonal Procrustes idea. Now we simultaneously translate,
rotate and scale Y ltY 2,"',Yk to minimise the residual sum-of-squares
n k k
I I I 42(PisPit) where Pis is the point in Ys representing the ith sample.
i=lt=ls=1
To minimise this requires a common centroid for the configurations, followed by
a straightforward iterative technique in which the two-matrix Orthogonal
Procrustes solution described in Section 9.1 plays a central role; details are
given by Gower (1975). The k points (PiuPiu".,Pik) that represent the
different ordinations will have a centroid Gj and the representation of all n
58

points Gi,G 2, ... ,G n may be taken to estimate an average ordination; I have used
the term consensus ordination to describe this average but it has been pointed
out that this is an abuse of language. Because each Yi is fitted to these
centroid positions, the order in which the matrices are fitted is irrelevant and
scaling does not present the problem it does with the two-matrix Orthogonal
Procrustes problem; indeed it can be shown that when k=2, Generalized
Procrustes Analysis is equivalent to scaling Y1 and Y2 each to unit sum-of
squares as described in Section 9.1. The residual sum-of-squares criterion
n k
may be written as k I I A2(PisGi). To examine the contributions of the
i=1 s=1
different ordinations to this criterion examine the residuals A(PisGj). This can
be split up in two ways: (i) for ordination s (fixed), examine the contributions
of the different samples; and (ii) for sample i (fixed), examine the contributions
of the different ordinations.

'1
~"
B

~DE
N
M
~ H F

~.~
K;:-..
L

~ J

Figure 18. Generalised Procrustes Analysis for 14 sites and 6 years. The
year information is shown only for sites A, D and K. The letters
give the ordination of the 14 sites averaged over the ordinations
for the six separate years.
59

In figure 18 ordinations for 14 sites were done for each of six years. Thus
n=14 and k=6. The centroids G:i of the six year-points for each site are
labelled by letters; to avoid overloading the figure the individual year-values
are given only for sites A, D and K. From figure 18 it can be seen that there
is much more year to year variation at site D than there is at sites A and K.
The centroids summarise the six original ordinations in a single average.

9.4. Other 3-way methods

We now focus on methods that operate directly on the matrices D lI D2 , ... ,Dk
which may be imagined as holding either distances or squared distances. Just
as Generalised Procrustes Analysis gives an average ordination whose
coordinates may be held in a matrix nXr so do all the methods described in
this section. They differ in how the differences from this average are
modelled and in the criteria that have to be optimised in the fitting process.
Suppose that the (i,j)th element of Ds is diJs then the aim is to fit
(squared) distances 6ijs, in some specified (and usually small) number of
dimensions.
In Individual Differences Scaling (Carroll and Chang 1970) the model is
2
6ijs = (Xi-Xj)Ws(Xi-Xj),
where xi is the ith row of X and Ws is an rxr diagonal matrix of positive
weights associated with the sth ordination. In the associated computer
program, INDSCAL, this model is fitted by a version of the Strain criterion
(see Section 5). Thus Ds (now with values -%dij~) is replaced by its centred
k
form, Bs = (I-N)Ds(I-N) and Trace I(Bs -XWs X,)2 is minimised. The numerical
s=l
methods for minimisation are outside the present scope but are discussed, with
examples, by Carroll, this volume. The basic things to note are that the
calculations can be done, that X (here termed the Group Average) gives an
average ordination and that for the sth ordination the axes of X are weighted
by the values estimated in Ws. Thus in the sth ordination one axis may get
unusually high weight compared to the weighting of the same axis in other
ordinations. It is interesting to plot the values of Ws (s=1,2, ... ,k) as a set of k
points in r dimensions (usually r=2) as a way of comparing the ordinations.
The axes given by X are uniquely defined and may not be rotated.
The ALSCAL program (Takane et al. 1977) minimises a version of the
Sstress criterion
60

rCdijk-oij)2
where 0ij are the distances generated by the group average X. Here dijk may,
if required, be transformed (perhaps monotonically) as described in Section 5.
One advantage of this criterion over that of Strain is that it easily
accommodates missing values.
A criterion of the Stress family is used in SMACOF-I (Heiser and de
Leeuw 1979). Now it is
2
r WijkCdijk-Oij)
i,j,k
that is minimised, where Wijk are optionally specified weights not to be
confused with those of the Individual Differences Scaling model. The usual
partition of a sum-of-squares gives:

r Wijk(dijk- Oij)2 = r WijkC dijk-d ij)2 + r Wij(dij- Oij)2


i,j,k i,j,k i,j
m m
where Wij 1 r Wijk and dij = r Wijkdijk!mWij.
m k=l k=l
The unknown quantities 0ij occur only in the term r Wij (dij-Oij)2, so by
i,j
minimising this the left-hand-side is also minimised. Minimising the expression
in 0ij is precisely the Stress problem of Section 5 and therefore a
group-average X is easily found. These, and related problems, are discussed
by Gifi (1981), de Leeuw (1984) and by de Leeuw, this volume. A related
reference is Law et al. (1984) which discusses further models of this general
class.
Enough has been said to indicate the kinds of techniques available and
the variety of solutions offered. Perhaps in this area more than in any other
the methodology tends to be embedded in computer programs, so that it is
specially difficult to disentangle the model, from its fitting criterion with its
algorithmic solution and finally its implementation on a computer.

10. CONCLUSION

The panoply of techniques discussed above may appear bewildering. Can


one impose some order? Well yes, one can. The key concept is that of
distance, or equivalently (dis) similarity. One must decide whether one is
interested in distance between sites or distance between species, or perhaps
some other distance. The types of data will limit both the way of measuring
61

distance and the method of analysis. Thus Components Analysis should ideally
be used only with quantitative variables and when Pythagorean distance is
accepted; similarly Correspondence Analysis is concerned with categorical, or
categorised, variables and chi-square distance. We have seen that both
categorical and quantitative variables can generate other forms of distance
which can be used with other forms of metric or non-metric scaling of which
Components Analysis and Correspondence Analysis are special cases. The
choice among metric methods is not so much governed by scientific objective
but more by how the chosen distance is to be approximated in few dimensions.
Non-metric methods rely only on ordinal information and therefore assume less
than is required for metric methods and hence would seem to be preferable.
In practice metric and non-metric ordinations of the same data often differ
very little, but this is not always so. Non-metric methods are computationally
much more expensive, and of the metric methods, Principal Coordinates Analysis
is the cheapest and hence is always worth trying and often gives all that is
required; it also shares with Correspondence Analysis the advantage over all
other methods, both metric and non-metric, of avoiding the possibility of
finding sub-optimal solutions. A full assessment of the relative merits of the
various forms of metric and non-metric ordination is much needed.
It has been shown that a two-way array may sometimes be interpreted as
a multivariate sample and sometimes as a table constructed from two categorical
variables (a two-way contingency table) or from two categorical variables and
one quantitative variable (a two-way table). In ecological contexts the
distinction between the three possibilities can sometimes be blurred but
consideration of the logical structure of the table can guide one to an
appropriate form of analysis or, at least, exclude certain methods as being
inappropriate. Computers have not helped here as all two-way arrays are the
same to a computer and users are easily tempted to use unsuitable methods for
the analysis of their data. Similarly all ordinations may look alike but their
proper geometrical interpretation depends on whether distances, inner-products
or skew-symmetry is being approximated.
Just as Components Analysis is the, basic method for analysing a
multivariate sample of quantitative variables, so is Multiple Correspondence
Analysis the basic method for analysing a multivariate sample of qualitative
variables. If one feels that the relationship between the values of quantitative
variables and their ecological effects is non-linear then Multiple
Correspondence Analysis in the form of Homogeneity Analysis offers a way
forward by categorising the quantitative values into disjoint groups and
62

estimating distinct scores for each group. If the relationship, although


non-linear, is to be constrained in some way, such as to be monotonic or
positive, then non-metric variants of Homogeneity Analysis allow the possibility
to be explored.
Another consideration is whether to approximate the distances directly or
to work via approximations to the corresponding inner-products. The latter is
usually technically the more simple but it has been shown how in some cases
the two approaches are closely linked and in some important cases become
equivalent. When analysing sets of data-matrices or distance-matrices, all the
above considerations apply but there is the additional complication of how to
compare one configuration with another. Basically this may be done in three
ways, (i) via orthogonal transformations (i.e. generalized rotations), (ii) by
averaging distances and (iii) by averaging inner-products. Once again much
work needs to be done to evaluate the relative merits of the different
approaches.
Ordination is concerned only with one class of methods of analysing
multivariate data. The other important class of methods is that concerned with
clustering. There are very many clustering criteria and algorithms, some of
which are discussed elsewhere in this volume. Clustering methods aim at
recognising homogenous sets of units, which when found can be marked on
ordination diagrams. When the clusters are organised hierarchically, the
hierarchical tree-structures can be marked on ordination diagrams to give
useful additional information. Other additional information, such as residuals
or ancillary material not used in the ordination, can and should be added both
to give supplementary new information and to draw attention to places in the
ordination where the original distances are grossly distorted.
The reader should realise by now that, like other statistical methods,
ordination techniques cannot be blindly used but require thought and
experience to get the best results.

REFERENCES

Aitchison, J. 1983. Principal component analysis of compositional data. Biometrika


70: 57-65.
Alvey, N. G, C. F. Banfield, R. I. Baxter, J. C. Gower, W. J. Krzanowski,
P. W. Lane, P. K. Leech, J. A. NeIder, R. W. Payne, K. M. Phelps,
C. E. Rogers, G. J. S. Ross, H. R. Simpson, A. D. Todd, G. Tunnicliffe-
Wilson, R. W. M. Wedderburn, R. P. White, and G. N. Wilkinson. 1983.
Genstat a general statistical program. Numerical Algorithms Group.
Oxford.
63

Carroll, J. D., and J. J. Chang. 1970. Analysis of individual differences in


multidimensional scaling via an n-way generalization of I Eckart-Young I
decomposition. Psychometrika 35: 305-308.
Chayes, F. 1971. Ratio correlation. A manual for students of petrology and
geochemistry. University Press. Chicago.
Constantine, A. G., and J. C. Gower. 1978. Graphical representation of
asymmetry. Applied Statistics 20: 297-304.
---------1982. Models for the analysis of inter-regional migration. Environment
and Planning I A I 14: 477-497.
Digby, P. G. N., and J. C. Gower. 1981. Ordination between- and within-groups
applied to soil classification. p. 63-75. In: D. F. Merriam [ed.] Down to
earth statistics: Solutions looking for geological problems. Syracuse
University Geological Contributions.
Digby, P. G. N., and R. Kempton. 1986. Multivariate analysis of ecological
communities. Chapman and Hall. London (in the press).
Eckart, C., and G. Young. 1936. The approximation of one matrix by another of
lower rank. Psychometrika 1: 211-218.
Gabriel, K. R. 1971. The biplot graphic display of matrices with application to
principal component analysis. Biometrika 58: 453-467.
Gifi, A. 1981. Nonlinear multivariate analysis. Department of Data Theory,
Faculty of Social Sciences, University of Leiden, Middelstegracht 14,
2312 TW Leiden, The Netherlands.
Gittins, R. 1984. Canonical analysis. Springer Verlag. Berlin.
Gower, J. C. 1971a. A general coefficient of similarity and some of its
properties. Biometrics 27: 857-871.
---------1971b. Statistical methods of comparing different multivariate analyses
of the same data. p. 138-149. In: J. R. Hodson, D. G. Kendall and P. Tautu
[eds.] Mathematics in the archeologiCal and historical sciences. University
Press. Edinburgh.
---------1975. Generalised Procrustes analysis. Pyschometrika 40: 33-51.
---------1977. The analysis of asymmetry and orthogonality. p. 109-123. In:
J. Barra et al. [eds.] Recent developments in statistics. North Holland.
Amsterdam.
---------1980. Problems in interpreting asymmetrical chemical relationships.
p. 399-409. In F. Bisby, J. C. Vaughan and C. A. Wright [eds.]
Chemosystematics: principles and practice. Academic Press. New York.
---------1984a. Multivariate Analysis: ordination multidimensonal scaling and
allied topics. p. 727-781. In E. H. Lloyd [ed.] Handbook of applicable
mathematics Vol. VI: Statistics. J. Wiley and Sons. Chichester.
---------1984b. Multidimensional scaling displays. p.592-601. In H. G. Law, C.
W. Snyder, J. Mattie and R. P. McDonald [eds.] Research methods for
multi-mode data analysis. Praeger. New York.
Gower, J.C., and P. Legendre. 1986. Metric and Euclidean properties of
dissimilarity coefficients. J. of Classification 3: 5-48.
Green, P.J. 1981. Peeling bivariate data. p. 3-19. In V. Barnett [ed.]
Interpreting multivariate data. J. Wiley and Sons. Chichester.
Greenacre, M. J. 1984. Theory and applications of correspondence analysis.
Academic Press. London.
Guttman, L. A. 1968. A general non-metric technique for finding the smallest
coordinate space for a configuration of points. Psychometrika 33: 469-506.
Healy, M. J. R., and H. Goldstein. 1976. An approach to the scaling of
categorised attributes. Biometrika 63: 219-229.
Heiser, W., and J. de Leeuw. 1979. How to use SMACOF-1. A program for metric
multidimensional scaling. p. 1-63. Department of Datatheory. Faculty of
Social Sciences, University of Leiden, Middelstegracht 14, 2312 TW
Leiden, The Netherlands.
64

Hill, M. 0., and H. G. Gauch. 1980. Detrended correspondence analysis, an


improved ordination technqiue. Vegetatio 42: 47-58.
Kruskal, J.B., and M. Wish. 1978. Multidimensional scaling. Sage University
papers on quantitative applications in the social sciences.
Series No. 07-911, Sage Publications. Beverley Hills and London.
Law, H. G., Snyder, C. W. Jr., J. A. Hattie, and R. P. McDonald. 1984. Research
methods for multimode data analysis. Praeger. New York.
Lebart, L., A. Morineau, and K. M. Warwick. 1984. Multivariate descriptive
statistical analysis. J.Wiley &. Sons. New York.
de Leeuw, J. 1984. The Gifi system of non-linear multivariate analysis. In
E. Diday et al. [ed.] Data analysis and informatics IV, North Holland.
Amsterdam.
Legendre, L., and P. Legendre. 1983. Numerical ecology. Elsevier Scientific
Publishing Co. Amsterdam.
Lowe, H. J. B. 1984. The assessment of populations of the aphid Sitobion
avenae in field trials. Journal of Agricultural Science 102: 487-497.
Sammon, J. W. 1969. A non-linear mapping for data structure analysis, I.E.E.E.
Transactions on Computers 18: 401-409.
Shepard, R.N., and J. D. Carroll. 1966. Parametric representation of non-linear
data structures. p. 561-592. In P. R. Krishnaiah [ed.]. Multivariate
analysis. Academic Press. New York.
Schonemann, P. H., and R. M. Carroll. 1970. Fitting one matrix to another
under choices of a central dilation and a rigid motion. Psychometrika
35: 245-255.
Sibson, R. 1979. Studies in the robustness of multidimensional scaling:
purturbational analysis of classical scaling. J. Roy. Statist. Soc. B
41:217-229.
Silverman, R. W., and D. M. Titterington. 1980. Minimum covering ellipses.
SIAM J. on Scientific and Statistical Computing 1: 401-409.
Takane, Y., F. Young, and J. de Leeuw. 1977. Nonmetric individual differences
scaling: an alternating least squares method with optimal scaling features.
Psychometrika 42: 7-68.
Yarranton, G. A. 1966. A plotIes method of sampling vegetation. Journal of
Ecology 54: 229-237.
Williamson, M. H. 1978. The ordination of incidence data. Journal of Ecology 66:
911-920.
SOME MULTIDIMENSIONAL SCALING AND RELATED PROCEDURES
DEVISED AT BELL LABORATORIES, WITH ECOLOGICAL APPLICATIONS

J. Douglas Carroll
AT &T Bell Laboratories
Murray Hill, New Jersey 07974, USA

Abstract - A large number of multidimensional scaling (MDS) and related models,


methods, and computer programs (for all of which we use the generic term "MDS
procedures") have been developed over the years at Bell Laboratories. This paper
focuses on probably the most widely known and used subset of Bell Labs MDS
procedures involving spatial (as opposed to tree structure, overlapping or non-overlapping
clustering, or other "discrete and hybrid") models. These are: the MDSCAL and KYST
family, for two-way (metric or nonmetric) MDS of proximities (e.g., similarities or
dissimilarities); INDSCAL, SINDSCAL and IDIOSCAL, for three-way MDS,
primarily of proximities (but also applicable to more general multiway data, in a manner
to be described); MDPREF, for "internal analysis" of preference (or other
"dominance") data for different individual "subjects" (or other data sources) in terms of
a vector model; and the PREFMAP family for "external analysis" of such data (where
the "stimulus" or other "object" dimensions are externally provided by prior analysis or
theory, only "subject" vectors, ideal points and/or other parameters being determined
from preference/dominance data). A number of these Bell Labs MDS procedures are
applied to some ecological data on sea worm species due to E. Fresi and collaborators.

INTRODUCTION

In this paper are presented descriptions of some of the major models, methods, and
computer algorithms for multidimensional scaling (MDS) and related techniques
developed at Bell Laboratories. Most of the computer programs implementing the
procedures described in this paper are available in one of two tapes available at a
nominal cost from the AT&T Bell Labs Computer Information Library. These two
tapes are referred to as the MDS-l and MDS-2 tapes. These programs are all written in
FORTRAN. Most of those on the MDS-l tape are written for IBM equipment, while
those on the MDS-2 tape should be machine independent. (It should be emphasized
that no guarantee is implied that any of these programs will continue to be distributed
on this basis by AT&T Bell Laboratories') All of the programs discussed here, except
SINDSCAL, and PREFMAP-3 are on the MDS-ltape (which has already been very
widely distributed). SINDSCAL is on the MDS-2 tape. It is hoped that PREFMAP-3
will soon be available.

NATO AS! Series, Yol. 014


Developments in Numerical Ecology
Edited by P. and L. Legendre
© Springer·Yerlag Berlin Heidelberg 1987
66

While this paper is explicitly limited to procedures for which programs are (or are
hoped soon to be) available through the Bell Labs computer information library, a large
number of other MDS and related procedures have been developed at Bell Labs which
are not so available (and thus are not described here). A supplementary bibliography
citing papers relevant to such other procedures developed (totally or in part) at Bell
Labs is available by request from the author. Space limitations also require omission of
many of the programs included in the Bell Labs package of MDS programs. These
include a procedure for maximum likelihood non metric 2-way MDS appropriate for
proximity data collected by a certain ranking process, called MAXSCAL4.1 (Takane
and Carroll 1980, 1981); SIMULES (SIMultaneous Linear Equation Scaling) (Carroll
and Chang 1972b, Chang and Carroll 1972c); MONANOVA (MONotonic ANalysis Of
VAriance, which implements a procedure for fitting an additive conjoint measurement
model to data from a factorial design) (Kruskal 1965, Kruskal and Carmone 1968);
Categorical Conjoint Measurement (CCM) (Carroll 1969, Chang 1971); CANCOR
(Generalized CANonical CORrelation Analysis) (Carroll 1968, Chang 1971); PROFIT
(PROperty FITting) (Carroll and Chang 1964, Chang and Carroll 1968); PARAMAP
(PARametric MAPping of nonlinear data structures) (Carroll 1965, Shepard and
Carroll 1966, Chang 1968); POLYFAC (POLYnomial FACtor Analysis) (Carroll 1969);
HICLUS (HIerarchical CLUStering via ultra metric tree models) (Johnson 1967),
MAPCLUS (A MAthematical Programming method for fitting the ADCLUS
overlapping CLUStering model) (Arabie and Carroll 1980a,b); INDCLUS (INdividual
Differences CLUStering) (Carroll and Arabie 1982, 1983); and others. (Most of the
programs on the MDS-l tape, including all of those just named with the exceptions of
MAXSCAL4.l, MAPCLUS and INDCLUS, which are on the MDS-2 tape, and
MONANOVA, are synopsized and described briefly in Chang, 1971. This paper also
includes brief synopses of early versions of MDSCAL, as well as INDSCAL,
INDSCALS, NINDSCAL, MDPREF and PREFMAP, all of which are discussed in the
body of this paper.) We focus here on two-way and three-way (or Individual
Differences) MDS methods for proximity data, and on methods for individual differences
preference (or other dominance) data. (For a general discussion of MDS, including
many of those models and methods not discussed in detail in the present paper, see
Carroll and Arabie 1980.)
67

The procedures to be discussed here are organized under 3 general headings. These
are: I. Two-Way (Non metric or metric) Multidimensional Scaling (MDS) procedures;
II. Three-Way Multidimensional Scaling (MDS) procedures; III. MDS Analysis of
Preference (or other Dominance) Data.

A complete outline of the text of this paper follows, including names of programs and
their authors.

I. Two-Way MDS of Proximity Data <Theoretical references: Shepard 1962a,b;


Kruskal 1964a,b).

LA. MDSCAL-5 (Kruskal and Carmone 1969) and KYST, KYST2 and
KYST-2A (Kruskal, Young and Seery 1973).

I.B. Some ecological data on 88 species of seaworms analyzed by KYST-2A.

II. Three-Way MDS of Proximity (and other) Data.

II.A. INDSCAL (Carroll and Chang 1969, 1970; Chang and Carroll 1969) and
SINDSCAL (Pruzansky 1975).

II.B. mIOSCAL (Carroll and Chang 1972a; Chang and Carroll 1972a; Carroll
and Wish 1974a).

II.C. An application of three-way MDS to the ecological data on sea worms due
to Fresi et al.

III. MDS and Multidimensional Analysis of Preference (or other Dominance) Data.

lILA. MDPREF (Carroll and Chang 1964; Chang and Carroll 1968).

III.B. PREFMAP and PREFMAP-2 (Carroll and Chang 1967; Chang and
Carroll 1972b) and PREFMAP-3 (Meulman, Heiser and Carroll 1986).

III.C. MDPREF analysis of the Fresi et al. seaworm data, and relation to
previous analyses via KYST-2A and SINDSCAL.
68

I. TWO-WAY MDS OF PROXIMITY DATA

I.A MDSCALS, KYST, KYSTl and KYST-1A


The multidimensional scaling (MDS) programs, known as MDSCAL5, KYST,
KYST2 and KYST-2A, most closely associated with J. B. Kruskal, are highly versatile
in the sense that they can be used for a large variety of scaling problems. These
programs have gone through several versions so far, and the discussion below relates to
the fifth version of MDSCAL, and to all three versions of KYST. The original
theoretical discussion of the procedure can be found in Kruskal (1 964a,b). Detailed
documentation of these programs can be found in Kruskal and Carmone (1969) or in
Kruskal, Young and Seery (1973). A complete discussion of the computational method
used is given in Kruskal (1977).
Discussion of Computational Procedure
The problem attacked by Kruskal (1964a,b), following Shepard's (1962a,b)
pioneering work in this area, and known generically as nonmetric (or, with use of certain
options, metric) two-way MDS, is that of deriving a configuration of objects in a
prespecified number of dimensions, given a set of proximity data among pairs of objects.
Let oij represent the original measure of dissimilarity between pairs of objects i and j.
Assume for the moment that the dissimilarities can be strictly rank ordered. (The way
that ties are handled by the program will be discussed later.> The objective is to
represent the n objects by n points in an r-dimensional space, such that the rank order of
distances between pairs of points best reproduces the rank order of the o's. Let the
coordinates of the i-th point in that space be defined by a vector Xi = (Xi 1 , . . . , Xir)'

Let d ij denote the distance from Xi to Xj. Let X be the matrix whose i-th row is Xi;

thus, X == (Xii), for i ... 1,2 ... n (objects) and t = 1,2 ... r (dimensions).

The criterion is that of minimizing the function called "Stress" given by one of two
alternate formulas. The one now known as Formula 2 is:

STRESSFORMULA 2 = J~ IJ
(dij - dij)2/~ (dij -
IJ
d)2 , (I.A.I)

where d is the average of all the dij's. The one known as Formula 1 is:

STRESSFORMULA 1 = J~ ~
ij
(d·· -
IJ
d·IJ.)2/~ d'f·IJ .
~
ij
(I.A.2)
69

The problem, then, can be expressed as that of finding the matrix X such that

Euclidean distances, defined as d ij =


A
.J~ (Xit - Xjt) 2 computed from that matrix,

best match the 0ij's. The dij's are a set of numerical values chosen to be as close to
their d ij counterparts as possible, subject to being monotone with the original 0ij's. The
A

di/s are simply fitted values in the monotone regression procedure.

The two formulas above will be abbreviated here as S 2 and S 1, respectively. S 2 is,
in MDSCAL5, the "normal" or default option. In the various versions of KYST, S 1 is
the default option. It should be mentioned that the two Stress formulas differ only in
the normalizing factor in the denominator. In all cases the ~ implies summation over
ij

all values of i and j for which there are data. For example, if a half-matrix option with
diagonal absent is used, the sum would be only over that off-diagonal half-matrix, while
if, say, the whole matrix option with diagonal present is used, summation is over all n 2
values. If there are missing cells the summation skips these cells. Furthermore, in the
case of S 2, d is the average over these same values of i and j.

The procedure used for obtaining the x's is the method of steepest descent. Briefly
stated, the method involves improving the starting configuration a bit by moving it
slightly in the direction of the negative gradient, or direction of steepest descent. The
direction of steepest descent is the direction in the configuration space (the space defined
by all n· r parameters of the X matrix) along which stress is decreasing most rapidly.
This direction corresponds to the (negative) gradient which is defined by evaluating the
partial derivatives of the function S (S 1 or S 2, depending on which option is used).

Letting S stand for either S 1 or S 2, then the gradient will be a vector of n· r


components whose general entry is

(i=1,2, ... ,n; (=1,2, ... ,r).

The n . r components of this vector can be "packed" into a matrix G of the same row

and column order as the X matrix; thus G == [- :~t]. On each iteration a step size ex

is defined in a way described in Kruskal's original paper [1964b] and ex times G is added
to X to get an "improved" estimate of X. Using a subscript I for the I-th iteration, the
70

iterative process can be described as follows:

Given Xl == (Xit), the I-th estimate of X,

1. Compute Gl = [- :!t 1 (evaluated at X - Xl)'

2. Compute (Xl (as described in the above-cited Kruskal paper), and then,

Xl + 1 is, then, the improved estimate of the X matrix corresponding to iteration 1+ 1.


This iterative process continues until convergence occurs, as determined by convergence
criteria specified in detail in Kruskal and Carmone (1969) or Kruskal, Young and Seery
(1973). The Xo matrix defines the "initial configuration," (corresponding to the O-th
iteration) which may be defined in a number of different ways. One option is to
generate a starting configuration by a procedure that puts all points along the coordinate
axes in a systematic way (but one that results in an "essentially random" placement of
the points along these axes). This is the one that is sometimes referred to as the "L
shaped" starting configuration because, in two dimensions, the configuration does,
indeed, resemble an "L."

A second option involves a more fully random configuration ("filling" the space more
completely) which can be used by providing a "seed" number for a random number
generator. The configuration, in this case, is generated by choosing points randomly
from a spherical multivariate normal distribution. By choosing different seeds for the
random number generator, of course, different random starts can be used.

A third option is for the user to provide a starting configuration. This may be a
"rational" start provided by using some other procedure, an a priori configuration of
some kind, or one provided by a previous run of the same program which requires
additional iterations.

As a fourth option, if one is securing solutions in several dimensionalities in one run,


the first r dimensions of the (r + 1)st dimensional solution can be used to define the
starting configuration for the r-dimensional solution.
71

All of the options listed above are available in both MDSCAL-5 and in the various
versions of KYST. KYST, KYST2, and KYST-2A have an additional option for the
starting configuration, which is probably the most important algorithmic distinction
between the KYST and the earlier MDSCAL family of MDS programs. This option
entails using an adaptation of the classical metric MDS technique associated with
Torgerson (I958) or Gower (I966) to derive a starting configuration. This starting
configuration is similar to, but not quite identical with, that in programs by F. W.
Young called (generically) TORSCA (Young and Torgerson, 1967, Young, 1968). In
the variant of this "TORSCA" starting configuration used in KYST, KYST2 and
KYST-2A, a linear transformation of the data is implemented to assure the data values
are all positive and that the ratio between the smallest and largest values has a
reasonable value. (This provides a practical solution to what is sometimes called the
"additive constant" problem in metric MDS methods')

Special Features of the MDSCAL5 and KYST Programs

The MDSCAL5 and KYST programs can cope with a variety of problems arising in
the original dissimilarities data. We shall discuss them in this section.

Missing Data - the program can be set to identify missing observations by reading in a
cut-off value below which data will be treated as missing. The Stress function is
modified by simply omitting, both in the numerator and denominator, the terms which
correspond to the missing cells.

Nonsymmetry - either because of inherent non symmetry of measurement procedures or


errors in measurement, the values of oij and Oji may not be equal in some cases. In such
a situation the Stress function is computed over all cells {i.e., i, j and j, 0 and
minimized in the algorithm.

Ties - two approaches are possible for resolving ties between dissimilarities (a tie arises
wherever oij = 0kl). These are called primary and secondary approaches.

In the primary approach, when oij = Okl no restriction is placed on the corresponding
~

d's. Thus, if oij = Okl, dij may be greater than, less than, or equal to d kl , without a
~

necessary penalty in the Stress function (since d ij may be greater than, less than, or
equal to d kl ).
72

The secondary approach is appropriate when fJ ij = fJ kt is taken to mean that


dij = d kt . Then if d ij ¢ d kt , the terms (dij - dij ) 2 and (dkt·- d kt )2 cannot both be
zero so that Stress might be lowered by making a correction to the configuration, tending
to bring dij and d kt more nearly into agreement (at least those two components of
Stress would be lowered, although, of course, other components may be increased at the
same time).

Non-Euclidean Distances - the user of the MDSCALS or KYST programs can choose
any Minkowski-p metric, by specifying the value of p (~1.0), thus causing the program
to use the following formula for computing dij:

d ij = [~IXit-XjtlplllP U.A.3)
t-1

This option enables one to use this specific class of non-Euclidean distances. The
Stress and gradient formulas are changed accordingly. (While p is usually restricted to
be ~ 1.0, values between 0 and 1 can in fact be used, and may be meaningful in some
circumstances. If the "lip" power is omitted, this formula does, in fact, yield a metric.
For discussion of this, see Carroll and Wish 1974a.)

Definition of Gradient

It is possible to write a general equation for the (negative) gradient of S 1 or S 2 for


any of the Minkowski-p metrics (recalling that p = 2 corresponds to the Euclidean
case). Letting Sa (a = 1 or 2) stand for either S 1 or S 2, this equation is:

U.A.4)

- aSa [
git = aXit = Sa
2
~ (djk _ d a)2
]
f dik-dik-S~(dik-da)]1
dff-O Xkt - Xit P
1 -2
(Xkt - Xit) ,

where p is the parameter of the Minkowski p metric, and where

(LA.S)

while d ik is the current "estimate" of d ik derived from a least squares monotone


regression (or from some other least squares regression procedure, options for which have
already been described).
73

Both the definition of Stress and of the gradient are necessarily different for the
various "split data" options, which will be described in the section on splitting data
below.

Options for Regression

Four basic options exist for performing the regression of d ij on Oij. These are:

1. Regression-Ascending - for performing monotone regression when the original data


are dissimilarities.

2. Regression-Descending - for performing monotone regression when the original


data are similarities.

3. Regression-Polynomial - specified integer (degree of polynomial) for performing


polynomial regression. If the degree of polynomial is equal to one, it becomes
linear regression. An integer from 1 through 4 can be used. In the linear case one
has the option of including or excluding a constant term (i.e., the linear function
may be non-homogeneous or homogeneous).

4. Regression-Multivariate - integer (number of variates) for performing a


prespecified regression by supplying a separate FORTRAN subroutine for same.
This option, in principle, allows essentially any linear regression function of the
A C
form d = ~ acgc (0) (C ~ 5) to be used, so long as an algorithm is available in
c-l

the form of the above-mentioned FORTRAN subroutine for computing the


functions gc (0).

Options for Data Input

The input matrices can be in one or more of these forms:

l. Full, matrix, diagonal present

2. Lower half matrix, diagonal present

3. Lower half matrix, diagonal absent

4. Upper half matrix, diagonal present


74

5. Upper half matrix, diagonal absent

6. Lower corner matrix·

7. Upper corner matrix·

• A corner matrix is a rectangular (MXN) matrix which is treated as an off-diagonal


submatrix of a larger (M + N) X (M + N) full (square) matrix with the remaining
entries handled as missing data.

Initial Configuration - the user may supply a starting configuration for scaling the
objects. If not, two varieties of a random start can be used, as discussed above. Also, as
discussed earlier, other options exist if solutions in more than one dimensionality are
obtained. Finally, as discussed earlier, in the KYST programs, the "TORSCA"-like
start is another option.

Splitting Data - four options exist for using parts of the data as separate sublists and
then performing separate regressions for each of these sublists. They are:

1. Split by rows

2. Split by groups

3. Split by decks

4. Split no more (a control phrase used to indicate that no more "split" options are to
be specified).
The first three options make each row of every data deck, each group of rows (see
Kruskal and Carmone 1969, for explanation of this) or each data deck a separate sublist,
respectively. The "split no more" option is relevant only when several data decks are
used. It causes all subsequent data decks to be joined into a single sublist until further
indication.

In case any of the "split data" options are used, it is necessary to redefine Stress as
follows:

S·a = J~ B
~b S~b , O.A.6)

where b stands for a data "block" (which may be a row, group, or deck, depending on
75

options used), N B is the number of such blocks, while S~b is S 1 or S 2 (for a = 1 or 2,


respectively) defined on block b. S*a is then the overall Stress (of type a), defined
simply as the root mean square of the individual Stresses.

The gradient can be defined easily. Dropping the "a" from Sand S*, the overall
gradient is simply:

(I.A.7)

Data Saving - it is possible to use the same data for performing different methods of
scaling by using the option called "Save Data."

Weighting of Data - the MDSCAL5 and KYST programs allow for differential
weighting of the original data values. This can be done either by supplying a matrix of
weights in the same way as the data are laid out or by using a FORTRAN subroutine
for generating weights internally. The standard weights are taken as 1.0 for each
observation. Further details on this and other aspects of these programs can be found in
Kruskal and Carmone (1969) or Kruskal, Young and Seery (1973). More information
and a general introductory overview of "two-way" multidimensional scaling generally (as
well as a brief summary of "three-way" MDS) can be found in Kruskal and Wish
(1978).

I.B Some Ecological Data on 88 Species of Seaworms Analyzed by KYST -2A

Some ecological data collected over a period of two years at 5 sites in the harbor of
Ischia in the Bay of Naples are described in detail in a later section OI.C) of this paper.
Also described in that section is the computation of a number of different proximity
(derived dissimilarity) matrices (one for each of the 5 sites, a number for various time
periods, and an "overall" dissimilarity matrix).

While leaving details of this measure for later, we will describe briefly here the
results of applying KYST-2A to the "overall" dissimilarity measure calculated for the
Fresi et al. data. Before KYST-2A could be applied to this data, a subset of the
seaworm species had to be eliminated. The reason for this is that our version of KYST-
2A would handle only 60 objects (in the present case, the species of sea worms) .
Inspection of the original data in the Fresi et al. (1983) paper indicated that 33 species
76

were observed only twice in the entire study (i.e., at anyone of the sites in anyone of
four time periods). Thus these 33 were eliminated, leaving a total of 55 species to be
analyzed by KYST-2A.

Table 1. Biological names of 88 seawonn species in data from Fresi ~~. (1983). These marked with asterisks
were the 55 most frequent species in that data, which were analyzed via KYST2-A.

• I Lepidonolus clava (Montagu) • 45 Nereidae gen. sp. 3


2 Pholoe synophlhalmica (ClaparCde) • 46 Lysidice ninetla Audouin et Milne-Edwards
3 Paleonlus debilis (Grube) • 47 Lumbrinereis coccinea (Renieri)
4 Eleone sp. • 48 Lumbrinereis funcha/ensis (Kinberg)
5 Phyllodoce (cfr.> villala Ehlers • 49 Lumbrinereis inflala (Moore)
6 Eu/alia sanguinea Oersted 50 Lumbrinereis sp.
• 7 Eulalia viridis Linneo • 51 Arabella geniculala (Claparooe)
8 Hesionidae gen.sp. • 52 Arabella iricolor (Montagu)
• 9 Syllis (cfr.) vivipara Krohn 53 Dorvillea rudolphii (Delle Chiaje)
• 10 Syllis gracilis Grube • 54 Polydora ciliala {Johnston)
0 II Syllis hyalina Grube • 55 Po/ydora caeca (Oersted)
• 12 Syllis armillaris (Miiller) • 56 Po/ydora sp.
0
13 Syllis prolifera Krohn 57 Spio filicornis (Miiller)
• 14 Syllis spongicola Grube • 58 Dodecaceria concharum Oersted
15 Syllis cirropunclala Michel 0
59 Cauleriella bioculalus (Keferstein)
16 Syllis amica Quatrefages 60 Cirralulus cirralus (Miiller>
17 Syllis cornuta Rathke • 61 Cirralulus chrysoderma Claparede
18 Syllis kronii Ehlers 0
62 Cirriformia filigera (Delle Chiaje)
• 19 Trypanosyllis zebra (Grube) 63 Clenodrilus serralus (0. Schmildt)
20 Odonlosyllis clenosloma Claparede • 64 Cirratulidae gen. sp. I
21 Odonlosyllis fulgurans Claparede • 65 Cirralulidae gen. sp. 2
22 Pionosyllis sp. 66 Theosloma oerstedi (Claparede)
23 Eurysyllis luberculala Ehlers 67 Capitellidae gen. sp.
0 24 Brania c1avala Claparede 0
68 Streblosoma hess/ei Day
• 25 Brania pusilla Dujardin 69 The/epus cincinnatus (Fabricius)

·••
26 Exogone verugera (Claparede) 70 Nicofea venus/ufa (Montagu)
0
27 Exogone gemmifera Pagenstecher 71 Amphig/ena mediterranea (Leydig)
0 28 Exogene sp. 72 Potamilla tarelli Malmgren

··
29 Sphaerosyllis hystrix Claparede 73 Mixieo/a aeSleliea (Claparede)
30 Sphaerosyllis c1aparedi Ehlers 74 Fabrieia sabella (Ehrenberg)
• 31 Auto/ylus aurantiacus Claparede 75 Oriopsis (err.) eimeri (Langerhans)
32 AutO/ylus prolifer (Miiller> 76 Sabellidae gen. sp.
0 33 Autolylus sp. 0
77 Pi/eo/aria sp.

·
34 Syllidae gen. 78 Pomaloceros Iriqueter (Linneo)

·
35 P/atynereis dumerilii Audouin et Milne-Edwards 79 Hydroides pseudouneinata Zibrowius
0
36 Pfatynereis coecinea Delle Chiaje 80 Hydroides e/egans (Haswe\l)
• 37 Nereis zonala Malmgren 81 Hydroides dianthus (Verrill)
• 38 Nereis persiea Fauvel 82 Serpufa eoneharum Langerhans
• 39 Nereis sp. • 83 Vermiliopsis striatieeps (Grube)
0 40 Ceratonereis costae (Grube) 84 Vermiliopsis sp.
• 41 Perinereis maeropus (ClaparCde) 0
85 Filograna imp/exa Berkeley
• 42 Perinereis cultrifera Grube 0
86 Spirobraneus po/ytrema (Philippi)
0 43 Nereidae gen. sp. I 87 Protula sp.
• 44 Nereidae gen. sp. 2 88 Serpulidae gen. sp.
77

Table 1 indicates the names of all 88 seaworm species analyzed in this paper. The
sequential numerical code on the left is actually used in the various plots in this paper.
Asterisks indicate the 55 most frequent species analyzed by KYST·2A. The "regression
ascending" option was used, with the primary option for ties, and STRESSFORMULAI.
Analyses were done in 6 down to 1 dimension (s).

In MDSCAL or KYST analyses, a plot of STRESS vs. dimensionality is often used


as an aid in deciding on the most appropriate dimensionality.
lI)

~ ,------------------------------------------,

o
N
o

I!)

ci

o
o

I!)
o
ci

o
ci
2 3 4 5 6
dimensionality

Figure 1. Plot of STRESS (formula 1) vs. Dimensionality for


KYST-2A analysis of Fresi et al. data.

Figure 1 shows this plot. One often looks for a clear "elbow" in such a plot; that is, a
dimensionality after which STRESS falls off only minimally (and more or less linearly)
with dimensionality. While inspection of Figure 1 does not yield an absolutely clear
78

dim3

49
27
----------lI~t--=ta~-;;2~5---=-+e------dim 1

24
85

Figure 2. The dimension one-three plane of the four dimensional


KYST-2A solution for the 55 most frequent of the 88 sea worm
species from the Fresi et al. data. "Overall" derived dissimilarity
matrix used as input.

"elbow," it was decided that the most appropriate dimensionality was four.

For reasons to be discussed later, the four dimensions were plotted in two planes, the
plane defined by dimensions one and three (in Figure 2) and that defined by dimensions
two and four (in Figure 3).

dim4

85

71
80
74 58 59
75 fffj1 49

dim2
83
47 10
13 38
42 62

54

Figure 3. The dimension two-four plane of the four dimensional


KYST-2A solution for the 55 most frequent of the 88 sea worm
species from the Fresi et al. data.
79

The 55 seaworm species included in this analysis are shown in these figures, using the
sequential coding indicated in Table 1. Since the present author is not a biologist, and
has no knowledge whatever about these particular species of seaworms, we leave
substantive interpretations of these {and other dimension plots to be seen later} to
subject matter experts.

II. THREE-WAY MDS OF PROXIMITY (OR OTHER) DATA

Before a detailed discussion of three-way {and possibly even higher-way} MDS or


other data analysis models and methods, some terminology is needed. Because of our
psychological roots we often speak of "stimuli" and of individual "subjects". A more
neutral pair of terms, however, is "objects" {which can be entities - of any type
whatsoever - one is interested in studying; e.g., species of seaworms, variables, sites,
times, countries in Western Europe, epistemological theories, numerical ecologists,
monads, Hilbert spaces, or brands of soap} and "data sources" {which, as the phrase
suggests, comprise any source of data about these objects; e.g., individual numerical
ecologists, who may make judgments of similarity among various species of Polychaetes,
different times - say in a longitudinal study - in which measures of correlation over
species are computed to provide measures of proximity among various variables - with
species and variables comprising "objects" in these two examples, respectively}. Clearly,
what may be a "stimulus" or other "object" in one context may be an "individual" or
other "data source" in another!

We also often speak of the number of ways of a data array, as when we refer to
two-way, three-way, or higher-way models and methods, for two-way, three-way or
higher-way data. The simplest "way" {pardon the ambiguity!} to think of this use of the
term "way" is that it is the number of indices, or subscripts, necessary to index the data.
The Fresi et al. data to be described in detail shortly can be viewed as four-way data
{species x sites x months x years} since we would need four indices to keep track of these
four different modalities. If, however, one were to argue {as one well may} that months
and years should readily be thought of as a single mode, and thus a single way of the
data array {indexed by only one subscript, ranging systematically - say sequentially in
time - over all month-year combinations} then it might as easily be formalized as a
three-way data array.
80

Our point here is that the number (and nature) of "ways" in a data array is largely
"in the mind of the beholder" (or, more to the point, is dependent on the aims of and/or
conceptual structure imposed by the data analyst/researcher trying to understand a
particular batch of data) .

Another term often used in reference to data arrays, already alluded to tangentially
above, is "modes". A data mode is a type or category of entity (e.g., the "species"
mode, "time" mode, "site" mode, or "variable" mode) which mayor may not correspond
to the "ways" of the data array. In general, the number of "ways" will be at least as
great as the number of "modes", but may be greater (because two or more different
"ways" of the data may correspond to the same "mode"). The best example of the
latter phenomenon is the case (already considered in Section I) of a two-way, but one-
mode n x n (usually, but not necessarily symmetric) matrix of proximities (similarity or
other proximity measures among sea worm species, for example, or correlation coefficients
among variables).

Another case in which the number of "ways" exceeds that of "modes" - which we
shall soon encounter - will be a data set that is two-mode (seaworm species - the
"objects" - by "data sources" derived, as will be described, from data corresponding to
various combinations of sites, months and years), but three-way (species x species x data
source). As will be seen in detail shortly, we shall begin with a data array that is either
four-mode and four-way, or three-mode and three-way (depending on whether one feels
"month" and "year" should be treated as separate modes/ways or as a single
mode/way), and derive from this another data array that can be conceived as being
two-mode, three-way data of proximities among the 88 sea worm species ("objects") for
14 different "data sources".

II.A INDSCAL

The INDSCAL approach, standing for INdividual Differences SCALing of proximity


(or other) data by means that retain information on individual differences, was
developed by Carroll and Chang (1970). Two basic options exist in the INDSCAL
program: a) INDSCAL analysis per se (called INDIFF in the program) - for scaling
stimuli (or other objects) for which symmetric matrices of proximity measures are
available for a number of individuals or other data sources, in terms of a weighted
81

Euclidean model often called the INDSCAL model and b) CANDECOMP analysis -
for scaling stimuli (or other objects) for which (for example) measurements are available
on a number of variables G.e., the input matrices are, in general, rectangular and non-
symmetric) in a number of different conditions (e.g., observational contexts,
experimental variations, times, sites, or other "modes" or scenarios distinguishing the
various object x variable matrices). The CANDECOMP part of the algorithm uses
Carroll and Chang's method of canonical decomposition of N-way tables. INDSCAL
analysis (option a) actually corresponds to using symmetric CANDECOMP with pre-
and post-processing, to be described below.

The INDSCAL Model

The INDSCAL model of individual differences is based on two major assumptions


which are stated below. While (since this model was originally devised in a
psychological context) we will refer to "stimuli" and "individuals," and to individual
differences in perception as reflected in similarity judgments, this model can be applied
to individual differences, among any type of data sources, in similarity or dissimilarity
measures defined on all pairs of objects (e.g., species), from domains other than that of
psychological "stimuli."

Assumption 1 - a set of r dimensions or "factors" underlie the n stimuli. These


dimensions are assumed to be common for all m individuals making similarity
judgments, i.e., they are sufficient to account for (except for "noise" or error) the
similarity judgments (or other proximity values) associated with all m subjects, or other
data sources. Let X == (xjt) represent the matrix of stimulus coordinates in the
common, or group space; Xjt is the coordinate value for the j-th stimulus on the t-th
dimension; j = 1, 2, ... , nand t = 1, 2, ... , r.

Assumption 2 - the similarity judgments for each individual are related in a simple way
to a "modified" Euclidean distance in the group stimulus space. In particular, the
relationship is assumed to be linear (in the metric version) or monotone Gn a quasi-
nonmetric version). We shall describe the metric version which is the one used
predominantly. (The quasi-nonmetric version is implemented in a program called
NINDSCAL, available on the MDS-I tape, but this will not be discussed further here.}
We assume that the dissimilarity measure, oW, provided by the i-th individual for the
82

pair of stimuli j and k, is related to a modified or weighted Euclidean distance, dW, by:
L(;)[fJW)-dW (ILA.I)

where L (;) is a linear function with positive slope. The subscripts j and k (for stimuli or
other objects) range from 1, 2, ... , n and the superscript i (for individuals or other
data sources) ranges from 1, 2, ... , m.

The "modified" Euclidean distance for the i-th subject is given by the formula:

(II.A.2)

This formula differs from the usual Euclidean distance formula only in the presence of
the weights Wit, which represent the saliences or "perceptual importances" for the i-th
individual of the 1-th dimension of the group perceptual space, represented by the matrix
X. Another way to express the d}~ 's are as ordinary Euclidean distances computed in a
"private" space for individual i whose coordinates are:
(j) _ 112 (II.A.3)
Yjt - Wit Xjt·

This is a space that is like the X-space except that the configuration has been expanded
or contracted (differentially) in directions corresponding to the coordinate axes. This
can be seen to be a linear transformation with the transformation matrix restricted to be
diagonal (the diagonals being square roots of the w's). This class of transformations is
sometimes referred to as a "strain."

The above model is sufficiently general to accommodate individuals with widely


divergent perceptions of a set of stimuli in terms of a common perceptual space. For
example, consider a two-dimensional perceptual space of a set of automobile brands
whose axes are identified as luxuriousness and sportiness. Let us now imagine two
individuals, P and Q, who view the brands entirely differently, individual P considering
the brands only on one dimension (luxuriousness) while individual Q views the brands
only on the other (sportiness). The INDSCAL model has the capability of
accommodating the judgments of both persons P and Q, by allowing dimension weights
of (1,0) for P and (0,1) for Q.

The same basic model, but without a method for fitting the model to data, was
proposed independently by Horan (1969). Alternative methods of fitting the INDSCAL
83

model (sometimes called simply the "weighted Euclidean model") to data have been
proposed by Bloxom (1968), Takane et al. (1977) and Ramsay (1977).

Estimation of Parameters

We now briefly discuss the procedures by which the parameters of the model,
namely, the n x r elements of the X-matrix and the m x r elements of the matrix
W == (Wit) are estimated from dissimilarity judgments on all possible n (n - 1) 12 distinct
pairs of stimuli by m individuals.

The first step in the method of estimation is to convert the dissimilarities into
distance estimates. In view of the linearity assumptions made above, this is done using
the standard procedure described in Torgerson (1958). This method entails estimation
of an additive constant which converts the comparative distances (i.e., the original
dissimilarity judgments) into absolute distances between pairs of stimuli. The method
estimates the smallest value of the constant which guarantees satisfaction of the triangle
inequality for all triples of points. This can easily be shown to be
c~ln = rr;:lx [oW - oW - oM)). This constant guarantees that the triangle inequality

will be satisfied for all triples of points, with the inequality being a precise equality for at
least one triple (the one for which the expression above attains its maximum). It is as
though these three points lie precisely on a straight line in the multidimensional space.
This is why this scheme is sometimes called the "one-dimensional subspace" method of
estimating the additive constant. Any constant larger than c~ln would certainly suffice
also, but c~ln is, as its name implies, the smallest constant guaranteeing this. While
there are a number of other schemes of estimating the so-called additive constant (see
Torgerson 1958), this one is one of the simplest (both conceptually and numerically) and
most assumption-free. Having estimated c (j) in this way, distance estimates, 'd~~, are
calculated as 'dW = oW + c (0 .
The distance estimates are then converted for each subject to scalar products between
the points represented as vectors issuing from an origin at the centroid of all the points.
This is done by double centering the matrix whose entries are -1/2 ['dW ]2. The

b
resulting numbers, ~~, can be regarded as the estimated scalar products between the
vectors y50 == (y 5}) , y 5q , ... , y 59), and y M). This step is the same as in the "metric"
84

phase of the TORSCA (Young, 1968) algorithm, and in generating the "TORSCA"
starting configuration in KYST, KYST2 and KYST-2A (Kruskal, Young and Seery,
1973).

The derivation below shows that these numbers are, in fact, estimated scalar
products. (Readers not interested in this derivation are advised to skip to the section
entitled "Scalar Product Form of INDSCAL Model.")

Derivation of Scalar Products from Distances

Given exact squared Euclidean distances


r
dYk = ~ (Xjl - Xkl)2 (II.AA)
I-I

assume:
n
~ X jl = 0 for all t = 1, 2, ... , r (II.A.S)
j=1

(We may do this without loss of generality, since the origin of the x space is arbitrary,
and this just fixes it at the centroid of all n points') Expanding (II.AA) ,

(II.A.6)

= ~ XYI - 2~Xj1Xkl + ~xil


I I I

where

(II.A.7)

and

bjk = ~ XjlXkl (the scalar product) (I1.A.8)

Because of II.A.S

b.k=bj.=b .. =O OI.A.9)
85

From (II.A.6) and (II.A.9) we have

d 2.k = e2. + ek2 UI.A.I0)

d~J' = e~J + e2. (II.A.ll)

d~. = U~ UI.A.12)

where
1 n
e2. = - ~
~
e~J (II.A.13)
n j

Then (II.A.6), (II.A.I0), (II.A.ll) and (II.A.12) together imply that

if.IJ - d 2• k - d~J' + d 2.. = -2b'Jk (II.A.14)

Multiplying both sides by -1/2 gives the desired result.

Note that we didn't have to know anything about geometry to derive this result. The
law of cosines, for example, was never mentioned.

Note also that this is an exact result for deriving exact scalar products (about an
origin at the centroid) from exact Euclidean distances. In practice, of course, we derive
A A

estimated scalar products (b's) from estimated distances (d's).

Scalar Product Form of INDSCAL Model

A scalar product form of the INDSCAL model can be devised by substituting


vectors y)O and y r) in the "private space" for individual i into the definition of the
scalar product, as follows:

UI.A.15)

Thus, the three-way matrix of individuals by stimulus pairs, where general entries are
the values of bWderived from the dissimilarity data, can, if the INDSCAL model holds,
be decomposed into the trilinear form in equation UI.A.15). The problem now is one of
estimating values of the X-matrix and the W-matrix where elements enter into the
right-hand side of equation (II.A.15). This estimation (in a least squares sense) can be
achieved by a procedure called "canonical decomposition of N-way tables" (now usually
abbreviated CANDECOMP). In this particular case, N = 3, since there are three ways,
two for stimuli and one for individuals. The CANDECOMP procedure, for the general
86

N-way case <N ~ 3) is described in detail in Carroll and Chang (1970).

CANDECOMP is actually designed to analyze data in terms of a more general


trilinear model (or multilinear, in the N-way case for N > 3) which Gn the 3-way case)
is of the form:
r
Zijk - ~ aitbjtCkt (II.A.16)
t-}

where Zijk represents data, the a's, b's and c's are parameters to be estimated and ";;;0"
here implies least squares estimation. The CANDECOMP procedure provides least
squares estimates of these parameters (the a's, b's and c's) via what is now called an
Alternating Least Squares procedure but was originally called a NILES <Nonlinear
Iterative Least Squares) or NIPALS <Nonlinear Iterative PArtial Least Squares)
procedure (see Carroll and Chang, 1970).

The INDSCAL special case is obtained by making the following identifications:

jZijk
ait
=
=
bW
Wit (II.A.17)
b jt == Cjt - Xjt

However, when CANDECOMP is applied to a 3-way table of scalar products (which


are symmetric in the j,k indices), no special constraint is imposed to make "matrix 2"
[B = (b jt )] equal to "matrix 3" [C = (Ckt)). This will be true (up to the class of
admissible transformations, namely that of a "strain," or linear transformation given by
a diagonal matrix) when the iterative process has converged. That it be exactly true is
guaranteed by setting them equal in a final stage of the program. For a general
discussion of CANDECOMP, including theory and applications of its higher (than
three) way generalizations, and an extension called CANDELINC enabling linear
constraints on parameters, see Carroll and Pruzansky (1984). Carroll, Pruzansky and
Kruskal (1980) provide a general discussion of CANDELINC.

Normalization of Data and Solution

In the algorithm for the INDIFF part of the program (the part that does the actual
INDSCAL analysis using CANDECOMP as a subroutine), the original data and final
solutions are normalized. In the case of the original data, the scalar product matrices
87

are normalized such that the sum of squares of the scalar product matrix is set equal to
unity for each subject (or "data source"). In the case of INDSCAL analysis, the final
stimulus space is normalized such that the variance of projections of stimuli on the
coordinate axes is equal to unity and the centroid is at the origin. The appropriate
companion normalization is applied to the subject-matrix.

The combination of these two different procedures has one interesting outcome: the
square of the Euclidean distance of a subject's point from the origin can be
(approximately) interpreted as the (proportion of) total variance accounted for in the
scalar products data for that subject. If the dimensions of the stimulus space are
orthogonal, then the square of the Euclidean distance of the subject's point will exactly
equal the proportion of variance accounted for. No normalization of the data is done for
the CANDECOMP option; there is, however, a normalization of the solution.
Specifically, all matrices but the first are normalized to have unit sums of squares for
each dimension. All the differences in sums of squares are then absorbed in the final
matrix. When using CANDECOMP the origins of the various spaces are not
constrained at all.

Input Parameters

The various input parameters of the INDSCAL program are enumerated below:

Data Input Options - these are controlled by a parameter called IRDAT A. Eight
alternatives are provided in the program, corresponding to integer values of 0 to 7 for
IRDATA.

IRDAT A Input Option

o Rectangular matrices (this is the CANDECOMP option)


1 Lower half of similarities matrix without diagonal
2 Lower half of dissimilarities matrix without diagonal
3 Lower half of Euclidean distance matrix without diagonal
4 Lower half of correlation matrix without diagonal
5 Lower half of covariance matrix with diagonal
6 Full symmetric matrix of similarities
7 Full symmetric matrix of dissimilarities

In cases 1-5 the matrix can also be read in as an ordered vector.


88

The user can obtain either a simultaneous or a successive r-dimensional solution. In


the simultaneous case all dimensions in the matrices are computed at one time, whereas
in the successive case, as the name indicates, only one dimension is estimated at a time.
In general, unless there is good reason to do otherwise, a "simultaneous" solution should
be obtained. The user can control stringency of convergence of the iterative process by
two parameters, namely, maximum number of iterations and another specifying a
convergence criterion based on changes in the fit measure from iteration to iteration.

An option exists for not setting matrix 2 equal to matrix 3. In the general
CANDECOMP analysis this option must always be chosen since, in general, the input
matrices are different. In the case of INDSCAL analysis, however, matrix 2 is set equal
to matrix 3 since, by symmetry, these input matrices should be equal. When done in the
latter fashion, we refer to the CANDECOMP analysis (say of the derived scalar
products) as symmetric CANDECOMP.

The INDSCAL program can also be used in solving for the weights assigned by
subjects to a prespecified configuration. The program also has the ability to use a
prespecified configuration as a rational start even in the case in which all matrices are to
be solved for.

More complete details of how to use the INDSCAL program can be found in Chang
and Carroll (1969).

SINDSCAL

SINDSCAL (Pruzansky, 1975) is another computer program that implements the


procedure of Carroll and Chang (I970) for fitting the INDSCAL individual differences
model for multidimensional scaling. It is a modification of the more general INDSCAL
program of Chang and Carroll (1969) described above. (See also INDSCALS described
in Chang (1971).) INDSCAL was written to allow as input either rectangular or
symmetric matrices of proximities. Since almost all of the applications of INDSCAL to
date have used similarities or dissimilarities, Euclidean distances, correlations or
covariances, SINDSCAL was written to handle only these symmetric data. It is also
limited to the three-way case.
89

The method of analysis used in SINDSCAL is essentially the same as the method of
Carroll and Chang (970) used in INDSCALS (Chang 1971). Therefore, the final
stimulus and weights configurations should be identical (except for possible differences
due to different convergence criteria, starting configurations. or other numerical details).
The principal differences between SINDSCAL, INDSCALS or INDSCAL used with
three-way "INDIFF" options lie in the computational procedure and user options.

Modifications in the internal program structure have yielded:

(1) a considerable reduction in memory requirements achieved by storing the input


data in symmetric storage mode. (An n x n stimulus matrix in symmetric storage
mode is reduced to a vector of length n(n + 1) 12.)

(2) considerable simplification of the computational algorithm in the main


computation subroutine, called CANDE.

(3) some reduction in computation at various stages in the procedure.

These changes along with the use of the global optimization feature of the Fortran-
IV compiler result in significant savings in computer charges. Additional savings may
result because SINDSCAL uses dynamic storage allocation. Small data sets may be run
with proportionately smaller computer memory and, therefore, some savings in cost.

Some user-oriented changes include:

(I) a reduction in the number of input parameters required,

(2) additional plotting and printing options,

(3) provision for a user-supplied subroutine to preprocess the input data,

(4) sufficient printout throughout the computation so that most of the information
from a run can be recovered if the program gets cut before completion,

(5) no limitation on the number or size of the input matrices due to the use of
dynamic storage allocation,

(6) access to the program in both batch and time-sharing modes.


90

Since most features of SINDSCAL have already been described in the discussion of
INDSCAL above, we highlight only those features in which it is most distinct from that
earlier program/procedure.

Input Data Type

The input to SINDSCAL consists of many different matrices, corresponding to


different individuals or other "data sources", but all pertaining to the same stimuli or
other "objects". Thus SINDSCAL deals only with two-mode but three-way data! Since
each matrix is assumed, in SINDSCAL, to be symmetric, only a half matrix need be
provided (and stored) in each case. These matrices may be:

(1) similarities, dissimilarities, Euclidean distances or correlations, represented as


lower-half matrices without diagonals;

(2) covariances or scalar product matrices in the form of lower-half matrices with
diagonals;

(3) full symmetric matrices of similarities or dissimilarities. The program ignores the
values on the diagonal. In this case, although the upper half of each matrix is
(redundantly) provided as input, only a half matrix is stored, thus allowing the
greater efficiency in memory storage and computation which is the principal
hallmark of SINDSCAL.

Maximum Number of Iterations

SINDSCAL uses the same basic iterative procedure as is used in INDSCAL to


estimate parameters. The program ends when convergence is achieved, or the maximum
number of iterations has been reached; the reason for ending the analysis is printed on
the standard output. The convergence criterion is based on the difference between the fit
on the current iteration and the previous iteration. When this difference is less than a
certain value, the process is considered to have converged. An important advantage of
the INDSCAL model, as already discussed, is that the orientation of coordinate axes is
uniquely determined. However, the solution must have reached a global minimum for
the axes to be in the correct orientation. Since it is relatively inexpensive to run
SINDSCAL (as compared to running the INDSCAL program using the "INDIFF"
options), it is recommended that, if possible, a very large number, such as 200, be used
91

for this option in order to prevent the program from stopping before convergence has
been reached.

Plot Options

The program generates plots of all possible planes (defined by pairs of SINDSCAL
coordinates) of the final group stimulus space and weights space. The points may be
numbered or the user may supply either the stimulus or subject labels or both sets of
labels. It is also possible to suppress all plotting.

Relaxation Factor

A "relaxation" factor was introduced in the parameter estimation procedure


(subroutine CANDE). This technique was originally described by Harshman (1970).
Its effect is to move the parameters being estimated in a direction beyond the value
which is optimum for the current iteration and, hopefully, towards the final overall
optimum value. In practice, the number of iterations generally is reduced by at least
one-half, and the final solutions are identical to solutions obtained without the relaxation
factor.

For a description of preprocessing and normalization options available for certain


data types, output options, and other details of SINDSCAL, see Pruzansky (1975), or
Arabie, Carroll and DeSarbo (in press).

II.B IDIOSCAL

IDIOSCAL (Individual Differences In Orientation SCALing), is a generalization of


INDSCAL allowing IDIOsyncratic reference systems as well as an analytic
approximation to INDSCAL. Equations originally formulated in the 1970 Carroll-
Chang Psychometrika paper describing INDSCAL and CANDECOMP (the method of
canonical decomposition of N-way tables on which INDSCAL is based) have been
implemented in a computer program called IDIOSCAL. This amounts to a
generalization of INDSCAL in which each individual (or "data source") is allowed an
idiosyncratic orthogonal rotation of the coordinate system prior to differential weighting
of this (rotated) reference system. A classical example of (conceptual) rotation of such
a coordinate system is provided by the debate in the early part of this century among
educational psychologists about the factors or dimensions underlying intelligence. (To
92

simplify matters, let us suppose for now that all agreed that there were exactly two
dimensions of intelligence.) One school proposed a first (primary) dimension (often
called "G") corresponding to "General Intelligence", with a second (and secondary)
dimension contrasting verbal with quantitative ability. A second school countered that -
quite to the contrary {they felt} - there were two independent, sovereign and equally
theoretically valid dimensions - one a dimension of verbal and a second of quantitative
intelligence! From the perspective of our modern sophisticated multivariate point of
view, replete with manifold degrees of rotational freedom, we see quite clearly that these
two schools were arguing, quite vociferously as it happens, about nothing more than
different rotations of coordinate systems describing the same space of intellectual
"objects" (e.g., specific "abilities" measured by equally specific tests; or, in a dual
manner, specific individuals exemplifying different degrees of these abilities, as measured
by their respective "factor scores"). To derive the IDIOSCAL model as a description of
the perceptual structure of intelligence for these different educational psychologists, we
need only add the assumption that, within each of these "schools" different adherents
attached different saliences, or "perceptual importances", to the two dimensions
characterizing the particular "school" to which that particular scholar subscribed. In
practice, the IDIOSCAL model means that each individual is allowed a generalized
Euclidean metric defined by a positive definite quadratic form. Another (seemingly
different, but mathematically equivalent) interpretation of this quadratic form is possible
in terms of different "subjective intercorrelations" of the same set of coordinate axes.
This latter interpretation is favored by Tucker, Harshman and others.

The model includes as special cases Tucker's (1972) Three-Mode Scaling, based on
three-mode factor analysis, the PARAFAC-2 model and method of R. Harshman
(1972), and a generalization of INDSCAL proposed by Sands and Young (1980). The
method of solution is closely related to one proposed by P. SchOnemann (1972), based on
earlier work of Meredith's (1964).

Inspired by Schonemann's (1972) "Analytic Solution" for the INDSCAL model


(which provides an exact solution in the errorless case, but has uncertain properties with
errorful data), we have incorporated a second phase that allows an analytic
approximation to INDSCAL based on a modification of Schonemann's procedure.
Basically, it differs in that, rather than choosing some arbitrary particular subject to
93

define a rotation of axes, we define a kind of composite (different from the arithmetic
average) of the actual subjects, which is used to determine a more nearly optimal
orientation. This seems to work well in cases of both real and artificial (errorfuO data.

A third "phase" of the IDIOSCAL program assumes no individual differences


whatever, forcing all individuals to have the same axis orientation and weights (except
for a possible overall scale factor). This is tantamount to a scaling of the averaged data
(but averaged in the more appropriate way outlined by Horan, 1969).

Thus the three phases of IDIOSCAL are very closely analogous to the first three
phases of PREFMAP (for PREFerence MApping of stimulus spaces) which will be
discussed at a later point in this paper. To carry this analogy further, approximate F
tests have been incorporated, as in PREFMAP, which may be useful for distinguishing
between models, and may even help in judging dimensionality.

We now describe, in fairly cursory mathematical notation, the hierarchy of models


(and related "phases" of analysis) involved in IDIOSCAL. To those familiar with
PREFMAP (See I1LB for a description) this hierarchy can be seen to be closely
analogous to the hierarchy of decreasingly general "unfolding" models in that approach
to external unfolding analysis.

In each case we begin description of the relevant model, assuming we have already
(via assumption and/or appropriate preprocessing) obtained data values we believe to be

r,
approximate squared Euclidean distances between stimuli (objects), j and k, for each
subject (data source) i, which we shall call [15 W and state the model assumption for

these values.

r
"Phase I" of IDIOSCAL - The general model.

[Q~k)2;; [dW = (y}j) -yM>)(y}i> -yM»' , (I1.B.I)

where

YJ(j) -x-T-
J I' (II.B.2)

so that

(I1.B.3)
94

where

(lI.B.4)

Defining (~},)2 =! i; [~Wr,wehave

[~},r ~ [d},)2 - ! i~ [dW)2 = (Xj-Xk)R.(Xj-Xk)', (lI.B.S)

with
1 m
R.=-~Ri· (II.B.6)
m i-I

Without loss of generality, we may assume

R. - I, m.B.7)

so that

(II.B.S)

That is [~},) 2 is approximately an ordinary squared Euclidean distance defined in

terms of coordinates x. This fact allows us to obtain an approximate solution for


X == (Xjt), the matrix of x coordinates, using the "classical" metric MDS approach (see

r [dW r
II.A for details). Writing Eq. (1I.B.3) in summational notation, we have

[~W ~ = ~~ (Xjt -Xkt)r~p (Xjt' -Xkt') m.B.9)

r(r +I) 12 .)
- ~ r*~~t') tl(jkHtt') ,
(tt')

where

(lI.B.10)

and

(lI.B.l1)

(while att' is the "Kronecker delta;" att' ... 1 if t = t', 0 otherwise). Let
95

(II.B.I2)

So r* i is a row vector of [r i I) components and

(II.B.I3)

an [n)2 x [r+1)
2 .
matnx, wh'l
1e

(0)
d i[2] .... [[dCl.,
2
dB 2 , [(0)
[(i)) d23 2 , ... , den-On 2] ,.
[(j») (II.B.I4)

So, dF] is a column vector of (~) components. Eq. (II.B.9) can be written in matrix

form as:

(II.B.IS)

where dF] and Ll are known, and r*i is to be solved for. The least squares solution is

m.B.I6)

Having solved for r*i' the entries can be "unpacked" in the appropriate way into Ri (a
square symmetric matrix). Ri can then be factored into TiT;. One way is to factor

m.B.I7)

with Vi orthogonal and fll diagonal, and define


A

Ti ... Vifli' m.B.IS)

Vi can be interpreted as an orthogonal rotation to a new coordinate system, and fli as


weights applied to that new coordinate system.

[Tucker (I 972), Harshman (I972) and others prefer to factor Ri into


R.... W J12 C.WJI2
I I I I m.B.I9)

with wl 12 diagonal and Ci defined to have unit diagonals. Ci is then interpreted as a


matrix of cosines of angles between dimensions, and wl 12 as weights applied to these
oblique dimensions. Harshman's PARAFAC-2 assumes C is constant over subjects, but
Wi differs, thereby guaranteeing a unique orientation of coordinate axes. Tucker's
Three-Mode Scaling, however, has no such uniqueness of orientation property.]
96

Phase II of IDIOSCAL: Modification of Schonemann's "Analytic Solution" to


Determine Analytic Approximation to INDSCAL. If the INDSCAL model holds
(exactly) then

Ri = TWiT', (II.B.20)

for some T (in general, nonorthogonal), and with Wi diagonal. The essence of
Schonemann's (1972) analytic solution seems to be that, if Eq. (II.B.20) holds for any
two i (say i = 1 and 2) with Wi nondegenerate (that is, all diagonals nonzero) for both,
we can solve exactly for T (that is guaranteed at least to "fit" those two). This is
because two square symmetric matrices are always simultaneously diagonalizable by a
matrix T, which is not, however, orthogonal (in general). Since clearly T is only defined
up to post-multiplication by a diagonal, we may, without loss of generality, assume T to
be so defined that
(II.B.21)

Thus

(II.B.22)

T can be decomposed as

T = VPV', (II.B.23)

with V, V orthogonal and p diagonal, so that


Rl = Up 2 U'. (II.B.24)

Thus, factoring R 1 yields V and p2 (and thus P). We may then define
R*2 = p- 1 V'R 2Vp-l = p- 1 V'TW 2T'Vp-l (II.B.2S)

= p- 1 V'VpVW 2V'PV'Vp-l = VW2V'

(since V'V, and thus p-l U' Vp, = I). Thus, factoring R*2 yields V (and, incidentally,
W2 , although that is of no real interest). Having thus obtained V, p, and V, they may
be put together, according to Eq. (II.B.23), to define T (which may be further post-
multiplied by a diagonal matrix, if desired, for normalization purposes). SchOnemann
chooses the two subjects, in effect, to be the "average subject" whose R matrix is the
average of those for the real subjects, i.e.,
97

1 m
R. =-~R·I'
m.B.26)
~
m i

plus one of the "real" subjects (apparently arbitrarily chosen). Using the average
subject is sensible, from a statistical point of view, and is also correct mathematically
since it is easy to show that, if Eq. OI.B.20) holds for all i, then

R. = TW.T', m.B.27)

showing that Eq. m.B.20) also holds for this average subject, with W. replacing Wi'
The weakness of Schonemann's (I972) solution from a statistical point of view is the
choice of the second subject as some arbitrary real subject. Bad choice of this second
subject could result in a very bad solution. Our modification of Schonemann's procedure
rests essentially on a more representative choice of the two subjects (or pseudosubjects,
since both are composites of the "real" ones).

We have pursued two approaches to this. The first is to use a kind of crude
clustering procedure to group the subjects into two groups of about equal size so that the
profiles in the two groups are maximally different. "Average subjects" are then defined
for each of the two groups, and those two group averages are used as the basis for
finding the appropriate T.

In the second approach the first subject is, as in Schonemann's approach, the
"average subject," as defined in (n.B.26). Using the U and {J found for that subject, we
define the matrices R*i as in (I1.B.2S). Since R*i = VWiV', (with V orthogonal and Wi
diagonal), it follows that

(R*)2 == R*iR*; = VWrV' , m.B.28)

so that

m.B.29)

where

(n.B.30)

(the ki's being weights for the different matrices).


98

Q, then, defines the second "pseudosubject". Note that Q is of the same general
form as R*j, so that factoring it should yield V exactly (in the exact case) or
approximately (in the more usual case of noisy data). Q, however, provides a composite
of all the subjects, but a different one than provided by R.

We have tried two different ways of defining Q, differing in definition of the weights.
One is essentially the unweighted case, in which k j = 11m for all i. In the other case, k j
was defined as:

rr
kj = - - - -
tr (R*)2 '

where rj is the correlation between d 2 's and predicted d 2 's in "Phase I" (in which the
general mIOSCAL model of Eq. (II.B.3) is fit).

Finding the weights for the INDSCAL model. Once the T yielding the correct
orientation of axes is found (or a hopefully reasonable approximation thereto, as
described above), we may find the INDSCAL weights as outlined below. (The x's below
have presumably been defined by use of this T and so correspond to the "correct"
dimensions). Recalling the INDSCAL model:

01.B.32)

in matrix form, this can be written as:


~[21 ;;; d[21 = ~ *W~ (II.B.33)
1 I I'

dFI
r r,
where is defined as before (in Eq. II.B.14), ~Fl is the analogous column vector with
[oW replacing [dW while W j is the row vector (of r components) with general

entry Wit, and ~* is the [~) by r matrix

with

(1I.B.35)

Much as before, the least squares solution for W j is:


W. =
I
~(2]'~*(A*A*')-1
I , (II.B.36)

which immediately yields estimates of the weights.


99

Estimation with and without constant term. The estimation schemes above have
involved no additive constant terms for the d 2's. It is conceivable, however, that better
fits could be obtained by adding such constant terms. This means that Eq. (II.B.3) is
modified to become:

(II.B.37)

while Eq. (II.B.32) becomes:

(II.B.38)

It is straightforward to alter the regression schemes for estimating the R;'s or the
Wit'S, as the case may be, to incorporate such a constant. This is done by simply adding
an extra independent pseudovariable (whose values are all 1) to the regression scheme.
This will, of course, change the estimates of the R's and w's to some degree. Inclusion of
this constant has advantages for interpretation of the F ratios to be described later. As
will be seen subsequently, it also seems to improve the fit in Phase II (corresponding to
the INDSCAL approximation).

Phase III of IDIOSCAL. Phase III corresponds to a model assuming essentially no


individual differences, so that all subjects are assumed to be equivalent to the average
subject. An overall scale factor is allowed for each subject, however, and (possibly) an
additive constant.

Thus the models are either

[oW)2;;; aj(Xj-xk)(Xj-Xk)', (II.B.39)

or

(II.B.40)

where the X matrix is the one derived for the average subject.

Approximate F tests for comparing the three phases. Since the model in the three
phases are fit by using least-squares linear regressions (with appropriately defined
pseudo-variables), it is possible to define approximate F tests for comparing models in
the three phases (as well as for assessing goodness of fit in each independently). This is
very closely analogous to similar approximate F tests in the PREFMAP, PREFMAP-2
100

and PREFMAP-3 procedures, for those familiar with this (Sec. III. B.) . This is most
appropriate when the constant term has been included, since otherwise the residual mean
square is not an unbiased estimate of error variance. The approximate Fs and their
degrees of freedom are defined below. (See Table 2).

The "Fs" must, of course, be taken with a large grain of salt since, first of all the
required normality assumptions cannot be taken seriously, and, secondly, the
configurations (which define the "independent" pseudovariables) have been fitted to the
data. Since, however, these Fs are computed for each subject separately, and since each
subject plays only a small part in determining the configuration in each case, this second
objection can presumably be ignored as the number of subjects grows "large". Possibly
some adjustment of degrees of freedom would correct for it when the number of subjects
is small. Presumably a "jackknife" procedure could be used, even for small numbers of
subjects, but this would be expensive computationally. Analogous approximate F ratios
(called Pseudo-Fs) could be calculated to test "significance" of added dimensions in
INDSCAL or IDIOSCAL. This could conceivably lead to a way of objectively assessing
dimensionality in individual differences scaling. (A somewhat related approach based on
a "leave one out" procedure has recently been investigated by Weinberg and Carroll
1986.)

II.C Application of Three-Way MDS to Some Ecological Data on Seaworms

We now consider applications of Three-Way MDS methods to ecological data,


illustrating this with a specific application to the data in the article by Fresi et al.
(1983). (Some multivariate analyses are reported by Fresi et al. in that paper.) The
Fresi et al. data involve frequencies of observation of 88 varieties of sea worms in samples
taken from five sites in the harbor of Ischia in the Bay of Naples, over four time periods
(February 1975, July 1975, February 1976 and July 1976). There are many ways in
which these three-way (or even "higher way") methods could be applied to these data.
The data array, which is presented in the Fresi et al. paper as a rectangular data table,
is more appropriately formulated as a general three-way array
(sea worms x sites x time periods), or even possibly as a four-way array
(sea worms x sites x months x years). A direct analysis, for example by the general
three-way CANDECOMP procedure (see equation I1.A.16 and related discussion), or
the closely related PARAFAC procedure proposed by Harshman (1970) would be
101

Table 2. Pseudo-F's for assessing and comparing models fitted in the IDIOSCAL
procedure.

Effect Pseudo-F

Phase I dh ]2
[ dfl ( 2)
r[/I-r[ r(r+012 (~)-r(r+012-1

Phase II [d
h
df 1 ]rft(1-rf) r (~)-r-l

Phase III [d
h
dfl ]2
rIll/( I-rIll
2) 1 (~)-2

Phase I-Phase II [ ~: ] (rl-rr) /(I-rl) r(r-012 (~)-r(r+0/2-1

Phase II-Phase III [~:] (rr-rlll)/(1-rf) r-l (~)-r-l

Phase I-Phase III [~: ] (rl-rlll)/(l-rl> r(r+O/2-1 (~)-r(r+0/2-1

A2
NOTE: r[, rll and rIll represent correlations (between d 2 and d ) calculated by
IDIOSCAL for a particular individual in phases I, II and III, respectively.

possible. Since CANDECOMP generalizes straightforwardly to the 4 or higher way


case, an analysis of the 4-way table mentioned could be subjected to 4-way
CANDECOMP analysis (see Carroll and Pruzansky 1984). A number of questions
arise as to how best to normalize the data in this case, however; furthermore it is not at
all clear that the rather strong and specific modeI(s) assumed in this
CANDECOMP/PARAFAC type of analysis is (are) appropriate to these data. On the
other hand, the general IDIOSCAL model/method seemed too general. For these and
other reasons it was decided to pursue an exploratory INDSCAL analysis, using the
Pruzansky (1975) SINDSCAL program, in a way to be described below. (This was
based to some degree on the general approach used by Wish in his analysis of a large
battery of data on perception and subjective ratings of nations, reported in Wish and
Carroll 1974,) The Fresi et al. data were frequencies, as indicated earlier. The
marginal frequencies of the 88 species of sea worms over the 20 sites x time periods
102

ranged from 1 (for over a dozen of the species) to well over a thousand. Because of this,
our first impulse was to normalize these data by converting to relative frequencies so that
the normalized data for all species would sum to one. While this seemed to us like a
wise first step in normalizing these data, it did not lead to readily interpretable results in
any of the further analyses we attempted. We therefore abandoned this normalization of
the data, and attempted instead an alternate transformation of these data, suggested by
Pierre Legendre, which should have the effect of somewhat more nearly equalizing the
total weight of resulting data values for the various species, as well as reducing the
skewness of these distributions. This transformation was of the form:

Z jlmp = log </jlmp + I) (II.C.I)

where Jjlmp is the frequency of seaworm species j at site I for month m in year p, and
Zjlmp is the corresponding transformed value. Data transformations are discussed in
some detail in section 2 of Gower's paper in this volume. After this initial
transformation, we then further normalized the data to have zero mean and unit
variance within each site x month x year, so that the final "normalized" data were of the
form

Zjlmp - zimp
Yjlmp == (II.C.2)
Simp

where Zimp is the mean and Simp is the standard deviation of the z's over all 88 species
for site I on month m in year p.

Given this normalized (four-way) data array Y == (Yjlmp), we then proceeded to


compute a number of derived dissimilarity matrices as follows.
103

Number of matrices

(Overall) Dt: d)1> == J~ ~ ~ (djlmp -dk1mp )2


Imp

5 (Site I') Dsl': d~P = .J~ ~ (djl'mp - dkl'mp)


m y
([' = 1, 2, ... , 5)

2
(m' = F, J)

2 (Year p')

(p' = 75,76)

.Jf ( )
4
(t' = F75,J75,F76,J76) (Time t') d,(tk') == Yjlm,'p,' - Yklm,'p,'
2

(where mt' and Pt' are the month and year, respectively, associated with time period t').

In the above, the sites are encoded as simply SI through S5, the months as F (for
February) and J (for July) the years as 75 (for 1975) and 76 (for 1976) and the 4
"times" as corresponding combinations of the month and year codes. Obviously, these
various matrices are far from independent of one another. (In fact, just to take one
example, the square of the overall dissimilarity for j and k is just the sum of the squares
of the five site dissimilarities (or of the four "time" dissimilarities). However, as a "first
start" on an exploratory data analysis for these data, we used the resulting 14 matrices
[1 overall + 5 sites + 2 months + 2 years + 4 times (months x years)] as input to an
INDSCAL analysis, using the SINDSCAL program. In this case the sea worm species
comprised the "stimuli," and the 14 derived dissimilarity measures defined the data
sources. Since each of these dissimilarity matrices was, in fact, defined as a Euclidean
distance matrix, we used the option in SINDSCAL specifying that the data were
Euclidean distances. Thus the total input to SINDSCAL comprised 14 matrices, each a
symmetric half matrix of Euclidean distances among the 88 worms (so each of the 14
matrices had 88·87/2 = 3828 entries, for a total of 14· 3828 = 53592 data values).
Needless to say, this was a rather large data array, at least for so computationally
intensive a procedure as SINDSCAL! Analyses were done in 1 through 6 dimensions.
The fit measures (VAF in the derived scalar products) are given in Table 3. Based on
104

Table 3. Fit measures (Variance accounted Table 4. Variance accounted for in dimensions from
for in derived scalar products) for SINDS· KYST2-A and MDPREF analyses when mapped into four
CAL solutions in 6 down to 1 dimension (s) dimensional SINDSCAL solution.
~or 14 d·ISSlmilarIty
. . . matrIces derived f rom
Fresi et al. data. Solution/Dimension R Z (VAF) by four
Code SINDSCAL dimensions
k4-1 .988
Dimensionality Total Variance k4-2 .990
k4-3 .809
1 .594 k4-4 .603
2 .778 k3-1 .985
3 .833 k3-2 .985
4 .872 k3-3 .718
5 .900 k2-1 .976
6 .922 k2-2 .972
kl-l .840
mdl 1.000
md2 .999
md3 .941
md4 .880

the pattern of these fit measures, and on inspection of the results, it was decided to
report the 4-dimensional INDSCAL solution (although it was somewhat debatable
whether the 4 or the 5-dimensional solution should be chosen).

Interpreting these results, since the present author knowns little of the biology of the
88 species of seaworms, we focused on the pattern of the 14 different data sources in the
"subject (or source) space." The weights for the 14 data sources on the four dimensions
are displayed graphically in Figures 4 and 5. (As can be seen in the Figures, some of
the source weights are slightly negative, a condition which should not occur in
INDSCAL, since all subject or source weights should be zero or positive. These are only
very slightly negative, however, and so can probably be plausibly interpreted as
essentially zero weights, which have become slightly negative due to error in the data.
We will henceforth interpret these, in fact, as though they are zero.)

Figure 4 shows the dimension one vs. two plane of the source space. Dimension one
has very high weight for sites 4 and 5, and for the 1975 time periods. Since only sites 4
and 5 have very large weights on this dimension and the 1975 time periods have large
weights on it, we may label dimension one (sites 4 and 5; 1975). The corresponding
105

dim 2

J 6

'76 53

J
F76

all

54

F 55

52 J75
----'----i*-----'-----'--------'----....L.------ldim 1
'75

F75
Figure 4. Dimension one-two plane of source space for SINDSCAL
solution for Fresi et al. data.

variable seems to be one that was particularly prevalent in sites 4 and 5, somewhat in
sites 3, and not at all in sites 1 and 2, and quite salient in 1975, but very weakly present
in 1976. Dimension two, on the other hand, seems to be very strongly weighted in site 3,
very slightly in sites 4 and 5, but not at all in sites 1 or 2.

Whatever dimension two taps was especially prevalent in 1976. Thus dimension two
will be labeled (site 3; 1976). One interesting point in these results is that sites 4 and 5
seem to occupy essentially the same location in all four dimensions. Thus these two sites
were, insofar as these analyses are concerned, virtually indistinguishable.

We now look at the plane defined by the remaining 2 dimensions, dimensions 3 and
4, in Figure 5. What "jumps out" at us in this plane is that site 2 has high weight on
dimension 3 and virtually zero weight on dimension 4, while site 1 reverses this pattern,
having almost identically zero weight on dimension three but a very large weight on
dimension four. Sites 3, 4 and 5 have essentially zero weights on both these dimensions,
while the matrices relating to the various time periods (as well as the "all" matrix
corresponding to overall dissimilarities over all sites x time periods) generally have
moderate weights on both. Thus dimension three seems to correspond to whatever
106

dim4
s

F';ijl
J76 J
J75 all
F
'75
s2

s4 F75
--L---s:;r----'------'-------1-----'-----'dim 3

s3
Figure 5. Dimension three-four plane of source space for
SINDSCAL solution for Fresi et al. data.

distinguishes site 2 from the others, while dimension four corresponds to the variable
most prevalent in site 1.

While they must be taken with a fairly large "grain of salt," distances among the
source points have a certain interpretation in these INDSCAL subject (source) space
plots. Without actually doing the computation we can see from inspection of these two
planes that sites 4 and 5 are exceedingly close, and in turn are relatively closer to site 3
than to either sites 1 or 2. Conversely, sites 1 and 2 are by far closer to one another
than to any of the other three sites.

The variables corresponding to these dimensions are defined in a sense by the


coordinates of the points corresponding to the species of seaworms on these dimensions.
These can be seen graphically by inspection of Figures 6 and 7, which show these 88
points plotted also in the plane of dimension one versus two and that of three versus
four. It should be kept in mind that, despite the "z ... log (f + 0" transformation that
was used, the relative size of the distribution of variables for these species were quite
different, and so frequency effects are clearly affecting these results. Thus, the seaworm
species "farthest out" on these dimensions tend to be those whose overall frequencies
were greatest. Table 2 gives the actual species of seaworm corresponding to the 88
107

dim2

k1-1 24

'mdl
• d4

Rg;;:;;-----------dim 1

Figure 6. Dimension one-two plane of stimulus (species) space for


SINDSCAL solution for Fresi et al. data. Vectors are from
mapping of KYST and MDPREF dimensions into SINDSCAL
space, using PREFMAP.

points shown in these figures, for the benefit of those knowledgeable about these species.
(It should be commented that we have reflected some of these dimensions so that the
positive values always tend to imply greater frequency.)

In Figures 6 and 7 vectors are shown indicating the direction best corresponding to
the dimensions derived from the one through four dimensional KYST2-A solutions
shown in Figures 2 and 3. Since there were a total of ten such dimensions (4 + 3 + 2 + 1
for the four through one dimensional solutions, respectively) there are a total of ten
vectors indicated corresponding to these. These are encoded "kr-t" where "kr" stands
for the KYST r-dimensional solution, while t indicates the tth dimension in that solution.
Since, in the case of KYST, the solutions for different dimensionalities do not have any
necessary correspondence, these ten dimensions are all distinct, although it will be noted
that the t th dimension in solutions for different dimensionalities do tend to correspond
fairly closely, though certainly not perfectly. In addition to these ten dimensions from
the various KYST solutions, four other vectors, labelled mdl through md4 are also
shown. These are four dimensions from another MDS analysis, called MDPREF, which
will be described in section III.
108

29
13
35

42 24
71

~~~~-------dim3

Figure 7. Dimension three-four plane of species space for


SINDSCAL solution for Fresi et al. data. Vectors are as in Figure
6, except projected into three-four plane of SINDSCAL space.

Table 4 gives figures which indicate how well these fourteen dimensions (ten from the
one through four dimensional KYST solution plus four from the four dimensional
MDPREF analysis) from the other analyses "fit" into the four dimensional SINDSCAL
space. The values in Table 4 are squared multiple correlations (R2's), which can be
interpreted as proportions of variance accounted for in these fourteen dimensions from
KYST and MDPREF via the four SINDSCAL dimensions. (Since the KYST analyses
had to be done on only a subset of 55 of the seaworm species, these R2's were
necessarily based only on this subset of 55 of the total 88 species, however.) In fact, the
procedure used for determining these best fitting directions (or vectors) was the
PREFMAP-3 procedure described also in section III. The particular analysis done in
these cases involved fitting the vector model, with linear regression options. In the case
of this particular set of options, PREFMAP is equivalent to the use of multiple linear
regression. Thus we may view these vector directions as being defined by the regression
coefficients from a multiple linear regression. In fact, the projections of these vectors
onto the SINDSCAL coordinate axes are, in the present case, proportional to the Beta
coefficients for these regressions.
109

The point to be drawn from these PREFMAP/multiple regression analyses is that the
four SINDSCAL dimensions capture quite well essentially all the dimensions emerging
from the other MDS analyses. The fact that the vector directions best representing
these other dimensions do not coincide directly with the coordinate axes indicates,
however, that the SINDSCAL dimensions do not correspond in a simple one-one fashion
with these KYST and MDPREF dimensions. Rather, each of these alternative
dimensions corresponds to a different linear combination of the SINDSCAL dimensions.
Since the SINDSCAL dimensions have a unique orientation, while those in the other
solutions are defined only up to arbitrary rotation (or linear transformation), we feel it
appropriate to treat the SINDSCAL solution as defining the "reference space" in terms
of which the others are defined. As already seen, the SINDSCAL dimensions do have a
particularly simple association with the various derived dissimilarity matrices -
particularly with those defined for the five different sites. This suggests that these
dimensions may correspond to variables characterizing the species having especially
meaningful relations to the geographic variables distinguishing sites (as well as,
secondarily, to variables related to the four different time periods).

To correct for the frequency effect we present another pair of plots in which the
following transformations have been effected.

(0 The origin of the space was first transformed so all the coordinate values were
non-negative (by subtracting the smallest algebraic coordinate value on each
dimension from all the coordinates).

(2) After this translation to a "more or less" rational origin (such that essentially all
the very low frequency species are at or very close to that origin) we now multiply
all coordinates of each species point by the reciprocal of its marginal value in the
"z == log if + 0" scale. This tends to convert all values to something
approximating a relative frequency scale. The coordinate value on each
dimension after this transformation can be interpreted as the relative value of the
species on that dimension (relative to its overall frequency in the samples taken
from all 20 sites x time periods comprising these data). While these plots, shown
in Figures 8 and 9, are no more interpretable to us than were the earlier figures
(6 and 7) we hope they may help ecologists or other biologists in interpreting
these dimensions.
110

For general discussions of two- and three-way MDS and related models and methods
for proximity data, we refer the reader to Carroll and Wish (1974a,b), Wish and Carroll
(1974), Kruskal and Wish (1978), Carroll and Arabie (1980), Shepard (1980), Carroll
and Pruzansky (I980, 1986) and Arabie, Carroll and DeSarbo (in press).
dim2

82

5CB
88 21ft!
l1ll811

9
30
J2

85 ~6 2
12 3i1 1
70J'6
6~~3Jlp79
44f,7 5~ 55
-------F"'-"---fl~cf1!C>!O·---·------·------dim 1

Figure 8. Transformed dimension one-two coordinates of


SINDSCAL species space for Fresi et at. data.

51/1 2!

i;6
70 1~ jlIlI
68 3~1 8 2 30 sa 6Q
4744 1
3'i- 484e15 52 15 ::fu
33 76
121~ !%
112§5 41
~dJ~ 5° 8 8
-"f'1 ~? 80
;S:-~----"l'-----------·---dim 3

Figure 9. Transformed dimension three-four coordinates of


SINDSCAL species space for Fresi et at. data.
111

III. MDS (AND OTHER MULTIDIMENSIONAL ANALYSES) OF PREFERENCE


(OR OTHER DOMINANCE) DATA
lILA MDPREF

MDPREF (Carroll and Chang 1964b), standing for MultiDimensional PREFerences


scaling, is a model and method implemented in a computer program by Chang and
Carroll (I968) to perform an "internal" analysis of m subjects' preferences (or any type
of dominance) data. The program utilizes a vector preference model and develops
simultaneously the vector directions for the subjects (or the site x time "variables" in our
"sea worms" example) and the configuration of stimuli (or objects "sea worms" in the
present case) in a common space. A theoretical discussion of this and other methods of
internal analysis of preference (or proximity) data is provided by Carroll (I 972, 1980)
or Heiser (I 98 1). By an "internal" analysis, we mean that both stimulus (object) points
and subject (variable) parameters (vectors in this case) are determined entirely from the
preference (dominance) data.

Theoretical Discussion

Since MDPREF was originally developed for individual differences preference


analysis, we will often refer to it as though it is an analysis of preference judgments by
different subjects. However, it can be applied to any kind of "dominance" data, in
which each of m "variables" measure the relative dominance of each of n "objects" in
some respect. In MDPREF the dominance judgments can be either paired comparisons,
rankings, or ratings of stimuli or other objects. The following discussion is for the latter
case; the development is quite similar for the former two, except that in the case of
ran kings the ranks are substituted for preference (dominance) scale values, while if the
data are paired comparisons, preference (dominance) scale values are derived by
methods described by Chang and Carroll (I 968). See also Carroll (I972, 1980).

The model assumes that stimulus (object) points are projected onto subject (variable)
vectors, with preference (degree of dominance) being determined by the relative size of
these projected values (the larger value being preferred). Let Xj = (Xjl , ... , Xjr)

represent an r-dimensional stimulus point for the j-th stimulus and Yi = (Yi I , ... , Yir)
represent the vector for subject i in the same r-dimensional space. (For simplicity, we
now speak simply of preference of subjects for stimuli; the reader can make the
necessary substitution of terminology if desired.) Then Sij' the estimated preference
112

scale value of stimulus j for subject i, is defined by the scalar product:

UII.A.I)

(the expression on the right being the scalar product in matrix notation). This can be
written, more generally, in matrix notation as follows:
Let X == (Xjt) be the n x r matrix of stimulus coordinate values and Y == (Yit) be the
m x r matrix of the termini of subject vectors, then

S - ("Sij )
" = = Y X, . (III.A.2)

The problem is to determine the matrices Y and X' from the set of paired comparison
"
judgments such that S accounts for the paired comparisons data as well as possible in
some statistically well-defined sense (realized by minimizing an "objective function"
embodying the statistical criterion to be optimized). Carroll and Chang (I 964b)
describe procedures - one iterative and one utilizing an Eckart-Young (I936)
decomposition - that accomplish this task. [In more modern terminology, the "Eckart-
Young decomposition" is frequently called, or closely related to, the "singular value
decomposition" (svd).1 It is the latter that is implemented by MDPREF, and that is
described below.

If the input data are already scale values of preference (this matrix S is called the
"first score matrix") the program proceeds to decompose S by the Eckart-Young
procedure, which involves computing eigenvalues and eigenvectors of the matrix S'S or
SS' {whichever is smaller}. If the input data are paired comparisons, they are first
converted to a "first score matrix" of scale values by summing over rows and/or columns
of each paired comparisons matrix. Monte Carlo analyses by Carroll and Chang have
indicated that the simpler, Eckart-Young, procedure works as well with errorful data as
the iterative one. This is the reason MDPREF utilizes only the Eckart-Young
procedure. This overall procedure can be shown to have certain least squares properties.
Among other properties, in the case in which the original data are paired comparisons, it
provides a least squares fit in a certain sense to the original paired comparisons data,
schematized as a matrix of plus and minus ones (and possibly some zeros). See Carroll
(1972, 1980) for details.
113

Input Options

As noted earlier, MDPREF has two input options, namely, paired comparisons and
direct judgments of preference scale values. In the case of paired comparisons, options
exist for reading in weight matrices specific to each subject and for handling missing
data. In the case of direct preference judgments (e.g., rankings) two options exist for
normalization - either: a) subtracting row means or b) subtracting row means and then
dividing entries by the standard deviation of values for that row.

Output Details

The following are the major output categories entailed in a typical run of MDPREF:

1. First score matrix normalized according to alternative chosen from above options.

2. Cross-products matrix of subjects.

3. Cross-products matrix of stimuli.

4. Eigenvalues of the first score matrix.

5. Estimates of the first score matrix after factorization. (This is sometimes called
the "second score" matrix.)

6. Coordinates of stimuli and vector directions for subjects in the user-specified


dimensionali ty.

7. Plots of some or all pairs of dimensions, including both stimuli and subjects.
(Many different versions of MDPREF exist, with different details regarding this
and other options). See Chang and Carroll (1968) for further details on the
specific version of MDPREF available on the MDS-1 tape.

III.B PREFMAP, PREFMAP-2 and PREFMAP-3

PREFMAP (PREFerence MAPping) is a procedure that analyzes preference (or


other dominance) data in terms of a set of multidimensional preference models,
developed by Carroll and Chang (1968), which include and generalize the "vector
model" first proposed by Tucker (I960) and the basic Coombsian unfolding model of
preference (Coombs, 1964). Collectively, these are called the linear-quadratic hierarchy
114

of models. PREFMAP utilizes a known configuration of stimuli and attempts to portray


an individual's preference data via this hierarchy of models. PREFMAP is called an
external analysis, since the stimulus (object> space is (externally) given, and only the
subject (variable) parameters (e.g., ideal points or vectors) are to be determined.
Specifically, PREFMAP consists of four phases, corresponding to analysis in terms of
four models. The phases are referred to as Phases I, II, III and IV. As one goes from
Phase I to Phase IV, the underlying assumptions become stronger and model complexity
is therefore considerably reduced.

Theoretical Discussion

PREFMAP starts out with the following assumptions:

1. A group of individuals share the same perceptual configuration of r dimensions


for a set of n stimuli. Let X = (Xjl) j ... 1, 2, ... , n; t = 1, 2, ... , r represent the
common perceptual space. Generally X will be externally defined (i.e., given a priori as
input to the PREFMAP procedure).

2. Further, the preference value for the /h stimulus of any individual, say the ith,
is (at least) monotonically related to the squared "distance" between the individual's
ideal point and the location of the stimulus in space. Let the matrix S == (Sjj)

i - I , 2, ... ,m; j == 1, 2, ... ,n represent the scale values of m individuals'


preferences for the n stimuli. Each row of the S-matrix represents the scale values for
individual i's preferences for the n stimuli. (For convenience, we assume that smaller
values represent higher preferences.) In general, PREFMAP assumes F j (Sjj) ;;; db.
The models differ in definition of drj' and in that of F j •

Two versions of PREFMAP models may be distinguished - metric and nonmetric.


In the metric version the function F j is assumed to be linear, while a general monotonic
function, not specified a priori, is permitted in the nonmetric case. Thus, the preference
scale values are assumed to be defined on at least an interval scale in the metric version
while only their ordinal relationships are utilized in the nonmetric version. We discuss
the metric version of the PREFMAP algorithm first and then describe the nonmetric
case.
115

Metric Version of the PREFMAP Algorithm

In the metric version of the PREFMAP algorithm, it is assumed that the scale values
of preference are linearly related to squared distance, that is, that F; is linear.
Assuming F; has nonzero slope, we may invert it and write:
(III.B.l)

where a and b are constants (a > b) and;;; denotes approximate equality (except for
error terms not expressed).

Let Xj =
(Xjl, ••• , Xjr) represent the row vector of coordinates of the /h stimulus
(j = 1, 2, ... ,n) and y; == (Yi/, ... ,Y;r) represent the vector of coordinates of the
ideal point for the ith individual (i ... 1, 2, ... , m). Given the above relationship and
input data for Xj and S;j' the PREFMAP method solves, for each individual, for
estimates of the coordinate values of the vector y;, and, depending on the model, possibly
for additional parameters associated with individuals.

In model IV the squared distances are defined in a special way which corresponds to
the special case when the ideal point is infinitely distant from the stimuli, so that only its
direction matters. In this special case, the squared distance is actually defined by a
linear equation, and can also be viewed as equivalent to projection on a vector in the
appropriate direction; thus the name "vector model". This equivalence of the linear, or
vector, model to the unfolding model with ideal points at infinity is demonstrated in
Carroll 0972, 1980).

Four alternative models for relating preference data to a given stimulus space, called
models I, II, III and IV, are included in the hierarchy proposed by Carroll and Chang.
The four models correspond, in the obvious fashion, to the four phases of PREFMAP, in
a decreasing order of complexity. Phase I fits a highly generalized unfolding model of
preference (model I); Phase II utilizes a more restrictive model assuming weighted
Euclidean distances analogous to those assumed in the INDSCAL model discussed
earlier; Phase III is the "simple" or Coombsian unfolding model in which ordinary
(unweighted) Euclidean distances are assumed; and Phase IV is the linear, or "vector",
model. Phases I, II and III differ in the way the term db is formulated, i.e., in the
definition of the metric, while Phase IV can be viewed as putting certain restrictions on
ideal point locations, as discussed earlier.
116

All four phases utilize regression procedures (quadratic or linear) to estimate


coefficients which are then reparametrized to provide estimates of parameters associated
with the corresponding model. This is described in detail in Carroll (I 972, 1980).

Phase I

One way to describe the model assumed in Phase I is to assume that both Xj and Yi
are operated on by an orthogonal transformation matrix T j - which is idiosyncratic for
each subject - and weighted squared distances are then computed from the transformed
values. Thus, one defines:

OII.B.2)

and

OII.B.3)

and then computes the (weighted) Euclidean squared distances #j by the formula:
r
drj = ~ Wit (xjt - yit)2 . OII.B.4)
t -1

Geometrically, this corresponds to an orthogonal, or rigid, rotation of the coordinate


system, followed by differential stretching of the new (rotated) coordinate system.
Different rotations and different patterns of weights are allowed for each individual.

Phase II

Phase II differs from Phase I in that it does not assume a different orthogonal
transformation for each individual, although it allows differential weighting of
dimensions, so that squared distances are computed simply by
r
drj = ~ Wit (Xjt - Yit)2 . (m.B.5)
t -1

Phase III

Phase III is the "simple" unfolding model, but it allows the possibility that some or
all of the dimensions have negative weight, making Phase III equivalent to Phase II,
with weights Wit = ± 1 for each individual. To be precise, the weights Wit = ± at, where
each at = ± 1.
117

Phase IV

Phase IV utilizes the vector model in which preference values are related to
coordinates of the stimulus space by an equation (excluding the error term) of the form:
r
Sij == ~ bit Xjt + Ci' (III.B.6)
/-1

This equation contains only linear terms, so least squares estimates of the bit'S can be
derived immediately by multiple linear regression procedures. Having estimated the
coefficients bi! ,bi2 , ... , bir , the direction cosines of the vector for the ith individual are
A A

obtained by normalizing the vector of estimated coefficients bi = (bit) to unit length by

dividing each bit by #,. Parameters of the other models are also fit by regression
/
procedures, although these are more complex. The reader is referred to Carroll (1972,
1980) for a more detailed exposition of this.

In Phase II, much as in INDSCAL, the orientation of coordinate axes is critical.


Since the axis orientation of the a priori space may be essentially arbitrary, an
approximate solution is provided for the appropriate orientation. This will automatically
be provided in either PREFMAP or PREFMAP-2 if Phase I precedes Phase II.
Otherwise, Phase II can be entered directly, but with an initial solution for what is
called the "canonical rotation". In Phase III the problem is a little more involved still,
since a general linear transformation may be required. This can be viewed as entailing
an orthogonal transformation followed by a differential weighting of dimensions. This,
called the "canonical rotation and canonical weights", can also be solved for. In
PREFMAP-3 it is optional whether the "canonical rotation" and/or "canonical weights"
will be solved for. In some cases the orientation may be assumed to be correct as given
and only the canonical weights asked for. PREFMAP, PREFMAP-2 and PREFMAP-3
all differ in how the canonical orientation and/or canonical weights are solved for. In
fact, in PREFMAP-3 it is possible to solve for "canonical weights" without necessarily
solving for the "canonical rotation." See Chang and Carroll (1972) or Meulman, Heiser
and Carroll (I986) for details.
118

Nonmetric Version of the PREFMAP Algorithm

It may be recalled that the nonmetric version of PREFMAP fits monotonic functions
relating the preference scale values and the squared Euclidean distances between a
subject's ideal point and the stimulus points. This is accomplished by the procedure
described below.

1. Solve for the parameters of the appropriate regression equation (quadratic or


linear) to predict the Si/S. This step is essentially the metric version of PREFMAP.
The "predicted" values (from the model) will be calledslj> i = 1, 2, ... ,m;
j ... 1, 2, ... , n.

2. Estimate the monotone function M!I) for subject i that best predicts the
estimates (the sW's) from the original Si/S, using the procedure described by Kruskal
(I964b) for least squares monotone regression. Define sW == M!I) (Sij)'

3. Replace Sij with sij to compute a new set of predicted values, sW.
4. Using the new set of Si/S, compute a new monotone function MP> and a new set
A , A (2)
of Sij s, namely Sij

5. Continue this iterative procedure until the process converges (i.e., until no more
changes occur in the monotone function or regression coefficients). Specifically, the
process is terminated by reference to a parameter called CRIT. If the sum of squares of
differences in the predicted Si/S for the [th and ([ _ost iterations is less than CRIT, the
process stops at the [th iteration.

Input Parameters

In all the PREFMAP programs, the preference data can be expressed in one of two
ways: a) smaller values indicating higher preferences or b) larger values indicating
higher preferences. The programs can start with any prespecified phase and can work
their way down to any model of lower complexity. PREFMAP-3 actually allows
different models to be fit for different subjects in the same analysis.

Other options include: a) normalization of original scale values versus leaving them
as initially defined and b) computing each subject's scale values for each new phase or,
alternatively, using the estimates of the previous phase as the original values for the
119

following phase. There are also various options concerning whether or not the canonical
rotation and/or weights are computed prior to entering a particular phase.

Output Details

A typical run of PREFMAP produces some or all of the following output:

1. Listing of all input parameters selected and the original configuration of stimuli.

2. For each subject the printout of the original scale values, regression coefficients
and estimates of dtj (or Sjj' where Sjj = aj dtj + hj, or equals projection of stimulus j on
vector for subject i in the case of the "vector model") for each phase and for each
iteration in the case of the monotone (or nonmetric) version.

3. For Phase I (only) the direction cosines of each subject's idiosyncratic rotation.

4. Coordinates (or direction cosines for Phase IV) of ideal point and weights of the
dimensions specific to each subject. In Phase I, the orthogonal rotation matrix may also
be printed for each subject. Depending on options selected, the canonical rotation matrix
and/or canonical weights may also be provided as output.

5. Plot showing the relationship between the monotone transform of the scale values
and original scale values (optional).

6. Plot showing the positions for ideal points or vector directions of all subjects as
well as stimulus positions.

7. A summary table showing the correlation coefficients for each subject by each
phase and corresponding F-ratios, including F-ratios for testing the statistical
significance of the improvement in fit associated with moving from a simple to a more
complex method. Such an F is associated with every pair of models (IV versus III, II or
I; III versus II and I; and II versus I). In each case, it can be taken as assessing
whether the more complex model (with a lower Roman numeral) fits the data
significantly better than the simpler (higher numbered) model. These tests are possible
because of the hierarchical embeddedness (or nested structure) of these models; that is,
the fact that each "simpler" model is a special case of each more complex one. In terms
of the algebraic structure of the models, each more complex model includes all the
parameters of any simpler model, plus additional parameters. The situation is formally
120

equivalent to testing significance of additional terms in a stepwise regression scheme.

PREFMAP-2 has the additional feature of allowing definition of a so-called


"int,ernal" stimulus configuration directly from the preference data itself. For further
details on PREFMAP and PREFMAP-2 see Chang and Carroll (1972). PREFMAP-3
does not allow generation of such an "internal" stimulus configuration, but does have
many other options. PREFMAP-3 is much more flexible in the mix of models fit to
different subjects. In a single analysis different subjects may be fit by different models
in the hierarchy of models described here. These models are simply called, in
PREFMAP-3, G for General Unfolding), W (for Weighted Unfolding), U (for simple
Unfolding) or V (for Vector model). Greater flexibility also exists in PREFMAP-3 in
"metric" vs. "non metric" fitting for different subjects. See Meulman, Heiser and Carroll
(1986) for details on PREFMAP-3.

It would seem in principle to be very interesting to apply the entire family of models
in the PREFMAP hierarchy to the Fresi et al. data. For example, it would seem quite
appropriate to fit model II (the simple unfolding, or "ideal point" model), using each of
the site x time period variables as a pseudo-subject, seeking an ideal point in the four
dimensional space of seaworm species determined by INDSCALISINDSCAL such that
the frequency of species for that site x time period is inversely related to distance from
that ideal point. One could think of this "ideal point" as the species of sea worm most
ideally suited to that particular site/time period combination. Time constraints did not
allow for a thorough analysis of these data via the PREFMAP hierarchy of models,
however. We therefore opted for an internal analysis of the site x time period variables,
using the MDPREF vector model approach. MDPREF, as discussed earlier,
simultaneously determines a space for the "stimuli" (species in this case) and the
"subjects" (sites x time periods) in terms of a vector model. A vector model can
actually be thought of as an unfolding or "ideal point" model with the ideal points all
infinitely distant (or, in practice, very far) from the stimuli (species), so that the vector
direction simply corresponds to the direction of the ideal point from the centroid of the
stimuli (species). It is of interest both to see how well MDPREF accounts for these
data, and also how the structure of the species space relates to that determined by the
three-way INDSCALISINDSCAL analysis.
121

III.C MDPREF Analysis of the Fresi et al. Seaworm Data

We attempted an analysis of the Fresi et al. data on seaworm species using


MDPREF. As indicated earlier, dominance relationships can be attributed to variables
much more general than preference judgments (narrowly construed). More generally,
dominance data are any data indicating the tendency of objects to dominate one another
in some respect or context. Thus the relative frequency of the various sea worms at the 5
sites and the 4 time periods can be taken as dominance data for these species at these
sites x time periods. (In fact, dominance data, broadly defined, can be viewed as
encompassing essentially any variety of multivariate data')

We thus applied MDPREF to these data, treating the seaworm species as "stimuli"
and the 20 sites x months x years as "subjects." The "total and marginal" variance
accounted for (V AF) for dimensionalities from 1 through 20 are displayed in Table 5.

Table 5. Variance accounted for (VAF) and commulative


VAF for MDPREF solutions in dimensionalities 1 through 20,
for Fresi et al. data.

Dimensions Variance Cumulative Variance


Accounted For
1 0.490 0.490
2 0.201 0.691
3 0.053 0.745
4 0.048 0.794
5 0.036 0.830
6 0.028 0.858
7 0.025 0.884
8 0.020 0.904
9 0.019 0.924
10 0.014 0.938
11 0.013 0.952
12 0.010 0.962
13 0.008 0.970
14 0.008 0.978
15 0.004 0.983
16 0.004 0.988
17 0.003 0.992
18 0.003 0.996
19 0.002 0.998
20 0.001 1.000

(In MDPREF, as mentioned earlier, the unrotated r -1 dimensional principal axis


solution is simply the r dimensional one with the least important - in VAF terms -
122

dimension dropped. Because of this "nesting" property, this calculation is


straightforward). Based on the VAF figures, and on interpretability criteria, once again
it was decided to report the four dimensional solution.

Since we are focusing, in our attempt to interpret these solutions, on the structure of
the variables (sites x time periods), we present the positions of the vectors for these 20
variables separately from the species points in Figures 8 and 10. In these Figures we use
the same coding for these variables as in the Fresi et al. paper; a three symbol (number,
letter, number) code. The first number (1-5) denotes the site, the letter denotes the
month (F = February, L = July), while the third number denotes the year (5 = 1975,
6 = 1976). (We used an "L" rather than a "J" here to encode "July" to maintain
consistency with the coding used by Fresi et al.). MDPREF does not, like
INDSCAL/SINDSCAL, produce unique dimensions, so that rotation of coordinate axes
is usually necessary to attain an optimally interpretable set of dimensions. In the present
case, however, perhaps fortuitously, the orientation of axes originally obtained appears to
lead to a quite interpretable structure (without rotation) for these 20 variables. (This is
not entirely a happenstance, no doubt; the principal axis orientation in which MDPREF
dimensions emerge is certainly more likely than a purely random orientation to yield
interpretable structure.}

In the interest of grouping the dimensions in a fashion enhancing interpretability, we


did permute their order. Thus Figure 10 shows the plane defined by dimensions one and
three. Dimension one can be seen, from the fact that all variables have positive
projections on that dimension, to be a "consensus" dimension, reflecting whatever factor
is most nearly shared in common by all sites x time periods. Figure 11 shows the 88
sea worm species in the same plane. The projections of the seaworms onto the dimension
one axis would probably correspond very closely to the mean value of the twenty
variables (i.e., with the mean of the log of the frequencies + 1). This dimension could be
interpreted, then, as overall "abundance" of the species, and the loading of a variable on
that dimension simply indicates the extent to which that variable reflects this overall
"abundance." (As in factor analysis, the size of that loading can be viewed as a direct
measure of the correlation of that variable with this first dimension.} Except that sites 1
and 2 seem to have very slightly lower weights on this dimension than do sites 3, 4 and
5, however, there seems to be nothing "interpretable" about this dimension vis a vis these
123

dim 3

2F5 5F5
2~F5
4L5
1F5
gF6 21 6 5:~5 'dim 1
3F6
1F6
1L6 4L6
1L5 4F6
5L6
3L6

Figure 10. Termini of vectors projected into one-three plane for 20


site x time period variables for unrotated MDPREF solution for
Fresi et 01. data.

31

14
17 12

11
~34

10

Figure 11. One-three plane of unrotated MDPREF stimulus


(species) space for Fresi et 01. data. Four vectors show result from
mapping of dimensions from four dimensional KYST-2A solution
into unrotated MDPREF space.

site x time variables. Dimension three is more interesting, however. Note that almost
all the variables involving the year 1975 (those whose code ends with "5") weight
positively on that dimension, while those involving 1976 tend to exhibit negative weights.
124

In fact almost all the variables with a final "5" are in the upper right quadrant, and
almost all those with a final "6" in the lower right quadrant. The most glaring exception
is "IL5" (site I, in July 1975) which appears just below "IL6' in the lower right hand
quadrant. We have no definite explanation for this anomaly, although a partial
explanation may be that there is something special about site 1 as a whole on this
dimension. We note that, in general, the variables involving site 1 for a given time
period seem to have systematically lower values on this dimension than do those for the
other four sites. For example, IF5 has a much lower value than do 2F5, 3F5, 4F5 and
5F5 all of which are at the extreme positive end of dimension 3, while IF5 is almost at
the zero point. Whatever dimension three corresponds to in its effect on the 88 species
of sea worms, it is a factor that was positive (tended to increase the abundance of those
species at the positive pole of that dimension) in 1975, and negative in 1976. A more
explicitly descriptive way of stating the same thing is that those species at the positive
end tended to be relatively more abundant in 1975, those at the negative end to be
relatively more so in 1976.

dim4

3L6
2L6
3F6

2F6 3F53L5
2FllL5
4L6
dim 2
1F5 5~56
1F6 5F6
4F6 4F5
1L6
4655

1L5

Figure 12. Termini of 20 variable vectors projected into two-four


plane of unrotated MDPREF solution for Fresi et al. data.
125

We now shift to the remaining plane of this four-dimensional MDPREF solution,


shown, for the sites x times, in Figure 12. This is the plane defined by dimensions two
and four. This plane distinguishes among the five sites to a remarkable degree. (It is
dubious that a technique such as discriminant analysis, specifically geared to doing this,
could do a significantly better job of separating these five groups.) As it is, we see that
dimension two makes the most clearcut separation; that between sites 1 and 2 at the left
(negative) end and sites 3, 4 and 5 at the right (positive) end. Then dimension four
separates site 1 from 2 on the one hand, and site 3 from an amalgam of sites 4 and 5 on
the other, so that site 1, 2, 3 and (4,5) wind up following neatly in a clockwise fashion
(more-or-Iess) in the lower left, upper left, upper right and lower right quadrants,
respectively. A map of the harbor of Ischia is given in the Fresi et al. paper. One can
see from inspection of this map the reason why sites 1 and 2, located in open sea and
separated by the harbor entrance from the other three sites, might be so clearly
distinguished from those other sites, both in this representation and in the
INDSCALISINDSCAL (source space) representation. This map also suggests some
hypotheses as to why sites 4 and 5 may be so nearly indistinguishable. Site 3 is closer to
the strait providing the harbor entrance, and separating sites 3, 4 and 5 from sites 1 and
2, so that it may be more affected by water flowing through that strait, while its ecology

dim4
54

·k4-4

45

o
48 3

---------=--::-::---'-f~XI~F========~:;_dim
ltI ·k4-2
2
177 2 31
47
14 25
24 ~ k4-3 4053
34

Figure 13. Two-four plane of unrotated MDPREF species space for


Fresi et af. data. Vectors are as in Figure 11, except projected into
two-four plane of unrotated MDPREF space.
126

may also more closely resemble that of 1 and 2 than does that of sites 4 and 5, which lie
more distinctly in the harbor area. Figure 13 shows the dimension two-four plane of the
stimulus (species) space, indicating how the seaworm species array themselves on these
dimensions separating the various sites. (Again, it should be noted that overall
frequency of the species has not been normalized here.>

It might be noted, by comparing Figures 1 and 2 to Figures 11 and 13, that the
dimensions emerging from the KYST -2A analysis of the "Overall" dissimilarity matrix
are essentially the same as those (for the seaworm species) in the unrotated MDPREF
analyses. This is true despite the fact that the KYST-2A analysis omitted 33 of the 88
species, and also despite the marked difference in types of analysis. KYST -2A is a
nonmetric technique aimed at accounting for rank orders of these derived dissimilarities,
while MDPREF is a metric technique aimed at accounting for the values of the 88
species on the 20 site x time variables. This congruence of the dimensions in these two
analyses is shown directly by using PREFMAP-3, in a manner essentially identical to
that described in section II.C, to "map" the dimension from the four dimensional
KYST-2A solution into this MDPREF species space. The four vectors representing
these four KYST dimensions (k4-1, k4-2, k4-3 and k4-4), respectively correspond very
closely, as can be seen, to the corresponding dimensions (one through four, respectively)
of the MDPREF solutions. The VAF's (or squared multiple correlations) were: .989,
.991, .806 and .854 respectively. It is not unusual, however, for these two quite different
analyses to produce highly comparable results. The reasons for this probably are
twofold:

(1) The theoretically nonmetric KYST analysis is, in fact, essentially equivalent to a
metric one, since the function relating input dissimilarities (distances) to recovered
distances is almost perfectly linear and, in fact, goes very nearly through the origin,
indicating that the input distances are very nearly ratio scale estimates of the derived
distances. It should be emphasized, as spelled out in more detail below, that this might
not have happened!

(2) The KYST -2A solution is rotated to principal components orientation, while the
MDPREF solution is essentially a principal components solution.
127

The only seemingly important difference between these two solutions vis a vis the
"worm" stimuli is in the scaling of these dimensions. Even this is not of any real
significance however. It merely reflects the fact that, in MDPREF the stimulus
(sea worm species) space is arbritrarily scaled to unit variance on all dimensions (and
zero covariance - Le., a "spherical" distribution), while the differential VAF (Variance
accounted for) is absorbed in the vectors, while in KYST the differential VAF is
reflected in the scaling of the stimulus (worm) dimensions. Thus, in this case at least,
the simple metric MDPREF analysis has recovered essentially the same struture for the
sea worm species as did the more complex and sophisticated KYST-2A procedure, while
MDPREF has also extracted information about the "subjects" (sites x times) in the
form of the 20 vector locations, such that projection of stimulus points onto subject
vectors yields approximations to the original dominance data.

It should be stressed, however, that this simple relationship between these two types
of analysis will not always be exhibited. Particularly in the case of strong nonlinearities
in the data, KYST-2A can yield a lower dimensional, more parsimonious representation
of the stimuli (or other objects) than MDPREF (or other principal components/factor
analytic type models and methods).

Rotation of the MDPREF Solution to Congruence with SINDSCAL

As mentioned, MDPREF does not yield unique dimensions, but rather is subject to
rotational indeterminacies. In fact, more generally, a linear transformation of the
stimulus space can be effected, as long as the appropriate companion transformation,
given by the "inverse adjoint" transformation, is applied to the subject vectors.
However, we shall restrict ourselves in the present case to orthogonal transformations,
with possible overall dilations, or scale transformations. Since the inverse adjoint of an
orthogonal transformation is the same orthogonal transformation, this leads to a
particularly simple form (which has other advantages as well). Since the stimulus
spaces in both MDPREF and SINDSCAL are scaled to have equal variances of
projections of stimuli (species) on coordinate axes, restricting the class of
transformations to be orthogonal seems appropriate in this case.

Figure 14 shows the dimension one versus two plane of the transformed species space
superimposed on the same planes of the SINDSCAL space. In this representation the
128

~4

s----<fliIBH-\----'---I-'-------'-----'dim 1

Figure 14. One-two plane of the MDPREF species space rotated to


optimal congruence with one-two plane of SINDSCAL species space
(two solutions superimposed, with "arrows" connecting corresponding
points).
4

/'h
......f2
~4
0/
8\
3

Figure 15. Same as Figure 14, but three-four plane of rotated


MDPREF species space superimposed on same plane of SINDSCAL
solution.

two points representing the same species are connected with one arrow. The terminus
(arrowhead) of the arrow shows the position of the species point in the SINDSCAL
representation, while the origin (shown by an asterisk) shows the point in the MDSCAL
representation after rotation to optimal congruence with the SINDSCAL representation.
In this case, the SINDSCAL configuration provides the "target," and the MDPREF
129

solution is rotated to best congruence in a certain least squares sense (specifically, so that
the sum of squares of the arrow length is minimized). The specific procedure used was a
variant of one originally proposed by Cliff (I 966), which is closely related to the
"orthogonal procrustes" approach described by Gower in section 9.1 of his paper in this
volume. Figure 15 shows a similar plot for the dimension three-four plane. (It should
be kept in mind that the dimensions referred to here are those from the SINDSCAL
solution, so the one-two plane should be taken as corresponding to those dimensions from
the SINDSCAL analysis, not from the MDPREF solution first described. Since the
MDPREF coordinate system has been completely transformed in this process, there is no
necessary one-one correspondence with those dimensions.> Figures 16 and 17 show the
rotated MDPREF solution in those same two planar projections, but this time with the
2

85

I 1

29 80

Figure 16. One-two plane of MDPREF space rotated to optimal


congruence with SINDSCAL solution. Site x time period vectors as
well as 88 sea worm species points are shown, in joint space
representation.

(rotated) vectors shown simultaneously (and, in fact, with lines connecting them to the
origin to make their vectorial nature more evident). In Figure 16, showing the
dimension one-two plane of this rotated MDPREF joint representation, we see that all
the vectors for sites 3, 4 and 5 when projected into that plane have substantial lengths,
while those for sites 1 and 2 have lengths, when projected into this plane, that are very
near zero. Thus these two dimensions are accounting for virtually all the reliable
variance for sites 3, 4 and 5, and essentially none for site 1 and 2. This is consistent
with the fact that, in the SINDSCAL representation, the derived dissimilarity matrices
130

dim4

49
85

Figure 17. Same as Figure 16, except that three-four plane of joint
MDPREF representation, after rotation to optimal congruence with
SINDSCAL solution, is plotted showing both species points and
variable vectors.

for sites 3, 4 and 5 had high, clearly non-zero, weights on the corresponding INDSCAL
dimensions, while those derived for sites 1 and 2 had near zero weights. The opposite
pattern shows up in the plane for dimension three and four of this rotated MDPREF
representation shown in Figure 17; the lengths of the vectors for sites 1 and 2 projected
in this plane are substantial, while those for sites 3, 4 and 5 are near zero. Again, this is
consistent with the INDSCAL results. We can also see in this three, four plane a clear
separation between sites 1 and 2, with site 1 having higher weights on dimension four
than three, and site 2 the opposite pattern. In the one, two plane we can see some, but
not as clear, differentiation of site 3 from site 4 and 5. These three sites are much more
"mixed up" in this representation than in others we have seen. There is some hint of the
differentiation based on year (1975 versus 1976) in the vectors for sites 3, 4 and 5 in this
plane, however.

In a sense this representation provided by the MDPREF solution after rotation to


optimal congruence with SINDSCAL provides the most cogent and succinct
representation of all for these data. What it shows is an overall four dimensional
representation, but one neatly partitioned into two two-dimensional subspaces. The one-
two subspace seems to account for most of the variance in the variables related to sites 3,
4 and 5 while the three-four subspace accounts for most of that in variables related to
sites 1 and 2. A further "nice" aspect of this representation is that almost all the vectors
131

lie in the positive quadrants of these two planes, so the weights of these four dimensions
are almost all positive or zero. This suggests that the use we have made of SINDSCAL
in this case may provide a very effective basis for rotation of an MDPREF type
representation to a special kind of generalized "simple structure."

It now only remains for ecologists and biologists to "interpret" the dimensions in
terms of their effects on the seaworm species. We happily defer that privilege to these
experts. To aid such experts in this creative endeavor, however, we provide a final table,
Table 6, in which the coordinates for the 88 seaworm species on the dimensions of the
four different configurations discussed in this paper are presented.

Acknowledgments. Invaluable help in conducting the data analyses reported and other
technical help in preparing this paper were provided by Rhoda T. losso and Barbara B.
Hollister. Thanks are also due to Martina Bose and to Karen Golday for word
processing and other technical assistance. Finally, I am greatly indebted to
Pierre Legendre and Joseph B. Kruskal, plus two anonymous reviewers, for careful
readings of the paper at various stages of its development, leading to enormous
improvements in its content.

REFERENCES

Arabie, P., & J. D. Carroll. 1980a. MAPCLUS: A mathematical programming


approach to fitting the ADCLUS model. Psychometrika 45: 211-35.
Arabie, P., & J. D. Carroll. 1980b. How to use MAPCLUS: A computer program for
fitting the ADCLUS model. Unpublished manuscript, Bell Laboratories, Murray
Hill, NJ.
Arabie, P., J. D. Carroll & W. S. DeSarbo.1987. Three-Way multidimensional scaling
and related techniques. Sage University Paper. In press.
Bloxom, B. 1968. Individual differences in multidimensional scaling. Research Bulletin.
68-45. Educational Testing Service, Princeton, NJ.
Carroll, J. D. 1965. Parametric mapping of similarity data as a scaling method.
Unpublished manuscript, Bell Laboratories, Murray Hill, NJ.
Carroll, J. D. 1968. Generalization of canonical correlation analysis to three or more
sets of variables. Proceedings of the 76th Annual Convention of the American
Psychological Association 3: 227-228.
Carroll, J. D. 1969. Polynomial factor analysis. Proceedings of the 77th Annual
Convention of the American Psychological Association 4: 103-104.
Table 6. Coordinates of points for 88 species of seaworms on dimensions of four dirrerent multidimensional representations described in text.

Unrotated Rotated SINDSCAL Transformed Abundance I


MDPREF Coordinates MDPREF Coordinates COORDINATES SINDSCAL COORDINATES Measure
Numerical Cod. MPI MP2 MP3 MP4 RMI RM2 RM3 RM4 SII
SI1 SI2 SI3 SI4 TS1 TS2 TS3 TS4 ABUND-
~log(f+ll
I -.055 .014 -.022 -.011 -.028 -.016 -.043 .009 -.046 -.041 -.056 .033 .010 .014 .014 .027 2.07
2 -.054 .023 .004 .022 -.028 -.016 -.043 -.009 -.040 -.044 -.052 .048 .020 .019 .025 .031 1.38
3 -.061 .020 -.018 .016 -.029 -.016 -.019 .040 -.053 -.042 -.052 .053 .021 .041 .050 .054 .69
4 -.057 .014 -.002 .071 -.040 -.010 -.034 .030 -.059 -.031 -.032 .064 .008 .036 .050 .024 1.09
5 -.059 .019 -.003 .061 -.057 .003 .001 .058 -.057 -.035 -.039 .065 .015 .051 .069 .037 .69
6 -.060 .016 -.027 .001 -.052 -.000 -.006 .056 -.053 -.043 -.053 .045 .013 .025 .030 .041 1.09
7 -.029 -.070 .0 -.047 -.037 -.011
-.Oll -.042 .017 -.050 -.049 .020 -.048 .002 .002 .014 .018 7.49
8 -.060 .016 -.011 .034 -.025 -.047 .000 -.058 -.054 -.040 -.053 .048 .020 .044 .049 .062 .69
9 -.030 .039 -.024 .122 -.045 -.007 -.021 .038 -.031 .009 -.029 .074 .013 .028 .020 .006 2.77
10 .335 . III -.053 -.021 -.060 .053 .012 .085 .340 .346 .151 -.061 .007 .007 .004 .002 52.78
II
11 .113 -.064 -.037 -.203 .233 .219 .028 -.033 .104 .045 .077 -.167 .006 .004 .005 .009 28.65
12 .003 -.019 -.256 .002 .109 -.012 -.034 -.183 -.043 .084 -.052 -.066 .002 .015 .003 .015 9.97
13 .102 -.344 -.200 -.088 -.097 .146 -.108 -.094 -.055 .074 .194 -.396 .000 .004 .008 .015 31.82
14 -.036 .001 -.031 .019 -.116 .065 .069 -.340 -.035 -.023 -.048 .003 .006 .009 .007 .017 5.12
15 -.057 .009 .004 .022 -.038 .007 -.020 .009 -.053 -.046 -.029 .055 .010 .018 .041 .025 1.38
16 -.061 .009 .005 .034 -.036 -.021 -.014 .032 -.056 -.044 -.045 .048 .017 .038 .060 .062 .69
H -.059 .002 -.008 .008 -.043 -.019 -.009 .040 -.055 -.045 -.047 .034 .009 .018 .028 .041 1.38
18 -.061 .009 .005 .034 -.039 -.021 -.024 .018 -.056 -.044 -.045 .048 .017 .038 .060 .062 .69
19 -.057 .006 -.034 -.006 -.043 -.019 -.009 .040 -.052 -.042 -.054 .025 .008 .016 .018 .036 1.79
20 -.059 .Oll -.032 -.008 -.038 -.010 -.043 .003 -.052 -.044 -.054 .039 .Oll .019 .023 .037 1.38 ~
21 -.061 .009 .005 .034 -.035 -.012 -.048 .008 -.056 -.044 -.045 .048 .017 .038 .060 .062 .69
22 -.061 .020 -.018 .016 -.043 -.019 -.009 .040 -.053 -.042 -.052 .053 .021 .041 .050 .054 .69
23 -.060 .019 .005 .046 -.040 -.010 -.034 .030 -.054 -.042 -.038 .066 .020 .041 .070 .036 .69
24 .559 .021 -.352 .193 -.043 -.Oll -.009 .052 .409 .666 .371 -.252 .005 .008 .005 .004 84.99
25 .211 .146 -.143 -.055 .117 .567 .121 -.143 .229 .256 .008 -.044 .008 .008 .002 .003 37.03
26 -.061 .009 .005 .034 .150 .197 -.081 -.041 -.056 -.044 -.045 .048 .017 .038 .060 .062 .69
27 .322 -.023 .117 .045 -.043 -.019 -.009 .040 .300 .235 .270 -.166 .006 .005 .006 .004 55.29
28 -.058 .011 -.037 -.017 .194 .109 .207 -.033 -.052 -.044 -.054 .035 .009 .016 .020 .034 1.60
29 .185 -.390 .183 -.162 -.034 -.012 -.053 .000 .073 -.009 .376 -.413 .003 .001 .009 .010 46.32
30 -.057 .008 -.014 .046 .078 -.151 .270 -.301 -.058 -.033 -.041 .045 .007 .027 .033 .033 1.38
31 -.050 .0 .010 .027 -.053 -.001 -.012 .038 -.050 -.046 -.014 .048 .006 .009 .026 .015 2.77
32 -.058 .015 .010 .048 -.036 -.021 -.001 .029 -.053 -.043 -.031 .065 .013 .025 .050 .023 1.09
33 -.054 .002 .003 .035 -.043 -.013 -.002 .051 -.053 -.042 -.028 .044 .007 .013 .028 .022 2.07
34 -.060 .019 .005 .046 -.042 -.015 -.004 .034 -.054 -.042 -.038 .066 .020 .041 .070 .036 .69
35 .121 -.375 .1l3
.133 -.083 -.043 -.011 -.009 .052 -.012 -.016 .353 -.327 .001 .001 .012 .Oll 36.08
36 -.055 -.033 .029 .014 -.002 -.121 .250 -.252 -.060 -.050 -.031 .002 .003 .008 .023 .037 2.39
37 -.016 -.107 -.036 -.050 -.040 -.046 .014 .Oll -.059 -.031 .050 -.082 .000 .003 .013 .016 10.44
38 -.027 -.069 -.142 -.225 -.044 -.028 .005 -.097 -.038 -.046 -.057 -.133 .003 .002 .003 .022 9.83
39 -.057 .006 -.049 -.039 .000 -.033 -.143 -.195 -.051 -.045 -.055 .023 .007 .Oll .014 .030 2.19
40 -.004 .024 .120 -.047 -.030 -.014 -.065 -.017 .039 -.060 .010 .039 .Oll .001 .010 .005 9.21
41 -.050 .014 .002 .046 .069 -.085 .021 .030 -.051 -.029 -.016 .070 .009 .023 .039 .Oll 1.79
42 -.011 -.155 -.112 -.206 -.041 -.005 -.004 .045 -.053 -.034 -.037 -.239 .001 .002 .003 .025 13.19
43 -.010 -.125 -.066 -.166 -.022 -.052 -.073 -.231 -.047 -.050 .041 -.133 .001 .001 .010 .017 12.73
44 -.049 -.021 -.087 -.098 -.010 -.056 -.048 -.178 -.050 -.044 -.061 -.034 .004 .006 .005 .028 4.38
45 -.052 .006 -.051 -.036 -.025 -.017 -.092 -.078 -.052 -.044 -.041 .014
.0\4 .005 .008 .014 .024 3.09
1.09
46 -.052 .009 -.044 -.025 -.034
-.014 -.012 -.057 -.025 -.046 -.041 -.052 .025 .007 .010 .012 .023 2.77
47 -.032 -.074 -.146 -.185 -.030 -.009 -.056 -.008 -.048 -.041 -.069 -.142 .002 .003
.001 .002 .027 8.54
48 -.025 -.101 -.002 -.024 -.022 -.018
-.0 IS -.126 -.178 -.051 -.038 -.015 -.116 .002 .004 .009 .026 7.87
49 .226 .290 .296 -.084 -.046 -.041 .026 -.067 .388 .139
.1l9 -.004 .091 .012 .005 .002 .000 36.62
50
SO .061 .016 -.015 .028 .385
.185 -.030
-.010 .049 .168 -.055 -.040 -.054 .047 .018 .044 .047 .063 .69
51 -.057 .008 -.044 -.029 -.045 -.008 -.025 .034 -.051 -.045 -.055 .028 .008 .013
.Oll .016 .032
.012 \.94
1.94
52 -.053 .019 -.010 .042 -.031 -.013
-.Oll -.060 -.009 -.047 -.033
-.013 -.050 .047 .011 .021 .020 .024 \.79
1.79
53 -.058 .032 -.015 .046 -.043 -.000 -.015 .043 -.052 -.027 -.053 .068 .023
.021 .063
.061 .049 .033
.013 .69
54 .136 .162 -.179 -.263 -.044 .003 -.024 .052 .187 .165 -.087 -.037
-.017 .009 .008 .000 .004 27.90
55 -.041 .040 .077 .021 .189 .102 -.225 -.130 -.003 -.058 -.038 .069 .021 .004 .016 .007 2.99
56 -.042 .052 -.010 .011 .011 -.049 .010 .070 -.027 -.013
-.011 -.058 .067 .012 .018 .009 .007 3.17
57 -.061 .020 -.018 .016 -.010 -.000 -.040 .044 -.053
-.051 -.042 -.053
-.051 .053
.051 .021 .041 .049 .054 .69
58 .175 .112 .069 .067 -.040 -.010 -.034
-.014 .030 .213
.211 .152 .076 .014 .009 .007 .005 .002 30.31
10.31
59 .095 .131
.1l1 .230 -.043 .140 .097 .075 .079 .172 .042 .030
.010 .075 .010 .004 .005 .000 23.04
60 -.052 .040 .027 .019 .206 -.075 .062 .106 -.026 -.049 -.050 .068 .030 .015 .026 .016 \.38
1.38
61 .060 .064 .215 -.037 -.011 -.027 -.017 .056 .127 -.022 .044 .039
.019 .010 .002 .007 .002 18.71
62 .106 .173 .064 -.204 .151 -.094 .076 .073 .216 .044 -.046 .038
.018 .013
.011 .005 .001 .002 2\.60
21.60
63 -.059 .015 .018 .048 .238 -.028 -.102 -.007 -.054 -.044 .-.028 .069 .020 .038
.018 .085 .031
.011 .69
64 -.043 .055 .035 -.040 -.040 -.019 -.000 .054 -.007 -.043 -.059 .066 .014 .006 .006 .005 4.18
65 -.046 .050 .028 -.024 .028 -.047 -.043 .034 -.016 -.043 -.057 .066 .015 .008 .008 .007 3.40
66 -.061 .020 -.018 .016 .014 -.040 -.039 .037 -.053 -.042 -.053 .053 .021 .041 .049 .054 .69
67 -.059 .015 .018 .048 -.040 -.010 -.034 .030 -.054 -.044 -.028 .069 .020 .038 .085 .031 .69
68
6S -.044 -.046 -.075 -.051 -.040 -.019 -.000 .054 -.058 -.031 -.065 -.075 .002 .008 .004 .033 4.90
69 -.059 .015 .018 .048 -.048 -.007 -.052 -.066 -.054 -.044 -.028 .069 .020 .038 .085 .031 .69
70 -.051 -.020 -.050 -.018 -.040 -.019 -.000 .054 -.057 -.035 -.060 -.025 .003 .011 .008 .036 3.17
71 .143 -.369 .274 .324 -.047 -.008 -.041 -.025 -.029 .042 .473 -.199 .001 .003 .015 .008 35.64
72 -.034 .053 -.034 .048 -.108 -.031 .500 -.000 -.028 .007 -.053 .069 .011 .022 .009 .006 3.46
73 .010 .118
.IIS .087 -.130
-.1l0 -.030 .031 -.032 .054 .082 -.008 -.067 .067 .013 .005 .001 .002 10.89 ~
74 .010 -.151 -.035 .105 .140 -.072 -.068 .029 -.068 .009 .095 -.114 -.000 .005 .013
.Oll .014 13.68
75 .026 -.166 .008 -.020 -.112 .036 .105 -.049 -.030 -.021 .081 -.185 .002 .002 .009 .016 17.23
76 -.054 .037 .020 .024 -.041 -.032 .084 -.113 -.032 -.047 -.050 .068 .032 .021 .033 .020 \.09
1.09
77 .024 .049 .177 -.016 -.018 -.023 -.017 .055 .080 -.043 .038 .065 .011 .002 .009 .001 13.03
1l.03
78 -.057 .014 -.002 .071 .103 -.086 .061 .072 -.059 -.031 -.032 .064 .008 .036 .050 .024 \.09
1.09
79 -.047 .035 .025 -.020 -.057 .003 .001 .058 -.021 -.049 -.052 .053 .018 .008 .014 .015 2.48
80
SO .073 -.024 .395 .065 .004 -.040 -.031 .030 .117 -.071 .213 .063 .009 .000 .015 .001 19.26
81
SI -.053 .037 .030 .017 .143 -.169 .253 .128 -.030 -.047 -.050 .067 .027 .017 .026 .017 \.38
1.38
82 -.058 .032 -.015 .046 -.011 -.030 -.015 .054 -.052 -.027 -.053 .068 .023 .063 .049 .033 .69
83 .119 .213 .019 -.175 -.044 .003
.001 -.024 .052 .205 .127 -.070 .067 .011 .008 .000 .001 23.58
84 -.059 .015 .018 .048 .234 .022 -.124 .013 -.054 -.044 -.028 .069 .020 .038 .085 .031 .69
85 .099 .064 -.228 .558 -.040 -019 -.000 .054 -.017 .306 .059 .076 .002 .018 .007 .000 20.00
86 -.034 .008
.OOS .053 .057 -.233 .412 .145 .224 -.033 -.031 .010 .065 .007 .008 .019 .005 4.85
87 -.061 .020 -.018 .016 -.021 -.024 .033 .060 -.053 -.042 -.053 .053 .021 .041 .049 .054 .69
88 -.052 .036 -.016 .085 -.040 -.010 -.034 .030 -.051 -.014 -.049 .070 .012 .041 .027 .015 \.38
1.38
134

Carroll, J. D. 1972. Individual differences and multidimensional scaling, p. 105-155. In


R. N. Shepard, A. K. Romney & S. Nerlove (Eds.) Multidimensional
scaling: Theory and applications in the behavioral sciences (Vol. 1). Seminar
Press, New York and London.
Carroll, J. D. 1980. Models and methods for multidimensional analysis of preferential
choice (or other dominance) data, p. 234-289. In E. D. Lantermann & H. Feger
(Eds.) Similarity and choice. Hans Huber Publishers, Bern, Stuttgart and
Vienna.
Carroll, J. D., & P. Arabie. 1980. Multidimensional Scaling. Annual Review of
Psychology 31: 607-49.
Carroll, J. D., & P. Arabie. 1982. How to use INDCLUS, A computer program for
fitting the individual differences generalization of the ADCLUS model.
Unpublished manuscript, Bell Laboratories, Murray Hill, NJ.
Carroll, J. D., & P. Arabie. 1983. INDCLUS: An individual differences generalization
of the ADCLUS model and the MAPCLUS algorithm. Psychometrika 48: 157-
169.
Carroll, J. D., & J. J. Chang. 1964a. A general index of nonlinear correlation and its
application to the problem of relating physical and psychological dimensions.
Unpublished manuscript, Bell Laboratories, Murray Hill, NJ.
Carroll, J. D., & J. J. Chang. 1964b. Nonparametric multidimensional analysis of
paired-comparisons data. Paper presented at the joint meeting of the Psychometric
and Psychonomic Societies, Niagara Falls, October 1964. Unpublished
manuscript, Bell Laboratories, Murray Hill, NJ.
Carroll, J. D., & J. J. Chang. 1967. Relating preference data to multidimensional
scaling solutions via a generalization of Coombs' unfolding model. Unpublished
manuscript, Bell Laboratories, Murray Hill, NJ.
Carroll, J. D., & J. J. Chang. 1969. A new method for dealing with individual
differences in multidimensional scaling. Unpublished manuscript, Bell
Laboratories, Murray Hill, NJ.
Carroll, J. D., & J. J. Chang. 1970. Analysis of individual differences in
multidimensional scaling via an N-way generalization of "Eckart-Young"
decomposition. Psychometrika 35: 283-319.
Carroll, J. D., & J. J. Chang. 1972. IDIOSCAL (Individual Differences In Orientation
SCALing): A generalization of INDSCAL allowing IDIOsyncratic reference
systems as well as an analytic approximation to INDSCAL. Unpublished
manuscript, Bell Laboratories, Murray Hill, NJ.
Carroll, J. D., & J. J. Chang. 1972. SIMULES: SIMUltaneous Linear Equation
Scaling. Proceedings of the 80th Annual Convention of the American
Psychological Association 7: 11-12.
Carroll, J. D., & S. Pruzansky. 1980. Discrete and hybrid scaling models, p. 108-139.
In E. D. Lantermann & H. Feger (Eds.) Similarity and Choice. Hans Huber
Publishers, Bern, Stuttgart, Vienna.
135

Carroll, J. D., & S. Pruzansky. 1984. The CANDECOMP-CANDELINC family of


models and methods for multidimensional data analysis, p. 372-402. In H. G.
Law, W. Snyder, J. Hattie, & R. P. McDonald (Eds.) Research methods for
multimode data analysis. Praeger, New York.
Carroll, J. D., & S. Pruzansky. 1986. Discrete and hybrid models for proximity data,
p.47-59. In W. Gaul and M. Schader (Eds.) Proceedings of the German
Classification Society Annual Meeting, (Classification as a tool of research),
Elsevier Science Publishers B.V. (North Holland), Amsterdam.
Carroll, J. D., S. Pruzansky & J. B. Kruskal.1980. CANDELINC: A general approach
to multidimensional analysis of many-way arrays with linear constraints on
parameters. Psychometrika 45: 3-24.
Carroll, J. D., & M. Wish. 1974a. Models and methods for three-way multidimensional
scaling, p. 57-105. In D. H. Krantz, R. C. Atkinson, R. D. Luce, & P. Suppes
(Eds.) Contemporary developments in mathematical psychology (Vol. II). W. H.
Freeman, San Francisco.
Carroll, J. D., & M. Wish. 1974b. Multidimensional perceptual models and
measurement methods, p. 391-447. In E. C. Carterette & M. P. Friedman (Eds.),
Handbook of perception (Vol. II). Academic Press, New York.
Chang, J. J. 1968. How to use PARAMAP: A computer program which performs
parametric mapping. Unpublished manuscript, Bell Laboratories, Murray Hill,
NJ.
Chang, J. J. 1971. Multidimensional scaling program library. Unpublished manuscript,
Bell Laboratories, Murray Hill, NJ.
Chang, J. J., & J. D. Carroll. 1968a. How to use MDPREF: A computer program for
multidimensional analysis of preference data. Unpublished manuscript, Bell
Laboratories, Murray Hill, NJ.
Chang, J. J., & J. D. Carroll. 1968b. How to use PROFIT: A computer program for
property fitting by optimizing nonlinear or linear correlation. Unpublished
manuscript, Bell Laboratories, Murray Hill, NJ.
Chang, J. J., & J. D. Carroll. 1969. How to use INDSCAL: A computer program for
canonical decomposition of N-way tables and individual differences in
multidimensional scaling. Unpublished manuscript, Bell Laboratories, Murray
Hill, NJ.
Chang, J. J., & J. D. Carroll. 1972a. How to use IDIOSCAL: A computer program for
individual differences in orientation scaling. Unpublished manuscript, Bell
Laboratories, Murray Hill, NJ.
Chang, J. J., & J. D. Carroll. 1972b. How to use PREFMAP and PREFMAP
2: Programs which relate preference data to multidimensional scaling solution.
Unpublished manuscript, Bell Laboratories, Murray Hill, NJ.
Chang, J. J., & J. D. Carroll. 1972c. How to use SIMULES: A computer program
which does simultaneous linear equation scaling. Unpublished manuscript, Bell
Laboratories, Murray Hill, NJ.
136

Cliff, N. 1966. Orthogonal rotation to congruence. Psychometrika 31: 33-42.


Coombs, C. H. 1964. A Theory of Data. Wiley, New York.
Eckart, c., & G. Young. 1936. The approximation of one matrix by another of lower
rank. Psychometrika 1: 211-218.
Fresi, E., R. Colognola, M. C. Gambi, A. Giangrande & M. Scardi.1983. Ricerche sui
popolamenti bentonici di substrato duro del porto di Ischia. Infralitorale
fotofilo: Policheti. Cahiers de Biologie Marine XXIV: 1-19.
Gower, J. C. 1966. Some distance properties of latent root and vector methods used in
multivariate analysis. Biometrika 53: 325-38.
Harshman, R. A. 1970. Foundations of the PARAFAC procedure: Models and
conditions for an "explanatory" multimodal factor analysis. UCLA Work Pap.
Phonetics 16, 84pp.
Harshman, R. A. 1972. Determination and proof of minimum uniqueness conditions for
PARAFAC II. UCLA: Work. Pap. Phonetics 22: 111-17.
Heiser, W. J. 1981. Unfolding analysis of proximity data. Doctoral Dissertation,
University of Leiden, Leiden, The Netherlands.
Horan, C. B. 1969. Multidimensional scaling: Combining observations when individuals
have different perceptual structures. Psychometrika 34: 139-65.
Johnson, S. C. 1967. Hierarchical clustering schemes. Psychometrika 32: 241-54.
Kruskal, J. B. 1964a. Multidimensional scaling by optimizing goodness of fit to a
nonmetric hypothesis. Psychometrika 29: 1-27.
Kruskal, J. B. 1964b. Nonmetric multidimensional scaling: A numerical method.
Psychometrika 29: 115-29.
Kruskal, J. B. 1965. Analysis of factorial experiments by estimating monotone
transformations of the data. Journal of the Royal Statistical Society Series B
27: 251-263.
Kruskal, J. B. 1977. Multidimensional scaling and other methods for discovering
structure, p. 296-339. In K. Enslein, Ralston, & Wilf (Eds.) Statistical methods
for digital computers (Vol. III of Mathematical Methods for Digital Computers).
Wiley, New York.
Kruskal, J. B., & F. Carmone. 1969. How to use M-D-SCAL (version 5M) and other
useful information. Unpublished manuscript, Bell Laboratories, Murray Hill, NJ.
Kruskal, J. B., & F. Carmone.1968. Use and theory of MONANOVA: A program to
analyze factorial experiments by estimating monotone transformation of the data.
Unpublished manuscript, Bell Laboratories, Murray Hill, NJ.
Kruskal, J. B., & M. Wish. 1978. Multidimensional Scaling. Sage, Beverly Hills.
Kruskal, J. B., F. W. Young, & J. B. Seery. 1973. How to use KYST-2A: A very
flexible program to do multidimensional scaling and unfolding. Unpublished
manuscript, Bell Laboratories, Murray Hill, NJ.
Meulman, J., W. J. Heiser, & J. D. Carroll. 1986. PREFMAP-3 User's Guide.
Unpublished manuscript, AT&T Bell Laboratories, Murray Hill, NJ.
137

Meredith, W. 1964. Rotation to achieve factorial invariance. Psychometrika 29: 187-


206.
Pruzansky, S. 1975. How to use SINDSCAL: A computer program for individual
differences in multidimensional scaling. Unpublished manuscript, Bell
Laboratories, Murray Hill, NJ.
Ramsay, J. O. 1977. Maximum likelihood estimation in multidimensional scaling.
Psychometrika 42: 241-66.
Sands, R., & F. W. Young. 1980. Component models for three-way data: an alternating
least squares algorithm with optimal scaling features. Psychometrika 45: 39-67.
Schonemann, P. H. 1972. An algebraic solution for a class of subjective metrics models.
Psychometrika 37: 441-51.
Shepard, R. N. 1962. The analysis of proximities: Multidimensional scaling with an
unknown distance function. I. Psychometrika 27: 125-140.
Shepard, R. N. 1962. The analysis of proximities: Multidimensional scaling with an
unknown distance function. II. Psychometrika 27: 219-246.
Shepard, R. N. 1980. Multidimensional scaling, tree-fitting and clustering. Science
(210): 390-98.
Shepard, R. N., & J. D. Carroll. 1966. Parametric representation of nonlinear data
structures, p. 561-592. In P. R. Krishnaiah (Ed,) Multivariate Analysis.
Academic Press, New York.
Takane, Y., & J. D. Carroll. 1980. How to use MAXSCAL-4.1: A program to fit the
simple (unweighted) Euclidean distance model to various types of nonmetric
dissimilarity data by maximum likelihood. Unpublished manuscript, Bell
Laboratories, Murray Hill, NJ.
Takane, Y., & J. D. Carroll. 1981. Nonmetric maximum likelihood multidimensional
scaling from directional ran kings of similarities. Psychometrika 46: 389-405.
Takane, Y., F. W. Young, & J. de Leeuw. 1977. Nonmetric individual differences
multidimensional scaling: An alternating least squares method with optimal
scaling features. Psychometrika 42: 7-67.
Torgerson, W. S. 1958. Theory and Methods of Scaling. Wiley, New York.
Tucker, L. R. 1960. Intra-individual and inter-individual multidimensionality. In
Psychological Scaling: Theory and Applications. ed. H. Gulliksen, S. Messick.
Wiley, New York.
Tucker, L. R. 1972. Relations between multidimensional scaling and three-mode factor
analysis. Psychometrika 37: 3-27.
Weinberg, S. L., & J. D. Carroll. 1986. Choosing the dimensionality of an INDSCAL-
derived space by using a method of resampling. Paper presented at the Annual
Meeting of the American Educational Research Association, San Francisco, April
16-20. Unpublished manuscript, ATT Bell Laboratories, Murray Hill, NJ.
Wish, M., & J. D. Carroll. 1974. Applications of individual differences scaling to studies
of human perception and judgment, p.449-491. In E. C. Carterette & M. P.
Friedman (Eds,) Handbook of perception, Chapter 13 (Vol. 11). Academic Press,
138

New York.
Young, F. W. 1968. TORSCA-9: A FORTRAN IV program for nonmetric
multidimensiional scaling. Behavioral Science 13: 343-344.
Young, F. W., & W. S. Torgerson. 1967. TORSCA, a FORTRAN IV program for
Shepard- Kruskal multidimensional scaling analysis. Behavioral Science 12: 498.
THE DUALITY DIAGRAM :

A MEANS FOR BETTER PRACTICAL APPLICATIONS

Y. Escoufier
Unite de Biometrie
ENSA-INRA-USTL
9, place Pierre Viala
F-34060 Montpellier Cedex, France

Abstract - Producing the Principal Components Analysis of a data


table requires choices which need to be explained in order to
acquire complete understanding of the results. This explicitness
opens the road to other possible choices, leading to theoretical
research and many practical applications. Changes of scale, changes
of variables, weighting of statistical units, decentering of the
representations, and elimination of dependence between individuals
are dealt with. After reviewing the usual methods from this pers-
pective, it can be seen that it is possible to transform them in
order to better adapt mathematical abstractions to concrete
situations.

I - A REVIEW OF PRINCIPAL COMPONENTS ANALYSIS (PCA)


Let us put ourselves in the place of a scientist who
runs a data table through a PCA Variance Matrix program. The
program gives a representation of the variables and of the objects
as well as the eigenvalues, which will allow him to estimate the
overall validity of the graphs obtained. In addition, especially if
the program is French, it will give quantities called absolute and
relative contributions which will evaluate the role played by each
of the variables and each of the objects, also called units or
statistical units in this paper.

Let us try to specify what the program has done in


order to go from the data table to the results. For this purpose,
let X be an nxp matrix. The i-th row of X is denoted by X. and
~
contains the p measures made on the i-th statistical unit. The
j-th column, denoted by xj, contains the n values taken by the
j-th variable.

NATO ASI Series, Vol. G14


Developments in Numerical Ecology
Edited by P. and L. Legendre
© Springer-Verlag Berlin Heidelberg 1987
140

Representation of variables and statistical units. Looking closely


at the program, we will see that the arithmetic mean values of
each of the variables are calculated first :

Vj = 1, ..• ,p xj = 1 ~ X~
n i=l ~

Then X is replaced by the centered array X for which

Let Inxn be the Rn identity matrix, and 1n the column


n
vector of R in which all the components are 1.

1 I'
= (I
A

We can write X _ ~n ~n) X 1.1


nxn n

Secondly, the variance matrix associated with the array


A A

is calculated, i.e. : V X'x 1.2


n

The third step consists in calculating the eigenvectors


and eigenvalues of V, that is the p vectors ~a of RP and the
p numbers Aa satisfying
Va = 1, ... ,p V~a Aa ~a with ~~ ~a 1
1.3
Al ~ A2 ~ A
P

Standard mathematical expansions show that


Va=l, ... ,p Aa ~o
, 1, ... ,p A,
Va=l, ... ,p a Aa f a => ~: ~a' = 0
If Aa Aa " ~
a and ~
a, can be chosen in such a way that ~'a ~a' = o.
p
Then: V I: A ~ ~' 1.4
a=l a a a
q
Finally cons ider for q I: A ~ < p,
~'. It can be Vq =
a=l a a a
shown (see note after expression 1.14) that for every pxp matrix Aq
of rank q < p

P
Tr( (V I: 1.5
a=q+1
141

where Tr represents the trace of the matrices, i.e. the sum of their
diagonal elements.

This property justifies the interpretation that is made


from the variables display. Suppose that for every variable k, we
associate to it the point whose coordinates are

The scalar products that can be read from that represen-


tation are the elements of the matrix V2 , the best possible
approximation of V by a matrix of rank 2.

~
If the sum A;
is sufficiently small, the
a=q+1
covariances and variances of the p variables can be visually
appreciated.

The fourth step is to calculate the coordinates of the


units by the formula :
Va 1, ... ,p
. 1.6
a = X 4> a
1/1*

It can easily be checked that


.... 1/1*'
Va = 1, ... ,p XX'
with a 1/1*
a 1.7
n 1/1*
a a 1/1*
A
a n Aa

....
This leads us to investigate the matrix XX'
n
W
n
that
plays the same role for the units as V does for the variables.

We then set

W P
and we have n E A n
1.8
a=l a
q
Let ~ E 1.9
n a=l
It has been shown (note after expression 1.14) that for every nxn
matrix Aq , of rank q < p
W 2 W P 2
Tr«ri Aq» ~ Tr«ri E A 1.10
a=q+1 a
142

This property justifies the interpretation that will be


made from the representation of the objects. W2 is the best rank 2
approximation of W, which means that the graphical re.presentation
obtained by giving the coordinates (~!i = ffl ~li ; ~~i = lIZ ~2i)
to the i-th statistical unit allows to visualise the scalar
products among the units, and thus their distances.

Absolute and relative contributions. We have seen in passing that


~*' ~*
~*' ~* = a a Aa. The idea naturally arose to consider the
a a n
quantities
(~* )2
ak as the participation of the variable k in the definition
A
a
of Aa 1.11

as the participation of the statistical unit i to the


definition of Aa. 1.11'

These quantities are given the name of absolute


contributions. They allow to estimate the part played by a variable
or a statistical unit in the construction of the representations.

P W p ~a ~~
From V L A ~ ~' and n
LA - - - we can also
a=l a a a a=l a n

conclude that

(~* ) 2
ak
Hence is the participation of the ~* axis 1.12
a
in the reconstruction of the variable K,
in actual fact the reconstruction of the
variance Vkk ,

while is the participation of ~xi~ ~t in the 1.12 '


W.. Xi Xi
reconstruction of ~n n
that is,
the inertia of the i-th statistical unit
with respect to the mean point.
143

These quantities are given the name of relative


contributions. They are criteria for the quality of the represen-
tation specifically concerning each variable and each statistical
unit.
p
Reconstitution of data. Noting that I pxp L ~a ~~
a=l
A A P A P
then X X I pxp L X ~a ~~ L II'ljI
a a
~ I
a 1.13
a=l a=l
A q
Then let Xq L I>:' ljI ~ I. It has been shown
a= 1 a a a
(following note) that for each nxp matrix Aq of rank q <p
A A A A A A

(X-A )(X-A )' (X-X )(X-X )' P


Tr( 9 g ) ~v Tr( 9 n g) L Aa 1.14
n
a =q+l

Note: Expressions 1.5, 1.10 and 1.14 come from the well-known
result by Eckart and Young (1936). It is of importance to remark
A

that Vq , Wq and Xq are not only optimal for the least squares
criterion given here by Tr(.) but also for an infinity of other
criteria (Rao 1980 Sabatier et al. 1984).

II - CHANGES IN THE INITIAL CHOICES

weighted the units by *'


11.1. Weighting of the statistical units. The preceding section
first in the calculation of the mean
values and second, for the calculation of the variance matrix.

If we denote D 1
n I nxn as the diagonal matrix of
elements 1 then
n'
1
A

X (Inxn - 1 ~n ~n
I D) X 11.1
A A

V X' D X II. 2

Re-reading section I with formulas 11.1 and 11.2 then


shows that the fact that all the diagonal elements of D are equal
ta n
1 ~s
. never exp 1·~c~t
. 1 y use d ~n
. proo f s.
144

Re-working the formulas in which n appears explicitly,


we get :
A A

Vo.=I, ... ,p X X'D ~* = A ~* with ~*' D ~* A


0. II. 7
0. 0. 0. 0. 0.
P
WD = E \ ~o. ~~ D II. 8
0.=1
q
E A
0. ~
0. ~'D
0. II.9
0.=1

P
E 11.10
o.=q+l

(~*.)2 D ..
0. ~ ~~
11.11 '
A
0.
P
E A II.14
o.=q+l 0.

It is possible to consider situations in which D is more


general. Actually the mathematical results are stronger than those
used by the usual computer programs. This lead to new applications
which will be discussed in section IV.

11.2. Invertible linear transformations of variables. Consider now


the case of an invertible, linear transformation M, applied to the
data matrix X and write the elements of the PCA of XM.
A
- -n
A

Centering XM (Inxn 1 1 ' D) XM = XM III. 1


-n
A A

Variance matrix VCMJ= M'X'D X M = M'V M III. 2

Eigenvectors and eigenvalues of VCMJ

Va = 1, ... , p A u
0. 0.
with u'u,
0. 0.

i.e. M' V M u
0.
= A
0.
u
0. with u'u
0. 0.'

For each 0. = 1, ... ,p, consider <Po. defined by Uo. M' <p
0. and set
M M' = Q
145

We have M'V Q = A-
a. M' cP a. with cp' M M' 0
CPa. a. CPa.' a. a. '
i. e. VQ CPa. A-
a. with cp' Q a.' 0 IIL3
CPa. a. cP a. a.'
p p
u' , we have M'VM a. M' cp' M hence
From V[M J a. u a. a.
E A- E A-
a.=l a.=l
CPa. a.

VQ IIL4

Moreover we have

P
E
a.=q+l
q
Tr( (M'VM E A- M' cP
a.
a.= 1 a.
q
Tr«VQ - E IIL5
a. =1

The coordinates of the statistical units


A A A

Let ~~ = X M ua. = X M M'CPa. = XQ CPa. IIL6

It can be verified that


A A A A

Va. = 1, ... ,p X M M'X'D X Q X'D ~* = A- a.


~*
a. = a. ~*
a.
A A

with ~*'
a
D ~*a cP , Q X'D X Q
CPa
a

a. Q CPa. A- a. A-
a. III.7 cP ,

and the properties 11.8, II. 9, 11.10, which do not explicitly


A A

involve Q, remain valid. The matrix X Q X' which could be noted


WCMJ' is the matrix of the scalar products between statistical
units when the space RP is given the positive bilinear form
defined by Q = M M' .

Thus, studying linear data transformations is similar


to choosing a means of calculating distances between statistical
units. Most of the current programs choose Q = I pxp ' avoiding the
problem of choice by previous processing (the variables are
standardized in order to use the correlation matrix).
146

We can see that this choice in mathematical equations


does not present great difficulties, but it appears important not
to hide it.

Absolute and relative contributions. For absolute contributions,


the starting point is ~*'Q
a ~* = A so that absolute contributions
a a
are only obtained when Q is diagonal like D. Thus we have
(~~\) 2 Qkk
A
a
p
Formula 111.4 implies V = L Aa ~a ~~ so that relative
contributions can always be calculatea~l
Reconstitution of data.
A P p
We have XM L v1'
a lji a u'a = L If'
a a
lji ~~ M III. 13
a=l a=l
A p
Supposing that (XM)q L IX'a ljia ~'a M, we obtain
a=l
p A A A A
L A Tr «XM (XM)q) (XM XM)~)D)
a=q+l a
A q A q
Tr «X- L 1I'1/I ~~)Q(X- L II'lji ~d) 'D) III.14
a a a a
a=l a=l

III - THE DUALITY DIAGRAM


The above section presents the idea of a PCA which is a
function of the triplet eX, Q. D) instead of the usual presentation
of the PCA of the X array based on some implicit choices :
Q = Ipxp' D = ~ Inxn' Cailliez and Pages (1976) popularized this
point of view in France by giving it a rigourous mathematical
formalization that we are going to review. We will keep in mind
that our first objective is to bring out the choices to be made in
order to carry out a study: the data (X), the weighting of
statistical units necessary for the calculation of relationships
between the variables (D), and the way of quantifying the
resemblances between the statistical units (Q).
147

Our second objective is to define the mathematical


nature of the objects dealt with in order to make the best use of
their properties.
The first step consists of considering the i-th unit
as a vector of E RP. It will be written as

P
L x~1. -J
e.,
j=l
where (~1' ... , ~p) is a system of n linearly independent vectors
of E, i.e. a basis of E.
Symmetrically the j-th variable is considered as a
vector of F = Rn. It will be written as
n
L x~1. -1.
f. where (iI' ... , in) is the basis of F.
i=l

The second step consists in associating a linear


mapping ~j with the j-th variable, which makes the i-th statistical
unit correspond to the value xi,
that it has taken for that
variable :
P k
et
-J
(L
k=l
X. e k )
1.-
~ X~l.-J
k=l
et (~k) x~1.

Thus variables also have a representation in E*, the


dual space of E. In fact (et, -1.
... , -p
e*) is the basis of E*, the dual
basis of (e- 1
, ... , e
-p
) which is the basis of E. In a similar way
(it, ... , i~), the basis of F*, the dual of (iI' ... , in)' can
be defined. -1.ft is the representation of the i-th statistical unit.

This construction gives two representations for each


unit: one in E, the second one in F*. Consider then the linear
mapping defined by
P
Vi = 1, •.. , p it
-1. L x~1. -J
e.
j=l
Its associated matrix is X'.
148

In the same way, X is associated with the linear mapping


n
1.: X~1. -1.
f.
i=1

The calculation of distances between objects considered


as points of E, entails the choice of a positive definite bilinear
form Q which is considered to be a mapping from E into E*. Similar-
ly, the calculation of covariances between the variables in F
depends on a quadratic form D that maps F into F*.

This can be summarized by the following diagram which


illustrates the choices to be made for a study.
E=RP ~(' _ _ _ _X'_ _ _ _ _ F*

Q 1
E* -------:X,.....----4)
TD
F=Rn

The calculation of scalar products between two variables


~f and ~~ in E* must give the same result as the calculation made
between the two same variables Xk and X~ in F for the positive
definite bilinear form D. This leads to the fact that E* must be
provided with the metric V = X' D X. For symmetrical reasons, F*
has the metric W = X Q X'. The diagram can then be completed as
follows :
X'
E=RP f . ( - - - - - - - - F *
Q! jv W! i D
E* X ) F=Rn

Expressions 111.4 and 111.7 show that the solutions of


the PCA are given by the eigenvalues and eigenvectors of VQ and
WO, which appear on the diagram.

IV - ON APPLICATIONS CONCERNING D
IV.I. Special centering
Since the duality diagram just described coincides
exactly with sections I and II using the matrix X, the weights D
can be included as follows :
149

X' (I nxn - D 1 l' )


F*

1ID
E=RP ~
~n ~n

Ql iv w
E* ) F=Rn
(Inxn - 1~n
1 ' D)X
~n

IV.l.l. It is possible that one of the units has a very unusual


behaviour. The representation of the units will tend to show on
the first axis that this individual is in opposition to the others.
While this unit can be eliminated and the peA repeated, the duality
diagram allows for another possibility. Let 6 be a diagonal matrix
whose diagonal elements are all zero except for that corresponding
to the unusual object, which is set to 1. In the following diagram,

E=RP X'(I nxn - 6 1 1 ' ) __


____________________
~n ~n
F*
~(

Ql IV
E* )
(Inxn - 1~n
l' 6) X
~n

the principal components will be the eigenvectors of


WD = (Inxn - 1 l' 6)X D X'(I nxn - 6 1 l')D. They are clearly
~n ~n ~n ~n

a) centered for 6 (because l'


~n
6 W D = 0) '.
and b) orthogonal for D.

In practice, this means that the unusual object is


located at the origin, and the representation of the other points
is studied in relation to it. The matrix (I nxn - 1 l' 6 )X ~n ~n

expresses the deviations from that statistical object. Note that


the weighting assigned to that object in D is unimportant.

IV.l.2. This procedure can be further modified by a matrix 6 having


more than one diagonal element different from zero. Thereby,
representations of the objects, centered for 6 and orthogonal for
D, are obtained. This means giving more importance to the represen-
tations of some objects. Here, the relative weighting of these
objects in D cannot be ignored.
150

IV.2. Analysis of partial covariances (Lebart et al. 1979, p.300)


Equation 1 l'n D = 1n (1' D 1 )-1 l'n D is the basis for
~n ~ ~ ~n ~n ~

the interpretation of centering in terms of projecting on the line


of constants (Cai11iez and Pages 1976, p. 146).

Let us consider an nxq matrix of data X2 , dealing with


the same objects as X. We define X3 as the matrix obtained by the
juxtaposition of 1~n and X2

X3 (!n i X2 )
Let P3 = X3 (X 3 D X3 )-1
A
X'D.
3
Based on the orthogonality of tn and
the columns of X2 = (Inxn - !n !~ D) X2 ,

( I nxn- X2 (X'D
2 X2 )-1 X'2 D)(I nxn- 1~n ~n
I'D)

X'D)
2

In the next duality diagram,


X'(I _PI)
E=RP 4(~______n_x_n_____3________ F*

Ql1V
E*
______________________-+)F=Rn
w Un

We do the PCA of the residuals of X in the regression


A

on X2 , i.e., V is the residual variance matrix.

ii) WD = (I
nxn - P ) X
3 Q X'(1 nxn - PI) D
3
Because P3 is idempotent, !~ P3 W D = 0 and the
principal components satisfy
1 ' P3 1/Ja. = 0
~n

i. e. 1 ' D 1/Ja.
~n
0, the principal components are centered for D, and

X'2 D 1/Ja. 0, the principal components are orthogonal to the


A

sub-space of F generated by the columns of X2 ·


151

iii) Finally, the principal components are orthogonal for D.

IV.3. Correlated objects


One of the consequences made apparent by the duality
A

diagram is that any change in the weights D and in X modifies V.


A

Thus to modify V, the changes in D and X which will produce the


appropriate V are needed.
This problem arises, for example, if the observations
A A

Xi are linked to the observations Xi - 1 by


Ip 1<
A A

i.1i=2, ..• ,n X.~ = P X.~- 1 + e.~ with 1

It is clear that here, V mixes the correlations of


objects with the correlations of the variables, and that it is
desirable to eliminate the effect of the correlations between
obj ects.

In order to do this, Aragon and Caussinus (1980)


suggest studying the following diagram
A

E=RP ____X'_ _ _ _ _ F*

1fC- 1
~(

Ql IV w
E* -------~A--------~)F=Rn
X

where C is the matrix of auto-correlations

1 p p
2 p
n-l

p 1 p

C p2 p 1

p
n-1 ........... 1
152

with inverse
1 - P 0 0

-P l+p2

C- l 1 0 0
l-p 2
l+p2 -p

0 - p 1

o
lfA (+~
o • . . + D
p
and /:;

then C- l = (I - A') /:; (I - A)

Thus the analysis is equivalent to the following

The first object is associated with Xl g~ven th:


weight 1 and those that follow are associated with X.l - p X.l - 1
given the weight 1/ (l_p2) > 1. The sum of the diagonal terms of /:;
can be made equal to unity by multiplying /:; by the necessary
constant. The principal components, i.e. the eigenvectors of
A A

(I-A) X Q X'(I-A') /:;, are orthogonal for /:;. If there is a matrix D


A

such that l' D (I-A)X = 0, the principal components are also


-n A

centered for D (this would be true if X was centered with respect


to a matrix D, giving a weight of 0 to the first object).
153

v - PRACTICAL CONSEQUENCES OF THE USE OF Q


V.I. In the first place, the explicit use of the metric Q allows
an explicit discussion of the choice of the scale of measurement,
and, in particular, the replacement of initial data by standardized
data. It will be noted, however, that there is a slight difference
between the PCA on the correlation matrix, as in conventional
software, and that here. The first is the PCA of the triplet
(X [Diag(V)]-1/2, I pxp ' ~ I nxn ). The second is the PCA of the
triplet (X, [Diag(V)]-l, ~ I nxn ). They both yield the same WO,
and therefore the same representation of the units. However, the
variables are represented differently. The first leads to the
diagonalization of
[Diag(V)]-1/2 X'D X [Diag(V)]-1/2

The second to X'D X [Diag(V)]-l


Obviously the two solutions are related.
Recent works on the choice of a metric in special cases
includes that of Karmierczak (1985), which considers the choice
of distances between profiles, and that of Besse and Ramsay
(1986) on the distances between curves.

V.2. Correspondence Analysis of a nxp contingency table, P, has


been shown (Escoufier 1982) as the PCA of the triplet
-1 -1
(D I (P - Dr !n ~~ DJ ) DJ ' DJ , Dr)

It is easy to see that the product of the sum of the


eigenvalues by the total number of statistical units under study
is simply the x2 statistic describing the contingency between the
qualitative variable defining the rows of P and the qualitative
variable defining the columns. Correspondence analysis can be
considered as a means of bringing out the modalities of the
variables which differ the most from the model of independence.
Lauro and D'Ambra (1983) have shown how x2 could be replaced by
the asymmetric criterion of Goodman and Kruskal (1954). Here again
the use of a special PCA is justified because of the natural
asymmetry between the two gualitative variables being studied.
154

The problem is no longer that of the deviation from the indepen-


dence model, but that of the difference between the conditional
distributions of a variable and its marginal distribution.

These approaches suggest that the comparison of an


. ..
experimental variance matrix V = X'D X with a theoretical variance
matrix E can be developed by the peA of the triplet (X, E- 1 , D).
The eigenvalues of V E- l will be computed. They can be used for
testing the hypothesis that the variance matrix is equal to E
(Anderson 1958, p. 265). The peA will indicate those objects that
contribute most to the different eigenvalues Le .• those that are mainly
responsible for the difference between V and E. Since E1 in
general is not diagonal, it is no longer possible to consider the
absolute contribution of the variables. However, the variables
having large relative contributions are considered to be respon-
sible for any difference between V and E.

Similarly, Discriminant Analysis can also be considered


as a peA of the triplet (M, T- l , Dq) in which M is the qxp matrix
of the means of p variables in each of q classes, Dq is the qxq
diagonal matrix of the weights of the classes, and T is the
variance matrix calculated over the set of tmits. Let B = M'Dq M be
the between-class variance matrix. The sum of the eigenvalues is
Tr(B T- 1 ), the criterion which is referred to by Morrison (1967,
p. 198), to test the equality of the means among the different
groups. Evaluating the contributions of objects (mean points per
class) will reveal which groups contribute most towards rejecting
the hypothesis of equality.

V.3. Now let us look at a situation in which two sets of quanti-


tive variables have been observed for the same objects.
The first set leads to a completely determined peA,
.
that of the triplet (X, Q, D). For the second peA we use the data
.
Y and we agree to give the same weight D to all statistical units •
.
What metric R should be chosen so that the peA of (Y, R, D)
"resembles the closest" the peA of (X, Q, D) ? In order to answer
that question, it is necessary to give a precise meaning to
"resembles the closest".
155

Choosing the resemblance of representations of the objects, it is


natural to quantify the distance between the two PCAs by :
Tr«X Q X'D - Y R Y'D)2)
Bonifas et al. (1984) show that the best choice is
R = (Y'D y)-l Y' D X Q X' D Y (Y' D y)-l

which goes back, from the point of view of the statistical units
under consideration, to the representation given by the PCA of
(Y(Y' D y)-l Y' D X, Q, D)

Note that the sum of the eigenvalues equals


Tr(VQ) Tr«Y'D y)-l Y' D X Q X'D Y) and that Y'D X = Y'D X.

Consider the case where X is a nxq response pattern


array associated with a qualitative variable with q categories.
A A

We know that X'D Y = DqM where M is the qxp matrix of q mean


vectors calculated for each category and Dq is the weight matrix
of each. The choice Q = Dq-1 leads to
A A

y'D X Q X'D Y = M' Dq M = B,

so that setting (Y'D Y) = T we get Tr(VQ) = Tr(T- 1 B). In other


words, discriminant analysis measures the distance between the i-th
and i'-th units by the quantity (Xi - Xi') D~l(Xi - Xi')" It is
possible to question this choice of D~l, and to consider other
possibilities.

VI - CONCLUSION
A deeper mathematical understanding of the steps taken
in a normal PCA program based upon the variance matrix opens up
numerous paths for theoretical and practical work.
This does not challenge the usual methods of data
analysis, which are still.a reasonable compromise between current
knowledge and what the user is willing to do in terms of cost,
whether it be the cost of the mathematical training necessary for
understanding, or for computations.
156

This formalization allows anyone, who is willing to


make the effort to acquire the necessary knowledge (and ultimately
to pay for the expense of special programs), to be able to choose
the mathematical abstractions best adapted to the concrete problem
under study.

REFERENCES
ANDERSON, T.W. 1958. An introduction to multivariate statistical
analysis. John Wiley & Sons, New York, NY.
ARAGON, Y.,and H. CAUSSINUS. 1980. Une analyse en composantes
principales pour des unites statistiques correlees,
p. 121-131. In E. Diday et a1. [ed.J Data analysis and
informatics. North Holland Publ. Co. New York, NY.
BESSE, Ph., and S.O. RAMSAY. 1986. Principal components analysis of
sampled functions. Psychometrika (in press).
BONIFAS, L., Y. ESCOUFIER, P.L. GONZALEZ, and R. SABATIER. 1984.
Choix de variables en analyses en composantes principales.
Revue de Statistique Appliquee, Vol. XXXII n°n° 2 : 5-15.
CAILLIEZ, F.,and J.P. PAGES. 1976. Introduction a l'analyse des
donnees. SMASH, 9, rue Dub an , Paris 75010.
ECKART, C., and G. YOUNG. 1936. The approximation of one matrix by
another of lower rank. Psychometrika, Vol. 1 nO 3 : 211-218.
ESCOUFIER, Y. 1982. L'analyse des tableaux de contingence simples
et multiples. Metron, Vol. XL n°n° 1-2 : 53-77.
ESCOUFIER, Y. 1985. L'analyse des correspondances : ses proprietes
et ses extensions. Institut International de Statistique.
Amsterdam: 28.2.1-28.2.16.
ESCOUFIER, Y., and P. ROBERT. 1979. Choosing variables and metrics
by optimizing the RV-coefficient, p. 205-219. In J.S. Rustagi
[ed.J Optimizing methods in statistics. Academic Press Inc.
GOODMAN, L.A., and W.H. KRUSKAL. 1954. Measures of association for
cross-classifications. J. amer. stat. Ass., Vol. 49 : 732-764.
KARMIERCZAK, J.B. 1985. Une application du principe du Yule:
l'analyse logarithmique. Quatriemes Journees Internationales
Analyse des donnees et informatique. Versailles. France.
(Document proviso ire : 393-403).
LAURO, N., and L. D'AMBRA. 1983. L'analyse non symetrique des
correspondances, p. 433-446. In E. Diday et al. [edJ Data
analysis and informatics III. Elsevier Science Publ. BV.
North Holland.
LEBART, L., A. MORINEAU,and J.P. FENELON. 1979. Traitement des
donnees statistiques. Dunod.
MORRISON, D.F. 1967. Multivariate statistical methods. Mc Graw-Hill
Bock Co.
PAGES, J.P., F. CAILLIEZ, and Y. ESCOUFIER. 1979. Analyse factoriel1e:
un peu d'histoire et de geometrie. Revue de Statistique
Appliquee, Vol. XXVII nO 1 : 6-28.
RAO, C.R. 1980. Matrix approximations and reduction of dimensionality
in multivariate statistical analysis, p. 3-22. In P.R. Krishnaiah
[ed.J Multivariate analysis V. North-Holland Pub1. Co.
SABATIER, R., Y. JAN,and Y. ESCOUFIER. 1984. Approximations
d'applications lineaires et analyse en composantes principales,
p. 569-580. In E. Diday et a1. [ed.J Data analysis and
informatics III. Elsevier Science Publ. BV. North Holland.
NONLINEAR MULTIVARIATE ANALYSIS WITH OPTIMAL
SCALING

Jan de Leeuw
Department of Data Theory FSW
University of Leiden
Middelstegracht 4
2312 TW Leiden, The Netherlands

Abstract - In this paper we discuss the most important multivariate analysis


methods, as they relate to numerical ecology. We introduce appropriate notation
and terminology, and we generalize the usual linear techniques by allowing optimal
nonlinear transformations of variables. This defines a very general class of
nonlinear multivariate techniques, which is between the purely nonlinear techniques
of contingency table analysis and the classical linear techniques based on the
multivariate normal distribution.

INTRODUCTION

It has already been pointed out by many authors that multivariate analysis is the
natural tool to analyze ecological data structures. Gauch summarizes the reasons for
this choice in a clear and concise way. "Community ecology concerns assemblages
of plants and animals living together and the environmental and historical factors
with which they interact. ... Community data are multivariate because each sample
site is described by the abundances of a number of species, because numerous
environmental factors affect communities, and so on. ... The application of
multivariate analysis to community ecology is natural, routine, and fruitfu1."
(Gauch 1982, p. 1). Legendre and Legendre discuss the ecological hyperspace
implicit in Hutchinson's concept of afundamental niche. "Ecological data sets are
for the most part multidimensional: the ecologist samples along a number of axes
which, depending on the case, are more or less independent, with the purpose of
finding a structure and interpreting it." (Legendre and Legendre 1983, p. 3).
A number of possible ecological applications of multivariate techniques are
mentioned in the following quotation from the recent book by Gittins (1985).
"Ecology deals with relationships between plants and animals and between them and
the places where they live. Consequently, many questions of interest to ecologists
call for the investigation of relationships between variables of two distinct but
associated kinds. Such relationships may involve those, for example, between the
plant and animal constituents of a biotic community. They might also involve, as in

NATO ASI Series, VoL G14


Developments in Numerical Ecology
Edited by P. and L. Legendre
© Springer-Verlag Berlin Heidelberg 1987
158

plant ecology, connections between plant communities and their component species,
on the one hand, and characteristics of their physical environment on the other. As
another example, comparative relationships among a number of affiliated species or
populations with respect to a particular treatment regime in a designed experiment
might be studied. In more general terms, the question which arises calls for the
exploration of relationships between any two or more sets of variables of ecological
interest." (I.c., page 1).
It is of some importance to observe that Gittins gives a somewhat limited
description of the possibilities of multivariate analysis here. The reason being, of
course, that his book is about canonical analysis, a rather specific class of
multivariate techniques. We can study relationships between sets of variables, as in
the various form of canonical analysis, but also relationships within a single set of
variables, as in the various forms of clustering and component analysis. In
classification and ordination, for example, we usually deal with a single set of
variables. Each species in the study defines a variable, assigning abundance numbers
to a collection of sites. It may seem natural to relate sets of variables if we want to
study abundance or behaviour of species in relation to the environment, but it would
be more appropriate to analyze the within-structure of a single set if we describe the
structure of a single community or location. And if we want to study the interaction
between members of a community, under various circumstances, it may be even
more appropriate to use techniques derived from multidimensional scaling, for
which the basic data are square interaction or association matrices and the basic
units are pairs of individuals.

FORMS AND PROBLEMS OF MULTIVARIATE ANALYSIS

As indicated in the introduction, multivariate analysis studies the relationships


between a number of variables which are defined for each of a number of objects.
We shall formalize this below, but the intuitive meaning is probably clear. The
objects can be samples or sites, and the variables can be species with varying degree
of abundance in each of the sites or they can be physical characteristics of the sites.
Or the objects can be pairs of individuals of a certain species, and the variables can
be measures of interaction between the pairs. In this section we argue that
multivariate analysis consists of a very large variety of models and techniques, in
fact a far greater variety than one could ever hope to discuss in a single paper or
book. Nevertheless some classes of techniques can be distinguished, and we shall
briefly discuss them in order to delineate the class we shall be talking about in the
sequel. A more extensive treatment of the same classificatory problem is in Gifi
159

(1981), and in Gnanadesikan and Kettenring (1984).


In mathematical statistics the notion of a model plays a very prominent part. In
fact the model is usually the starting point of a statistical analysis. The assumption is
that the data are realizations of random variables, whose distribution, except
possibly for some unknown constants, is described by the model. In multivariate
analysis by far the most prominent model is the multivariate normal distribution
(Anderson 1984, Muirhead 1982). The multivariate measurements are assumed to
be realizations of independent random vectors, each with the same multivariate
normal distribution. Statistical techniques estimate unknown parameters and test
hypotheses on the basis of this multinormal model, usually employing the likelihood
function. The multivariate normal distribution has numerous technical and
interpretational advantages, which are mostly due to its intimate connections with
Euclidean geometry.
In recent years another model has gained some prominence, mainly in discrete
multivariate analysis. This is the multinomial model, usually presented in the form
of loglinear analysis (Bishop, Fienberg, and Holland 1975, Haberman 1979, also
compare Legendre and Legendre 1983, chapter 4). Again the basic assumption is
that we are dealing with realizations of independent and identically distributed
random vectors, but in multinomial analysis no additional parametric assumptions
are made. Because the data vectors are discrete, and each variable assumes only a
finite number of values, it is possible to use such a nonparametric approach. The
main difference between the multinormal and the multinomial model is that in the
multinormal case we only have to model the first order interactions between the
variables. Because the means and covariances are a complete set of sufficient
statistics, they contain all information in the data, and we can ignore all higher order
moments. In the multinomial model all higher order interactions have to be taken
into account. This often leads to serious interpretational problems, and it makes
analysis with a moderate number of variables already quite impractical. It is
consequently not surprising that much effort in the recent statistical literature is
expended on the development of models which combine features of multinomial and
multinormal modelling (Agresti 1983). In a sense the techniques we shall present
below can also be interpreted as such combinations.
In another sense, however, there are important differences between the classical
statistical modelling techniques and our multivariate data analysis methods. As we
have seen above, the notion of a probabilistic model is basic in classical statistics.
From the model we derive the technique, and the results then tell us if the model is
appropriate or not. In multivariate data analysis we work differently. We do not
make explicit assumptions about the process that has generated the data, because
very often it is not at all clear how realistic such assumptions are, and in many cases
160

it is even clear that the usual assumptions are not satisfied at all. Multivariate
normality and complete independence are quite rare in practice. Thus instead of
starting with a model and trying to fit in the data, we start with the data and we try to
find a structure or model that can describe or summarize the data. These two
approaches correspond, of course, with the age-old distinction between induction
and deduction, between empiricism and rationalism. In recent discussions the
concepts of exploration and confirmation, and of description and inference, are
often contrasted. Data analysts generally feel that the models of classical statistics
are much too strong and too unrealistic to give good descriptions of the data. And,
of course, mathematical statisticians feel that the techniques of data analysis very
often lead to unstable results, that are difficult to integrate with existing prior
knowledge. It will not come as a surprise, that we think that both approaches have
their value. If there is strong and reliable prior knowledge, then it must be
incorporated in the data analysis, because it will make the results more stable and
easier to interpret. But if this prior knowledge is lacking, it must not be invented
just for the purpose of being able to use standard statistical methodology. And,
certainly, we must not make assumptions which we know to be not even
approximately true. Finally there are many situations in which good statistical
procedures can in principle be applied, on the basis of firm prior knowledge, but in
which there simply are not enough data to make practical application possible. In
such situations a data analytical compromise is needed too.
There are some interesting problems in the application of various multivariate
analysis techniques to ecology. They have been admirably reviewed by Noy-Meir
and Whittaker (1978). We mention them briefly here, but we shall also encounter
them again in our more formal development below. The distinction between Rand
Q techniques has been discussed extensively by psychometricians such as Cattell and
Stephenson. It is based on the fact that we think as the species as ordering the
samples, but also as the samples as ordering the species. In a given data structure we
have to decide what the variables are, and what the units are on which the variables
are defined. Sometimes the choice is clear and unambiguous, sometimes the
situation is more complicated. As a second problem Noy-Meir and Whittaker
mention data transformation and the choice of similarity measures. We could
generalize this somewhat to the problem of data definition and expression. This has
as special cases the choice of centering and standarization, but also taking logarithms
or using any of the other reexpression techniques discussed by Legendre and
Legendre (1983, p. 11-18). The nonlinear multivariate techniques explained in our
paper take a radical point of view, by assuming that the expression of the variable in
the data matrix is essentially conventional, merely a coding. Thus the reexpression
problem does not have to be solved before the technique is applied, but it is an
161

important part of our multivariate techniques to find appropriate reexpressions.


The third problem is the distinction between the discrete and the continuous, or
between ordination and classification. This has also been discussed extensively in
the psychometric multidimensional scaling literature. Compare Carroll and Arabie
(1980), De Leeuw and Heiser (1982). In this paper we take the point of view that
continuous representation, if applied carefully, will often show discontinuities in the
data. Assuming discontinuity right away, and applying a classification or cluster
method, in many cases imposes too much a priori structure. A final problem
mentioned by Noy-Meir and Whittaker is that of non-linearity and axes
interpretation. This is perhaps especially relevant in connection with the component
analysis or correspondence analysis of abundance matrices, in which we invariably
find the horseshoe orGuttman effect (Heiser 1986). Again the nonlinear
multivariate analysis techniques discussed below take a radical stand in this
problem. Nonlinearities due to the coding of the variables are avoided by finding
optimal transformations, and nonlinearities that occur in the representation can be
eliminated by imposing restrictions on the representation, somewhat as in detrended
correspondence analysis (Hill and Gauch 1980).
Noy-Meir and Whittaker come to the following conclusion in their useful
review paper. "After twenty-five years of development of continuous multivariate
techniques in ecology, some of the early optimistic promises, as well as some of the
skeptical criticisms, seem to have been overstated" (Noy-Meir and Whittaker 1978,
p. 329). The nonlinear multivariate data analysis techniques developed in this paper
may contribute additional useful procedures and possibilities. But they must be seen
in the proper perspective. If there is strong prior knowledge, either of a structural
or of a probabilistic nature, then it must be incorporated in the analysis. Sometimes
our techniques have options which make it possible to build in suitable restrictions,
but if the information is very specific, then one must switch to a specific technique.
If it is known that species distributions are Gaussian, then one should use Gaussian
ordination, and not correspondence analysis. Our techniques are most useful in the
areas in which there is not much prior knowledge, or in which the ratio of amount
of data to amount of theory is large.

MULTIV ARIABLES

We start our formal developments in this paper by providing some definitions.


In multivariate analysis we always study a number of variables, defined on a set of
objects. More precisely, a variable is a function. Legendre and Legendre use a
slightly different terminology. "Any ecological study, classical as well as numerical,
162

is based on descriptors. In this text the term descriptor will be used for the
attributes, variables, or characters (also called items in the social sciences) that
describe or compare the objects of the study. The objects that the ecologists
compare are the samples, locations, quadrats, observations, sampling units or
subjects which are defined a priori by the sampling design, before making the
observations." (Legendre and Legendre 1983, p. 8). For variables we use the
familiar notation cp : n -> r. Here n is the domain of the variable, consisting of the
objects, and r is its target, containing the possible values of the variable. Elements
of the target are also called the categories of a variable. A variable cp associates with
each 00 £ n a category CP(oo) £ r. In practical applications and in actual data analysis
the domain n will be a finite set {OOl>""OOn }. For theoretical purposes the domain
can be infinite. If n is a probability space, for instance, and cp is measurable, then
our variable is a random variable. Targets can be finite or infinite as well. In many
cases the target is the reals or the integers, i.e. r = lR = ]-00,+00[, or r = N =
{0,1,2, .... }. But it is also possible that r = {short grass, short grass Ithicket, tall
grass with thicket} or r = {close, moderate, distant}.
Table 1.5 from Legendre and Legendre (1983, p. 9), that we copy here, shows
the types of targets we can expect to encounter. Most of the terminology will
probably be clear, but we refer to Legendre and Legendre (1983, p. 10-11) for
further explanation.

Descriptor type EXariiples

Binary (two states, presence-absence) species present or absent


Multi-state (many states)
I nonordered (qualitative, nominal, attributes) geological group
I ordered
I I semi-quantitative (rank-ordered, ordinal) importance or abundance scores
I I quantitative (measurement)
I I I discontinuous (meristic, discrete) equidistant abundance classes
I I I continuous temperature, length

Most of the techniques of multivariate analysis have been developed for


continuous variables such as temperature and length. As shown by Gittins (1985),
for example, nonnumerical multi-state variables can be incorporated in some
techniques. In analysis of variance, for example, the design matrices consist of
dummies, which are codings of nonordered multi-state variables. In discriminant
analysis a similar dummy is used to code class membership. It remains true,
however, that the models of classical continuous multivariate analysis are entirely in
163

terms of multinormal variables. Dummies are used only as coding devices, to


indicate that objects are sampled from different populations. In nonlinear
multivariate analysis as discussed in this paper we use dummies and coding in a
much more constructive way. A good starting point is the following quotation.
"Coding is a technique by which raw data can be transformed into other values that
can then be used in the analysis. All types of descriptors can be recoded but
non-numerical descriptors must be coded before they can be analyzed numerically."
(Legendre and Legendre 1983, p. 10). The coding problem is thus related to the
reexpression problem discussed above. If variables are numerical we often use
trans/ormation, if they are non-numerical we use quantification, but in all cases the
coding we use is a real-valued function on the target set of the variable. Real-valued
codings of non-numerical variables are often called scalings. Coding in many cases
is dictated by conventional considerations. Thus {close,moderate,distant} is often
coded as {1,2,3}, but in nonlinear multivariate analysis we look specifically for
codings (or transformations, or quantifications, or scalings) which are optimal in a
well-defined sense.
In multivariate analysis we analyze several variables at the same time. This
requires some additional terminology. A multivariable is a set of variables with a
common domain. We use the notation <ll = {CPt I t £ T}, where CPt : n -> r t, and where
T is the index set of the multivariable. Thus the variables in <ll have the common
domain n, but they have possibly different targets r t. Multivariate analysis studies
the structure of multi variables.
A simple example is perhaps useful here. We have taken some classical data of
Mayr (1932). The domain of the five variables in this example consists of twelve
races of the bird Pachycephala pectoralis. Specifically n = {dahli, chlorura,
vitiensis, bougainvillei, torquata, melanota, melanoptera, sanfordi, ornata, bella,
optata, graeffii}. Variable 1 is called THROAT and maps n into rl =
{yellow,white}. Variable 2 is called BREAST BAND and maps n into r2 =
{present,absent}. COLOR OF BACK maps n into r3 = {olive,black}, FOREHEAD
maps into r 4 = {yellow,black}. Variable 5, finally, is called WING, and maps the
races into r5 = {colored,black}. All variables are binary. The multivariable is
defined by the following table, in which we have numbered races and variables, and
in which we have used simple abbreviations for the values of the variables. Thus,
for instance, CP3(bella) = olive and CP2(sanfordi) = absent.
164

1 2 3 4 5
01 WPOB C
02 W P 0 B C
03 W P 0 B C
04 Y P 0 B C
05 Y P 0 B C
06 Y P B B B
07 Y P 0 B B
08 Y A 0 B C
09 W P B B C
10 Y P 0 Y C Table 1.
11 Y P 0 Y C Bird data from Mayr.
12 Y A 0 Y C

The example can also be used to illustrate interactive coding of variables.


COLOR OF BACK and FOREHEAD are binary variables with targets,
respectively, {olive,black} and {yellow,black}. Using them we can create the
interactive variable COLOR OF BACK x FOREHEAD with target {(olive,yellow),
(olive,black), (black,yellow), (black,black)}. In general if we have m variables with
kl> ... ,km categories, i.e. a total of kl + ... + km categories, then we can create an
interactive variable with kl x ... x km categories. We can also make interactive
codings for all pairs of variables, this gives us a total of C(kl ,2) + ... + C(km ,2)
categories. Here C(k,r) is used for binomial coefficients. Thus there are many
possibilities of coding a given set of variables.
The example above is quite straightforward, but it is not representative for a
typical ecological data set. More representative examples are given, for example, in
appendix A2 of Gittins (1985). The limestone grassland community example,
discussed by Gittins in his chapter 7, defines eight estimates of species abundance
and six soil variables on a random sample of 45 stands, each of lOx 10 meter. Each
stand was divided into 5000 units of lOx 20 em, and species abundance is defined as
the percentage of these units in which the species occurred. It is clear that the most
natural object in this experiment is the lOx 20 cm unit, i.e. there are 45 x 5000 =
225000 such units. The eight species define binary variables on these units, with
target {present,absent}. There is a variable called STAND, which takes 45 different
values, and there are six soil variables, which have the property that units within the
same stand get the same soil value on all six of them. We can also follow Gittins and
use the stand as the fundamental unit. This process is called aggregation, because it
involves aggregating the 5000 original units in a single stand. This aggregation
process makes it possible to treat the abundancies as numerical variables, taking
values between 0% and 100%. The example shows that the choice of unit is
165

sometimes debatable.
The next example is also representative, but a bit more problematical. It is taken
from Legendre and Legendre (1983, p. 191). Five ponds are characterized by the
abundances of different species of zooplankton, given on a scale of relative
abundance varying from 0 to 5. It is clear that this matrix is also based on
aggregation, of the same sort as in the Gittins example. But we can also use it to
illustrate transposition, or the choice between Q and R. In this example we can take
the species as units, and the ponds as variables. Each pond maps the eight species into
the target {0,1,2,3,4,5}. It is also possible to interpret the ponds as units and the
species as variables, again with the same target {0,1,2,3,4,5}. We can also treat the
example as bivariate. The grand-total of the data matrix is 52. These 52 'abundance
credits' are used as the units, and the two variables are SPECIES and PONDS. Thus
there are three credits with species-value 1 and pond-value 212, and four credits
with species-value 5 and pond-value 214, and so on. The data matrix is, in this
interpretation, the cross table of the two variables. And finally we can use the 40
ponds and species combinations as units, and interpret our results as measurements
on a variable that maps these 40 combinations into {0,1,2,3,4,5}. Two other
variables can be defined on these units. The first one is POND, with five variables in
its target, and the second one is SPECIES, with eight values. In this last
interpretation there are consequently 40 units, and three variables. There are no
clear a priori reasons for preferring one interpretation over the other. The choice
must be made by the investigator, in combination with the choice of the data analysis
technique.

Ponds
Species
212 214 233 431 432
1 3 3 0 0 0
2 0 0 2 2 0
3 0 2 3 0 2
4 0 0 4 3 3
5 4 4 0 0 0
6 0 2 0 3 3 Table 2.
7 0 0 0 1 2 Zooplankton data
of Legendre.
8 3 3 0 0 0
166

FUNCTIONS OF CORRELATION MATRICES

In this paper we shall mainly discuss multivariate techniques which compute


statistics depending on the second order moments and product moments of the
variables, more specifically on their correlation coefficients. This implies,
obviously, that the higher order moments of the distributions of the variables are
irrelevant for the techniques we discuss. Thus the loglinear methods for frequency
tables, for example, are not covered by the developments in this paper. On the other
hand our techniques also do not depend on first order moments, i.e. on the means of
the variables. This means that we can suppose, without loss of generality, that all
variables we deal with are in deviations from the mean. We are not interested in the
structure of the means, although our development of discriminant analysis and
analysis of variance will show that in some cases means can be reintroduced by the
use of dummy variables. Because our methods depend only on the correlation
coefficients, this moreover means that they are scalejree. The unit of the variables
and consequently their variances are irrelevant. All variables can be assumed to be
standardized to unit variance. It is one of the purposes of this paper to show that this
somewhat limited class of multivariate techniques still has many interesting special
cases.
Now this description of the class of techniques we are interested in is somewhat
problematical. We can compute correlations only between variables which are
numerical, so either we must limit our attention to measured variables, or we must
compute correlations between non-numerical variables which are coded
numerically. And if we use coding of non-numerical variables, and then compute
correlations, then it is clear that the correlations will depend on the particular
coding or scaling that we have chosen. And, in fact, something similar is also true
for measured variables. Instead of using abundance or yield, for instance, we could
also use log-abundance or log-yield, which would give different correlations. We
introduce some notation to describe this scaling or transformation of the variables.
Remember that we started with a multi variable <l> = {«I>t I t E T}, where «I>t : Q -> r t .
A scaling (or quantification, or transformation) of the targets of this multi variable
is a system '¥ = {'Vt I t E T}, where 'Vt : r t -> IR. The values of a scaling are often
called the category quantifications of a variable (or the transformed values). A
scaling of the targets induces a quantification A of the multivariable by the simple
rule A = {At It E T}, where At is the composite 'Vt 0 «I>t: n -> JR. This is illustrated in
Figure 1. Write R(A) for the correlation matrix induced by the scaling of the
variables.
167

q,
variable
n .. r domain -----~.. ~ target

\/ R
qUantifi~
v"",ble '\ /-~moo
reals

Figure 1.
Quantification diagram.

Time to switch to an example. In the first three columns of Table 3 the


zooplankton data of Legendre and Legendre are coded as 40 observations on the
three variables SPECIES, POND, and ABUNDANCE. We use integer coding, or
category numbers. Observe that SPECIES and POND are uncorrelated, because the
design is balanced. Only the correlations of SPECIES and POND with
ABUNDANCE depend on the scaling of the variables we have chosen. With integer
coding the correlation between SPECIES and ABUNDANCE is -.01, and the
correlation between POND and ABUNDANCE is -.06. Now suppose that we use a
form of scaling which is sometimes called criterion scaling. This means that we use
integer coding for ABUNDANCE, but both for SPECIES and for POND we choose
the average ABUNDANCE values of a species or pond as the quantifications. The
SPECIES - ABUNDANCE correlation increases to .29, and the POND -
ABUNDANCE correlation to .16. The proportion of variance of ABUNDANCE
'explained' by SPECIES and POND is .1082.
We shall discuss other criteria and other solutions below, but first we have to
develop some notation and terminology which make it possible to discuss the
optimal scaling problem in general. The general approach and the notational system
are due, in some specific cases, to Fisher (1941) and to Guttman (1941). A more
comprehensive approach to nonlinear multivariate analysis along these lines
originated with Guttman (1959) and De Leeuw (1973). The specific notational
system and terminology we use in this paper are due to Gifi (1981), also compare
De Leeuw (1984a).
168

Table 3. Category numbers and indicators for Legendre zooplankton data.


--------------------------------------------
VARS INDICATOR CODINGS

S PA SPECIES POND ABUNDANCE

1 1 3 10000000 10000 000100


1 2 3 10000000 01000 000100
1 3 0 10000000 00100 100000
1 4 0 10000000 00010 100000
1 5 0 10000000 00001 100000
2 1 0 01000000 10000 100000
2 2 0 01000000 01000 100000
2 3 2 01000000 00100 001000
2 4 2 01000000 00010 001000
2 5 0 01000000 00001 100000
3 1 0 00100000 10000 100000
3 2 2 00100000 01000 001000
3 3 3 00100000 00100 000100
3 4 0 00100000 00010 100000
3 5 2 00100000 00001 001000
4 1 0 00010000 10000 100000
4 2 0 00010000 01000 100000
4 3 4 00010000 00100 000010
4 4 3 00010000 00010 000100
4 5 3 00010000 00001 000100
5 1 4 00001000 10000 000010
5 2 4 00001000 01000 000010
5 3 0 00001000 00100 100000
5 4 0 00001000 00010 100000
5 5 0 00001000 00001 100000
6 1 0 00000100 10000 100000
6 2 2 00000100 01000 001000
6 3 0 00000100 00100 100000
6 4 3 00000100 00010 000100
6 5 3 00000100 00001 000100
7 1 0 00000010 10000 100000
7 2 0 00000010 01000 100000
7 3 0 00000010 00100 100000
7 4 1 00000010 00010 010000
7 5 2 00000010 00001 001000
8 1 3 00000001 10000 000100
8 2 3 00000001 01000 000100
8 3 0 00000001 00100 100000
8 4 0 00000001 00010 100000
8 5 0 00000001 00001 100000
169

INDICA TOR MATRICES AND QUANTIFICATION

Let us look at the second part of Table 3. This contains the same information as
the first three columns, but coded differently. In the terminology of De Leeuw
(1973) we call the codings of the variables indicator matrices, but in other contexts
they are also called dummies. One interpretation is that SPECIES, for instance, is
now coded as a set of eight different binary variables. The total number of
variables, in this interpretation, is now equal to 19, which is the total number of
categories of SPECIES, POND, and ABUNDANCE. The important property of
indicator matrices, for our purposes, is that each possible quantification of the
variables is a linear combination of the columns of the indicator matrix of that
variable. Or, if there are n objects, we can say that the columns of the indicator
matrix form a basis for the subspace oflRn defined by the quantifications of the
variable. The columns span the space of possible quantifications.
Suppose G t is the indicator matrix of variable t. Assume that there are n objects
and that variable t has kt categories. Then G t has n rows and kt columns. The matrix
D t = Gt'G t is diagonal, i.e. the columns of G t are orthogonal (the categories of a
variable are exclusive). And the rows of G t sum to unity (the categories are
exhaustive). A quantification "'t of the categories maps the keelement set r t into the
reals, and is thus a kt-element vector. Write it as Yt. Then At, the quantified variable,
is given by the product qt = GtYt. Given vectors Yt of category quantifications we
can construct quantified variables, and given quantified variables we can construct
the correlation matrix R(A). We limit our attention to normalized quantifications. If
u is used for a vector with all elements equal to +1, the number of elements of u
depending on the context, then we want u'qt = u'GtYt = u'DtYt = 0 and qt'qt =
Yt'DtYt = n. If sand t are two variables, with corresponding indicators and
normalized quantifications, then the correlation between the quantified variables is
given by rst = n- 1 Ys'CstYt, where Cst =dfGs'Gt is the cross-table of variables sand
t. Observe that D t = C tt . Our formulation of the quantification problem in terms of
vectors and matrices shows that the correlations rst are functions of the bivariate
frequencies, collected in the cross-tables Cst, and the category quantifications Yt.
For a given problem, i.e. a given coding of a fixed data set, the Cst are constant and
known, but varying the Yt will give varying correlation coefficients. The
comparison of integer scaling and criterion scaling in the previous section was a
first example of this.

SOME COMMON CRITERIA FOR OPTIMAL SCALING


170

We now take a further step. The correlations vary with the choice of the
quantifications, and consequently all statistics depending on the correlations will
also vary. Suppose K(R(A» is such a (real-valued) statistic, interpreted as a function
of the sealings. We are interested in the variation of this statistic, and in many cases
in the largest and/or smallest possible value, under choice of quantifications. It is
possible, for instance, to look for the quantifications of the variables which
maximize or minimize a specific correlation. Or, if we have a number of predictors
and a single variable which must be predicted, we can choose scalings for optimal
prediction, i.e. with maximum multiple correlation coefficient. If the purpose of the
multivariate technique is ordination or some other form of dimension reduction,
then we can choose quantifications in such a way that a maximum amount of
dimension reduction is possible. In a principal components context this could mean
that we maximize the largest eigenvalue, or the sum of the p largest eigenvalues, of
the correlation matrix R(A). In fact we can look through the books on linear
multivariate analysis and find many other criteria that are used to evaluate results of
multivariate techniques. There are canonical correlations, likelihood ratio criteria
in terms of determinants, largest root criteria, variance ratio's, and so on. For each
of these criteria we can study their variation under choice of quantifications, and we
can look for the quantifications that make them as large (or as small) as possible.
Before we give some examples, we briefly discuss the mathematical structure of
such optimal scaling problems. If we restrict ourselves to the case of n units of
observation, coded with indicator matrices, then the stationary equations for an
extreme value of criterion K over normalized quantifications are

where 1ts t = alClarst. This assumes, obviously, that the partial derivatives exist.
Consequently we restrict our attention to criteria that are differentiable functions of
the correlation coefficients. The stationary equations suggest the algorithm

For s=1 to m:
AI: compute G:Jls = l1:~s 1tstGtYt,
A2: compute ys = Ds-IGs ''h,
A3: compute ys by normalizing ys'
next s.

Observe that the algorithm can be used for any criterion K. The criterion influences
the algorithm only through the form of the partial derivatives 1t s t. It is not
guaranteed that it works, i.e. converges, for all criteria. A detailed mathematical
171

analysis is given by De Leeuw (1986), who shows that the algorithm does indeed
work for some of the more usual criteria used in nonlinear multivariate analysis,
such as the ones we have mentioned above.
Let us now look at an example. If we want to apply optimal scaling to the
example of Mayr, in Table 1, then we get into trouble. Because all variables are
binary, the possible scalings are completely determined by the normalization
conditions. For binary variables, there is only one possible scaling, and in that sense
they are are the same as numerical variables. We could create variables with more
than two categories by using interactive coding, but the example is so small and
delicate that this would probably not be worthwhile.
We thus apply the algorithm, with various different criteria, to the zooplankton
example. The results are collected in Table 4. Column A contains the criterion
scaling technique mentioned in the previous section. We use integer scaling for
ABUNDANCE, and scale POND and SPECIES by

Table 4. Various optimal scalings for the zooplankton data


A: criterion scaling: A integer, maximize r(S,A) + r(p,A).
B: maximize r(S,A).
C: maximize r(p,A).
D: maximize r(S,A) + r(P,A).
E: abundance credits solution.
-------------------------------------
ANALYSIS A B C D E

SPECIES -0.24 -0.20 +0.13 +1.24


-1.18 -1.00 -0.71 -1.06
+0.24 -0.90 -0.72 -0.45
+1.65 +1.12 +1.01 -1.02
+0.71 +1.84 +1.91 +1.24
+0.71 -0.30 -0.29 -0.52
-1.65 -1.16 -1.46 -0.98
+0.24 +0.20 +0.14 +1.24

POND -0.22 +0.70 +1.22 +1.38


+1.94 +0.54 +0.03 +0.83
-0.75 +0.74 +0.75 -0.94
-0.75 -1.91 -1.54 -0.96
-0.21 -0.07 -0.70 -0.84

ABUNDANCE -0.88 +0.01 +0.12 +0.09


-0.20 -1.97 -5.37 -3.34
+0.48 -1.49 -0.11 -1.16
+1.16 +0.29 -0.23 +0.07
+1.84 +2.71 +1.85 +2.62

maximizing the sum of the correlations between ABUNDANCE and POND and
172

SPECIES. The quantifications are given in Table 4, for the correlations we find
r(S,A) = .29 and r(P,A) = .16. In column B we maximize the correlation r(S,A) by
scaling both SPECIES and ABUNDANCE. Of course this gives no quantification
for POND. The optimal correlation is r(S,A) = .59. In column C the same is done
for r(P,A), which can be increased to .36. Column D is more interesting. It
optimizes r(S,A) + r(P,A) over all three quantifications. This gives r(S,A) = .58 and
r(P,A) = .33. In this solution 44% of the variance in (scaled) ABUNDANCE is
'explained' by (scaled) SPECIES and POND.
We shall make no attempt to give an ecological interpretation of the scalings
found by the techniques. The example is meant only for illustrative purposes. It
seems, by comparing columns B, C, and D, that the optimal transformations are not
very stable over choice of criterion, which is perhaps not surprising in such a small
example. The optimal correlations are much more stable. So is the fact that the
categories of ABUNDANCE are scaled in the correct order, except for the zero
category which moves to the middle of the abundance scale.
Column E in Table 4 is quite different from the others. This is because it
interprets the data as a single bivariate distribution, with 52 'abundance credits' as
the units. If we now scale SPECIES and POND optimally, maximizing the
correlation in the bivariate distribution, then we find the quantifications in column
E, and the optimal correlation equal to .89. Again we give no interpretation, but we
point out that the solution in column E can be used to reorder the rows and columns
of Table 2 by using the order of the optimal quantifications. In this reordered
version of the table the elements are nicely grouped along the diagonal. For more
information about such optimal ordering aspects of nonlinear multivariate analysis
techniques we refer to Heiser (1986).
In the book by Gifi (1981) special attention is paid to a particular class of
criteria, that could be called generalized canonical analysis criteria. Also compare
Van der Burg, De Leeuw, and Verdegaal (1984, 1986) for an extensive analysis of
these criteria, plus a description of alternating least squares methods for optimizing
them. In generalized canonical analysis the variables are partitioned into sets of
variables. In ordinary canonical correlation analysis (Gittins 1985) there are only
two sets. In some of the special cases of ordinary canonical analysis, such as multiple
regression analysis and discriminant analysis, the second set contains only a single
variable. In principal component analysis the number of sets is equal to the number
of variables, i.e. each set contains a single variable. The partitioning of the variables
into sets induces a partitioning of the dispersion matrix of the variables into
dispersion matrices within sets and dispersion matrices between sets. Suppose S is
the dispersion matrix of all variables, and T is the direct sum of the within-set
dispersions, i.e. T is a block-matrix with on the diagonal the within-set dispersions,
173

and outside the diagonal blocks of zeroes. In ordinary canonical correlation analysis
T consists of two blocks along the diagonal that are nonzero, and two zero blocks
outside the diagonal. In principal component analysis T is the diagonal matrix of the
variances of the variables. Van der Burg et al. (1984, 1986) define the generalized
canonical correlations as the eigenvalues of m- 1T-l S, where m is the number of
sets. In principal component analysis the generalized canonical correlations are the
eigenvalues of the correlation matrix, in ordinary canonical analysis they are
linearly related to the usual canonical correlation coefficients. Gifi (1981)
concentrates on techniques that maximize the sum of the p largest generalized
canonical correlation coefficients. These are, of course, functions of the correlation
coefficients between the variables. This means that we are dealing with a special case
of the previous set-up. But this special case is exceedingly important, because the
usual linear multivariate analysis techniques are all forms of generalized canonical
analysis.

MEASUREMENT LEVEL

In the examples we have discussed so far only two possible scalings of the
variables were mentioned. Either the quantification of the categories is known,
which is the case for measured or numerical variables, or the quantification is
completely unknown, and must be found by optimizing the value of the criterion.
Binary variables are special, because the quantification is unknown, but irrelevant.
The two cases 'completely known' and 'completely unknown' are too extreme in
many applications. We may be reasonably sure, for example, that the
transformation we are looking for is monotonic with the original ordering of the
target, which must be an ordered set in this case. Or we may decide that we are not
really interested in nonmonotonic transformations, because they would involve a
shift of meaning in the interpretation of the variable. If we predict optimally
transformed yield, for instance, and the optimal transformation has a parabolic
form, then we could say that we do not predict 'yield' but 'departure from average
yield'. In such cases it may make sense to restrict the transformation to be
increasing. The zooplankton example has shown that often monotonicities in the
data appear even when we do not explicitly impose monotonicy restrictions.
It is one of the major advantages of our algorithm that it generalizes very easily
to optimal scaling with ordinal or monotonic restrictions. It suffices to insert a
monotone regression operator MR(.) in step A2. Thus

For s=1 to m:
174

AI: compute qs = ~;ts 1tstG tYt,


A2: compute Ys = MR(Ds-IGs'cts),
A3: compute Ys by nonnalizing Ys'
next s.

We do not explain monotone regression here, but we refer to Kruskal (1964) or


Gifi (1981) for details. The basic property we need is that monotone regression does
indeed give monotone quantifications, and that it gives the optimum from the set of
all such quantifications in each stage.
By this modification of the algorithm we can now analyze at least three types of
variables. If we use the MR(.)-operator in A2 we impose monotonicity restrictions,
and consequently analyze ordinal variables. If we use the LR(.)-operator, which
performs a linear regression of the original values, then we analyze numerical
variables. And if we use IR(.), the identity operator, then we analyze nominal
variables. In the Legendre and Legendre scheme, discussed earlier, this
corresponds with (multi-state) ordered and nonordered variables, while the
numerical variables are called quantitative. It is now relative easy to think of other
operators which can be used in A2. A very familiar one is PR(.), or polynomial
regression, which fits the optimal polynomial of specified degree. Another one,
which is somewhat less familiar, but definitely more useful is SR(.), spline
regression. Splines will be discussed briefly below. As a final example we mention
SM(.), the linear smoother used by Breiman and Friedman (1985) in their
ACE-method. The ACE-methods are nonlinear multivariate analysis methods
which show great promise, but we do not have enough experience with them to
discuss them in any detail. We can also combine mono tonicity with the spline or
polynomial constraints, and look for the optimal monotone spline or polynomial.
In order to illustrate these new concepts it is, perhaps, time to analyze a
somewhat larger example. We have chosen the nitrogen nutrition example from
Gittins (1985, chapter 11). Eight species of grass were given nitrogen treatments of
1, 9, 27, 81, and 243 ppm N by varying the amounts of NaN03 in a culture solution.
Individuals of each species were grown separately in pots under sand culture in an
unheated greenhouse using a split-plot experimental design. There were 5 blocks of
replications of the complete experiment, and consequently 5 x 5 x 8 = 200 individual
pots, which are the natural units in this case. The logarithm of the dry weight yield
after a growth period of two months is the outcome variable for this experiment.
We do not repeat the data here, but we refer the interested reader to Gittins (1985,
appendix A2).
From the point of view of data analysis the most interesting problem seems to be
to predict the yield from the knowledge of the species and the nitrogen treatment.
175

The situation is in some respects quite similar to the zooplankton example, because
there we also has two orthogonal variables SPECIES and POND that were used to
predict ABUNDANCE. The nature of the variables is quite different, however, in
this larger example. SPECIES is a nominal (or multi-state unordered) variable,
and NITRO, the amount of nitrogen, is a numerical (or measured) variable. But
NITRO takes on only the five discrete values 1, 9, 27, 81, and 243, and in this
respects it differs from the numerical variable YIELD, which can in principle take
on a continuum of possible values. In the Legendre and Legendre classification
NITRO is discontinuous quantitative, while YIELD is continuous quantitative. This
implies that the indicator matrix for YIELD is not very useful. Because of the
continuity of the variable each value will occur only once, and the indicator matrix
will be a permutation matrix, with the number of categories equal to the number of
observations. This will make it possible to predict any quantification of YIELD
exactly and trivially, and thus the result of our optimal scaling will be arbitrary and
not informative. If we want to apply indicator matrices to continuous variables, then
we have to group their values into intervals, that is we have to discreticize them.
Discreticizing can be done in many different ways, and consequently has some
degree of arbitrariness associated with it. Moreover if we plot the orginal variable
against the optimal quantified variable, then we always find a step function, because
by definition data values in the same interval of the discretization get the same
quantified value. Step functions are not very nice representations of continuous
functions. It is very difficult to recognize the shape of a function from its step
function approximation. On the other hand polynomials are far too rigid for
satisfactory approximation. This is the main reason for using splines in nonlinear
multivariate analysis. In order to define a spline we must first choose a number of
knots on the real line, which have a similar function as the discretization points for
step functions. We then fix the degree p of the spline. Given the knots and the degree
a spline is any function which is a polynomial of degree p between knots, and which
has continuous derivatives of degree p - 1 at the knots. Thus a spline can be a
different polynomial in each interval, but not arbitrarily different because of the
smoothness constraints at the knots, i.e. the endpoints of the intervals. For p = 0 this
means that the splines are identical with the step functions, that have steps at each of
the knots. For p = 1 splines are piecewise linear, and the pieces are joined
continuously at the knots. For p = 2 splines are piecewise quadratic, and
continuously differentiable at the knots, and so on. Thus step functions are special
splines. If we choose the knots in such a way that all data values are in one interval,
then we see that polynomials are also special cases. Thus SR(.) has step functions
and polynomials as a special case, and MSR(.), which is monotone spline
regression, includes ordinary monotone regression and monotone polynomials.
176

We now apply spline regression to the nitrogen example. The transformation


for YIELD is restricted to be a piecewise linear spline, with knots at 0, .25, ... ,
2.25. Transformations for SPECIES and NITRO are not restricted. If we use
integer coding for SPECIES, the values 1, 9, 27, 81, 243 for NITRO, and the
original data values for YIELD, we find r(S,Y) = -.47 and r(N,Y) = .42. The
squared multiple correlation (SMC) is .3960. With optimal transformation, as
specified above, we find and SMC of .7816. The optimal transformation of
SPECIES is
(-1.55 -1.31 -0.82 +0.91 +0.78 +0.71 +1.10 +0.18),
and that of NITRO is
(+1.88 -0.06 -0.14 -0.77 -0.92).
Observe that the NITRO scaling is monotonic, but not at all linear. The
transformation for YIELD is plotted in Figure 2a. We see that it is roughly
monotonic, except for eight pots with small values of yield (less than .50). In fact it
is close to linear: the correlation between original and transformed values is -.9694.
An inspection of the data, and of the analysis of Gittins in his chapter 11, shows that
it is perhaps not entirely reasonable to use the same NITRO transformation for each
species. Species 1, 2, and 3 have very similar behaviour, and average YIELD values
are nicely monotonic with NITRO, but the other species react much less clearly to
the nitrogen treatments. For this reason we have repeated the analysis with two
variables. The first one is an interactive combination of SPECIES and NITRO, with
40 categories, and the second one is YIELD. Quantifications of SPECIES x NITRO
are derived from the indicator matrix, with 40 columns, and quantifications of
YIELD by using the same piecewise linear splines as before. The transformed
YIELD is in Figure 2b. It is still almost monotonic, but less linear than the previous
transformation. The correlation between observed and transformed values is down
to -.9094, the SMC is up to .9339. Figure 3 shows the quantification of SPECIES x
NITRO, plotted as eight separate transformations, one for each species. We clearly
see the difference between the first three species and the other ones, presumably a
difference in sensitivity to the nitrogen content. A clustering of species that suggests
itself is [{1,2,3 },{ 4,5,6},{7,8} J.

THE USE OF COPIES

By combining the various criteria with the various options for measurement
levels we get a very large number of multivariate analysis techniques. Nevertheless
there are some very common techniques, which are still not covered by our
developments. The major example is multiple correspondence analysis (also known
177

..-.... -
2.0
1.5
1.0
... •

0.5
0.0
en
-0.5 Q)
;::)
ca>

" ........ ....


-1.0
"'d
-1.5 ~
-2.0 .8

-2.5
en
g
original data
-3.0
0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25
Figure 2a. Yield transform, additive model.

2.0~------------------------------------------~

1.5
1.0
0.5
... . ..- .-----~

0.0
-0.5

""

-1.0
-1.5
-2.0 ""'.
-2.5 original data
-3.0 -f-----.------r-----.-----,.-----r-----.------.r------,.--.......j
0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25

Figure 2b. Yield transform, interactive model.


178

6
5

category numbers

Figure 3.
Nitrogen data
Optimal NITRO transformations
for eight species.
2
179

as homogeneity analysis, or Guttman's principal components of scale analysis ). For


the details and history of this technique we refer to Nishisato (1980, chapter 5), Gifi
(1981, chapter 3), Lebart, Morineaux, and Warwick (1984, chapter 4), and
Greenacre (1984, chapter 5). In ecology multiple correspondence analysis was
already discussed by Hill (1973, 1974), and it is closely related to the popular
ordination method called reciprocal averaging. We derive the technique here as a
form of generalized canonical analysis.
First suppose that we want to find quantifications or transformations of the
variables in such a way that the largest eigenvalue of the correlation matrix (i.e. the
percentage of variance 'explained' by the first dimension) is maximized. We
illustrate this with the zooplankton example, using the ponds as variables ordering
the eight species. As indicated by Hill (1974) this amounts to solving the eigenvalue
problem

Cx = mJ.1Dx.

Here C is the supermatrix containing all cross tables Cst. This optimal scaling
problem was originally formulated and solved by Guttman (1941). Matrix C is
called the Burt table in the French correspondence analysis literature. Matrix D is
the diagonal of C, and m is the number of variables. The category quantifications Yt
are found by normalizing the m subvectors of the eigenvector x corresponding with
the dominant nontrivial eigenvalue. In the zooplankton example C is of order 25,
because there are five variables with five categories each. The largest eigenvalue,
which was 3.41 with integer scaling, goes up to 3.70 with optimal scaling. The
percentage variance 'explained' goes from 68% to 74%. Table Sa gives the optimal
quantifications for the five variables. They are quite regular and close to
monotonic, but distinctly nonlinear.
There are now at least three ways in which the problem can be made
multidimensional. In the first place we can compute the induced correlation matrix
R, and find its subsequent eigenvalues and eigenvectors as in ordinary metric
component analysis. This is straightforward. In the second place we can change the
criterion to a multidimensional one. Thus we can maximize the sum of the first two,
or the sum of the first three eigenvalues of the correlation matrix. In general this
will give different correlation matrices, and different eigenvalue distributions. We
illustrate this for the sum of the first two eigenvalues in the zooplankton example. In
the previous solution, which optimized the largest eigenvalue, the first two
eigenvalues 'explained' 74% and 14%. If we optimize the sum of the two largest
eigenvalues we find 'explained' variances of 56% and 44%. The optimal
quantifications in Table 5b make the transformed data matrix exactly of rank two.
180

In order to obtain this perfect fit, the technique transforms variables 3 and 4 in a
somewhat peculiar way.
The third way of finding a multidimensional solution is quite different. It simply
computes additional eigenvalues and eigenvectors of the pair (C,mD). This defines
multiple correspondence analysis. The technique was introduced in psychome~ics
by Guttman and Burt (Guttman 1941, 1950, 1953, Burt 1950, 1953). Each
eigenvector now defines a vector of category quantifications, which induces a
correlation matrix. In Table 5c, for example, we give the quantifications
corresponding with the second eigenvalue of (C,mD), which is 2.55. The
correlation matrix that goes with these quantifications has a dominant eigenvalue
'explaining' 51 % of the variance, and a subdominant one 'explaining' 35%. The
quantifications in Table 5c look peculiar. We could go on, of course, by using
additional eigenvalues of (C,mD).
If one thinks about this a little bit, then it is somewhat disconcerting. The
multiple

Table 5. Nonlinear Principal Component Analysis.


5a. Quantifications maximizing the largest eigenvalue.
5b. Quantifications maximizing the sum of the two
largest eigenvalues.
5c. Second dimension multiple correspondence analysis.

category 1 2 3 4 5

variable 1 .77 .00 .00 -1.29 -1.29


variable 2 .85 .00 .65 -1.29 -1.29
variable 3 -.69 .00 .67 .58 2.20
variable 4 -.96 .69 .51 1.33 .00
variable 5 -.96 .00 .57 1.35 .00

variable 1 -.77 .00 .00 1.29 1.29


variable 2 -.77 .00 -.77 1.29 1.29
variable 3 .38 .00 -2.64 .38 .38
variable 4 -.38 -.38 2.64 -.38 .00
variable 5 1.00 .00 -1.00 -1.00 .00

variable 1 .77 .00 .00 -1.29 -1.29


variable 2 1.07 .00 -1.51 -.07 -.07
variable 3 -.21 .00 2.41 -1.40 .03
variable 4 -.42 -.10 2.63 -.42 .00
variableS .98 .00 -1.29 -.60 .00

correspondence problem in general has 1: (kt - 1) nontrivial eigensolutions, which


181

give an equal number of induced correlation matrices. Applying ordinary metric


principal component analysis to each of these correlation matrices gives m times L
(kt - 1) dimensions. In the zooplankton example there are thus 5 x (4 + 4 + 4 + 4 + 4)
= 100 dimensions. This is a bit much. Gifi (1981) calls this data production, to
contrast it with the more common and more desirable concept data reduction.
Careful mathematical analysis (Gifi 1981, chapter 11, De Leeuw 1982, Bekker
1986) shows that in many cases there are mathematical relationships between the
different dimensions, so that they are not independent. This is probably familiar to
most ecologists as the horseshoe or Guttman effect, which makes the second
ordination dimension a curved function of the first one. Remember that Noy-Meir
and Whittaker (1978) already mentioned the curving of the dimensions as an
important problem for multivariate ordination, and that Hill and Gauch (1980)
consider this curvature problem the main shortcoming or correspondence analysis
as an ordination technique.
From the principal component point of view multiple correspondence analysis
does not solve an optimal scaling problem in the same sense as the other techniques
we have discussed. The eigen-equations for (C,mD) are the stationary equations for
finding the quantifications optimizing the largest eigenvalue, but additional
solutions of these stationary equations only define suboptimal stationary values for
this problem. The natural multidimensional generalization of nonlinear principal
component analysis is finding a single set of quantifications that maximizes the sum
of the first p eigenvalues, and for this problem there are no horseshoe-like
complications. On the other hand it is possible to interpret multiple correpondence
analysis as a form of generalized canonical analysis. If we think of each category as
a binary variable, while the original variables define sets of these binary variables,
then a generalized canonical analysis of these m sets is identical to multiple
correspondence analysis. With binary variables there is nothing to transform or
quantify, and thus we have an essentially linear technique applied to indicator
matrices.
A somewhat more satisfactory description is possible by introducing the notion
of copies (De Leeuw 1984a). This also means that we define sets of variables using
the original m variables, but now a variable is not split up into categories. If we are
interested in a two-dimensional solution, for instance, we take two copies of each
variable in each of the m sets. We then optimize the sum of the first two generalized
canonical correlations over quantifications. Thus a set consists of two identical
variables, identical in the sense that the functions <I> 1 and <1>2, mapping n into r 1 =
r2, are the same. Of course the quantifications '1'1 and '1'2 can be different, and
because the variables are in the same set they will generally be different at the
optimum of the criterium. In fact the two quantifications can without loss of
182

generality be chosen to be orthogonal, i.e. we can require Yl 'DY2 = O. Using p


copies of a variable to define m sets of p variables in this way defines multiple
correspondence analysis as a special case of generalized canonical correlation
analysis.
But this way of looking at things immediately suggests several useful
generalizations. In the first place we can use a different number of copies for
different variables. It is reasonable, in many cases, to use copies for unordered
multi-state nominal variables only, and to use a single copy for ordinal variables. In
the second place the notion of copies can be combined with the various measurement
levels we have discussed above. Thus we can require copies to be monotonic (in that
case they cannot also be required to be orthogonal), or we can require that some
copies are monotonic, while others are free. If there are two copies of a variable in a
set, we can require the first one to be linear, and the second one to be free. And so
on. This is again a decision about the coding of a variable. For each variable we have
to decide what measurement level we impose, and we also have to decide how many
copies of the variable we use. We do not illustrate the use of copies with our
zooplankton example, because the solution using the first multiple correspondence
analysis dimension (which optimizes the largest eigenvalue of the correlation
matrix) is already monotonic, and quite satisfactory. Using rather complicated
procedures on such a small example is bound to produce trivial and uninteresting
solutions, as the technique that maximizes the sum of the two largest eigenvalues
already shows.
The notion of copies is not limited to principal component analysis, i.e. to a
generalized canonical correlation problem with only one variable in each set. In
other forms of canonical analysis we can use copies as well. In fact we can even
decide to include copies of a variable in different sets. If we include a copy in each
set, then the largest generalized canonical correlation will be unity, and it will be
defined completely by this (quantified) variable. The remaining canonical variables
will be orthogonal to the first, i.e. to this quantified variable. Thus using a copy of a
variable in each set amounts to performing a partial canonical correlation analysis,
with the variables of which copies are used in the sets partialed out. Combining
partitioning into sets with the various measurement levels, and with the notion of
copies, gives an even richer class of techniques (De Leeuw 1984b).

SOME COMPUTER PROGRAMS

It is nice to have a number of principles and technical tools that can be used to
create very general nonlinear multivariate analysis techniques. But it is perhaps
183

even nicer to know that some of the possible options have already been combined
into various series of computer programs, and that these programs are readily
available. The ALSOS series of programs comprises programs for analysis of
variance, mUltiple regression, principal component analysis, factor analysis, and
multidimensional scaling. An overview is given by Young (1981). The GIFI series
has programs for correspondence analysis, multiple correspondence analysis,
principal component analysis, canonical correlation analysis, path analysis, and
multiple-set canonical analysis. Gifi (1981) has the necessary references. A relative
newcomer is the ACE series, discussed in Breiman and Friedman (1985). There are
programs for multiple regression, discriminant analysis, time series analysis, and
principal component analysis.
The three series of nonlinear multivariate analysis programs differ in many
respects, even if they really implement the same technique. The various possibilities
of choosing the regression operators differ, the algorithms differ, and the input and
output can also be quite different. But it is of course much more important to
emphasize what they have in common. All three series generalize existing linear
multivariate analysis techniques by combining them with the notion of optimal
scaling or transformation. Thus they make them more nonparametric and less
model-based, more exploratory and less confirmatory, more data analytic and less
inferential.

DISCUSSION AND CONCLUSION

We have introduced our nonlinear multivariate analysis techniques without


referring to any statistical model. As we briefly indicated in an earlier section our
derivations and ideas also apply directly to correlations defined in the population,
i.e. to the transformation or quantification of random variables. In the book by Gifi
(1981) many population models are discussed, and the behaviour of our techniques
when they are applied to random samples from such models is also analyzed. For the
population models we also refer to Breiman and Friedman (1985) and their
discussants, to De Leeuw (1982), and to Schriever (1985). The statistical stability of
our techniques can be studied by using asymptotic techniques such as the delta
method, and the modem resampling techniques such as the Jackknife and Bootstrap.
Gifi (1981) gives examples. Also compare De Leeuw (1984c). Observe that stability
is an important consideration here, because we fit many parameters. We must guard
against chance capitalization, i.e. against the possibility that our results and our
interpretations are based on haphazard properties of the sample. Techniques of
testing the stability (or significance) of generalized canonical correlations have been
184

discussed by De Leeuw and Vander Burg (1986). Although these techniques for
analyzing stability are often expensive computationally, we think that in almost all
cases the extra computations are quite worthwhile. A confidence band around a
nonlinear transformation, or a confidence ellipsoid around a plane projection give
useful additional information, even if the random sampling assumptions do not seem
to apply.
Books such as Legendre and Legendre (1983), Gauch (1982), and Gittins (1985)
have already shown to ecologists that linear multivariate analysis techniques, if
applied carefully, and by somebody having expert knowledge of the subject area in
question, can be extremely helpful and powerful tools. It seems to us that combining
multivariate exploration with automatic reexpression of variables is an even more
powerful tool, which has already produced interesting results in many different
scientific disciplines. We think that they show great promise for ecology too, but we
must emphasize that perhaps even more care, and an even more expert knowledge
of the ecological problems, is required. Attacking very simple problems with very
powerful tools is usually unwise and sometimes dangerous. One does not rent a
truck to move a box of matches, and one does not use a chain saw to sharpen a
pencil. The techniques we have discussed in this paper are most useful in dealing
with large, relatively unstructured, data sets, in which there is not too much prior
information about physical or causal mechanisms. In other cases, often better
techniques are available. But these other cases occur far less frequently than the
standard mathematical statistics or multivariate analysis texts suggest.

REFERENCES

AGRESTI, A. 1983. Analysis of Ordinal Categorical Data. John Wiley & Sons,
Inc., New York, NY.
ANDERSON, T.W. 1984. An Introduction to Multivariate Statistical Analysis.
(second edition). John Wiley & Sons, Inc., New York, NY.
BEKKER, P. 1986. A Comparison of Various Techniques for Nonlinear Principal
Component Analysis. DSWO-Press, Leiden, The Netherlands.
BENZECRI, J. P. ET AL. 1973. L'Analyse des Donnees. (2 vols). Dunod, Paris,
France.
BENZECRI, J.P. ET AL. 1980. Pratique de l'Analyse des Donnees. (3 vols).
Dunod, Paris, France.
BISHOP, Y.M.M., S.E. FIENBERG, AND P.W. HOLLAND. 1975. Discrete
Multivariate Analysis: Theory and Practice. MIT-Press, Cambridge, Ma.
BREIMAN, L, AND J.H. FRIEDMAN. 1985. Estimating Optimal Transformations
for Multiple Regression and Correlation. J. Am. Statist. Assoc. 80: 580-619.
BURT, C. 1950. The Factorial Analysis of Qualitative Data. British J. Psycho!.
185

(Statist. Section) 3: 166-185.


BURT, C. 1953. Scale Analysis and Factor Analysis. Comments on Dr. Guttman's
Paper. British J. Statist. Psychol. 6: 5-23.
CAILLIEZ, F., AND J.-P. PAGES. 1976. Introduction a l'Analyse des Donnees.
SMASH, Paris, France.
CARROLL, J.D., AND P. ARABIE. 1980. Multidimensional Scaling. Ann. Rev.
Psychol. 31: 607-649.
DE LEEUW, J. 1973. Canonical Analysis of Categorical Data. Unpublished
dissertation. Reissued DSWO-Press, Leiden, The Netherlands, 1984.
DE LEEUW, 1. 1982. Nonlinear Principal Component Analysis. In H. Caussinus et
al. reds] COMPSTAT 82. Physika Verlag, Wien, Austria.
DE LEEUW, J. 1984a. The Gifi System of Nonlinear Multivariate Analysis. In E.
Diday et al. reds] Data Analysis and Informatics IV. North Holland Publishing
Co, Amsterdam, The Netherlands.
DE LEEUW, J. 1984b. Beyond Homogeneity Analysis. Report RR-84-08,
Department of Data Theory, University of Leiden, The Netherlands.
DE LEEUW, 1. 1984c. Statistical Properties of Multiple Correspondence Analysis.
Report RR-84-07, Department of Data Theory, University of Leiden, The
Netherlands.
DE LEEUW, J. 1986. Multivariate Analysis with Optimal Scaling. Report
RR-86-01, Department of Data Theory, University of Leiden, The Netherlands.
DE LEEUW, J., AND W.J. HEISER. 1982. Theory of Multidimensional Scaling. In
P.R.Krishnaiah and L. Kanal [eds.] Handbook of Statistics II. North Holland
Publishing Co, Amsterdam, The Netherlands.
DE LEEUW, J., AND J. MEULMAN. 1986. Principal Component Analysis and
Restricted Multidimensional Scaling. In W. Gaul, and M. Schader [eds.]
Classification as a Tool of Research. North Holland Publishing Co.,
Amsterdam, The Netherlands.
DE LEEUW 1., AND E. VAN DER BURG. 1986. The Permutation Distribution of
Generalized Canonical Correlations. In E. Diday et al. [eds.] Data Analysis and
Informatics V. North Holland Publishing Co., Amsterdam, The Netherlands.
DE LEEUW, J., P. V AN DER HEIJDEN, AND I. KREFf. 1984. Homogeneity
Analysis of Event-history data. Methods of Operations Research 50: 299-316.
DE LEEUW, J., AND J. VAN RIJCKEVORSEL. 1980. HOMALS and PRINCALS:
some Generalizations of Principal Component Analysis. In E. Diday et al. reds]
Data Analysis and Informatics. North Holland Publishing Co., Amsterdam, The
Netherlands.
DE LEEUW, J., J. VAN RIJCKEVORSEL, AND H. VAN DER WOUDEN. 1981.
Nonlinear Principal Component Analysis using B-splines. Methods of
Operations Research 23: 211-234.
DE LEEUW, J., F.W. YOUNG, AND Y. TAKANE. 1976. Additive Structure in
Qualitative Data. Psychometrika 41: 471-503.
FISHER, R.A. 1941. Statistical Methods for Research Workers. (8th edition).
Oliver and Boyd, Edinburgh, Scotland.
GAUCH, H.G. 1982. Multivariate Analysis in Community Ecology. Cambridge
University Press, Cambridge, G.B.
186

GIFI, A. 1981. Nonlinear Multivariate Analysis. Department of Data Theory FSW,


University of Leiden, The Netherlands. To be reissued by DSWO-Press, 1987.
GITTINS, R. 1985. Canonical Analysis. A Review with Applications in Ecology.
Springer,Berlin, BRD.
GNANADESIKAN, R., AND J.R. KETTENRING 1984. A Pragmatic Review of
Multivariate Methods in Applications. In H.A. David and H.T. David [eds.]
Statistics: an Appraisal. Iowa State University Press, Ames, Iowa.
GREENACRE, M.J. 1984. Theory and Applications of Correspondence Analysis.
Academic Press, New York, NY.
GUTTMAN, L. 1941. The Quantification of a Class of Attributes: a Theory and
Method of Scale Construction. In P. Horst [ed.] The Prediction of Personal
Adjustment. New York: Social Science Research Council, New York, NY.
GUTTMAN, L. 1950. The Principal Components of Scale Analysis. In S.A.
Stouffer et a1. [eds.] Measurement and Prediction. Princeton University Press,
Princeton, NJ.
GUTTMAN, L. 1953. A Note on Sir Cyril Burt's "Factorial Analysis of Qualitative
Data". British J. Statist. Psychol. 6: 1-4.
GUTTMAN, L. 1959. Introduction to Facet Design and Analysis. Proc. 15th Int.
Congress Psychol. North Holland Publishing Co, Amsterdam, The Netherlands.
HABERMAN, S.J. 1979. Analysis of Qualitative Data. (2 vols.). Academic Press,
New York, NY.
HEISER, W.J. 1981. Unfolding Analysis of Proximity Data. Department of Data
Theory, University of Leiden, The Netherlands.
HEISER, W.J. 1986. Shifted Single-peakedness, Unfolding, Correspondence
Analysis, and Horseshoes. This volume.
HILL, M.O. 1973. Reciprocal Averaging: an Eigenvector Method of Ordination. J.
Ecology 61: 237-251.
HILL, M.O. 1974. Correspondence Analysis: a Neglected Multivariate Method.
Appl. Statist. 3: 340-354.
HILL, M.O., AND H.G. GAUCH. 1980. Detrended Correspondence Analysis, an
Improved Ordination Technique. Vegetatio 42: 47-58.
HOFFMAN, D.L., AND F.W. YOUNG. 1983. Quantitative Analysis of Qualitative
Data: Applications in Food Research. In H. Martens and H. Russwurm Jr. [eds]
Food Research and Data Analysis. Applied Science Publishers, London, GB.
KOY AK, R. 1985. Nonlinear Dimensionality Reduction. Unpublished Ph.D.
Thesis. Department of Statistics, University of California, Berkeley, CA.
KRUSKAL, J.B. 1964. Multidimensional Scaling by Optimizing Goodness-of-Fit to
a Nonmetric Hypothesis. Psychometrika 29: 1-28.
LEBART, L., A. MORINEAU, AND K.M. WARWICK. 1984. Multivariate
Descriptive Statistical Analysis. John Wiley and Sons, Inc., New York, NY.
LEGENDRE, L., AND P. LEGENDRE. 1983. Numerical Ecology. Elsevier
Scientific Publishing Co, Amsterdam, The Netherlands.
MAYR, E. 1932. Birds Collected during the Whitney South Sea Expedition. Amer.
Museum. Novitates 20: 1-22.
MEULMAN, J. 1982. Homogeneity Analysis of Incomplete Data. DSWO-Press,
Leiden, The Netherlands.
187

MUIRHEAD, R.M. 1983. Aspects of Multivariate Statistical Theory. John Wiley


and Sons, Inc., New York, NY.
NISHISATO, S. 1980. The Analysis of Categorical Data. Dual Scaling and its
Application. University of Toronto Press, Toronto, Can.
NOY-MEIR, I., AND R.H. WHITTAKER. 1978. Recent Developments in
Continuous Multivariate Techniques. In R.H. Whittaker [ed.] Ordination of
Plant Communities. Dr. W. Junk BV, The Hague, The Netherlands.
PERREAULT JR., W.D., AND F.W. YOUNG. 1980. Alternating Least Squares
Optimal Scaling: Analysis of Nonmetric Data in Marketing Research. J.
Marketing Research 17: 1-13.
SCHRIEVER, B.F. 1985. Order Dependence. Mathematical Centre, Amsterdam,
The Netherlands.
TAKANE, Y., F.W. YOUNG, AND J. DE LEEUW. 1979. Nonmetric Common
Factor Analysis. Behaviormetrika 6: 45-56.
TAKANE, Y., F.W. YOUNG, AND J. DE LEEUW. 1980. An Individual
Differences Additive Model. Psychometrika 45: 183-209.
VAN DER BURG, E. 1984. Homals Classification of Whales, Porpoises and
Dolphins.In J. Janssen et al. [eds.] New Trends in Data Analysis and
Applications. North Holland Publishing Co., Amsterdam, The Netherlands.
VAN DER BURG, E., AND J. DE LEEUW. 1983. Nonlinear Canonical
Correlation. British Journal of Mathematical and Statistical Psychology 36:
54-80.
VAN DER BURG, E., J. DE LEEUW, AND R. VERDEGAAL. 1984. Non-linear
Canonical Correlation with M Sets of Variables. Report RR-84-12, Department
of Data Theory, University of Leiden, The Netherlands.
VAN DER BURG, E., J. DE LEEUW, AND R. VERDEGAAL. 1986.
Homogeneity Analysis with k Sets of Variables. Accepted for Publication.
VAN RIJCKEVORSEL, J. 1982. Canonical Analysis with B-splines. In H.
Caussinus et al. [eds.] COMPSTAT 82. Physika Verlag, Wien, Austria.
VAN RIJCKEVORSEL, J., AND G. VAN KOOTEN. 1985. Smooth PCA of
Economic Data. Computational Statistics Quarterly 2: 143-172.
V AN RIJCKEVORSEL, J., AND J . WALTER. 1983. An Application of two
Generalizations of Nonlinear Principal Components Analysis. In J. Janssen et al.
[eds.] New Trends in Data Analysis and Applications. North Holland Publishing
Co., Amsterdam, The Netherlands.
YOUNG, F.W. 1981. Quantitative Analysis of Qualitative Data. Psychometrika 46:
347-388.
YOUNG, F.W., J. DE LEEUW, AND Y. TAKANE. 1976. Regression with
Qualitative and Quantitative Variables. Psychometrika 41: 505-529.
YOUNG, F.W., 1. DE LEEUW, AND Y. TAKANE. 1980. Quantifying Qualitative
Data. In E.D.Lantermann and H. Feger [eds.] Similarity and Choice. Huber
Verlag, Bern, Schweiz.
YOUNG, F.W., Y. TAKANE, AND J. DE LEEUW. 1978. The Principal
Components of Mixed Measurement Level Multivariate Data. Psychometrika
43: 279-281.
JOINT ORDINATION OF SPECIES AND SITES:
THE UNFOLDING TECHNIQUE

Willem J. Heiser
Department of Data Theory
University of Leiden

Middelstegracht 4, 2312 TW Leiden


The Netherlands

Abstract - Several different methods of gradient analysis, including correspondence analysis and
Gaussian ordination, can be characterized as unfolding methods. These techniques are applicable
whenever single-peaked response functions are at issue, either with respect to known environment-
al characteristics or else with respect to data driven reorderings of the sites. Unfolding gives a joint
respresentation of the site/species relationships in terms of the distance between two types of
points, the location of which can be constrained in various ways. A classification based on loss
functions is given, as well as a convergent algorithm for the weighted least squares case.

1. INTRODUCTION

Ordination and clustering methods all rely on the concept of distance and some kind of
reduction principle in order to facilitate the analysis of structures in data. Usually, this requires the
choice of some measure of ecological resemblance as a fIrst step, either between objects (individ-
uals, samples), or between attributes (species, descriptors). Then in ordination the aim is fInding a
reduced space that preserves distance, i.e. reduction of dimensionality, and in cluster analysis the
aim is allocating thellnits of analysis to a reduced number of (possibly hierarchically organised)
classes, i.e. reduction of within-group distance with respect to between-group distance.
This paper will be centered at a third type of method, also based on distance and reduction,
but not relying on derived associations or derived dependencies. It is particularly suited for the
analysis of species x samples presence-absence or abundance data; or, perhaps somewhat more
generally, for any ecological data matrix that is dimensionally homogeneous (Legendre and
Legendre 1983), and non-negative. In psychology, where its early developments took place in the
context of the analysis of individual choice behavior and differential preference strength, the group
of methods is called unfolding (Coombs 1950, 1964). Since the word "unfolding" aptly describes
the major aim of the technique, it will be used as a generic name throughout this paper.
In order to outline the objectives of unfolding in ecological terms, the first thing to notice is
that the basic notion of ecological resemblance need not be confmed to distance defined on pairs of
units from a single set. If it is assumed that for each species there is a unique combination of the
levels or states of the environmental variables that optimizes its possibilities to survive, perhaps to
be called its ideal niche, and that the sampling sites approximate these ideal circumstances to

NATO AS! Series, Vol. G14


Developments in Numerical Ecology
Edited by P. and L. Legendre
© Springer-Verlag Berlin Heidelberg 1987
190

different degrees, then species abundance might be supposed to level off monotonically with the
distance of a sampling site from the ideal niche. Here distance could be understood as concrete,
geographical distance, or as distance in some abstract space. In the latter case the samples are to be
arranged in an orderly fashion, along a gradient, reflecting the gradual changes in environmental or
community characteristics. Now the unfolding technique seeks to find precisely those gradients that
yield single-peaked response functions, i.e. it seeks a reduction to (low-dimensional) unimodality.
Psychologists study objects called stimuli, want to arrange them along stimulus scales, and one of
the major response classes available to them is preference. In these terms, the unfolding technique
aims at finding those stimulus scales that yield single-peaked preference functions.
Coombs developed his form of unfolding in an attempt to resolve a notorious problem in
psychology, i.e. the problem of defining a psychological unit of measurement (Coombs 1950).
How can we quantify human judgement without recourse to an arbitrary grade-point system? The
ecological equivalent of this issue would be: how can we quantify the differential reactions of
species to the environment without capitalizing on the pseudo-exact numerical aspects of
abundance? The answer unfolding has to offer is through the study of consistency (or scalability)
of the behavioral reactions under the condition of single-peakedness.
The first goal of this paper is to convince the reader that the unfolding technique is the natural
general-purpose first candidate for gradient analysis. However, there exists plenty of scope for
making more specific assumptions than has been done so far, and hence several rather different
methods are to be considered as members of the family. Therefore, a second goal is to try to
organize the field a little by comparing the various loss functions on which these methods are
based, and by showing the interrelations between various special cases. The third goal is to present
explicit computational formulas for a convergent unfolding algorithm, and to sketch a few open
problems and lines of development

2. NON-LINEARITIES: A MIXED BLESSING

2.1. Indications for unimodality in ecology and elsewhere

The importance of single-peaked, or unimodal, response curves and surfaces stems from a
diversity of scientific areas, ecology being one of the richest sources. Frequently a linear analysis
of contingencies showed unexpected nonlinearities, or sometimes regression plots of abundance or
cover against carefully chosen a priori gradients were unmistakenly bell-shaped. Ihm and van
Groenewoud (1984) summarize the early evidence from vegetation studies as follows: "Already
Goodall (1954) in one of the first applications ofPCA to the analysis of vegetation data noted the
problem caused by the nonlinearity of quantitative species relationships in the interpretation of the
principal components. Knowledge about the non-linearity of gradient response was, however, not
191

new. Braun-Blanquet and Jenny (1926) investigated the pH-value of soils in which several species,
e.g. Carex curvula (L) and others, were growing in the Swiss Alps and England. They found
normal frequency curves for these pH-values. Making the assumption of a uniform distribution of
the pH-values - at least in the range of growth of the species studied - one could conclude that also
the gradient response was Gaussian. It appears the bell-shaped gradient response curves were first
suggested by Igoshina (1927). Gause (1930) studied the abundance of certain species as related to
ecological conditions and found that they followed the law of Gauss. The ordination work by
Curtis and Mcintosh (1951), Bray and Curtis (1957), Cottam and Curtis (1956), Whittaker (1948)
and many others all showed the non-linearity of species-site factor relationships. Especially the
published examples of gradient responses clearly show the unimodal type of the response curves."
(1.c., p.13). For many additional references, see Gauch (1982) and Whittaker (1978).
The first articulated unimodal response model in psychology was proposed by Thurstone
(1927), building upon nineteenth century work on sensory discrimination. He claimed wider
applicability, e.g. as a model for attitude and opinion, but later on abandoned the subject. Hovland,
Harvey and Sherif (1957) undertook additional experimental work, and provided convincing
evidence for single-peakedness in human evaluative responses. In factor analyses of personality
tests one frequently found nonlinearities called - by lack of a full understanding - 'difficulty
factors'. Coombs and Smith (1973) and Davison et al. (1980) studied unimodal developmental
processes, and a classic example of single-peaked behavior is preference for family compositions in
terms of number of children and bias towards boys or girls (e.g., Coxon 1974). Yet the phenom-
enon is not very actively studied anymore in psychology, not nearly as much as its special case:
monotonicity .
At this point, it might be useful to emphasize that it is not unimodality alone, but the fact that
the peaks of the curves are shifted with respect to each other which makes the situation special. For
imagine a number of unimodal curves precisely on top of each other, then any transformation of the
gradient would provide the same information; thus one could make the curves more skewed,
double-peaked, monotonically increasing, or indeed of any conceivable shape by suitable re-
expressions of the values against which they are plotted. When the curves are shifted along the
gradient, this freedom of simultaneous change of shape is reduced enormously.
The early contributions to ordination by the famous archaeologist Flinders Petrie, source of
inspiration for Kendall (1963) and much subsequent work in archaeological seriation (cf. Hodson
et al. 1971), were typically not tailored to the precise shape of the artifact distributions, but
primarily to the fact that they should form an overlapping sequence of 'present' counts if the sites
were properly ordered (presumably in time). Roberts (1976, section 3.4) has given an interesting
graph-theoretical characterization of this ordering problem.

Summarizing, we might say that unimodality is a firmly established empirical phenomenon,


that it is only visible when the gradients are carefully chosen, and finally that linear methods like
principal components analysis (PCA) will distort expected gradients in a nonlinear fashion (Swan
192

1970; Noy-Meir and Austin 1970). Because these distortions can have widely different forms -
depending on such things as the dimensionality of the gradient, the homogeneity of the species and
sample variances, and the variability of maximum abundance - it is hazardous to rely on the
standard PCA approach, and there is clearly a need for specialized nonlinear methods.

2.2. Nonlinear data transformations

If a bivariate distribution of data points is curved, we can straighten it out by transforming


one or both of the variables. For instance, if the cloud "accelerates" from left to right, a log
transformation of the vertical axis will remove, or mitigate, the acceleration. This is called
linearizing the regression. If all bivariate distributions among m variables are considered
simultaneously, it will generally be necessary to use different transformations to linearize the
regression as much as possible on the average. This is one of the major objectives in the Gifi
system of nonlinear multivariate analysis; for a full explanation see de Leeuw (1987a).
Under the assumption of shifted single-peaked response curves and surfaces, we don't
expect to find linear bivariate relationships (cf. Greig-Smith 1983, who has clearly summarized the
peculiar shapes one can obtain). Perhaps not too surprisingly, then, the approach using nonlinear
data transformations towards linearity turns out to be a move in the wrong direction in this case
(Heiser 1985a), giving more extreme curvature and convolutions than a linear peA. The previous
statement deserves a qualification, because it is only true when the class of transformations, or
admissible reexpressions, of the variables is defined in the standard way. As we shall see later on
(section 4.1), there are alternative ways of coding, based on the assumption of shifted single-
peakedness, which do give satifying results.

2.3. The general polynomial model

Instead of bringing in nonlinearity at the data side, it can be introduced in the functional
structure of the model. McDonald (1962, 1967) and Carroll (1969, 1972) have advocated this
general approach. Deviation from linearity - although a heterogeneous phenomenon by its very
nature - can always be modelled by a sufficiently rich family of polynomials. Carroll's polynomial
factor analysis model has the following form:

(1)

with
(2)

Here, as in the sequel, fij denotes the abundance of species i in sample j, or, in the more general
193

tenninology of Legendre and Legendre (1983), the value of descriptor i for object j. The symbol ==
is used for approximation in the least squares sense, and the indices run as i=I, ... ,n , j=I, ... ,m ,
and r=I, ... ,q. So in its full generality, there are p sample gradients, or a p-dimensional space of
sample points, with coordinates Yjs. Then there are q elementary polynomial functions <l>r that have
to be specified on an a priori basis. Thus to obtain a quadratic response surface, for example, one
would have to specify:

<1>1(.): Z1j = 1,
<1>2(·): Z2j = Yj1 ,
<1>3(·): Z3j = Yj2 ,
<1>4(.): Z4j = y2jl>
<l>S(.): ZSj = y2j2,
<1>6(·): Z6j = Yj1Yj2 •

It is easily verified that if only the frrst three of these are chosen, (1) and (2) reduce to the familiar
bilinear form of the PCA model.
Carroll used a steepest descent method for fmding optimal values for the parameter sets {a ir}
and {Yjs}. There is little experience with the procedure, however. It is quite heavily loaded with
parameters, and does not give a particularly simple parametrization of the species. It has a great
many special cases. Perhaps it should better be called a program for research, rather than a model.
When the {Yjs} are fixed to known values, e.g. environmental measurements such as soil
pH, soil moisture, elevation and so on, the set-up (1) and (2) becomes formally equivalent to a
multiple regression analysis problem (Draper and Smith 1966; Gittins 1985). Note that although
nonlinear predictors are used, the model is now linear in the parameters, and can be fitted by
standard methods. Also note that in fact we have n independent regression problems, one for each
species or row of the data matrix. The last two remarks remain true if the definition of <l>r is
extended to include logarithmic, exponential or other simple functions. Carroll (1972) has given
explicit reparametrizations, constituting the so-called PREFMAP hierarchy of models, to obtain a
description of the species response curves or surfaces in terms of the location of the peak, the
importance of the relative contributions of the gradient factors, and possibly their interaction.
Fixing the space of sample points or objects and then studying the regression is only one way
to simplify the general polynomial model, and is called direct gradient analysis (Whittaker 1967), or
external analysis of preferences (Carroll 1972). These terms are used in contrast to indirect gradient
analysis or internal analysis of preferences, in which some optimal quantification of the gradient
has to be found as well. As we shall see shortly, there is also the possibility of an analysis between
these two extremes, whenever there is partial knowledge on the gradient (f.i., a ranking of the sites
with respect to moisture status, instead of exact numerical measurements). But first a few additional
remarks are in order, regarding the reasons for concentrating on unimodal models.
194

2.4. Strategic reasons for giving priority to shifted single-peakedness

It was remarked earlier: linearity has the virtue of being uniquely defined, but deviation from
linearity can have many appearances. From a statistical point of view, it seems wise to progress
slowly from very simple to increasingly complex models, and to examine the deviations from the
model along the way. In fact, the bilinear model of PCA is already a second type of approximation,
the fIrst one being the hypothesis that all abundances are equal, up to row- and/or column effects.
However ignorant or even indecent this may sound in a field that studies diversity, we may
occasionally need to have statistical assurance that we deal with genuine interaction between species
and sites. If the abundance data are considered to be a contingency table, for instance, the chi-
squared test value under the hypothesis of independence should be very large.
The shifted single-peaked model is a further approximation of the second type, and it has the
virtue of having one defining characteristic as well. It is more complex in form than the bilinear
model, but not necessarily in terms of number of parameters. The situation is depicted in Figure 1.
When moving to the right the number of parameters is increased, so a better fit will always be ob-

one tvo
/ linear component ---7 linear component:! ~ ...

INo m~rection I~ r------,

\ ~ shU~d shU~d
single-peaked ~ single-peaked

... other nonlinear models ...

Fig. 1. Interaction models can best be partially ordered.

tained, but one set of curves might be enough where multiple components would be needed. Of
course, other nonlinear models might turn out to be even more appropriate, but in general there is
little hope in trying an exhaustive search.
It is difficult to accept that, when two models describe the same data about equally well, one
of them is "true" and the other one is "false". Let us consider Figure 2 in the light of this remark.
The Figure gives an idealized example of one of those notorious curved chains of sample points
from a PCA of abundance data. In addition, however, it gives two directions representing species A
and B, selected arbitrarily from the whole range of possible species directions. The advantage of
making this so-called joint plot or biplot (Gabriel 1971) is that it enables the demonstration of a
195

very elementary fact, which is often - if not always - overlooked in the literature. The PeA model
implies that, in order to reconstruct the abundances for species A, the sample points should be
orthogonally projected onto direction A. If this is actually done, and for direction B likewise, and
if the curved chain is straightened out, or "unfolded" into a straight line, locally preserving the dis-

B
I) 8

10 *
* *
7
* 6
*
11* 5
* A

* 3

* 2
*1

Fig. 2. Joint plot of two species (A and B) and a number of


sites exhibiting the horseshoe effect.

recon-
structed A
* *
abundance
* * B
*
t * ** *
* *
*
* * *
* * *

* *
* *
2 3 4 5 6 7 8 9 10 11 12
position along the horseshoe
Fig. 3. Abundance as a function of position along the horseshoe
(Peak A corresponds with direction A of Figure 2, and peak B with
direction B).
196

tances among the sample points, the projections plotted against the "unfolded" chain get the
appearance of Figure 3: shifted single-peaked curves! Any direction in between A and B in Figure 2
would yield a curve with its peak in between the peaks of A and B in Figure 3l and more extreme
directions (to the left of B, and to the right of A) would get curves with more extremely shifted
peaks. This shows that there is no real contradiction between the two ways of representing the data,
provided they are interpreted with an open mind. For single-peaked surfaces the PeA representation
will be a curved manifold in three dimensions, much less easily recognizable. Under single-peaked-
ness the data themselves already form a curved manifold in m dimensions, which has to be
"unfolded" to display its simplicity. Of course, these observations are not sufficient for getting a
practical method. The occurrence of deviations from the model, including random errors, as well as
the possible need to work in high dimensionality, urges us to use and further develop specialized
unfolding methods.

3. A FAMILY OF LOSS FUNCTIONS FOR UNFOLDING

A curve or surface of any shape could in principle be modelled by means of the general
polynomial mode1. This relatively blind approach implies that many parameters have to be estimated
(often repeatedly under different specifications of the model), many of which are unlikely to be
readily interpretable. Under shifted single-peakedness the parametrization can be solely in terms of
the location of the peaks, and possibly also with respect to remaining aspects of shape: tolerance or
species dispersion (range of the responses along the gradient), correlated density in the more-
dimensional case, and (lack of) symmetry. Any unfolding method is based on the assumption that
abundance is inversely related to the distance of a sample point from the estimated peak location of
the species response function, frequently called the ideal point. The name "unfolding" refers to the
following metaphor: suppose the model is known, and imagine the sample points painted on a
handkerchief. Pick the handkerchief up at the ideal point of species i andfold it, for instance by
pulling it through a ring. Then observe that the sample points will appear in the order of the
magnitude of the abundances as given in the tth row of the data matrix (or of the raw observations
if these are recorded, for each species i, as a list of samples from most abundant down to least
abundant, or absent). Because the analysis technique must construct the model starting from the
data, this process must be reversed; hence the name.
Two major approaches to unfolding can be discerned: one based on dissimilarity approxi-
mation, the other on distance or squared distance minimization. As shall become evident shortly,
there is an important sense in which the latter - formally equivalent to correspondence analysis - is a
special case of the former. The discussion starts with the problem of external unfolding, where the
location of the sample points is fixed in advance, and the ideal points must be determined.
197

3.1. Locating one set of points with respect to a given set

Suppose the coordinates of m points in p-dimensional space are available in the mxp matrix
Y, the j'th row of which is denoted with Yj. Now consider n unknown additional points, indexed
by i, with coordinates xi collected in the rows of the nxp matrix X. The Euclidean distance d(Xi,yj)
is defined by writing its square as:

(3)

In order to construct a loss function that measures the departure of the model distances from the
data, some definition of dissimilarity - the empirical counterpart of distance - has to be agreed upon.
Just to make a start, suppose this is done in the following way. Since the total number of
occurences of a species is often of little interest, at least not in the study of species x environment
interaction, it is advisable to work with the species-specific proportions

(4)

or some other standardization factor, such as maximal species abundance, to make the distributions
row-wise comparable. Now the species-sample dissimilarity Bij and the associated weights Wij may
be defined as:

B··I] -- -log p"I] and wij = 1 if Pij> 0, (5a)


Bij = 1 and Wij = 0 if Pij = O. (5b)

Other choices will be encountered later. In (5a) and (5b) the weights are merely used to indicate
presence or absence; non-occurrence gets an arbitrary unit dissimilarity, and will not cause any
increase in loss (because Wij = 0). Note that, indeed, dissimilarity is a decreasing function of
relative abundance; ifpij approaches zero, then Bjj approaches infinity, and if pjj = 1 then Bij = O.
The interpretation of the latter case depends on the data standardization; under (4) it implies that &j
only becomes zero if a species occurs in only one sample (in any frequency).
The basic unfolding loss function is now defined as the weighted least squares criterion

cr 2R -- }:.}:.
I J W··
I]
{B··I] - d(x·l' y.)}2
] , (6)

the "rectangular" or "off-diagonal" version of Kruskal's so-called raw STRESS (Kruskal was the
first who explicitly proposed to use least squares distance modelling, in his (1964a, 1964b)
papers). Depending on the alterations in the definition of Wjj and Bij, as well as on the choice of
domain n over which O"R is to be minimized, we get different unfolding methods.
For the problem of this section n is the set of all nxp matrices, but in addition a provision
has to be made for ensuring that Band d match in scale (assuming that the coordinates of the given
set of points are on an arbitrary scale). Because the distance function is homogeneous, i.e. IX
198

d(Xi,yj) = d(axi,aYj) for any nonnegative a, adjusting the scale of the coordinates and adjusting
the scale of the distances amounts to the same thing. However, we can also adjust the scale of the
dissimilarities by just extending their definition so as to include an unknown scaling constant:

()ij(a) = a ( - log Pij) , (7)

where the notation ()i/a) is used to make the dependence on a fully explicit. Whatever choice is
made, the scale adjustment would leave erR dependent on the arbitrary scale of the given set of
points; this is undesirable, so erR has to be normalized. As shown by Kruskal and Carroll (1969),
various ways of normalization only affect the scale of the loss function, not the argument for which
a minimum is attained. De Leeuw and Heiser (1977) have argued that normalization on the
distances makes the computational problem considerably more complicated in a number of
important special cases. Therefore the external unfolding problem - as defined here - becomes:

{ min er2 N (xl>""x n; a) } , (8a)


a

with

(8b)

This optimization problem (and the one that will follow shortly) has no closed-form solution, it is
not related to any eigenvalues and eigenvectors, nor to projection from some high-dimensional
space to a p-dimensional one; it has to be solved iteratively. A convergent algorithm for finding at
least a local minimum shall be discussed in some detail now, because it offers the opportunity to
illustrate a number of interesting features of this type of algorithm. It is based on the general
algorithm model proposed by De Leeuwand Heiser (1977, 1980), called SMACOF (an acronym
highlighting its prime technical characteristic: scaling by MAximizing a COnvex Function, or, as is
preferred nowadays, Scaling by MAjorizing a COmplicated Function).
The minimization of erN can be done by repeatedly solving two subproblems. There is a
normalized regression problem, in this case finding the optimal value of a for fixed distances, and
a relocation problem, i.e. finding new locations X+ starting from some initial guess "X and keeping
the rescaled dissimilarities constant at their current values. As to the former, it can be shown that,
writing d ij for the fixed distances, the optimal choice of a is

(9a)

The quantities

(9b)

sometimes called the pseudo-distances, or dhats, or disparities, all names referring to the
characteristic of distance approximation by a function of the data, can be substituted in (8b),
199

thereby reducing it to the basic fonn (6) with unifonnly rescaled weights, due to the nonnalization
factor. This settles the regression part for now.
The relocation part is more difficult. One of the objections to a relatively straight-forward
steepest descent method, such as the one used by Kruskal (1964b), is that the partial derivatives of
O'R do not exist at points where d(Xi,y) becomes zero. In this context it is of some interest to note
that the very same problem emerges in the classic Fermat or generalized Weber problem (Kuhn
1967), also called the location problem, which is to locate a point Xi among m known points in
such a way that

min L j wijd(Xi,yj) . (10)


Xi
The SMACOF approach turns out to be closely related to Kuhn's algorithm. It is based on the
'subgradient', rather than the ordinary 'gradient' (De Leeuw 1977).
To elaborate somewhat on the location problem: if Wij is binary and Ji is the index set of the
nonzero elements in row i, and if in addition distance is one-dimensional (the ecological gradient is
one variable), then (10) reduces to

min LjEy I Xi - Yj I, (11)


Xi
the solution of which is well-known: the median of the sample values for which the species is
present. This shows that in the case of binary weights, the solution of (10) is a proper general-
ization of the median concept to higher dimensions (cf. Austin 1959). It is also a generalization to
the case of differential weights. So it certainly is one sensible way to estimate the peak of a surface.
But in addition it becomes clear that, while (10) could be called a distance minimization approach,
the external unfolding problem is different in the sense that (6) aims at approximation of
dissimilarities. How can this be done?
The interested readeris referred to De Leeuw (1977) and De Leeuw and Heiser (1980) for a
general explanation of the SMACOF algorithm model and its rationale. For the unfolding case also
see Heiser (1981). The specific computational steps are as follows. Suppose d ij = d(~i'Yj) is the
distance between the fixed point Yj and some initial estimate ~i of the i'th point. Then define the
matrix A with elements

aij = Wij d+ij / dij if dij > 0, (12a)


aij = 0 if dij = O. (12b)

Furthermore, the weights are collected in W = {Wij}, and the diagonal matrices P and R are
defined as:

P = diag (Ae m ) , (13a)


R = diag (Wem ) ,
200

where em denotes an m-vector of ones. The SMACOF algorithm for external unfolding uses the
following two operations:

X'" = Pl! - A If , (14a)


X+ = R-l(X~ + Wlf) . (14b)

Here X~ is a preliminary, unconstrained update, and X+ is the successor configuration suitable for
the present case of flxed column points. Note that in the equally weighted case the last operation
(14b) amounts to a uniform rescaling and an adjustment of the centroid. The flrst operation (14a)
carries the burden of the iterative relocation of the species points, because A and P contain infor-
mation on the size of the current distances d;j' on what they should be (d+;}, and on how strongly

.'.'

.'.'
.' .'.'.'

x..., .,........
2

Fig. 4. Coordinate-free construction of new species points (dissimi-


1arities used: 8 11 =1, 8 12 =6, 8 21 =5, 8 22 =4, 8 31 =4, 8 32 =2).

an improvement is desired (Wij)' Let us have a closer look by writing (14a) row-wise as a single
weighted summation:
201

(15)

where K is the subset of the first m integers for which (12a) holds. Thus the preliminary updates
are a weighted sum, with weights wikft+ik, of unit-length difference vectors pointing from the fixed
column points towards the current location of i. If the current location of i coincides with a column
point, then (12b) comes into effect; the zero difference vector cannot be normalized and is omitted
from the summation. Sample sites where species i is absent - or at least where wij = 0, perhaps due
to another reason - do not contribute either.
The relocation step is illustrated in Figure 4, starting from an arbitrary configuration of three
~-points and two ,V-points, with unit weights and the dissimilarities as given in the Figure caption.
Thus there are 6 difference vectors, and the concentric circles around the origin represent the size of
the dissimilarities. The circles are used for adjusting the length of the difference vectors, and are
expanded or contracted during the iterations (this is a uniform expansion or contraction for the
present case of linear regression without an intercept, 9a and 9b; it would become a more involved
stretching and shrinking when other forms of regression are introduced). The x~i are now simply
obtained by vector addition. Next their length has to be divided by 2, the number of ,V-points, and
their origin must be shifted towards 'vo, the centroid Of'v1 and'v2, thus accomplishing (14b). For
x+1 the latter step is explicitly shown, while the other auxiliary lines are omitted for clarity. By
visual inspection alone it can be verified that the new distances are closer to the dissimilarities than
the old ones. Finally note the fact that each point is relocated independently from the others, in
much the same way as there were n independent regression problems under the general polynomial
model.
A summary of all steps is given in the following skeleton algorithm for external unfolding:

Jr,. f- 'good guess'


O"OLD f- 'large'
fur iter = 1,... , maxiter dQ:
(i) determine X+ from (14a) and (14b);
(ii) calculate d(x+ io'vj) using (3);
(iii) find d+ ij from the regression of d on 0;
(iv) calculate aNEW using (8b);
(v) if (aOLD - aNEW) is not 'small'.thm
* set Jr,. f- X + and aOLD f- aNEW
* go to (i)

* STOP
As a first extension to this scheme we shall now consider the situation in which the sample points
are not a priori given, but have to be located as well.
202

3.2. Reciprocal relocation: internal unfolding

In internal unfolding analysis two sets of points have to be located with respect to each other;
hence the term 'reciprocal relocation'. As a consequence, the relocations are not independent
anymore. It does eliminate the need to rescale the data: the rescaling factor can be absorbed in the
unknown coordinates. Therefore, the normalized loss function (J~ becomes functionally equivalent
to the unnorrnalized one (J\ , i.e. the same up to a constant, and the problem becomes:
(16)

The skeleton algorithm of the previous section need not be changed very much. We can skip step
(iii) (not for long; it will be reintroduced soon). Only step (i), calculation of the new locations, must
really be adjusted. Two additional matrices are required:

Q = diag (enA) , (17a)


C = diag (enW) , (17b)

where en denotes an n-vector of ones. Then, analogous to (14a), a preliminary update for the
sample points is found from

(18)

The companion operation (14b) is no longer correct. Instead, the successor configurations X + and
Y+ must be computed from the system of linear equations:

RX+ - WY+ =X~, (19a)


CY+ - W'X+ = Y~. (19b)

The interested reader may consult section 3.6 at this point for finding out how these equations come
about. How to solve the system most efficiently depends on the size of n and m. Suppose n > m
(the other case runs analogously). Then we should fIrst solve

(C - W'R-IW) Y+ = Y~ + W'R-IX~ , (20a)

which determines Y+ up to a shift of origin because the matrix C - W'R-IW is generally of rank
m-l (its null space is the vector em, due to the defInition of W, C, and R). Next, any solution of
(20a) can be used to determine X+ from

X+ = R-l (X~ + WY+). (20b)

Finally, although this is not really necessary, X+ and Y+ can be simultaneously centered so that
their joint centroid is in the origin. This settles the relocation part for internal unfolding.
203

Now consider a slight generalization in the regression part. Some species might cover a wider
range of sites than others, independent of the location of their peaks. If the frequencies are
normalized on the sum, this will tend to make the minus log proportions uniformly larger, wich
might be considered undesirable. This effect can be removed by introducing a scaling parameter for
each species as a generalization of (7):

(21)

Note that all that would have to be done for including (21) in the external unfolding algorithm
would be to execute it for each species separately, because that would make (9) effectively row-
specific, and the row-point movements were done independently anyhow. For internal unfolding,
however, the loss function has to be adjusted explicitly:

(22)

where the subscript C in Gc is used to indicate the conditionality of the regression and normalization
on the rows (the loss function is "split by rows", cf. Kruskal and Carroll 1969). Yet the algorithm
does not become very much more complicated. Keeping the distances fixed, the normalized
regression (9) must simply be done on each row separately, giving a+i. Next new weights can be
defmedas

(23)

which shows that minimizing (22) becomes equivalent to the basic unconditional problem (16),
with row-wise rescaled data and row-wise rescaled weights.
Summarizing the steps again in a skeleton algorithm for row-conditional internal unfolding
we get:

~ +-- 'good guess'


'if +-- 'good guess'
GOLD +-- 'large'
fm: iter = 1, ... , maxiter dQ:
(ia) determine X~ from (I4a) and y~ from (18);
(ib) determine X+ and Y+ from (20a) and (20b);
(ii) calculate d(X+i,Y+j) using (3);
(iii) fQ1: i= 1,... ,n d2.:
* find d+ij from the regression of the i'th row of {dij}
on the i'th row of {Bij};
(iv) calculate GNEW using (22);
(v) if (GOlD - GNEW) is not 'small'.thm
* set ~ +-- X +, 'if +-- Y + and GOLD +-- GNEW
204

* calculate new weights using (23)


* go to (i)

* STOP
The algorithm is now illustrated for a classical set of single-peaked ecological data.

Example: Internal unfolding of upland conifer-hardwood forests of nothern


Wisconsin.

The original data (from Brown and Curtis 1952) are the "importance values" of seventeen tree
species in 55 woodland stands. Importance value is a compound measure of species abundance, it
being the sum of relative frequency, relative density, and relative dominance of any species in a
given stand. The data were standardized species-wise as indicated in (4), with a factor of 105% of

Table 1. Climax adaptation numbers used in the analysis of


conifer-hardwood data (source: Brown and Curtis 1952).
Climax Climax
Tree species adaptation Tree species adaptation
number number

Pinus banksiana 1 Quercus rubra 6


Quercus ellipsoidalis 2 Abies ba/samea 7
Populus tremu/oides 2 Betula lutea 8
Populus grandidentata 2 Tsuga canadensis 8
Pinus resinosa 3 UInUlS americana 8
Quercus alba 4 Tilia americana 8
Pinus strobus 5 Ostrya virginiana 9
Betula papyrifera 5 Acer saccharwn 10
Acerrubrwn 6

the maximum importance values, and coded as (5a) and (5b). This way one obtains small, but non-
zero dissimilarity in the maximum abundance cells. To keep the analysis simple, species-specific
free scaling parameters were omitted. The discussion in Kershaw and Looney (1985) has served as
background; they explain how Brown and Curtis obtained single-peaked importance curves for the
species, the way in which a climax adaptation number was assigned to each species, and give other
details on the original analysis. The species involved here, and their climax adaptation numbers, are
given in Table 1. The climax concept implies that the vegetation has developed to a state of
equilibrium with the environment, but its intricacies are definitely beyond the scope of the present
paper. The adaptation numbers are simply used to label the results of the unfolding analysis (see
Figure 5).
205

Again for reasons of simplicity, the algorithm was executed in two dimensions. Apparently
the horizontal axis, ranging from Pinus banksiana to Acer saccharum, closely resembles the climax
number arrangement (product-moment correlation: 0.97). This is a first, rather strong indication for
the validity of the model. But there is plenty of variation to account for in addition to that. For
instance, Pinus resinosa and Quercus ellipsoidalis almost never occur together in the same stand,
even though they differ by only one unit in climax number. The two-dimensional unfolding

o 3 Pinus
14 • r"inosa

74 121 180
2 Populus 0 0 238 0 8 Ulmus
• tremuloidu 0157 0 • americana
016 145 0 Pin;: ~43
7
strobus

0 o
o
o
186 082 Abies
balsamea
121 0 0 054
166 0 158 10 0 0
o
39 106 .9 10
o 34 o OstrJa .Acer
o ~irgtniana saccharum
042 56
o 059 o 08 Tili: 8 0
o 13 0 015 americanao
46 6 Actr
1 o • rubrum o
':, 0 0
o
• Pinus o 8 Tsuga o
banksiana 4. Quercus
alba 6 Quercus • canadensis
o o 5 • rubra 0

00 6
S.Bltula 0 8 Betula
papyri/era 12 0 • lutea
o
o O2 2. Populus 0
• Quercus grandidentata 4 o
elllpsoidalis

Fig. 5. Internal unfolding of conifer-hardwood data (trees labelled with


a '*' and their climax adaptation number, sites with a '0' and the impor-
tance values of Pinus strobus).

analysis shows this by giving them a large separation in the vertical direction, as is also the case for
Betula lutea and Ulmus americana, and, although less strongly, for other pairs.
The model fits the data reasonably «JR = .2254, which is not entirely satisfactory according
to the current standards, indicating that a three-dimensional model could be called for, or, alter-
natively, for optimal rescaling of the species profiles). In order to present more concrete evidence
for the quality of fit, the sites in Figure 5 are labeled with the original importance values of Pinus
strobus, which shows the approximate single-peakedness clearly (Pinus strobus is absent in the
unlabelled sites). Reconstructions of similar quality can be obtained for the other tree species.
206

064 PiftUJI
• resiftosa
(-)
Populus 0
• Iremuloides
J-)07 7
0
0 64 Ulmu,
• am,rlcafta
o 3 6 0 . 071
72 Piftus o 3S7
slrobus
0428
0(-) (_)
S1, ~ 0107 100
032 86 o

0(_)

043 011

.Piftus
baftksiafta
043 0
29 0 21
029
21 0
• Populus 0 S4
29 0 • Quercus graftdideftlala
,lllpsoidalis

Fig. 6. Alternate labelling of the sites: calcium values (lO's lb. per acre).

Since we now have an ordination of the stands along with the optimal tree locations, various
stand characteristics can be examined to gain further understanding of the species-environment
interaction. In Figure 6 the stands are labelled with their calcium values. These tend to increase
when we move from the lower left to the upper right corner. It is especially the area around Ulmus
americana and Ostrya virginiana that has characteristically high calcium values. A numerical
assessment of the strength of relationships like this could be obtained by multiple regression
analysis with the point coordinates serving as predictor variables.

3.3. Squared distance minimization: correspondence analysis

Now that the two basic ways of unfolding via dissimilarity approximation have been
discussed, external when one of the two sets of points is fixed in advance, and internal when both
sets are free to vary, it will be instructive to reconsider the specification of dissimilarities and
weights. Suppose that, instead of (5a) and (5b), it is specified that:

Bij= 0 and Wij= f ij if fij>O, (24a)


Bij = 1 and wij= 0 if fij=O, (24b)
207

where the second one is not really a change, but the first one says that a species point should
coincide with any site where it occurs, with frequency of occurence used as weight. When these
specifications are substituted in the basic unfolding loss function (6) one obtains:

cr2CA -- L·1 L·J f..IJ d2(x· y.)J '


I'
(25)

because the weighted sum of squared dissimilarities and the weighted sum of squared cross
products vanish, due to the special structure in (24a) and (24b). The remaining part of the loss
function, (25), closely resembles the location problem as defined in (10), but aims at squared
distance minimization. Squared distance minimization is interesting for a number of reasons.
First, note that the SMACOF algorithm breaks down immediately under this specification,
because the matrices A (cf. (12a) and (12b)), and thus P (13a) and Q (17a) all vanish. So the
specification is at least incomplete, it has to be supplemented by a strong form of normalization or
a radical type of restriction. A good example of the latter is of course the external approach, which
now has an easy solution. To see this, it is convenient to write loss function (25) in matrix
notation, using the same symbols R and C as before (cf. (13b) and (17c)) for the marginal totals of
the matrix F = {fij}, and writing "tr" for the trace of a matrix:

(26)

For fixed Y the stationary equations for a minimum of cr 2CA over X are (setting the partial
derivatives with respect to X equal to zero):

X+ = R-IFY , (27a)

and, analogously, for fixed Ji{. we obtain

y+ = C-IF'Ji{. , (27b)

Comparing (27a) with the external unfolding result (14a), it turns out that the solution to squared
distance minimization merely involves taking a weighted average of the fixed points, not a
transform of some previous estimate such as X"". The best location of a species ideal point now is
the centre-of-gravity of the sites it occurs in. When the species points are fixed, the best location of
a site is the centre-of-gravity of the species it is covered with.
The internal approach is conceptually somewhat problematical from the present point of view.
First, we have to keep away from the trivial solution X = Y = 0, which certainly would minimize
(26). In a one-dimensional analysis, this is usually done by requiring that one of the sets of scores
is standardized in the metric of the marginal totals, e.g. en'Rx = 0 and x'Rx = n (where the
notation x and y is used for the vectors of one-dimensional species- and site scores, whereas Xi
and Yj denote the p-dimensional species- and site points). The first requirement can be formulated
as JRx = x, and can be inserted in the loss function; here JR is the projection operator
208

(28)

that centers all n-vectors, with weights R. The second one can be handled by introducing a
Lagrangian multiplier A, so that the adjusted minimization problem for the simultaneous estimation
of x and y becomes

min min {n + y'Cy - 2 x'JR'Fy + Ax'JR'RJRx } , (29)


x y

from which it follows in the usual way that x* and y* are a solution whenever they satisfy (using
the relationships JR'Rh = RJR and R-1JR' = J RR-1 ):

x* = J RR-1Fy* A- 1 , (30a)
y* = C-1F'hx* . (30b)

These are the well-known reciprocal averaging, dual scaling, or transition formulas of
correspondence analysis (e.g., Nishisato 1980). So under the specifications (24a) and (24b) of
trying to minimize the distance between a species and a site in the degree of their abundance
correpondence analysis is a special way of performing internal unfolding.
In order to obtain a solution of dimensionality greater than one, a third normalization
condition must be imposed to avoid repetition of the first solution in the columns of X and Y
(because that would actually give the smallest value of the loss function). How to do this is not free
from arbitrariness under the present rationale of the method. Usually one requires in addition that
the coordinates of the higher dimensions are R- or C-orthogonal with respect to the earlier ones.
This gives the stationary equations of a higher-dimensional correspondence analysis. The formulas
are omitted here (but see section 3.6). Healy and Goldstein (1976) have argued that the "usual"
normalization conditions are in fact restrictions, and they presented an alternative solution based on
linear restrictions that can be freshly chosen in any particular application. Whether the freedom
gained should be considered an asset or a liability is difficult to say.
Even within the confines of the usual normalization conditions there remains an awkward
arbitrariness with regard to the species-site distances in a joint plot We can just as well normalize y
and leave x free, thereby obtaining the same value of the loss function. There is also the possibility
to "distribute A" among x and y. Although in all cases the weighted mean squared distance (25)
remains equal, the actual Euclidean distances between species points and site points may change
considerably, especially when Ais small. This was one of the reasons for Legendre and Legendre
(1983, p. 278) to warn against making biplots; for who can withdraw from considering distances
while looking at a configuration of points! Also note that the "folding" interpretation of picking the
representation up at a species point i in order to obtain an approximate reconstruction of the i'th
row of the data matrix will give different results under different normalizations.
Finally, we may substitute (30a) in (30b), or vice versa, from which an eigenvalue-
eigenvector problem in only one of the sets remains. So in contrast to the general unfolding
209

problem, correspondence analysis "has no memory" for the previous locations of the same set
when solved iteratively by alternating between (30a) and (30b); in fact one of the sets of points is
superfluous for solving the problem! Therefore the recognition that it is formally a special case of
unfolding has limited value. It is often preferable to view correspondence analysis - or, for that
matter, principal components analysis - as a way to perform two related, "dual" multidimensional
scaling problems, in which one tries to fit the so-called chi-squared distances among the rows or
columns of the data matrix. This specific viewpoint is more fully explained in Heiser and Meulman
(1983a) and Fichet (1986). An up-to-date, comprehensive account of the method was provided by
Greenacre (1984), who was also the first who seriously compared correspondence analysis with
unfolding in his 1978 dissertation. The use of (24a) and (24b) in connection with the standard
unfolding loss function was suggested by De Leeuw (personal communication) and more fully
worked out in Heiser (1981). Hayashi (1952, 1954, 1956, 1974) based his "theory of
quantification" almost entirely on (25), and dealt with many of the possible appearances the matrix
F can have.

3.4. Approximation with squared distances: Gaussian ordination

In one of his early papers on multidimensional scaling, Shepard (1958) adduced evidence for
an exponential decay function relating frequency of substitution behaviour to psychological
distance. Transferring this idea,we could model expected frequency E(fij) as:

(31)

with ~i a positive number representing the maximum of the function (attained when the species
point Xi coincides with the site point y), and (Ij a positive number representing the dispersion or
tolerance of the species distribution. From (31) it follows that log expected frequency is linear in
the distances:

(32)

Under this model, then, we could still use the SMACOF algorithm by generalizing the definition of
8ij again a little, writing

(33)

where Ili = (Ii log ~i' In fact, this model inspired the earlier definition of Oij' (Sa), where Ili could
be omitted by fixing ~i equal to one ("to make the curves comparable"). Using (33) instead implies
that we no longer have to use a standardization factor like fi+ (4) prior to the analysis, but can try to
find values that optimize the fit to the data. For the skeleton algorithm it would entail step (iii) to be
a linear regression including an intercept term. The price is n degrees of freedom and, as experience
210

seems to attest, a less well-behaved algorithm.


Closely related to the exponential decay function is the Gaussian form
2
E(fij) = Pi e-d (X;'Yj) I (Xi , (34)

which was studied in ecology by !hm and van Groenewoud (1975), Austin (1976), Kooijman
(1977), Gauch and Chase (1974), Gauch et al. (1974), and others. Also see Schonemann and
Wang (1972). Under the Gaussian decay function it is again the species-site distance that plays the
central part. But now log expected frequency is linear in the squared distances, and this suggests
that we can use (33) in combination with the alternate loss function

(35)

which is called SSTRESS by Takane et al. (1977), who proposed it as a general MDS loss function,
and which was studied in detail for the unfolding case by Greenacre (1978) and Browne and
Greenacre (1986). Here, as in the SMACOF algorithm, Bij may be a fixed set of dissimilarities, or
some function of the original frequencies like (33). The regression principle remains the same.
Minimizing (35) would form a feasible and efficient alternative for the maximum likelihood
methods of Johnson and Goodall (1980) or!hm and van Groenewoud (1984), or the least squares
method of Gauch et al. (1974). In the latter methods it is not the data that is transformed, but the
distances. The STRESS and SSTRESS methods are based on optimal rescaling to achieve reduction
of structural complexity, the same data analytic principle on which the nonlinear multivariate
analysis and path analysis methods are based that are discussed by De Leeuw (1987a, 1987b) in
this volume.
It is possible to relate SSTRESS and STRESS in the following way (Heiser and De Leeuw
1979):

cr2SD = Li L j { ...JBij + d(Xi,yj)}2{ ...JBij - d(Xi,Yj)}2


=4 Li L j Bij { ...JBij - d(Xi,Yj)}2 , (36)

the approximation being better if the dissimilarities and distances match well. So we can simulate
SSTRESS solutions with the SMACOF algorithm by using an additional square root transformation
and choosing the dissimilarities as weights. This form of weighting will tend to give less emphasis
to local relationships, in favour of getting the large distances right
Ihm and van Groenewoud (1984), Ter Braak (1985), and Ter Braak and Barendregt (1986)
recently compared maximum likelihood estimation under the Gaussian response model with
correspondence analysis, as we have seen a technique also based on the squared Euclidean distance
function. The results are encouraging for correspondence analysis, especially if the species
dispersions are homogeneous.
211

3.5. Further special cases and extensions

Kershaw (1968) used a square root transformation of the abundances to make them less
heterogeneous. It is, of course, one of the usual statistical ways to stabilize the variance. Now
suppose we take the inverse square root as an alternative definition of dissimilarity, and the
frequencies themselves as weights:

Oij = 1/ ...Jfij and Wij = f ij if f ij > 0, (37a)


0ij = 1 and wij = 0 if fij=O, (37b)

Then the basic loss function cr2R transforms into (P denotes all pairs present, (37a»

cr2R = L(.IJ')~D
u
f..IJ {1I...JfIJ.. - d(x·l' y.)}2
J

= L(iJ)EP {1- ...Jfij d(Xi,yj) P = L(ij)EP {I - d(XbYj)/OijP , (38)

Thus loss is measured in terms of the ratio of distance and dissimilarity (for a defense of using
these relative deviations, see Greenacre and Underhill 1982), and we now obviously give more
weight to the small dissimilarities. It is interesting to compare this weighting structure with yet
another loss function, proposed by Ramsay (1977). He similarly argued that dissimilarity
measurements in psychology are frequently lognormally distributed. The lognormal arises from the
product of many independent and (nearly) identically distributed random variables. It has been
frequently applied as a model for the variation of nonnegative quantities (Aitchison and Brown
1957; Derman et al. 1973), indeed also for abundances (Grundy 1951). If dissimilarity is assumed
to be lognormally distributed we should work with

(39)

which forms the basis of Ramsay's MULTISCALE algorithm. In order to relate it to the standard
loss, we can use the first order approximation

(40)

from which it follows that (De Leeuw and Heiser 1982):

(41)

So Ramsay's loss function can be approximated by using the inverse squared dissimilarities as
weights in the standard loss function. The same reasoning is present in (37a), which led to (38).
The choice between so many possible types of transformation of the raw data can be
circumvented by defming a radically extended class of transformations as

(42)
212

So dissimilarity should increase whenever abundance decreases, for each species separately. This
specification would form the basis of a row-conditional, nonmetric unfolding algorithm. The idea
to pose merely monotonicity (42) as the basis of the technique is due to Coombs (1950). He did not
provide a working algorithm, however; it was not until the sixties that Shepard, Kruskal, Guttman
and others developed general nonmetric MDS algorithms (Kruskal 1977; De Leeuw and Heiser
1982). Technically, our skeleton algorithm only needs alteration in step (iii), where the type of
regression performed should be of the monotonic, or isotonic, variety (Kruskal 1964a, 1964b;
Barlow et al. 1972). Yet the nonmetric unfolding case always remained something of a problem,
due to a phenomenon called degeneration: a tendency to collapse many points, or, anyhow, to make
all distances equal (cf. section 4.3). These problems, and proposals to resolve them (although not
fully satisfactorily), are explained in Kruskal and Carroll (1969) and in Heiser (1981), who argued
that it is necessary to put bounds on the regression. Subsequently Heiser (1985, 1986) proposed a
smoothed form of monotonic regression in order to obtain a better behaving algorithm, and this
refmement might make standard application of nonmetric unfolding feasible.
The one-dimensional case of any STRESS minimizing algorithm deserves special care.
Guttman (1968) already pointed out its special status, and De Leeuw and Heiser (1977), also see
Heiser (1981), showed that the SMACOF algorithm does not really resolve the combinatorial
complications that arise in this case. Quite independently, Wilkinson (1971) made some insightful
observations on a form of one-dimensional unfolding, and showed the connection with the so-
called travelling salesman problem. Poole (1984) analysed the situation along the lines of the
graphical version of the algorithm in Figure 4, and proposed an improvement for the one-
dimensional case. Fortunately we now also have Hubert and Arabie (1986), who provided a
globally convergent, dynamic programming algorithm for one-dimensional MDS, extending the
work of Defays (1978). Little is known about its performance in the unfolding situation, but it
surely marks an exciting step forward.

3.6. Restrictions on the locations

In this section the major tools are described for restricting the locations of either the species
points, or the site points, or both. This is done first for the SMACOF algorithm, next for correspon-
dence analysis. Remember the SMACOF algorithm always starts with the preliminary updates X~

and Y~, as defined in (14a) and (18). These provide the basic corrections necessary to obtain a
better fit to the dissimilarities. From the general results of De Leeuw and Heiser (1980) it then
follows that the remaining task is to find

min tr { X'RX + Y'CY - 2 X'WY - 2 X'X~ - 2 Y'Y~ } , (43)


(X,Y) E n
213

where a is the domain of minimization, orfeasible region. When X and Y are completely free, a
is the set of all (combined) nxp and mxp matrices, and from equating the partial derivatives to zero
one obtains the system of linear equations (19a) and (19b) for the unrestricted internal unfolding
problem. In De Leeuw and Heiser (1980) it is also shown that it is not at all necessary to solve
problem (43) completely; it suffices to move from a feasible point into the right direction for
minimizing it. The algorithm will still converge to at least a local minimum. This important fact
opens the possibility to use alternating least squares, i.e. to split the parameter set into subsets, and
to alternate among the subset minimizations. The obvious candidate for a flrst split in the unfolding
situation is into X and Y, and accordingly (43) can be split into two subproblems (again writing %
and If for flxed matrices, and after some rearrangement of terms):

min tr {X - R-l(X~ + Wlf)}'R {X - R-l(X~ + Wlf)} + constant, (44a)


XE ax

min tr {Y - C-l(y~ + W'%)}'C{Y - C-l(y~ + W'%)} + constant. (44b)


YE ay
These are two projection problems, one in the metric R and the other in the metric C. The former
immediately gives (14b), the solution to the external unfolding problem when 'l( is fixed. It will be
evident that there is a variety of possibilities now in between the internal and the external approach
(in between indirect and direct gradient analysis).
In Heiser (1981, chapter 8) two examples of restricted unfolding were studied in detail. For
preference data with respect to family compositions, i.e. combinations of number of sons and
number of daughters, equality constraints were used in such a way that the family points would
always form a rectangular grid in two dimensions. So personal preference was supposed to be
single-peaked with respect to the grid, of which the spacings were left free to vary. The resulting
value of S1RESS turned out to be only slightly higher than in the unrestricted case, thus confirming
the validity of supposing lack of interaction. The second example concerned preferences of 137
Members of the Dutch Parliament for nine political parties, and it used their stands on seven
controversial issues as inequality constraints. Note that (44a) and (44b) can be split further down
into dimension-wise components, and this way each axis was associated with a single issue; the
subproblems become weighted monotonic regression problems. For further examples and
refinements, as well as references to other work on restricted MDS, see De Leeuw and Heiser
(1980), Heiser and Meulman (1983a, 1983b), and Meulman and Heiser (1984). Heiser (1981,
chapter 6) also discusses the possibility to impose centroid constraints, implying that each species
should be located in the centre-of-gravity of the sites in which it is dominant.
This brings us back to correspondence analysis. Recall the basic averaging formulas (27a)
and (27b). In order to incorporate restrictions on X and Y, these weighted averages must now be
regarded as the preliminary updates. Suppose we normalize If and keep it fixed, and want to
restrict X E ax. If we write
214

x = X- + (X - X-) with X" = R-IFY , (45)

then it may be verified that the correspondence analysis loss function transforms into

a2CA = tr (X - X-),R(X - X-) + tr Y'(C - F'R-IF)Y . (46)

The second tenn on the right-hand side of (46) is constant, so we again end up with a projection
problem in the metric R, in which X- rather then R-I(X~ + WY) must be projected onto the
feasible region. All the possibilities of restrictions mentioned for the SMACOF algorithm are now
open to us for correspondence analysis. Historically, it is not quite fair to say this, because a lot of
them were used earlier in the developing Gifi system (cf. Gifi 1981). Still, the formulation
presented here is new, and especially putting together (44a) and (46) clarifies the similarities and
differences between unfolding and correspondence analysis a great deal. Ter Braak: (1986a, 1986b)
has further developed the case in which the site locations are linear combinations of environmental
variables, under the name "canonical correspondence analysis".
A special example of restrictions in correspondence analysis is Hill and Gauch' (1980)
method of detrended correspondence analysis. They don't compute all dimensions simultaneously,
but work successively. Their aim is to remove the horseshoe effect, and other nonlinearities in
higher dimensions. To bring it in the present fonnulation, suppose Xl is the first set of scores,
satisfying - as explained in section 3.3 - JRxI = Xl and xI'RxI = n. Then, instead of requiring R-
orthogonality of x2, i.e. x2'RxI = 0, the idea is to have x2locally centered. To do this. an nxkG
matrix G can be fonned on the basis of xl> indicating a partitioning into kG blocks of species that
are close together on X l' Thus G is binary and G 'G is diagonal. The projection matrix

J G = I - G(G'G)-IG' (47)

is the required block-wise centering operator, and the new requirement becomes JGx2 = x2. This
can be inserted in (46), which shows that we have to solve

(48)

The weak point in this method is that it does not provide a unique, convincing definition of G, as a
result of which it may sometimes detrend too much, sometimes too little. This objection is
comparable to the earlier remark on the specificity of Healy and Goldstein's (1976) restrictions.

4. MISCELLANEOUS ISSUES AND DISCUSSION

4.1. Homogeneity analysis

Homogeneity analysis is the key method of the Gifi system of nonlinear multivariate analysis
(De Leeuw 1984; 1987a). It employs indicator matrices as a basis for all nonlinear transfonnations
215

of a given set of variables, and selects precisely those transformations that are as homogeneous as
possible. If the data matrix F in correspondence analysis is chosen as the set of concatenated
indicator matrices, we obtain solutions that are essentially equivalent to those of homogeneity
analysis. An extended discussion on the details of this connection can be found in Heiser (1981,
chapter 4). There, as well as in Heiser (1985), it was argued that in the case of shifted single-
peaked variables the homogeneity approach should not be followed without restraint. If we think it
is characteristic for species to have distributions that are shifted with respect to each other, we
should not center them (which is part of making them as homogeneous as possible). If, moreover,
the variables are thought to give an asymmetrical type of information, i.e. high abundance indicates
similarity of sites and low abundance dissimilarity, then we should not try to give equally dissimilar
sites as much as possible the same quantification.
Homogeneity analysis in a generalized sense can still be used, provided the right kind of
change of variables, or variable coding, is chosen. One possibility is to use conjoint coding
(Heiser 1981 p.123), which associates a nested sequence of sites to each species. The rationale of
conjoint coding is to assume that we deal with only one multinomial variable, species composition,
with the n species as its categories, and separately established for each site. Reliance on the exact
numerical values of abundance can be avoided by considering K level sets, from "exceptionally
abundant" via "moderately abundant" to "not absent" (note that the level sets are cwnulative). In
conjoint coding K binary m x n matrices are defined, the k'th of which indicates the presence, in
site j, of species i at level of abundance at least k. These are not ordinary indicator matrices, as they
do not have mutually exclusive columns, nor row sums equal to one, but they can be submitted to a
correspondence analysis just as well. All sites corresponding to the 'ones' in any column should be
as closely as possible together, and the weighted mean scores of the columns should be as far as
possible apart. The description here deviates from Heiser (I.c.), but only to the effect that a
different order of columns is used. This method was proposed earlier by Wilkinson (1971) and,
independently, by Hill et al. (1975), who called it the "method of pseudo-species" (see also Hill
1977).
A second possibility is to use convex coding (Heiser, 1981, section 5.3), which is especially
tailored to the situation where there are more species or individuals than sites, because it uses,the
geometrical property that the site space can be partitioned into so-called isotonic regions. Convex
coding does work with ordinary indicator matrices. Since these alternative ways of coding have not
yet been used a great deal, their data analytic value is uncertain.

4.2. Optimal rearrangement

It is well-known that both correspondence analysis and homogeneity analysis have a


remarkable rearrangement property: if the rows of the table can be reordered in such a way that all
columns become single-peaked, or have the so-called consecutive ones property, then both
216

techniques will find the correct ordering as their fIrst dimension (see Guttman 1950, and Hill 1974,
for somewhat less general statements; Heiser (1981, section 3.2) proved the proposition in the
form stated here; see Schriever 1985, for a comprehensive discussion of such ordering properties).
One would of course like to be able to say that each unfolding method shares this property,
but it is an open question under what conditions anyone unfolding technique can be said to achieve
an optimal rearrangement in the above sense. Perhaps it is necessary to assume symmetry of the
single-peaked functions. A second open question is how to devise an effIcient method that directly
optimizes the single-peakedness condition. Wilkinson (1971) proposed a combinatorial method to
fmd a permutation yielding consecutive ones, but little is known about its effectiveness.

4.3. Horseshoes

It is important to discern at least four different situations in which a curved confIguration of


points can arise from a p-dimensional analysis (p ~ 2). All of them have occasionally been indicated
with the term "horseshoe".
In the fIrst place there is the polynomial curvature emerging in correspondence analysis and
homogeneity analysis when the fIrst dimension is strongly dominant. This could best be called the
Guttman effect, as is usually done in France, because it gives the right credit to Guttman (1950).
The background of this phenomenon was discussed recently in greater detail by De Leeuw (1982)
and by Van Rijckevorsel (1986).
In the second place there is the more strongly curved, sometimes even convoluted case
obtained when the principal components of single-peaked data are studied directly (i.e., without the
normalizations, centering, and weighting involved in correspondence analysis). Here the points are
frequently distributed along the greater part of circles, ellipses and ellipsoids; it is much more
diffIcult to recognize such regularities in practice. Therefore, in evaluation studies of ordination
techniques, such as Whittaker and Gauch (1978), correspondence analysis is usually considered to
be the more satisfactory technique.
Inasmuch as the data are reasonably single-peaked, and provided the tails of the species
distributions are down-weighted (as they usually are in correspondence analysis), STRESS
minimizing unfolding techniques will not produce any curvature at all. Yet in some circumstances a
horseshoe effect can be encountered as well. Again if the data are close to being one-dimensional,
this time in terms of the distances, both MDS and unfolding tend to produce C- or S-shaped
configurations. Shepard (1974, p.386) characterized the situation as follows: "Evidently, by
bending away from a one-dimensional straight line, the confIguration is able to take advantage of
the extra degrees of freedom provided by additional dimensions to achieve a better fIt to the random
fluctuations in the similarity data. In some published applications, moreover, the possibility of the
more desirable one-dimensional result was mistakenly dismissed because the undetected occurrence
of merely a local minimum (which is especially likely in one-dimension) made the one-dimensional
217

solution appear to yield an unacceptably poor monotone fit and/or substantive interpretation."
Meanwhile there has been considerable technical progress for the one-dimensional case (cf. section
3.5). Also, it seems likely that the MDS-horseshoe frequently arises from the occurrence of large
tie-blocks of large dissimilarities, for instance when they are derived from presence-absence data.
In such cases it is advisable to down-weight the large distances, which also forms the basis of the
so-called parametric mapping technique (Shepard and Carroll 1966). In many of the specifications
in the previous sections down-weighting was used as well.
Finally, there is a typical horseshoe effect for unfolding, due to regression to the mean. If the
regression part in the unfolding algorithm is not selected carefully, for instance a straight-forward
monotonic regression is inserted, then the technique capitalizes on a general property of many kinds
of regression to yield regressed values that are more homogeneous than the regressants. The
unfolding technique is attracted to the extreme case of (nearly) equal pseudo-distances, because it
can so easily find a configuration with equal distances: all points of one set collapsed at a single
location, all points of the other set distributed on part of a circle or sphere around it. Linear or
polynomial regression without an intercept, and restricted forms of monotonic regression seem to
provide the best safe-guards against this type of degeneration (cf. section 3.5).
In conclusion, the horseshoe effect is something to be avoided in most cases, and it can be
avoided by an adequate choice of dimensionality, by using the right kind of nonlinear model,
and/or by well-considered transformations of the observations.

Acknowledgements

I would like to acknowledge gratefully the suggestions of the reviewers, F. James Rohlf and
Robert Gittins, and the comments of Daniel Wartenberg and Cajo I.F. ter Braak on an earlier draft.

REFERENCES

AITCHISON, I. AND lA.C. BROWN, 1957. The Lognormal Distribution. Cambridge University
Press, New York, NY.
AUSTIN, M.P. 1976. On non-linear species response models in ordination. Vegetatio 33: 33-41.
AUSTIN, T.L.jr. 1959. An approximation to the point of minimum aggregate distance. Metron 19:
10-21.
BARLOW, R.E., D.J. BARTHOLOMEW, J.M. BREMNER, AND H.D. BRUNK. 1972. Statistical
Inference under Order Restrictions. Wiley, New York, NY.
BRAUN-BLANQUET, J. AND H. JENNY. 1926. Vegetationsentwicklung und Bodenbildung in der
alpinen Stufe der Zentralalpen. Neue Denkschr. Schweiz. Naturforsch. Ges. 63: 175-349.
BRAY, R.J. AND J.T. CURTIS. 1957. An ordination of the upland forest communities of Southern
Wisconsin. Eco1. Monogr. 27: 325-249.
BROWN, R.T. AND T.T. CURTIS. 1952. The upland conifer-hardwood forests of nothern
Wisconsin. Eco1. Monogr. 22: 217-234.
BROWNE, M.l AND M.W. GREENACRE. 1986. An efficient alternating least squares algorithm to
perform multidimensional unfolding. Psychometrika 51: in press.
218

CARROLL, J.D. 1969. Polynomial factor analysis. Proc. 77'th Annual Convention of the APA. 4:
103-104.
CARROLL, J.D. 1972. Individual differences and multidimensional scaling, p. 105-155. In R.N.
Shepard et al. [ed.] Multidimensional Scaling, Vol I: Theory. Seminar Press, New York, NY.
COOMBS, C.H. 1950. Psychological Scaling without a unit of measurement. Psych. Rev. 57: 148-
158.
COOMBS, C.H. 1964. A Theory of Data. Wiley, New York, NY.
COOMBS, C.H. AND J.E.K. SMITH. 1973. On the detection of structure in attitudes and develop-
mental processes. Psych. Rev. 80: 337-351.
COTTAM, G. AND J.T. CURTIS. 1956. The use of distance measures in phytosociological
sampling. Ecology 37: 451-460.
COXON, A.P.M. 1974. The mapping of family-composition preferences: A scaling analysis. Social
Science Research 3: 191-210.
CURTIS, J.T. AND R.P. MCINTOSH. 1951. An upland continuum in the prairie-forest border
region of Wisconsin. Ecology 32: 476-496.
DAVISON, M.L., P.M. KING, K.S. KITCHENER, AND C.A. PARKER. 1980. The stage sequence
concept in cognitive and social development. Developm. Psych. 16: 121-131.
DELEEUW, J. 1977. Applications of convex analysis to multidimensional scaling, p. 133-145. In
J.R. Barra et al. [ed.] Recent Developments in Statistics. North-Holland, Amsterdam.
DE LEEUW, J. 1982. Nonlinear principal component analysis, p. 77-89. In H. Caussinus et al.
[ed.] COMPSTAT 1982. Physica Verlag, Vienna.
DE LEEUW, J. 1984. The Gifi system of nonlinear multivariate analysis, p. 415-424. In E. Diday et
al. [ed.] Data Analysis and Informatics, ill. North-Holland, Amsterdam.
DE LEEUW, J. 1987a. Nonlinear multivariate analysis with optimal scaling. In this volume.
DE LEEUW, J. 1987b. Nonlinear path analysis with optimal scaling. In this volume.
DE LEEUW, J. AND W.J. HEISER. 1977. Convergence of correction matrix algorithms for multi-
dimensional scaling, p. 735-752. In J. Lingoes [ed.] Geometric representations of relational
data. Mathesis Press, Ann Arbor, Mich.
DELEEUW, J. AND W.J. HEISER. 1980. Multidimensional scaling with restrictions on the con-
figuration, p. 501-522. In P.R. Krishnaiah [ed.] Multivariate Analysis, Vol V. North-Holland,
Amsterdam.
DELEEUW, J. AND W.J. HEISER. 1982. Theory of multidimensional scaling, p. 285-316. In P.R.
Krishnaiah and L.N. Kanal [ed.] Handbook of Statistics, Vol 2. North-Holland, Amsterdam.
DEFAYS, D. 1978. A short note on a method of seriation. Brit. J. Math. Stat. Psych. 31: 49-53.
DERMAN, C., L.J. GLESER, AND I. OLKIN. 1973. A Guide to Probability Theory and Application.
Holt, Rinehart and Winston, New York, NY.
DRAPER, N.R. AND H. SMITH. 1966. Applied Regression Analysis. Wiley, New York, NY.
FICHET, B. 1986. Distances and Euclidean distances for presence-absence characters and their
application to factor analysis. In J. de Leeuw et al. [ed.] Multidimensional Data Analysis.
DSWO Press, Leiden, in press.
GABRIEL, K.R. 1971. The biplot graphic display of matrices with application to principal
component analysis. Biometrika 58: 453-467.
GAUCH, H.G. 1982. Multivariate analysis in community ecology. Cambridge University Press,
Cambridge.
GAUCH, H.G. AND G.B. CHASE. 1974. Fitting the Gaussian curve to ecological data. Ecology 55:
1377-1381.
GAUCH, H.G., G.B. CHASE, AND R.H. WHITTAKER. 1974. Ordination of vegetation samples by
Gaussian species distributions. Ecology 55: 1382-1390.
GAUSE, C.F. 1930. Studies of the ecology of the orthoptera. Ecology 11: 307-325.
GIFI, A. 1981. Nonlinear Multivariate Analysis. Department of Data Theory, University of Leiden,
Leiden.
GITTINS, R. 1985. Canonical Analysis: A Review with Applications in Ecology. Physica Verlag,
Berlin.
GOODALL, D.W. 1954. Objective methods for the classification of vegetation, m. An essay in the
use of factor analysis. Aust. J. Bot. 2: 304-324.
GREENACRE, M.l 1978. Some objective methods of graphical display of a data matrix. Special
Report, Dept. of Statistics and Operations Research, University of South-Africa, Pretoria.
219

GREENACRE, M.J. 1984. Theory and Applications of Correspondence Analysis. Academic Press,
London.
GREENACRE, M.J. AND L.G. UNDERHILL. 1982. Scaling a data matrix in a low-dimensional
Euclidean space, p. 183 - 268. In D.M. Hawkins [ed.] Topics in Applied Multivariate Analysis,
Cambridge University Press, Cambridge.
GREIG-SMITH, P. 1983. Quantitative Plant Ecology, 3'rd Ed. Blackwell Scient. PubI., London.
GRUNDY, P.M. 1951. The expected frequencies in a sample of an animal population in which the
abundances of species are lognonnally distributed, I. Biometrika 38: 427-434.
GUTTMAN, L. 1950. The principal components of scale analysis. In S.A. Stouffer et al. [ed.]
Measurement and Prediction. Princeton University Press, Princeton, N1.
GUTIMAN, L. 1968. A general nonmetric technique for finding the smallest coordinate space for a
configuration of points. Psychometrika 33: 469-506.
HAYASHI, C. 1952. On the prediction of phenomena from qualitative data and the quantification of
qualitative data from the mathematico-statistical point of view. Ann. Inst. Statist. Math. 2: 93-
96.
HAYASHI, C. 1954. Multidimensional quantification - with applications to analysis of social
phenomena. Ann. Inst. Stat. Math. 5: 121-143.
HAYASHI, C. 1956. Theory and example of quantification, II. Proc. Inst. Stat. Math. 4: 19-30.
HAYASHI, C. 1974. Minimum dimension analysis MDA. Behavionnetrika 1: 1-24.
HEALY, M.J.R AND H. GOLDSTEIN. 1976. An approach to the scaling of categorised attributes.
Biometrika 63: 219-229.
HEISER, W.J. 1981. Unfolding Analysis of Proximity Data. Ph.D.Thesis, University of Leiden,
Leiden, The Netherlands.
HEISER, W.J. 1985a. Undesired nonlinearities in nonlinear multivariate analysis. In E. Diday et al.
[ed.] Data Analysis and Informatics, IV. North-Holland, Amsterdam, in press.
HEISER, W.J. 1985b. Multidimensional scaling by optimizing goodness-of-fit to a smooth
hypothesis. Internal ReportRR-85-07, Dept. of Data Theory, University of Leiden.
HEISER, W.J. 1986. Order invariant unfolding analysis under smoothness restrictions. Internal
Report RR-86-07, Dept. of Data Theory, University of Leiden.
HEISER, W.J. AND J. DE LEEUW. 1979. How to use SMACOF-I (2nd edition). Internal Report,
Dept. of Data Theory, University of Leiden.
HEISER, W.J. AND J. MEULMAN. 1983a. Analyzing rectangular tables by joint and constrained
multidimensional scaling. J. Econometrics 22: 139-167.
HEISER, W.J. AND J. MEULMAN. 1983b. Constrained multidimensional scaling, including confir-
mation. Applied Psych. Meas. 22: 139-167.
HILL, M.O. 1974. Correspondence analysis: a neglected multivariate method. Applied Statistics 23:
340-354.
HIlL, M.O. 1977. Use of simple discriminant functions to classify quantitative phytosociological
data, p. 181-199. In E. Diday et ai. [ed.] Data Analysis and Infonnatics, I. INRIA, Le Chesnay,
France.
HILL, M.O., RG.H. BUNCE, AND M.W. SHAW. 1975. Indicator species analysis, a divisive
polythetic method of classifcation, and its application to a survey of native pinewoods in
Scotland. J. Ecoi. 63: 597-613.
HILL, M.O. AND H.G. GAUCH. 1980. Detrended correspondence analysis: an improved ordination
technique. Vegetatio 42: 47-58.
HODSON, F.R et al. [ed.] 1971. Mathematics in the Archaeological and Historical Sciences.
Edinburgh University Press, Edinburgh.
HOVLAND, C.I., O.J. HARVEY, AND M. SHERIF. 1957. Assimilation and contrast effects in re-
actions to communication and attitude change. J. Abnorm. Soc. Psych. 55: 244-252.
HUBERT, L. AND Ph. ARABIE. 1986. Unidimensional scaling and combinatorial optimization. In 1.
de Leeuw et al. [ed.] Multidimensional Data Analysis. DSWO Press, Leiden (in press).
IGOSHINA, K.N. 1927. Die Pflanzengesellschaften der Alluvionen der Flusse Kama und
Tschussowaja (in Russian with German summary). Trav. de 1'lnst. BioI. al'Univ. de Penn 1:
1-117.
!HM, P. AND H. VAN GROENEWOUD. 1975. A multivariate ordering of vegetation data based on
Gaussian type gradient response curves. J. Ecol. 63: 767-777.
!HM, P. AND H. VAN GROENEWOUD. 1984. Correspondence analysis and Gaussian ordination.
220

COMPSTATLectures 3. Physica Verlag, Vienna, 5-60.


JOHNSON, R.W. AND D.W. GOODALL. 1980. A maximum likelihood approach to non-linear
ordination. Vegetatio 41: 133-142.
KENDALL, D.G. 1963. A statistical approach to Flinders Petrie's sequence dating. Bull. Inst.
Statist. Inst. 40: 657-680.
KERSHAW, K.A. 1968. Classification and ordination of Nigerian savanna vegetation. J. Ecol. 56:
467-482.
KERSHAw, K.A. AND J.H.H. LOONEY. 1985. Quantitative and Dynamic Plant Ecology, 3rd Ed.
Edward Arnold Publ., London.
KOOIJMAN, S.A.L.M. 1977. Species abundance with optimum relations to environmental factors.
Ann. Systems Res. 6: 123-138.
KRUSKAL, J.B. 1964a. Multidimensional scaling by optimizing goodness of fit to a nonmetric
hypothesis. Psychometrika 29: 1-28.
KRUSKAL, J.B. 1964b. Nonmetric multidimensional scaling: a numerical method. Psychometrika
29: 115-129.
KRUSKAL, J.B. 1977. Multidimensional scaling and other methods for discovering structure, p.
296-339. In K. Enslein, A. Ralston and H.S. Wilf [ed.] Statistical Methods for Digital
Computers, Vol m. Wiley, New York, NY.
KRUSKAL, J.B. AND J.D. CARROLL. 1969. Geometrical models and badness-of-fit functions,
p.639-671. In P.R. Krishnaiah [ed.] Multivariate Analysis II. Academic Press, New York,
NY.
KUHN, H.W. 1967. On a pair of dual nonlinear programs, p. 38-54. In J. Abadie [ed.] Methods
of nonlinear programming. North-Holland, Amsterdam.
LEGENDRE, L. AND P. LEGENDRE. 1983. Numerical Ecology. Elsevier Scient. Publ., Amsterdam.
McDONALD, R.P. 1962. A general approach to nonlinear factor analysis. Psychometrika 27: 397-
415.
McDoNALD, R.P. 1967. Nonlinear factor analysis. Psychometric Monograph 15.
MEULMAN, J. AND W.J. HEISER. 1984. Constrained multidimensional scaling: more directions
than dimensions, p. 137-142. In T. Havranek et al. [ed.] COMPSTAT 1984, Proceedings in
Computational Statistics. Physica Verlag, Vienna.
NISHISATO, S. 1980. Analysis of categorical data: dual scaling and its applications. University of
Toronto Press, Toronto.
NOY-MEIR, I. AND M.P. AUSTIN. 1970. Principal component ordination and simulated vegeta-
tional data. Ecology 51: 551-552.
POOLE, K.T. 1984. Least squares metric, unidimensional unfolding. Psychometrika 49: 311-323.
RAMSAY, J.O. 1977. Maximum likelihood estimation in multidimensional scaling. Psychometrika
42: 241-266.
ROBERTS, F.S. 1976. Discrete mathematical models. Prentice Hall, Englewood Cliffs, NJ.
SCHRIEVER, B.F. 1985. Order Dependence. Ph.D. Thesis, Amsterdam: Mathematical Centre.
SHEPARD, R.N. 1958. Stimulus and response generalization: deduction of the generalization
gradient from a trace model. Psych. Rev. 65: 242-256.
SHEPARD, R.N. 1974. Representation of structure in similarity data: problems and prospects.
Psychometrika 39: 373-421.
SHEPARD, R.N. AND J.D. CARROLL. 1966. Parametric representation of nonlinear data structures,
p. 561-592. In P.R. Krishnaiah [ed.] Multivariate Analysis, Vol. I. Academic Press, New
York, NY.
SWAN, J.M.A. 1970. An examination of some ordination problem by use of simulated vegetation
data. Ecology 51: 89-102.
TAKANE, Y., F.W. YOUNG, AND J. DE LEEUW. 1977. Nonmetric individual differences multi-
dimensional scaling: an alternating least squares method with optimal scaling features. Psycho-
metrika 42: 7-67.
TER BRAAK, C.J.F. 1985. Correspondence analysis of incidence and abundance data: properties in
terms of a unimodal response model. Biometrics 41: 859-873.
TER BRAAK, C.J.F. 1986a. Canonical correspondence analysis: a new eigenvector technique for
multivariate direct gradient analysis. Ecology 67: in press.
TER BRAAK, C.J.F. 1986b. The analysis of vegetation-environment relationships by canonical
correspondence analysis. Vegetatio 65: in press.
221

TER BRAAK, C.J.F. AND L.G. BARENDREGT. 1986. Weighted averaging of species indicator
values: its efficiency in environmental calibration. Math. Biosciences 78: 57-72.
THURSTONE, L.L. 1927. A law of comparative judgment. Psych. Rev. 34: 278-286.
VAN RuCKEVORSEL, J.L.A. 1986. About horseshoes in multiple correspondence analysis, p. 377-
388. In W. Gaul and M. Schader [ed.] Classification as a tool of research. North-Holland,
Amsterdam.
WHITTAKER, RH. 1948. A vegetation analysis of the Great Smokey Mountains. Ph.D. Thesis,
University of lllinois, Urbana.
WHITTAKER, RH. 1967. Gradient analysis of vegetation. BioI. Rev. 42: 207-264.
WHITTAKER, RH. 1978. Ordination of Plant Communities. Dr. W. Junk PubI., The Hague.
WHITTAKER, RH. AND H.G. GAUCH. 1978. Evaluation of ordination techniques, p. 277-336. In
RH. Whittaker [ed.] Ordination of Plant Communities. Dr. W. Junk PubI., The Hague.
WILKINSON, E.M. 1971. Archaeological seriation and the travelling salesman problem, p. 276-
283. In F.R Hodson et al. [ed.] Mathematics in the Archaeological and Historical Sciences.
Edinburgh University Press, Edinburgh.
Clustering under a priori models
SOME N-ON-ST ANDARD CLUSTERING ALGORITHMS

James C. Bezdek
Computer Science Department
University of South Carolina
Columbia, South Carolina 29208 USA

Abstract - This paper is a (non-exhaustive) survey of the theory of fuzzy


relations and partitions as it has been applied to various clustering algorithms.
More specifically, the structural models discussed will be object and relational
criterion functions, convex decompositions, numerical transitive closures, and
generalized k-nearest neighbor rules. We first discuss the role clustering plays in
the development of pattern recognition systems, which generally involve feature
analysis, clustering, and classifier design. Then selected clustering algorithms
based on each of the above methodologies will be reviewed. Recent applications
from various fields which use these algorithms are documented in the references.

1. INTRODUCTION
It has been twenty one years since Zadeh (1965) introduced fuzzy sets
theory in 1965 as a vehicle for the representation and manipulation of non-
statistical uncertainty. Since that time the theory of fuzzy sets and their applica-
tions in various disciplines have often been controversial, usually colorful, and
always interesting (c.f. Arbib 1977, Tribus 1979,Lindley 1982). At this writing
there are perhaps 10000 researchers (worldwide) actively pursuing some facet of
the theory or an application; there is an international fuzzy systems society
(IFSA); many national societies (e.g., NAFIPS, IFSA-Japan, IFSA-China, etc.);
and at least (5) journals (Int. Jo. Fuzzy Sets and Systems, Int. Jo. Man-Machine
Studies, Fuzzy Mathematics (in Chinese), BUSEFAL, and the newly announced
Int. Jo. of Approximate Reasoning) devoted in large part to communications on
fuzzy methodologies. A survey of even one aspect of this immense body of
work is probably already beyond our grasp. The purpose herein is to briefly
characterize the development of fuzzy techniques in cluster analysis, one of the
earliest application areas for fuzzy sets. In view of my previous remarks, it is
NATO AS! Series, VoL G14
Developments in Numerical Ecology
Edited by P_ and L. Legendre
© Springer-Verlag Berlin Heidelberg 1987
226

clear that many papers which might be important landmarks will be overlooked;
for these oversights (which are, of course, unintentional, and due to my own lim-
ited perspective of the field) I apologize a priori.
Section 2 presents a brief description of pattern recognition systems. Sec-
tion 3 contains an overview of the two axiomatic structures that support most of
the fuzzy clustering methodologies that seem to persist - viz., the fuzzy partition
of a finite data set; and the fuzzy similarity relation between two finite sets of
objects. These two structures are isomorphic in the crisp (Le., non-fuzzy) case,
but do not readily lend themselves to direct connections in the more general set-
ting. Section 4 is devoted to clustering algorithms designed to produce fuzzy
partitions. Algorithms are grouped into five categories: relational criteria, object
criteria, decompositions, numerical transitive closures, and generalized k-nearest
neighbor rules.

2. PATTERN RECOGNITION SYSTEMS


This section is not about fuzzy sets. In fact, it is a bit of a digression from
the main topic of the paper. However, I felt it imperative to include an over-
view of pattern recognition because numerical ecologists need to be aware that
the algorithms they usually consider for ecological data processing are only a
very small fraction of the techniques available. As an example, I heard many
attendees at the workshop discuss "ordination, " which was taken to mean
feature extraction via multidimensional scaling (MDS). I had the impression that
many ecologists believe that MDS and its offshoots are the only methods avail-
able for reducing multidimensional data to several "key" dimensions. The pur-
pose of Section 2 is to help readers overcome the insularity in both methodology
and terminology that sometimes develops quite inadvertently within an area of
technical expertise. Hopefully, Section 2 will expand horizons for numerical
ecologists by pointing ;to literature areas which deal with data procesing prob-
lems of interest to them. Section 2A records our notation and definitions; Sec-
tions 2B, 2C, and 2D, respectively, describe the three main activities that
comprise the design and implementation of a PRS - viz., feature analysis, clus-
tering, and classifier design.

2.A. Numerical Pattern Recognition Systems


First let us differentiate between numerical and syntactic PRS's. By numer-
ical I mean here a system whose inputs are vectors of real numbers (object data);
or numerical estimates of pairwise relations between objects (relational data).
Either kind of data may be measured directly, derived from direct measurements,
227

or provided by subjective analysis of a physical process. Numerical PRS's


include those based on statistical, deterministic, fuzzy, and heuristic models.
Syntactic PRS's (also variously called structural, semantic, grammatical,
linguistic), on the other hand, are predicated on the theory of formal languages,
and include consideration of ideas such as syntax, trees, and stochastic gram-
mars. Although the objectives pursued in syntactic PRS's are often the same as
those discussed below, their characterization and methodology follows quite a
different path than the course we intend to pursue. Moreover, very little work
has been done by the fuzzy sets community towards generalizing or improving
syntactic methods using Zadeh's idea. Followers of this branch of PR who wish
to pursue those few papers devoted to fuzzy syntactic methods should probably
start with the works of the late K. S. Fu (and his students), who was an ardent
champion of both syntactic PR and fuzzy sets (Fu 1974,Fu 1982). Interested
readers will find a very readable introduction to a broad spectrum ·of syntactic
approaches in Thomason and Gonzalez (1981); this concludes our discussion of
syntactic methods.
The (arguably) most widely accepted branch of numerical PR appears to be
statistical PR, which is built upon the notions of Bayesian decision theory. The
idea that numerical measurements drawn from a population could be separated
into subclasses or components of a mixture density dates back to at least 1898,
when Pearson (1898) discussed the use of the method of sample moments to
separate a mixture of two univariate normal densities into its components. Duda
and Hart (1973) credit Tryon (1939) with the first exposition of non-statistical
clustering strategies based on hierarchical (Le., relational) methods. These
methods have evolved into an entire body of literature, due mainly to the
influence of numerical taxonomy, which is of course elegantly represented by
Sneath and Sokal (1973).
More recently, fuzzy sets have been used as a basis for many PR problems.
The earliest paper espousing this viewpoint was the work of Bellman, Kalaba,
and Zadeh (1966). It is our intent, of course, to register the high points in the
evolution of this branch of numerical PRo Towards this end, we next describe
what constitutes a "typical" PRS.
To begin, let me record my definition of the term "pattern recognition;" I
believe a defensible case can be made for a simple statement: pattern recogni-
tion is [any method for] the search for structure in data. This is quite a general
definition - one that inevitably invites arguments. Some years ago Verhagen
attempted to correlate definitions of PR with activities in "other" disciplines
Verhagen (1975). For our purposes, however, it suffices to note that almost all
228

scientific endeavors involve (one or more of) the elements in the definition above
in some form or another. Figure 1 depicts the four major elements in a (numeri-
cal) PRS: data, feature analysis, clustering, and classification. Note especially
that all four components are "interactive"; each affects (and is affected by)
choices in one or more of the other factors.

Feature Nomination

X = Numerical Object Data

Design Data I
Test Data

,
,
.
"I
Feature Analysis Classifier Design
Extraction ..... .... Error Rates
Selection
..... Prediction
Display Control

~ (
~
R = Relational Data

t
Cluster Analysis

Exploration
Validity
Display
"-

Figure 1. A Typical Numerical Pattern Recognition System.

First and foremost in our illustration are the data, which we usually assume
to be represented by points in a numerical vector space. To be concrete, let X =
{X 1'X2, ... ,xn } denote a set of n feature vectors (or objects) xk in feature space
1Rs . Thus, Xkj E lR is the k-th (measured) observation of feature
j ,l~j~s ,1~k~n. We assume that xk denotes an (sxl) column vector, while its
transpose (xD is a (lxs) row vector. X is often presented in the form of an
(s xn) data array, whose columns are the n xk 's; and whose rows are (s) n-
vectors in "item space" lRn. The object data in Figure 1 are divided into two
229

sets; training or design data; and test data. Test data are presumably needed for
error rate prediction when a classifier has been designed; design or training data
are used to parametrize the classifier - i.e., find decision functions that subse-
quently label each point x E JRs. The other key components in our PRS are:
Feature Analysis (which includes nomination extraction, selection, and display);
Cluster Analysis (which includes cluster validity); and Classifier Design (which
includes performance evaluation and error estimates). There are many other
activities that might be variously connected with one or more components of
Figure 1. In the main, however, the modules in Figure 1 accurately represent
the major constituents of a typical PRS. There is one additional component in
Figure 1 that should be mentioned here - viz., the "relational data " module
shown as a satellite to cluster analysis. It may happen that, instead of object
data (X c JRS) one collects relational data, in the form of an (n xn) numerical
relation matrix R. Data in this form are common, e.g., in numerical taxonomy,
where the item of interest may be relationships between pairs of (implicitly)
defined objects. Thus, rjk' the jk -th element of R, is taken to be the extent to
which implied objects (Xj,xk) in XXX enjoy some relationship, such as similar-
ity, dissimilarity, etc. If we have object data set X, R is often constructed by
computing {rjk=0(xj,Xk)} e.g., if 0=d is a metric on JRs , then R is a dissimi-
larity relational data matrix. When X is given, all of the elements of Figure 1
apply. When R is given and X is only implicit, however, a much narrower
range of problems are presented. Specifically, clusters of the object set X can be
sought, but feature analysis and classifier design are much vaguer propositions.
On the other hand, the objects that are responsible for R may be anything
(species, models, types, categories), and need not have numerical object
representations as vectors xk E JRs. From this point of view clustering in R
becomes a very general problem!

2.B. Feature Analysis for Numerical Object Data


Physical processes are studied, understood, predicted, and ultimately, con-
trolled through the interactions of their variables. The data in a PRS are cons-
ciously chosen to reflect (i) our ability to measure specific quantities, and (ii) our
belief that the features measured will contain "information" that can be
exploited for the purposes listed above. I like to call this phase of design
feature nomination; one proposes the (s) original features based on what can be
measured, and what seems to be important to the problem at hand. The number
(n) of samples in X (some refer to X as one sample of (n) observations - I
prefer to regard X as (n) samples of the process) is largely determined by non-
230

scientific considerations, e.g., time, money, etc.


Once X has been proposed and constructed, we may ask how adequate it is
for clustering and classifier design. Thus we are led to feature analysis. First,
the numbers (s) and (n) are important because they impose implicit (and some-
times explicit) constraints on the type of processing that converts data into infor-
mation. If (s) is large, algorithms that deal with X may be slow, use too much
memory, and be too complex. Conversely, if (n) is small, the effects of (statisti-
cally) small sample size become pervasive. Another consideration about the
nominated features: are they (more or less) "independent." It is clearly advan-
tageous to have each feature contribute something new to the information con-
tent of the data. Consequently, we may seek to modify the originally nominated
features by transformation, or selection, or even addition of more features.
Adding features may be necessary, but leads us in the "wrong direction" with
respect to complexity, storage and time.
Putting aside the possibility that one may wish to augment the nominated
features with more measurements, we ask how to extract a "minimal number of
independent" ones? Mathematically, all methods for feature extraction can be
represented symbolically as images of a function f: X -7Y where Xc R S is the
nominated data; and Y =f [X] c RP, p <s, is the image of X. When f is a
function, it may be linear or non-linear. In other instances f is an algorithm or
process that converts xk E X to YkEY, usually in a non-analytic, non-linear
way. Linear feature extraction includes orthogonal projection of X onto coordi-
nate subspaces of R S (feature selection); orthogonal projection of X onto arbi-
trary linear subspaces (e.g., principal components or ordination); and oblique
projection onto non-orthogonally spanned linear subspaces (e.g., factor analysis
(Johnson 1982)) Other linear extraction mappings include algorithms such as
those devised by Foley and Sammon (1975), Fukunaga and Koontz (1970), and
Friedman and Tukey (1974). It should be noted that different extraction criteria
(which mayor may not coincide with the property measured by f) may be
selected depending upon the ultimate use to be made of X (and hence Y). Thus,
f may be chosen to preserve (local or global) algebraic, statistical or geometric
properties in X; or it may be chosen to optimize a downstream classifier error
rate; or to improve visual display of 2-D images of X that are used, e.g., during
exploratory data analysis (Friedman 1974). Popular non-linear methods fall into
various categories; of these we mention Sammon's algorithm (Sammon 1969),
triangulation (Lee, Slagle, and Blum 1977), Sammon-Triangulation (Biswas,
Dubes, and Jain 1981), and multi-dimensional scaling (Coxon 1982). Addition-
ally, there are a number of selection algorithms that do not have an analytic
231

representation for /, but are driven by optimization of an extraction criterion:


Narendra and Fukunaga (1977) is a good place to start for readers interested in
techniques of this kind.
We briefly touched on the use of feature analysis for display in connection
with Tukey's book (Tukey 1977). The use of color displays and interactive
graphics systems for exploratory data analysis of X in R S by examining visual
images of Y =/ [X] in R2 has become an increasingly important technique in
recent years. Graphical feature analysis is not new: the faces of Chernoff (1973)
are an excellent illustration of this idea. Of course, any of the previous algo-
rithms can be used for at least scatterplot displays of Y by choosing p =2.

2.C. Cluster Analysis


Given a data set X of unlabelled objects, we may presume that X contains
representatives of one or more subclasses which together comprise a mixture
population from which X was drawn (although some of these phrases have
well-defined statistical connotations, none are intended here). Let (c) denote the
number of subclasses represented in both the population and the data. Cluster
analysis in X refers to the problem of partitioning X into (c) subsets whose ele-
ments bear the same relationship to each other as the objects they represent do in
the physical process from whence X was drawn. The type of subset determines
the type of partition: fuzzy subsets lead to fuzzy c-partitions of X; whereas crisp
subsets are used for conventional crisp c-partitions of X. Ruspini (1969) first
characterized fuzzy c-partitions of X; we shall develop this clustering model in
further detail in Section 3. If the data are relational, c-partitions of the object set
can still be obtained, even if the objects are not represented numerically as
X c R S • Zadeh (1971) first discussed relational clustering with fuzzy similarity
relations. As we shall see, the natural isomorphism between crisp c-partitions of
X and equivalence relations in X XX is lost when generalizing these structures to
fuzzy partitions and relations. Nonetheless, there has been extensive develop-
ment of both types of algorithms, some of which will be covered in Section 4.
A related problem is the matter of (c) itself. In the discussion above it was
tacitly assumed that the number of subclasses in X was known. In exploratory
data analysis, however, one assumes nothing about (c), and its determination
becomes a major part of the problem to be solved. I refer to this as the cluster
validity question - for which (c's) do algorithmically determined substructures
in X provide plausible interpretations of the process being studied? Thus, Figure
1 depicts validity as a part of clustering; and when we must find (c), the
analysis is substantially more difficult. Note, as shown in Figure I, e.g., that
232

clustering can be used to do feature extraction; and conversely, if "good"


characteristics are chosen during feature analysis, one may expect excellent
results from almost any clustering algorithm. This depends on how well-
separated clusters in the data are. When the data become murky (overlapped,
noisy, etc.) we expect to have concomitantly greater trouble finding good
features, good clusters, and (c).
Cluster analysis is well represented in the literature. The books by Ander-
berg (1983), Everitt (1980), Lorr (1983), and Hartigan (1975), taken together,
present quite a large number of conventional approaches to clustering. Perhaps
the most extensive compendium of fuzzy clustering algorithms is collected in
Bezdek (1981). There have also been some very interesting graphics algorithms
developed for representing clusters in R S as, e.g., "icicles" (Kruskal and
Lanwehr 1983), and "trees and castles" (Kleiner and Hartigan 1981).

2.D. Classifier Design for Numerical Object Data


The last element shown in our diagram is the classifier. I want to
emphasize the distinction I make between clustering and classification. The
result of clustering is a c-partition of X; the result of classifier design is a c-
partition of R S , the feature space from which X is drawn. Strictly speaking, a
classifier may not actually partition R S , because the decision regions character-
ized by the classifier may not comprise all of R S due to regions of indecision
(ties, which can be eliminated, and "no-decision" regions, which cannot). All
classifiers can be represented by a set of (c) scalar fields, say dj :Rs ~R,I:$;j:$;c.
The job of d j is to assign a label vector, say lex) E R C to any unlabelled
x E R S , where Ij (x) is some function of dj (x). For example, one may assign
x to class U) if dlx»dj(x) for all i::l=j, in which case l(x)=(O,O, .. ,I,O .. ol with a
(1) in the j -th class. This is an example of a crisp classifier; a fuzzy classifier
produces fuzzy label vectors (defined below); and to make matters worse, there
are many crisp classifiers designed with fuzzy algorithms and vice-versa! Just as
clustering can be used to perform feature analysis, it is often the case that clus-
tering in X can lead to a classifier for R S • A classifier can, of course, always be
used to produce clusters in X, by simply submitting each xk E X to {dj }
sequentially. The resultant set of label vectors usually (not always; no constraint
binds successive columns generated by this procedure, so U may be in Ljcn )
constitutes a hard or fuzzy c-partition X. On the other hand, cluster analysis,
which always results in labels for the points in X, may not offer a way to label a
single unknown observation - this is the distinction between clustering and
classification (the confusion arises because clustering does label or "classify"
233

the points in X - but only in X!).


Another facet of classifier design is performance evalu~tion - how well will
the dj 's do on a set of labelled data? The benchmark most often used for
classifier performance is the empirical error rate: submit a set of crisply labelled
data to {dj }, and compute the (number of mistakes)/ (number of tries). The
result is an estimate of the probability of misclassification of {dj }. Returning
again to Figure 1, note that clustering and feature analysis are connected to
classifier design. This points up the fact that there is a beautiful but confound-
ing interaction between the elements of a PRS. Evidently the overall perfor-
mance of the system depends on interactively tuning the various components.
This fact is underappreciated by many casual users of a particular algorithm.
The first classification strategy based on fuzzy sets appeared in Zadeh
(1965), wherein Zadeh discussed the notion of separating hyperplanes for fuzzy
convex sets. Further developments in this area followed almost immediately,
first in Bellman, Kalaba and Zadeh (1966), and then in Wee (1967). But it was
Ruspini's paper (1969) that really provided impetus to applications of fuzzy
models in both clustering and classification.
Classifier design is well represented in Duda (1973) and Bezdek (1981).
Additionally, the books by Devijver and Kittler (1982), Tou and Gonzalez
(1974), and Fukunaga (1972) contain excellent treatments of statistical pattern
recognition. Having briefly surveyed the terrain surrounding Figure 1, we tum
to the main objective of the paper: it begins with the structure of fuzzy partitions
and relations.
3. FUZZY PARTITIONS AND RELATIONS
Section (3A) defines the fuzzy set, and attempts to dispel the main points of
philosophical controversy about it which still crop up from time to time. In (3B)
we describe fuzzy partitions of X ; and in (3C) their companion structure, fuzzy
relations in X x Y.

3.A. Fuzzy Sets


Following Zadeh (1965), we define a fuzzy subset of any set X as a func-
tion u:X -7 [0, 1]; for each x E X the value u (x) is the degree of membership of
x in the fuzzy subset u. Of course u is a function - there is no set-theoretic
characterization of fuzzy "sets." The terminology follows from the isomorphism
between crisp sets and their characteristic functions. That is, a crisp subset A of
X can be represented by its membership function uA :X -7 [0, 1] where uA (x )=1
for x E A and uA (x )=0 otherwise. Note that, by definition, every hard subset of
234

X is fuzzy, but not conversely.


There are several questions about u that often arise. More able spokesper-
sons than I have spent long years arguing over these points (cf. (Blockley, Pils-
worth and Baldwin 1983) or (Lindley 1982», so my objective here will be to
simply identify the questions, and (ostensibly) supply very concise answers.
(Q1). What is the (philosophical) interpretation of u (x)?
Most investigators within fuzzy sets have laid (Ql) to rest. It was Zadeh's
intention to quantify non-statistical uncertainty by representing, e.g., the "set" of
tall men; or "important" documents; or "very fast" particles. Uncertainty in
each of these cases is due to the semantic imprecision of natural language. It
seems natural enough for us to use phrases such as these ; Zadeh provided a
means for representation and manipulation of the information contained in them.
Thus, u (x) is usually taken to represent the degree or extent to which X matches
the semantic description of u. Of course, different observers will render
different interpretations of the same phrase for exactly this reason - natural
language is imprecise! This inevitably leads to:
(Q2). Where does the function u come from?
This problem is certainly a real one, and deserves a more scholarly answer
than I can make here. Well, where did the normal distribution come from?
LeGendre and Gauss invented it! Let me observe then, that the situation in
fuzzy sets is not unlike that facing an investigator who, say, has decided that a
physical process needs a stochastic model because it contains a clear element of
chance. Thus, given a (fair) dice, what is the probability Pi of observing the
i -th face? The modeler has two choices: the Pi's can be gotten subjectively
(guessed by the modeler); or objectively (determined experimentally by gathering
data and making inferences from it). In either case one eventually arrives at
Pi=1I6 ; and after awhile, we forget where these Pi's came from. Well, fuzzy
sets come to us in much the same way; we invent them subjectively (hopefully,
with a view towards their plausibility and relevance to the process being
modeled); or objectively, through some kind of data processing. Both
approaches are represented in the methods discussed below.
(Q3). Isn't u really probability in disguise?
No. First, the structures usually proposed (as, e.g., in (3B) below) lack even
finite additivity ; the union of a fuzzy subset of X and its complement may not
recover all of X. This has led to many papers on the validity (or invalidity) of
DeMorgan Triples for X (Lowen 1982). Pros and Cons aside, it is impossible to
235

construct a a-algebra with the usual Boolean operations, so fuzzy sets do not
rest on the same axiomatic premises as probability theory; Goodman (1982) and
others have tried to show that fuzzy sets and random sets (Matheron 1975)
amount to the same thing - so far, this work has been pretty esoteric and incon-
clusive.
Second, let us think of an experiment. Let x be, say, an (unobserved) can
of motor oil, let A be the set of potable liquids, and suppose you have available
two pieces of information: uA (x), the membership of x E A (i.e., a number in
[0,1] which represents the extent to which x is a potable liquid), and PA (x), the
probability that x EA. If you could have either uA (x) or PA (x) - but not both -
and needed to decide whether or not to drink x, which number would you prefer
to have? Now uA (x) and PA (x) might both be, say, 0.35 before observation of
x. But upon discovering that x is indeed motor oil, PA (x )=0, whereas uA (x)
remains fixed. The point is that uA and PA convey different types and amounts
of information about x. Based on the above arguments, it seems incontrovertible
that fuzzy sets are not somehow masquerading as probability theory.
(Q4). Can't one use a probabilistic model wherever a fuzzy model seems to
apply?
The answer to (Q4) is almost certainly yes! The point is, however, that
mathematical models are devised to portray some physical process, and are
chosen, at least partially, for their natural ability to "represent the action". It is
as unnatural to imagine a fuzzy model of, say, the binomial experiment, as it is
to propose that the extent to which "x is potable" is a matter of chance. Both
rationales have their place; we should use any model that improves our ability to
represent, analyze, predict, and control a process. Indeed, many processes are
well-modeled by a combination of structures. Thus, it seems better to ask "what
is useful?" rather than to ask "what is right?" There is a beautiful diagram of
this situation which was proposed by Blockley et al. (1983) which is repro-
duced as Figure 2 below. The "conjecture" represented by Figure 2 is best
described in their original words:

Many of the difficulties of probabilistic inference derive from the need


to specify conditional probabilities when the factors which produce the
dependencies are unknown. If, however, the problem being attempted is
one with low system uncertainty but with high random parameter uncer-
tainty, probabilistic inference will lead to good results. In this case the
basic relationship between the parameters are known when the
236

parameters are accurately specified. It is in these problems where the


system uncertainty is large, where all the possible factors affecting
dependencies between parameters are unknown that the max-min opera-
tions become powerful modes of inference.

These thoughts are presented as a conjecture in Fig. 8 [Figure 2]. This


figure is intended to represent the set of possible problems classified by
two rather vague and imprecisely understood, but very useful concepts,
randomness and complexity. The area in the bottom left comer near the,
origin represents organized simplicity. The problems here are those
dealt with by the simple theories (although many undergraduates may
not agree!) of engineering science, the analytical techniques taught to all
engineering students. The region of unorganized complexity is that
class of problems which are dealt with by statistics. This is a
mathematics of aggregates and whilst is useful for making inferences
about trends is not useful for making inferences about problems where
underlying causal models or empirical relationships between parameters
are known for particular cases.

The large region in between, that of organized complexity represents


that class of problems which contains most of the problems that civil
engineers face. It is the class of problems that systems theory
addresses. The conjecture advanced here is that the problems in the
region to the left of Fig 8, high randomness, low complexity might best
be tackled using probabilistic inference. The problems of high random-
ness, high complexity might best be tackled using fuzzy inference and
that there may be a region in between these two where a combination of
fuzziness and probability may be worthwhile.
Whatever method of inference is used, the interpretation of the meaning
of any measure is crucial. It is meaningless to talk of the probability of
the truth of any generalized theory of hypothesis 8• The notion of the
inductive reliability of a hypothesis should be replaced by the notion of
the responsibility of a decision to act on a hypothesis. Responsibility is
not that one has earned the right to be right, or even nearly right, but
that one has taken what precautions one can reasonably be expected to
take against being wrong. The responsible engineer is not expected to
be right every time but he is definitely expected never to make childish
or lay mistakes.
237

There are literally dozens of papers that address questions such as (Ql)-
(Q4). Blockley et ru:s paper wi1l1ead one to this literature: since this has been a
bit of a digression from our main task, we pass now to fuzzy partition spaces.

Unorganized Complexity (Agg regates)

1/1
1/1
Q)
c:
E
o Probabilistic Fuzzy
"0 .....- Transition
c: Inference Inference
RI
a:

....- - - - Organized Complexity (Systems)-------1~

Analy~ lcal

L-_ _ _ _ _ _ _.L-----Complexlty _ ...._ - 0 1

Figure 2. The Relationship between Random and Fuzzy


Models (Fig. 8 In Blockley!! !!. (1983)).

3.B. Fuzzy Partitions


Let e be an integer, l::;e::;n ,and X ={xl,x2'.'" xn} c lRs . We say that (e)
fuzzy subsets {ui} of X are a non-degenerate fuzzy c-partition of X in case the
(en) values {ui(xk)= uik ; 1 ::; i ::; e ; 1 ::; k ::; n } satisfy three conditions:

(la)

L,uik = 1 Y k (lb)

(I c)
238

It is both natural and convenient to array the values {Uik} as a (cxn) matrix
U=[Ujk] in the vector space Ven of all (cxn) real matrices. Upon doing so, we
are able to make these definitions:

Mjen= {U EVen I Uiksatisfies (1) V i,k} ; (2a)

(2b)

Mjen(Men) is called non-degenerate fuzzy (crisp) c-partition space for X, even


though strictly speaking these sets are not subspaces of Ven . Although not
couched in exactly these terms, this is the basic structure proposed by Ruspini
(1969). From this seed perhaps (250) papers on several dozen clustering and
classification algorithms have evolved. Let us explore the structure of Mjen
more carefully.
First note that Men eMjen , so M jen imbeds the solution space of all crisp
partitioning algorithms. Consequently, if X demands a crisp solution, Mjen con-
tains it. In other words, Mjen enriches (not replaces!) conventional models.
Second, note that the rows of U are just (values of) the fuzzy subsets {ui}'
whereas columns of U are label vectors of each Xi EX. It is convenient to let
ej=(O,O, ... ,1,0 ... ol, 1 in the j-th place, l~j~c, denote the usual unit vec-
tor in coordinate direction U), and to put
(3a)

Nje=conv(Ne )· (3b)

Ne is the usual orthonormal basis of R. e ; Nje is its convex hull. Ne (Nje ) are
the crisp (fuzzy) label vectors that comprise each column of U E Men (Mj en ).
Figure 3 illustrates these sets graphically, and shows their relationship to [O,l]e,
the c-fold Cartesian product of [0,1] with itself.
Now imagine the n-fold Cartesian products of the three sets shown in Figure
3. Obviously we have
[Ne]n e [Nje ]n=[conv (Ne )]n e [O,l]en . (4)
239

el =(1,0,0)........ ~=--- ____J /

N t3 = conv( N 3) =Fuzzy Label Vectors

Figure 3. Crisp and Fuzzy Label Vectors.

Men is not quite [Ne]n , nor is M fen as large as [Nfe]n, because constraint (1c)
binds columns of U. We want the lower (upper) constraint in (1c) because it
insures that each Uj is non-empty (no Uj is exhaustive), but this desire forces us
to add degenerate c-partitions of X to M fen to get the convex hull in (4). Thus,
we relax the constraints at (1 c) by putting
Lfen={U EVen I °° ~ 'LUik ~ n V i;(1a),(1b)} , (5a)

Len={U E L fen I Uik E {O,l} V i,k} (5b)

Each column of U E L fen is a label vector from N fe , and the columns are
independent. Consequently, it is appropriate to call Len (Lfen) the crisp (fuzzy)
label matrices of X. And because
and (6a)

(6b)
240

Len (Lien) have often been called the crisp (fuzzy) degenerate c-partitions of X.
The effect of this uncoupling is that Mien can now be written as a convex hull.
Indeed, it is easy to show that
Mlen=conv(Len)' (7)

so each fuzzy c-partition of X has at least one convex decomposition, say


U ="i:Pi Ui ' Lai=1 , 0 :::; ai :::; I Vi, by a finite number of (possibly)
degenerate crisp c-partitions of X. Moreover, since each column of U has (c-l)
independent entries, and there are (n) columns, it is not surprising to find Bezdek
and Harris (1979) that the dimension of Mien is n(c-I):
dim (Mlen)=n (c-l). (8)

Thus, we need at most n(c-l) vertices of Mien (U's in Len) for convex
decompositions of U. These results, together with the fact that Mien is a con-
vex polytope whose centroid is· the unique "fuzziest" or most uncertain state
(namely, U =[l/c D, determine the geometry of Mien.

3.e. Fuzzy Relations


In general, a numerical fuzzy relation between two sets X and Y is just a
fuzzy subset of XxY, i.e., a function r: XxY ~[O,I]. The membership r (x,y)
of (x,y) in fuzzy set r is interpreted as the degree of relationship between x and
y. When IX l=n,IY I=m, we array the (nm) values r(x;,y)= rij as a fuzzy
relation matrix R =[rij ], and when no confusion seems likely, call R the fuzzy
relation. Most applications of relational algorithms involve an even more special
case, wherein IX I =n and X =Y, in which case R is the adjacency matrix of a
weighted digraph for (nodes) X. In what follows we assume that R E V nn ,
denote the identity matrix as In' and use (:::;) as pointwise ordering for A, B
E Vnn . Following Zadeh (1971), we call R a numerical fuzzy similarity relation

(NSR) in case
o :::;rij :::;1 V i ,j (9a)

In ~ (reflexive) (9b)

R =R T (symmetric) (9c)
241

R~(D *)R ((0*) transitive) . (9d)

In (9d) we have generalized matrix multiplication; if C = AB, cij=rl (aik *bkj ).


k
The usual exterior operations are summation (L) and maximum (V). The choice
of (*) has undergone extensive investigation by many writers; there are at least
six infinite families of operators that can be used in (9d) that all provide
mathematically well defined extensions of transitivity Bonnisone and Decker
(1985). However, the usual choices for (*) are delta (~), product (.), and
minimum (V) which correspond, respectively, to the so-called T I,T 2, and T 3
norms. Thus, for a,b E [0,1],
TI(a,b)=a~b =v(0,a+b-1) ; (10a)

(lOb)

T 3(a,b )=a /\b (lOc)

When the elements of R are all crisp and D =V properties (9b) - (9d) yield an
equivalence relation (ER) on X. We denote the crisp ER's on X as
En={R E Vnn I (9b ),(9c ),(9d),D =V , rij E {O,1}V i ,j}. (11)

Men and En are isomorphic: crisp clusters or subsets in X define unique (up to
arrangements) equivalence classes in X and vice-versa. More generally,
En *= {R E Vnn I (9) holds} (12)

are the numerical (0 * )-transitive fuzzy similarity relations on X. Now


(a ~b) :::;; (ab :::;; (a /\b), so for D =V we have, e.g., that
(13)

There have been many extensions of this notion. Of these, we mention that
(V~) transitivity is equivalent to the property of pseudo-metricity; and (V /\) to
ultra-metricity. Let d :XXX -7[ 0,1] be dij=l-rij. Then
R E E v 6 c::::=::> d is a pseudo-metric and ; (14a)
242

R E E y 1\ ~ d is an ultra-metric (14b)

(14a) shows that (V /:!.) transitivity is essentially equivalent to the triangle ine-
quality. Proofs of (14a) and (14b) are presented in Bezdek and Harris (1978)
and Zadeh (1971), respectively. Another fact derived in Bezdek and Harris
(1978) is that conv (En) and E y/:" are identical at n = 3:
(15)

And for n > 3 we find that


Ey 1\ cconv(En)cE y/:,.. (16)

The relationship between partition and relation spaces is summarized in Figure 4.


In Section 4 we discuss relational clustering algorithms based on convex decom-
positions and (V *) transitive closures.

( Men )• ~ ( En )
r r
( Len ) ( EVA )
r
(conv! Len) = Mfcn) .... - - - ..... (
r
cony! En) )

(
r
EVA )

Figure 4. Connections: Partitions and Relations.

Figure 5 depicts graphically the difference between (V 1\) and (V /:!.) transi-
tivity for the relation matrix
243

R(~)=
1 0.8 0.7
[ 0.8 1 ~
0.7 ~ 1
].
One may check that R (~) E E y /\ ¢:;> ~=0.7 ; whereas R (~) E E yl1 for all ~ E
[0.5, 0.9]. Thus, (V /\) transitivity occurs only if ~= 0.7, and hence needs maxi-
mal mutual bonding of object 3 to object 2, requiring all "70 relatives" to be
shared with intermediate object 1. On the other hand, (V /:l) transitivity is
achieved by any ~ in the range [0.5, 0.9], so allows for the least possible bond-
ing. In other words, E y /\ contains "pessimistic" chains, whereas E yl1 allows
more optimistic alignments.

.5

G2:=§ (VA)

.5
I I I I I I I

• II • (VA)

I I I I I I I

Figure 5. (VA) and (VA) TransitiYity.

A more practical question: how does one choose a T -norm for (*)? This
seems to depend on the application at hand. There are several studies that
describe various problems that may arise as a result of, e.g., the discontinuity of
(/\) and /:l); see Bandler and Kohout (1984) for a nice discussion of the theory.

4. CLUSTERING ALGORITHMS

In this section we assume as input data an object data set X c lRs or a


numerical relational data matrix R E Vnn as discussed in Section 3. Clustering
244

algorithms discussed below are grouped into five categories: Section (4A) is con-
cerned with relation-criterion function methods that produce fuzzy c-partitions
U E Mien from R ; (4B) considers object-criterion methods that produce
U E Mien from X : (4C) contains convex decomposition algorithms for produc-
ing crisp clusters from fuzzy partitions and relations; in (4D) we discuss crisp
clustering based on numerical tranfoitive closures of R : and in (4E) methods on
generalized nearest neighbor rules are briefly reviewed. We shall indicate which
form the data are in for a particular method by exhibiting them as arguments of
the clustering criterion wherever possible.

4.A. Relation Criterion Functions


Clustering driven by optimization of a criterion function which assesses par-
titions according to some global property of the grouped data is well represented
in the literature of fuzzy sets. Ruspini presented the first approach of this kind

r
in Ruspini (1969), by defining an objective function JR that seeks U E Mien -
fuzzy c-partitions on n objects - given a relational data matrix R. Let

'R (U ; R ) = f r [[7" (u,rU/ki ]-(rjki (17)

Ruspini assumed that a was a real constant and that R was a dissimilarity meas-
ure; thus, rjk measured the extent to which the pair of (possibly) implicit objects
U,k) were in some sense unalike. Consequently, we call JR a relational cri-
terion (a function of object-pair relationships in R), as opposed to an object cri-
terion (a function of object vectors in X).
Optimal partitions were taken as local minima of JR over Mien' Iterative
optimization was used to estimate local solutions; Ruspini (1969) contains
several examples of this technique. Minimizing JR was cumbersome, slow, and
solutions were hard to interpret because JR does not measure an obvious pro-
perty of "good" clusters in X. These objections aside, the method was impor-
tant because it was the first fuzzy objective function method, and it paved the
way for further research.
Surprisingly enough, there have been very few fuzzy relational criterion
algorithms since Ruspini published his seminal work. It is not clear whether this
is because most researchers usually acquire object data (X c R S ) rather than
relational data (R E Vnn ); or what seems more likely, that it is very difficult to
245

provide a plausible interpretation for the property possessed by the (unknown


object) clusters (U E Mten) which are gotten by optimizing a (possibly obscure)
property of pairwise relations (that may be either calculated from X or provided
by direct collection) between the objects. In any case, we mention three subse-
quent algorithms that fall into this category (interestingly, two of these are quite
recent).
A scheme based on minimizing the function
JRB ( U;R )=L L L (uik Uij)2 rkj' (18)
j k
was presented by Roubens (1978), where U E Mten and R is a dissimilarity
measure. This is a simpler relational criterion than JR' but the original scheme
proposed to optimize it was complicated and sometimes unstable. A
modification due to Libert and Roubens (1982) ameliorated this problem, but
appears to be useful only when the substructure of R is quite distinct.
Another relational criterion recently discussed by Windham (1985) is based
on iteratively minimizing the functional
JAP (U ,W;R) = LLL(uik Wij)2 rkj' (19)
i k j

where U is in Mten; and W is a cXn matrix with entries Wij E [0,1], L. Wij~'
I
and rowsums L Wij=l. We call M'ten the set of such matrices. In the
j
assignment-Prototype (AP) algorithm U is the desired partition on n possibly
implicit objects (X); W is a set of "prototype weights"; and rjk is again a dis-
similarity measure. The interpretation provided for JAP follows from a crisp
special case; viz., when U and WEare hard. In this case, U has c hard clus-
ters {ui}' and one imagines that each ui contains a "prototype" (albeit implicit),
say xli ' which is pointed to by W; wij=1 iff Xj=xu ' and is zero otherwise.
Letting rkl. have its obvious meaning, we can rewrite JAP for this special case as
I

(19)

which sums all dissimilarities of points within ui to its most prototypical object.
Good partitions are taken as local minima of (19), the optimization extending
over MtenxM'ten. Windham presents necessary conditions, discusses conver-
gence, convergence rates, storage, and initialization; and illustrates the algorithm
with the IRIS data and an artificial data set devised to illustrate the shortcomings
246

of crisp relation - criterion algorithms. Windham's data are reproduced in Table


1, with an optimal (AP) solution for c=2 obtained in 5 iterations. Note that the
block symmetry of R is reflected by the block symmetry of U , and that object
F has equal memberships in both classes. Thus, a fuzzy relation - criterion pro-
duces U E Mj2 '11 that seems quite plausible in view of the structure of R in
Table 1. As noted in Windham (1985), this type of solution is unavailable if U
is constrained to be in M 2,11'

Table 1. Artificial Dissimilarity Data (after Windham (1985)).

ABC D E F G H I J K

A 0 6 3 6 11 25 44 72 69 72 100
B 0 3 11 6 14 28 56 47 44 72
C 0 3 3 11 25 47 44 47 69
D 0 6 14 28 44 47 56 72
E 0 3 11 28 25 28 44
F 0 3 14 11 14 25
G 0 6 3 6 11
H 0 3 11 6
I 0 3 3
J 0 6
K 0

Memberships Due to the AP Algorithm (after Windham (1985)).

A B C D E F G H I J K

u1 .92 .90 .95 .90 .86 .50 .14 .10 .05 .10 .08

U2 .08 .10 .05 .10 .14 .50 .86 .90 .95 .90 .92
247

Another recent algorithm is the so-called relational c-means (RCM) algo-


rithm discussed by Bezdek and Hathaway (1986). The criterion used in the
RCM method is

(20)

where U E Mjen and R E Vnn is unconstrained. Were it not for the denominator
in (20) JReM would, except for the exponent on (uik uij)' be JRB • Unlike
Windham's (AP) method, no direct theoretical conditions are known yet for local
optima of JReM' However, (20) can be iteratively minimized using a variation
of the method of coordinate descent described at length in Bezdek, Hathaway,
Howard, Wilson and Windham (1986). A glance at (20) hardly suggests a ready
interpretation of the property "good" U's have when derived as local minima of
JReM • There is a nice interpretation of this algorithm, but it depends on under-
standing a related object - criterion algorithm called fuzzy c-means (FCM). We
shall return to (20) after describing (FCM) in some detail below.

4.B. Object Criterion Functions


Clustering directly on an object set X in R S has received far more attention
that its relation - theoretic counterpart. Dunn (1974) discussed the first fuzzy
generalization of the conventional c-means or crisp least-squared errors algo-
rithm. This work led to a long line of full and partial generalizations which
(perhaps temporarily!) culminated in the fuzzy c-varieties algorithms which were
introduced by Bezdek, Coray, Gunderson and Watson (1981a, 1981b). Because
this method contains many useful and well-known algorithmic families we give
below a brief overview of its essential characteristics.
Here and below let 1* IA denote any weighted inner product nonn on R S ,
I.e.,
(21)

where A E Vss is positive-definite. The nonn metric induced on R S xRs by


(21) for any vectors xk ,vi E R S is given by

(22)
248

Next, let Vri(v i ; bil,bi2, ... ,bir) be the linear variety in 1Rs of dimension r,
1 ~ r ~ s, through the point viE 1Rs and spanned by the independent vec-
tors {bij} :

Vri(Vi;bil, ...bir) = {y E 1Rs Iy =vi + L (Xj b j , (Xj E 1R }.(23)


j

As r runs from 0 to s V ri describes linear manifolds (subspaces if vi = 9) as


follows:

Vo points in 1Rs (FCM)


VI lines in 1Rs (FCL)
V2 planes in 1Rs (FCP)
Vs- I hyperplanes in1Rs (FCHP)
Vs = 1Rs

The acronyms stand for fuzzy c-means (FCM), fuzzy c-lines (FCL), etc. When
the vectors {b ij } are orthonormal, the projection theorem enables us to calculate
the squared (A-Orthogonal) distance from a point xk E 1Rs to
Vri , 1 ~ i ~ c , as

D2ikA = d 2ikA + L « xk - vi ,bij >A )2. (24)


j

Note that (24) reduces to (22) for r = O. With (24) we define the fuzzy c-
varieties (FCV) object-criterion function:

JFCVm (U,Vr;X) = L L (uik) m D2ikA, where (2Sa)


k

U E M fen is a fuzzy c-partition of X (2Sb)

Vr = (Vri ,Vr2 ' ... V re ) is a set of (c) (2Sc)


r - dimensional linear varieties in 1Rs , O~ r::; s -1 ;and
249

1~m<oo.

JFCV is a very general functional; it attempts to assess the error incurred by


(partially) representing the (n) points in X by the (c) linear varieties Vr . The
idea underlying JFCV is simple: try to match the "shape" of clusters in R S to
linear manifolds. Obviously this is a good idea only when the data in R S has
"fiat" substructure - and one cannot generally know this. Moreover, since X is
finite and each V ri is infinite unless r=Q , we expect JFCV to have many local
extrema that do not suggest "good" clusters in X even when they are fiat! It is
easy to construct examples that show this. Figure 6 illustrates that one line can
provide a very good "fit" for two linear, coaxial clusters which are well
separated along the line in R S • Before addressing this difficulty we exhibit the
first-order necessary conditions that local extrema of JFCV must satisfy. The
variables (U ,Vr ) may be a local minimum of JFCV only if:

Vi = L (uik)m xk / L(uik)m (26a)


k k

Si = A 112 ( L (uik)m (xk - vil (xk - vi) ) A 112 (26b)


k

bij = A -112 Pij , where {Pij }are the first (r) (26c)
principal eigenvectors of Si ;

Uik =( L (DikA/ DjkA )2/(m-l)r 1• (26d)


k

In (26) we assume: m > 1, 1 ~ i ~ c , 1~ j ~r ,1 ~ k ~ n , and that


xk E V ri for all i ,k . If this last condition fails, an arbitrary tie-breaking
method will continue the FeV algorithms, which are Picard iteration through
(26) to iteratively optimize JFCV • Informally, one guesses a U(O) E Mien'
and computes successively the points Vi (on V ri ) with (26a); the fuzzy scatter
matrices Si at (26b), the orthonormal bases {bij I 1 ~ j ~ r} of the Vri's
with (26c); and finally, an updated U(1) E Mien with (26d). Termination is
declared when U(k+l) and U(K) are "close" in some sense. We call Si a gen-
eralized scatter matrix because for A =Is and U E Men equation (26b) yields
250

the familiar scatter matrices often encountered in multivariate statistical analysis.

Well separated, "collinear" clusters may adhere to the same


linear variety because it has infinite extent in feature space.

Figure 6. Linear Varieties may be too big !


251

Most of the early work on FeV is summarized in Bezdek (1981), which


contains proofs of the main results and several numerical examples on artificial
data. However, the problem of dimensionality alluded to above, and the fact that
X almost certainly does not contains (c) "flat" clusters in R S have resulted in
concentration on the algorithms derived by minimizing convex combinations of
FCV functionals. Although Bezdek et al. (1981b) contains these result for arbi-
trary finite convex sums, we mention here the only case that has received atten-
tion through applications, viz., the convex combinations of (FCM) and (FCL).
Define, for J.1 E [0,1]

(27)

In (27) the arguments of the three functions, from the left, are (U,V ;X),
(U,V 1 ;X) and (U,Vo;X) respectively, where VI = (V11,VI2,,,,,Vlc) are
(c) lines in R S ; and Vo =(vOl'v02.... vOc) are (c) points in R S • The symbol V
represents (c) sets in R S which are neither lines nor planes; but curved surfaces
that were called "elliptotypes" in Bezdek et al. (1981b). The parameter J.1 in
some sense controls the "degree of curvature" of the fitting surfaces. It is a
remarkable fact that vOi is exactly the point which translates V Ii away from the
origin in R S • That is, for any functional like (27), one need only span the
variety of highest dimension, say ri ' with the (ri) vectors from (26c); the lower
dimensional varieties will always be spanned by subsets of this set. The only
change in (26) needed for (27) is that D 2 iKA must be replaced by the "general-
ized distance"

(28)

Figure 7 depicts the geometry of the ik -th telTIl of JFCEm ' which can be written
in telTIlS of the slant distance z in that illustration as

(29)

If J.1=O (FCE) = (FCM) and hence assesses central tendencies, i.e., the propen-
sity for structure in X to cluster in hyperellipsoids (shaped by A) about the
points {voi } . When J.1 = 1 (FCE) = (FCL) so linearity of substructure
252

drives the criterion. And as the "mixing coefficient" ~ ranges from 0 to 1 ,


level sets of JFCEm are deformed hyperellipsoids. The i -th level set begins at
~ = 0 with principal axes determined by the eigenstructure of A and center vOi
. As ~ increases, this shape is stretched in direction b i (which is not an
hyperellipsoidal axis) until, at ~ = 1 , it has degenerated into the straight line
V Ii' In this way JFCEm attempts to adapt each cluster shape to a different
hyperellisoidal geometry.

z2 =(1-1l)(dikAl 2 + Il(DikAl 2 Mixing parameter 11 = (1-r2)


Linear Data :=) small DikA Spherical Data:=) small dikA

Figure 7. The Geometric Nature of Criterion JFCEm'

Applications of (FeE) clustering have been reported in the areas of two-


dimensional contour analysis Anderson, Bezdek and Dave (1982), Bezdek and
Anderson (1984), and Bezdek and Anderson (1985), chemiometric analysis
Jacobsen and Gunderson (1983) and geological exploration Granath (1984). In
the first area, (FeE) is used to generate (c) line segments that "best fit" sets of
2-D boundary coordinates with piecewise linear arcs. In the last area, Granath
(1984) discussed the use of (FeE) for geochemical prospecting based on minera-
logical data. Granath's work includes a novel use of fuzzy covariance matrices
(which are just scalar multiples of Si at (26b» that apparently improves upon his
earlier attempts at data interpretation using maximum likelihood and hard c-
means.
253

By far the best known and well studied special case of the (FCV) families is
(FCM), obtained by setting r =0 in (25) or J.l=0 in (27). In this case the fitting
varieties become points {voi} = {vi} c RS, which are typically thought of as
(c) "prototypes" of the (n) xk' s in X. Equations (25) and (26) take the simpler
forms, with v = (vI ,v2'···'ve):
1m (U,v ; X) = LL (Uik)m d 2ikA ; (30)
k i
Vi =L (Uik)m Xk / L (Uik)m ; (31a)
k k
Uik = (L (dikA / d ijA )2/(m-l)r 1• (3Ib)
j

These equations hold when m>l and no dikA=O. At m=1 ( or if U E Men) it


turns out that U must be crisp ( in Men ). In this case (31a) gives the centroid
of each hard cluster at every pass; whereas (31b) is replaced by the nearest pro-
totype assignment rule

_ { 1 ; dikA < d jkA , r:f::i }


uik - 0; otherwise (31b')

Finally, when A = Is' the identity on RS, and m=1, (30) and (31) become,
respectively, the familiar, conventional least squared errors or minimum variance
object criterion with Euclidean distance, and the Basic ISODATA or hard c-
means procedure, which has been extensively studied and used by virtually hun-
dreds of investigators.
There are at present perhaps seventy papers that concern themselves with
some aspect of the theory or an application of (FCM). It would be impossible to
review all of these here. Rather, interested readers are directed towards some of
these (FCM) papers by categories as follows:

Theoretical Aspects

Bezdek (1976b)
Bezdek (1980)
Gunderson (1983)
Selim and Ismail (1984)
Ismail and Selim (1986)
Cannon, Dave and Bezdek (1986)
254

Hathaway and Bezdek (1986)


Sabin (1986)
Windham (1982)
Windham (1986)

Medical Data

Bezdek (1976a)
Bezdek and Fordon (1978)

Geological Data

Full, Ehrlich and Bezdek (1982)


Granath (1984)

Nutritional Data

Windham, Windham, Wyse and Hansen (1985)

Engineering Systems

Bezdek and Solomon (1981)


Boissonade, Dong, Liu and Shah (1984)
Dong, Boissonade, Shah and Wong (1985)
Bezdek, Grimball, Carson and Ross (1986)

Image Processing

Huntsberger and Descalzi (1985)


Huntsberger, Jacobs and Cannon (1985)
Cannon, Dave, Bezdek and Trivedi (1986)
Trivedi and Bezdek (1986)

Classifier Design

Bezdek and Dunn (1975)


Bezdek and Castelaz (1977)
Bezdek, Hathaway and Huggins (1985)
255

Bezdek, Chuah and Leep (1986)

Miscellaneous

Bezdek(1974) : Numerical Taxonomy


McBratney and Moore (1985) : Weather
Devi (1986) : Hierchical Schemes

In most of the application papers (FCM) outputs have been compared to one or
more clustering algorithms which are based on statistical, deterministic or heuris-
tic techniques. We discuss but one example in somewhat greater detail (the one
I found closest to numerical ecology!). McBratney and Moore (1985) report on
the usage of FCM (with A = Is, m=2) to cluster two sets of climatic data from
Australia and China; and compare their results to more classical approaches
taken by previous meteorologists. They argue that the continuity of climate
demands the flexibility of continuous assignments to various classes, so fuzzy c-
partitions of climatic data are a very plausible model. Their paper concludes by
itemizing three advantages fuzzy classifications appear to have over their crisp
predecessors:
1. Fuzzy partitions are (physically) more realistic;
2. Fuzzy partitions are more flexible; and
3. Fuzzy partitions provide better information transfer.
There is also a very interesting side issue discussed by McBratney and Moore,
namely, how one chooses values for (c) and (m) in (30). The authors discuss
and illustrate a method for choosing optimal combinations of (c ,m) jointly by
inspecting plots of (m) versus «-dJm/dmY{~\ This cluster validity functional
recognizes explicitly the joint dependency of "good" solutions on (c) and
(m) - a new and apparently useful idea.
To illustrate the use of the FCM clustering algorithm in the context of
numerical ecology, we processed a set of data provided by Prof. Pierre Legendre
which consists of population counts of 88 species of Polychaetes (marine
worms) which were gathered at 5 different stations at 4 different times. These
data have been previously analyzed by Fresi, Colognola, Gambi, Giangrande,
and Scardi (1983), and will be further analyzed by other authors in this volume.
This writer makes no pretense at understanding the biological intricacies of the
data, so our remarks below are offered in the true spirit of exploratory data
analysis.
256

First, we array the data as an 88 x 20 matrix, say X = [Xij ], with each row
xi = (xi l' Xi2 ' •.. Xi '20) E R 20 being a mixed time/space vector of obser-
vations on species i, 1 SiS 88. More specifically, each xi has coordinates
arrayed sequentially as follows:
xi,l to xi,4 : species i; station 1; times 1,2,3,4
xi ,5 to Xi ,8 : species i; station 2; times 1,2,3,4
xi ,9 to xi ,12: species i; station 3; times 1,2,3,4
Xi ,13 to Xi ,16: species i; station 4; times 1,2,3,4
xi,17 to Xi ,20 : species i; station 5; times 1,2,3,4

The species and data are arrayed this way in Fresi ~ al. (1983). We "com-
pleted" the array by filling with zeroes. It seems reasonable to cluster subsets
of this data in various ways; for example, across stations at each fixed time, or
across times at each fixed station, etc. However, an extensive analysis of the
data are left to a future investigation. In order to save space, we present below
only the results of clustering the 88 species simultaneously over all 20 variables.
The expected result of processing the data this way is an overall "course" clus-
tering of species - if one exists - over all times and stations. Subsequent proces-
ing of time - constrained and/or space - constrained subsets of X would then
yield a more detailed breakdown of possible substructures across space and time.
Computing protocols for the outputs in Tables 2-5 are as follows (refer to
equations (30) and (31): m = 2.00; A = I, the identity matrix for R 20 ; loop-
ing through equations (31) was terminated when the maximum absolute
difference between the p-th and p + I-st estimate of U was less than 0.01
(i.e., max {I u(p+l) ik - u(P) ik I,} S 0.01 ) The number of clusters was
considered unknown. Part of the results of clustering these data with FCM as
described appear in Table 2 for c = 2 and c = 5. Specifically, Table 2 con-
tains the fuzzy membership matrices found by FCM when the exit condition was
satisfied. Space prohibits the exhibition of Ufem for c = 3 and 4; however, we
discuss the outputs for each of these cases below. Our discussion makes use of
the idea of ~-cuts of U, which are covered more fully in Section 4.C. Briefly,
U ~ is a hard partial labeling of X for 0 S ~ S 1 derived from any U E M fen
whenever we replace each column of U by the vertex e i in Ne such that
uik ~ ~. Note that some columns of U may not have a row with uik ~ ~, in
which case the k-th column of U ~ is a column of zeroes. Thus, U ~ is not
necessarily in Len' much less Men' Below, we fix ~ = 0.85, which, practically
speaking, is a very strong membership threshold.
257

Aside Although we convert the fuzzy labels in Table 2 to hard ones via U ~ for
this discussion, this is somewhat contrary to the whole point of using fuzzy
memberships - if possible, one prefers to leave the results in the form shown in
Table 2. Our discussion begins with the case c = 2.
Table 2. FCM/Cluster Memberships of each Species for c=2, c=5.

Species c=2 c=5

1 0.00 1.00 0.00 0.00 1.00 0.00 0.00


2 0.00 1.00 0.00 0.00 1.00 0.00 0.00
3 0.00 1.00 0.00 0.00 1.00 0.00 0.00
4 0.00 1.00 0.00 0.00 1.00 0.00 0.00
5 0.00 1.00 0.00 0.00 1.00 0.00 0.00
6 0.00 1.00 0.00 0.00 1.00 0.00 0.00
7 0.00 1.00 0.00 0.01 0.99 0.00 0.00
8 0.00 1.00 0.00 0.00 1.00 0.00 0.00
9 0.00 1.00 0.00 0.01 0.99 0.00 0.00
10 0.05 0.95 0.01 0.33 0.18 0.03 0.45
11 0.01 0.99 0.00 0.32 0.35 0.03 0.30

12 0.00 1.00 0.00 0.01 0.99 0.00 0.00


13 0.10 0.90 0.02 0.23 0.22 0.28 0.25
14 0.00 1.00 0.00 0.03 0.95 0.00 0.02
15 0.00 1.00 0.00 0.00 1.00 0.00 0.00
16 0.00 1.00 0.00 0.00 1.00 0.00 0.00
17 0.00 1.00 0.00 0.00 1.00 0.00 0.00
18 0.00 1.00 0.00 0.00 1.00 0.00 0.00
19 0.00 1.00 0.00 0.00 1.00 0.00 0.00
20 0.00 1.00 0.00 0.00 1.00 0.00 0.00
21 0.00 1.00 0.00 0.00 1.00 0.00 0.00
22 0.00 1.00 0.00 0.00 1.00 0.00 0.00
258

Species c=2 c=5

23 0.00 1.00 0.00 0.00 1.00 0.00 0.00


24 0.84 0.16 0.26 0.18 0.16 0.21 0.19
25 0.01 0.99 0.00 0.48 0.20 0.01 0.31
26 0.00 1.00 0.00 0.00 1.00 0.00 0.00
27 0.09 0.91 0.01 0.27 0.19 0.07 0.46
28 0.00 1.00 0.00 0.00 1.00 0.00 0.00
29 0.33 0.67 0.01 0.03 0.03 0.90 0.03
30 0.00 1.00 0.00 0.00 1.00 0.00 0.00
31 0.00 1.00 0.00 0.01 0.99 0.00 0.00
32 0.00 1.00 0.00 0.00 1.00 0.00 0.00
30 0.00 1.00 0.00 0.00 1.00 0.00 0.00
33 0.00 1.00 0.00 0.00 1.00 0.00 0.00

34 0.00 1.00 0.00 0.00 1.00 0.00 0.00


35 0.26 0.74 0.04 0.17 0.16 0.44 0.19
36 0.00 1.00 0.00 0.02 0.97 0.00 0.01
37 0.00 1.00 0.00 0.09 0.86 0.00 0.05
38 0.06 0.94 0.02 0.29 0.29 0.11 0.29
39 0.00 1.00 0.00 0.01 0.99 0.00 0.00
40 0.00 1.00 0.00 0.01 0.99 0.00 0.00
41 0.00 1.00 0.00 0.00 1.00 0.00 0.00
42 0.00 1.00 0.00 0.19 0.70 0.00 0.11
43 0.02 0.98 0.00 0.32 0.38 0.03 0.27
44 0.00 1.00 0.00 0.05 0.92 0.00 0.03
259

Species c=2 c=5

45 0.00 1.00 0.00 0.01 0.98 0.00 0.01


46 0.00 1.00 0.00 0.01 0.99 0.00 0.00
47 0.01 0.99 0.00 0.28 0.51 0.02 0.19
48 0.00 1.00 0.00 0.03 0.95 0.00 0.02
49 0.12 0.88 0.01 0.35 0.22 0.08 0.34
50 0.00 1.00 0.00 0.00 1.00 0.00 0.00
51 0.00 1.00 0.00 0.00 1.00 0.00 0.00
52 0.00 1.00 0.00 0.00 1.00 0.00 0.00
53 0.00 1.00 0.00 0.00 1.00 0.00 0.00
54 0.04 0.96 0.00 0.47 0.21 0.03 0.29
55 0.00 1.00 0.00 0.06 0.90 0.00 0.04

56 0.00 1.00 0.00 0.00 1.00 0.00 0.00


57 0.00 1.00 0.00 0.00 1.00 0.00 0.00
58 0.00 1.00 0.00 0.31 0.44 0.01 0.24
59 0.02 0.98 0.00 0.59 0.16 0.01 0.24
60 0.00 1.00 0.00 0.00 1.00 0.00 0.00
61 0.01 0.99 0.00 0.63 0.17 0.01 0.19
62 0.01 1.99 0.00 0.33 0.42 0.03 0.22
63 0.00 1.00 0.00 0.00 1.00 0.00 0.00
64 0.01 0.99 0.00 0.55 0.30 0.01 0.14
65 0.00 1.00 0.00 0.21 0.72 0.00 0.07
66 0.00 1.00 0.00 0.00 1.00 0.00 0.00

67 0.00 1.00 0.00 0.00 1.00 0.00 0.00


68 0.00 1.00 0.00 0.20 0.68 0.01 0.11
69 0.00 1.00 0.00 0.00 1.00 0.00 0.00
70 0.00 1.00 0.00 0.02 0.97 0.00 0.01
71 0.86 0.14 0.99 0.00 0.00 0.01 0.00
72 0.00 1.00 0.00 0.00 1.00 0.00 0.00
73 0.01 0.99 0.00 0.60 0.24 0.00 0.16
74 0.02 0.98 0.01 0.30 0.41 0.03 0.25
75 0.02 1.98 0.01 0.30 0.39 0.04 0.26
76 0.00 1.00 0.00 0.00 1.00 0.00 0.00
77 0.00 1.00 0.00 0.08 0.88 0.00 0.04
260

Species c=2 c=5

78 0.00 1.00 0.00 0.00 1.00 0.00 0.00


79 0.00 1.00 0.00 0.00 1.00 0.00 0.00
80 0.02 0.98 0.00 0.31 0.42 0.03 0.24
81 0.00 1.00 0.00 0.00 1.00 0.00 0.00
82 0.00 1.00 0.00 0.00 1.00 0.00 0.00
83 0.01 0.99 0.00 0.41 0.35 0.02 0.22
84 0.00 1.00 0.00 0.00 1.00 0.00 0.00
85 0.08 0.92 0.01 0.29 0.25 0.06 0.39
86 0.00 1.00 0.00 0.00 1.00 0.00 0.00
87 0.00 1.00 0.00 0.00 1.00 0.00 0.00
88 0.00 1.00 0.00 0.00 1.00 0.00 0.00

c=2 Apparently there is a very strong cluster at c = 2, having 84 of the


88 points in U 0.85. The other four points, Nos. 24, 29, 35, and 71 do not belong
to this main cluster; and of these, only #71 is in U.8S ( # 24 is close, with 0.84).
The two fuzziest members of the data are, by this output, species 29
(Sphaerosy'11is hY.strix Claparede), with label vector (0.33, 0.67); and species 35
(Platynercis dumerilii Aulouin et Milne Edwards), with label vector (0.26, 0.74).
The maximum membership of these latter two species is in the main cluster.
Thus it seems that species 24 (Brania clavata Claparede) and 71 (Amphiglena
mediterranea Leydig) are quite distinct from the other 86 species. We draw
attention to the way in which fuzzy memberships lead one to infer much more
about substructure than binary (i.e. hard) partitions ones can. Nonetheless, 69 of
the 84 species have memberships in the main cluster of 1.00 (to two significant
digits). This is a remarkably distinct 2 - partition for m = 2. It suggests a
very strong core of species that adhere tightly to their prototypical cluster center.
However, we approach this hypothesis carefully, because the (88 x 20) data
matrix is very sparse, and there is a strong possibility that the data themselves
are aggregated in a misleading fashion. On the other hand, this is exactly the
position an exploratory data analyst is in, so we set c = 3 to see if this
presumption has any merit.

c =3 At c = 3 22 of the 84 points in the main cluster drop out of U 0.85.


The 62 species remaining still belong to U 0.85' but of the 22 departing species,
only one (#68, Streblosoma hesslei Day) is strong enough to belong to U ~ at
this level of membership! The other 21 species have memberships that fragment
261

across the three clusters at various (lower) levels of distribution. Indeed, the
highest membership in the new cluster over all 88 species is #27 (Exogone gem-
mifera Pagenstecher), at 0.61. This suggests that the new cluster is much less
distinct and perhaps less well justified. The third cluster has one species in
U 0.85 : #71 at 0.88.

=
c 4 Something very interesting happens: only two additional species drop
out of U 0.85! Apparently the main cluster is quite stable for the 60 species now
identified as belonging together at c = 4, ~ = 0.85 Of the remaining 28
species, only #71 still remains, with a membership of 0.97, in (essentially) its
own cluster.

=
c 5 The last five columns of Table 2 contain the membership matrix for
X at c = 5. There is essentially no change in the main cluster! All 60 species
which appeared in U 0.85 at c = 4 remain there at c = 5. This is quite remark-
able, suggesting an extremely stable core of species in the main cluster (column
6 in Table 2). Moreover, 51 of these 60 species still have membership ~ 0.99
in this cluster. Note that species 71 now has membership of 0.99 (in column 4);
and that species 29 has established a cluster via the membership 0.90 in column
7. The maximum membership in column 5 is 0.63 (species 61), and in column
8, it is 0.46 (species 27). Thus, a total of 62 of the 88 species are in U 0,85 at
c = 5; and the remaining 26 species have memberships that are - in the main -
distributed across two fuzzy clusters (columns 5 and 8) that are relatively
inseparable.
It is interesting to track the memberships of species 24, 29, 35 and 71, the 4
species not in the main group at c = 2, as c increases from 2 to 5. Table 3
exhibits these memberships. The boldface numbers in Table 3 are the maximum
memberships at each c.
262

Table 3. Memberships for selected species as a function of number of clusters.

Species c=2 c=3 c=4 c=5

24 .84 .16 .60 .18 .22 .34 .21 .19 .26 .26 .18 .16 .21 .19

29 .33 .67 .14.35 .51 .03 .10 .10 .78 .01 .03 .03 .90 .03

35 .26.74 .10 .35 .55 .03 .13 .12 .72 .04 .17 .16 .44 .19

71 .86.14 .87 .06 .07 .97 .01 .01 .01 .99 .00 .00 .01 .00

Note that species 24 begins strongly distinct from the main cluster with max-
imum membership 0.84 at c = 2; and then its maximum membership decreases
monotonically with c. At c = 5 this species nearly has memberships
(llc = 0.20), which are the fuzziest possible state at c = 5. At the other
extreme, the maximum membership of species 71 shows a steady upwards pro-
gression from 0.86 (c = 2) to 0.99 c = 5); this suggests that species 71
"wants" it's own cluster - it has very distinct membership at c = 5, as is evi-
dent in column 4 of Table 2. Note also that species 29 works its way non -
monotonically up to 0.90 at c = 5, thereby demanding a cluster, while species
35 seems unsure of itself, much as species 21. The behavior of memberships as
(c) varies is one of the keys to cluster validity; these numbers can be used - in a
very qualitative way - to evaluate the relative attractiveness of various numbers
of clusters.
Finally, Table 4 exhibits the (truncated) cluster centers {vi } associated
with the matrices U in Table 2 at c = 2 and c = 5. To interpret the values con-
textually, we must truncate (or round) the vij' s so that they are integers; subse-
quently, each Vij may be taken as a non-statistical estimate of the population
count to be expected at each time and station. For example, Table 5 lists the
(truncated) cluster center for VI' which is essentially composed of "99 percent
of" species 71, contaminated, if we may, by "26 percent of" species 24 (cf.
Table 2), and very little else. And next to VI is the data for species 71 (row 71
of X):
263

Table 4. Cluster Centers {Vi} for the Membership Matrices in Table 2.

c=2 c=5

Coord. vI V2 VI V2 V3 V4 Vs

1 2 0 2 0 0 5 0
2 43 10 2 16 2 233 29
3 53 5 28 9 1 184 15
4 34 4 13 7 1 99 12

5 47 2 60 4 0 70 6
6 517 5 697 12 1 159 25
7 78 2 125 4 0 47 8
8 399 3 508 7 0 122 19

9 10 0 5 2 0 2 6
10 138 6 27 24 1 19 54
11 105 2 19 9 0 9 21
12 14 0 2 0 0 1 2

13 21 2 3 7 0 2 11
14 73 9 11 53 1 10 43
15 521 1 8 7 0 4 11
16 161 2 25 14 0 15 27

17 27 4 4 12 0 20 28
18 7 3 1 14 0 3 12
19 43 2 6 8 0 4 18
20 0 0 0 0 0 0 0
264

Table 5. Cluster Center and Data for Species 71

Truncated v 1 Species 71 Data Time Station

2 9 1 1
2 0 2 1
28 26 3 1
13 11 4 1

60 63 1 2
697 718 2 2
125 132 3 2
508 520 4 2

5 4 1 3
n 623
19 3 3 3
2 0 4 3

3 0 1 4
11 0 2 4
8 0 3 4
~ 044

4 0 1 5
1 0 2 5
6 0 3 5
o 0 4 5
It is clear from Table 5 that vIis dominated by the occurence of species 71 at
station 2. Further, this ostensibly explains the reason for species 71 wanting
"its own" cluster. Note how nicely the memberships in U (Table 2) mirror this
265

fact. It is also clear from the listing in Table 5 that vIis not a particularly
effective predictor of population count (nor do we expect it to be) - nonetheless,
one hopefully gains an understanding of the role played by the cluster centers in
FCM by studying this example. Note from Table 4 that the cluster center of the
main cluster (labelled v2 at c = 2, v3 at c = 5) is very close to the origin (of
R 20). Apparently this cluster characterizes those species that are found only
rarely in space and time.
So, what has been learned about X? I would guess that the 60 species
clustered together in Table 2 have some physical, chemical or biological relation-
ship that separates them from the other 26 (as previously suggested, perhaps
their main similarity is rarity). Moreover, that species 29 and 71 are somehow
quite distinct, both from the aforementioned group of 60, as well as the remain-
ing 24 marine worms. I expect to hear from marine ecologists about my conjec-
tures, right or wrong! In any case, I hope this example illustrates the main
strengths (and weaknesses) of clustering with FCM.
Beyond the direct use of the (FCM) clustering algorithms as presented
above, there have been many variations and extensions which are designed to
accommodate some feature of the data being studied. For example, Gustafson
and Kessel (1978) also recognized the need to allow each cluster in X to seek
different hyperellipsoidal shapes, and suggested the functional
JGKm(U, A ,v;X) = LL (uik)mdikA•. (32)
k i
as a means for accommodating this problem. In (32) the variable
A = (A 1,A2, ... ,Ai) is a set of (c) (sxs) positive definite matrices; distances to
points in each cluster are measured with different norms. Since the eigenstruc-
ture of Ai determines the hyperellipsoidal shape of clusters that match the i -th
term of JGK well, local minima of JGK possess the desired property: each cluster
may have a different (hyperellipsoidal) shape. Necessary conditions (31a) and
(31 b) are augmented with
Ai = (Pi det(Si»<lIs) (Si)-l, where (33a)
Si is the matrix at (26b), and (33b)
detA i = P for all i (33c)

Equation (33c) constrains the volumes determined by each Ai to be equal, so


(31) and (33) do not provide fully necessary conditions for (32). However, one
may use this sequence to search for triples (U ,A,v) that satisfy the requirements
266

imposed on the model, viz., finding local shape nonns that adjust themselves to
local substructures. There is probably a combination of algorithmic parameters
(J.l,m,A) for (FeE) that provides (roughly) the same solutions as (m,A) do for
many data sets. The point to be made is that both JFCEm and JGKm seek
"locally" hyperellipsoidal substructure by varying the nonn-induced topology
from cluster to cluster; the fonner fixes one A and varies each shape by stretch-
ing in direction (b i ) with "strength" J.l ; while the latter alters all (s) directions
in each cluster via different Ai's, but with fixed volume. Thus, local shapes
with JGK can be much more diverse (from each other) than with JFCE' but must
all occupy the same volume. Figure 8 illustrates these differences graphically.
The difficulty with all this is, of course, that one cannot know, for s >3, whether
X contains this sort of structure.

FCE : Each cluster may have a different volumeand


orientation but can have only one "linear" axis

GK : Each cluster may have a different shape and


orientation but all must have equal volumes

Figure 8. JFCE and JGK Optimize for Different Geometries.


267

Another modification of the basic (FCY) strategy involves utilization of


labelled data. Typical of this type is the algorithm of Pedrycz (1985), which
begins with the assumption that X can be subdivided into (q) labelled data, say
XI' and (t) unlabelled data X 2' so that X = XI U X 2' X If1X 2 = 0, and
(q +t) = n. Pedrycz assumes that the labels for points in X 1 may be fuzzy, so
that, for each xk E X l' we have l(xk) E N fe . A matrix (W) is constructed as
follows: for xk E X l' put wik = Ii (Xk)' l~i~c; and otherwise, put wik = 0 for
l~i ~c. Thus, W is a (c xn) matrix with (q) columns from N fe (fuzzy label
vectors), and (n - q) zero columns. Then Pedrycz adds a penalty term to the
special fuzzy c-means functional obtained by putting m =2 and A = Is in (30), to
yield
Jpz(U ,v ;X ,W) = LL (Uik dik)2 + LL «uik - Wik)dik )2. (34)
k i k i
In (34) X and Ware data; (U ,v) are the variables. The first term on the right
hand side of (34) is just J 2(U ,v ;X). The second term penalizes U's that do not
agree with the given labels, because the second term contains q zero terms when
U and W agree on X l' This modification results in the necessary condition

uik = «2 - L Wjk)/(2L (dik ldjk )2» + (wikI2) (35)


j j

in place of (31b) - recall that A = Is and m=2. Pedrycz also presents two gen-
eralizations of (34), one involving weights for the two terms of, respectively,
(lit) and (lIq); and secondly, localized Mahalanobis-like distances induced by
replacing A = Is with (c) matrices (Ci rlwhich combine the information in W
with the S/s in (26b). The methodology is illustrated with two sets of data:
Gustafson and Kessel's cross (Gustafson and Kessel 1978); and a set of EKG
data. This is a very interesting extension of FCM because it integrates local
shape modifications (like JGK ) with previous information (W). This area will
experience further growth.
Yet another avenue of variation from the basic (FCY) methodology is
represented by the (RCM) algorithm described briefly in Section 4A. fudeed,
substitution of (26a) into (22) with A = Is reduces (30) to (20) when rkj is the
squared distance between xk and Xj in X. Consequently, the relational criterion
JRCM is, for a special choice of R, equivalent to JFCM' In this case one can in
principle obtain the same U E M fen by minimizing either JFCM(U ,v; X) or
JRCM(U; R) as long as R = [rjk] = [Ixj - xkI2]. The point of (20) is, of course,
that JRCM is well-defined and can be used for any R, not just [Ixk - xjI2]; in
268

this more general situation, U's gotten by (RCM) may be interpreted as parti-
tions that might have been found by (FCM) if X had been available and con-
verted to R as above.
There are literally dozens of other papers that deal with object-criterion
fuzzy partitioning algorithms. I will conclude this section by pointing interested
readers towards (some) of these: Backer (1978), Bock (1984), Diday and Simon
(1976), Roubens (1982), and Kent and Mardia (1986). The last reference per-
tains to an interesting connection between statistical and fuzzy methodologies.
There are, of course, entire families of algorithms which generate matrices
U E M fen that are not interpreted as fuzzy c-partitions of X. Specifically,
parametric estimation techniques such as the method of maximum likelihood to
decompose mixtures of probability density functions generate a matrix
P = [Pik] E M fen , where Pik is the posterior probability that xk came from
class i given xk. Columns of P are label vectors in Nfe which advocates of sta-
tistical decision theory would call "probabilistic" labels for X. This obviously
yields the same sort of outputs for X that fuzzy clustering does; the difference
lies in one's belief about the data: are they really drawn from a statistical mix-
ture? There are hundreds of papers about this technique; interested readers will
get an excellent start in this direction with Everitt and Hand (1981), or the recent
survey by Redner and Walker (1984). Another school of thought not
represented here that is very active in generating "probabilistic" P's E M fen is
the methodology of relaxation labelling, which has been vigorously pursued by
Rosenfeld and his students. For an introduction to this area see Peleg (1981).

4.C. Convex Decompositions

In this section we shift our focus to several convex decomposition algo-


rithms. In general, one often needs to (ultimately) produce hard or crisp clusters
(U E Men::: R E En) on n objects. Once a fuzzy partition is obtained, say, by
one of the above methods, there are several obvious ways to convert
(U E Mfen) into crisp partitions (U' E Men); or to convert fuzzy (y /\) transi-
tive similarity relations (R E E v,) into crisp equivalence relations (R' E En).
Specifically, one may always extract the crisp maximal membership (maximal
relation) from U (R) by replacing each column in U (all elements in R) by the
appropriate crisp entities. This, in turn, is a special case of the more general
procedure of extracting crisp sets (in this case U' or R ') from fuzzy sets by
making a "~-cut": e.g. u'ik = 1 iff u'ik ;:: ~, 0:::; ~ :::;1; and u'ik = 0 otherwise.
These thresholding methods are particularly unsatisfying because we end up
269

"throwing away" some of the very information that fuzzy models presume to
capture. In this section we discuss convex decompositions of U and R. A fun-
damental distinction to be made at the outset is that (7) and (8) make this possi-
ble for U E Mfen produced by any method whatsoever; whereas no algorithms
exist that yield R E conv(En ). We begin with convex decompositions of U.
Equations (7) and (8) guarantee that each U E Mfen has at least one convex
decomposition into (n (c -1)+ 1) (possibly degenerate) U' s in Len. To see that
we must use Len instead of Men note that

U = (1~a 1~a 1~aJ, 0 ~ a ~1 (36)

cannot be written as a convex combination of U' s in Men. In fact, U at (36)


admits only the decomposition

U =a [6 66]+ (1 - a) [6 66], or variations thereof. (37)

The interpretation of a decomposition which involves degenerate terms mayor


may not be sensible. For example, zero rows in the degenerate terms might be
viewed as "coarse structure" indicants. In any case, we know we can write
U E Mfen as follows:
U = LaiUi (38a)

~a·
~ I
= 1·
'
(38b)

(38c)

In the present context the ai's may be interpreted as "degrees of confidence" in


their corresponding crisp Ui's; or as relative strengths of bonding between
chains in the equivalence classes induced by each Ui (if in Men). Three algo-
rithms for accomplishing (38) are given in Bezdek and Harris (1979); the
minimax (MM), forcing (F), and reclassification (R) decompositions. Interested
readers can pursue the details of these algorithms in the original paper. It
suffices here to illustrate their use by decomposing the matrix
270

U - (.90 .80 .30 .40 .05 ) (39)


- .10 .20 .70 .60 .95 '
with th~ (MM), (R) and (F) algorithms; these decompositions of U are listed in
Table 6 below.
First, note that all three decompositions allocate the largest "percentage"
(highest convex weight) of U to the hard object clusters {l,2} U {3,4,5}. This
would also result from conversion of U EM/en by several thresholding
methods. In particular, if U is thresholded using either the maximum member-
ship cut of U, whereby each column in U (point in N/e ) is simply replaced by
the vertex of N/e (point in N e ) closest to it; or by a ~-cut on U, wherein
column k of U is replaced by vertex i of N/ e if uik ~ ~, the crisp matrices
which result, say Umm and Up, coincide with U 1 in Table 6 (for U p, ~ ~ 0.60).
Note that Up's produced by the ~-cut strategy will have zero columns for ~ <
0.60, whereas Umm is always in Men (or at least Len depending on the tie-
breaking rule used). In the present example, the dominant term (one with the
largest convex weight) is

U 1 = [ 011000]
0 1 1 1 = Umm = U p,. (~~ 0.60).

It turns out that Umm is always the dominant term for both the (MM) and (R)
algorithms. The interesting aspect of decomposition of U as opposed to thres-
holding on U lies with the "remainders." Simply discarding the information in
U not preserved by thresholding seems to offset the advantages of using M fen in
the first place, Decompositions, on the other hand, may provide added insights
about object substructure that are otherwise lost. In Table 6, for example, the
(MM), (R) and (F) decompositions all have for their second term the crisp parti-
tion
11110]
U2 = [ 0 0 0 0 1 ' i.e.,

{l,2,3,4} U {5}. Thus, (U l' U 2 ) account for 80 (70) percent of the member-
ship in U via MM (R or F); this affords investigators with a very different sub-
structural interpretation than that provided by thresholding on UEM/en. Note
that both the number of and specific Ui ' s in the terms after U 2 in Table 6 are
quite different for the three decompositions; and that (MM) and (R) have only
Ui's in Men' whereas U 6 for the (F) decomposition is in Len (degenerate). The
last term in (F) suggests that there is some slight (5) possibility that all (5)
271

Table 6. (MM), (R), and (F) Convex Decompositions of U at (39).

Factor Convex Coefficients

Ui MM R F

11000 .60 .60 .40


001 1 1

11110 .20 .10 .30


00001

00010 .10 ** **
1 1 101

10111 .05 .10 **


01000

101 1 0 .05 .05 .**


o 1001
11010 ** .10 .10
00101

00111 ** .05 **
11000

10000 ** ** .10
01111

00001 ** ** .05
11110

00000 ** ** .05
11111
272

objects be grouped together (c = 1), whereas (MM) and (R) yield successively
less attractive possibilities at (c = 2). It is shown in Bezdek and Harris (1979)
that the coefficient vector (al, ...a q ) for (MM) is lexicographically larger than
any other convex decomposition, i.e., (MM) coefficients will always account for
the largest percentage of U in the same number of terms. Conjectured there is
that (MM) decomposition also is the shortest (minimal length) decomposition,
and is always non-degenerate (Ui E Men Y i). An example in Bezdek and
Harris shows that the maximum membership matrix (Umm)' which here appears
as the dominant term in all three decompositions of U exhibited in Table 6, may
in fact not even appear in Lai Ui . However, no decomposition produces a
larger (a l ) than (MM). Thus, the crisp equivalence relation R in En isomorphic
to Umm presumably implicates al as the maximal bonding strength enjoyed by
objects that are partitioned by U.
Questions about the method of convex decomposition in M/ en abound;
uniqueness, minimality, relation to En and conv(En ), physical interpretation of
the {ai}; all are good research topics. Furthermore, there are many algorithms
that convert object data (X) or relational data (R) into U EM/en' so this method,
which provides a very different means of interpreting U than thresholding,
deserves further study.
The intent of Figure 4 was to illustrate that M/en cannot be easily (if at all!)
identified with any of the imbeddings of En shown in the chart. Given a fuzzy
similarity relation R in the hierarchy E v 1\ c conv(En) c E v 1.1' e.g., how shall
we proceed to interpret R in terms of crisp clusters on n objects? When R E
conv(En ), one may proceed as above, to seek convex decompositions of R of the
form
(40a)

with (40b)

0::; a i ::; 1 and RiEEn Y i , (40c)


To illustrate, consider the matrix R =R T , where
0.3 0.6
1 0.7 (41)
1
273

Table 7 below exhibits the (unique) convex decomposition of R derived in


(Bezdek, 1978). If the convex weights are interpreted as relative "degrees of
attractiveness" of the crisp partition induced by the Ri ' s, one infers that R is
"best explained" at c = 3 by {1} U {2,3} U {4}; and that equal but lesser
credence should be placed on the two partitions {I ,2,3} U {4} and {1,3} U
{2} u {4}. Emphasized here for later reference: these three c1usterings are not
nested hierarchically, nor is (c = 4) admitted as a possibility.

Table 7. Convex Decomposition of Rat (41).

Coeff. Ri E En <-------> U i E Men C

1000 1000 3
0040 110 o1 1 0
10 0001
1

101 0 101 0 3
0.30 100 o 100
10 0001
1

1 110 1 110 2
0.30 110 0001
10
1

1 .3 .6 0
1 .7 0
R =
1 0
1

There are at least three important differences between the convex decompo-
sitions of U E Men and R E conv (En) shown above:
274

(i) U can be produced by many algorithms;


(ii) U can always be decomposed because of (7); and
(iii) There are at least three known algorithms for (ii).

Although (V~) transitivity is necessary for R E conv (En)' it is not sufficient.


Thus, the convex decomposition strategy for fuzzy similarity relations is much
less well developed than its counterpart for U E Mien.
As a footnote to Section (4C), we remark that Mien can be carried to E "ll
(but not conversely) by the operation R = T(U) = U T (L A)U, so every fuzzy c-
partition induces a pseudo-metric (d ij = 1 - rij) on object-pairs Bezdek and
Harris (1978). This method converts fuzzy object partitions into fuzzy (V ll)-
transitive similarity relations, which, if known to be in conv(En ), could then be
decomposed as above. Alternatively, T(U) = R can always be decomposed by
the method of transitive closures to be described next. As an example, if
.30 .90 .85 .10 .11 1
U = [ .50 .05 .00 .25 .78 , then for R = U T (LA)U,
.20 .05 .15 .65 .11

0 .60 .55 .45 .181


o .10 .80 .79
D = (1 - R) = [ 0 .75 .78
o .56
o

is a pseudo-metric on the five objects partitioned by U,

4.D. Numerical Transitive Closures


In this section we discuss methods based on extracting crisp equivalence
relations (R J3 E En) from fuzzy (V * )-transitive similarity relations
(C* E E" *). This approach is quite familiar to numerical taxonomists, as it is
closely related to hiearchical methods based on graph-theoretic models. Indeed,
Dunn (1974) wrote a virtually unnoticed but extremely important note that con-
nects most of Section (4D) to a very well known conventional method; we
return to this point below. The input data here may be Xc 1Rs or R E Vnn . In
the former case, we apply a measure of pairwise dissimilarity to pairs in X xX to
produce R E Vnn that is reflexive (9b) and symmetric (9b); it may be necessary
to normalize the rij's to satisfy (9a). If R is given we assume that it does
275

satisfy (9a) - (9c). For such an R, the (V *) transitive closure (C. (R» is cal-
culated as follows:
C. (R)=RVRk-l,k =2, ... ,n, where (42a)
R 2 = R (V *) R as in (9d). (42b)

In (42) (*) may be any T-nonn; in what follows we discuss only the
T 1, T2, and T3 nonns exhibited at (10). Zadeh (1971) proved that for T2 and T3
(42) indeed tenninated in at most (n - 1) steps; it is easy to see that the same is
true for any (T = *) that is bounded above by T3 (in particular, the (V ~) transi-
tive closure C tJ. (R) of R can be constructed with T 1 this way).
The construction of C. (R) via (42) is not very efficient; using matrix mul-
tiplication as in (42b) is O(n4). Kandel and Yelowitz (1974) presented an O(n3)
generalization of Warshall's algorithm for C. (R). Both algorithms were dis-
cussed for T3 (* = 1\); the complexity is unchanged for T2 and T 1. Equations
(42) continue to appear in reported applications, probably because users do not
have data for which n is large, and also because computer speed seems to
increase faster than our ability to utilize it. Dunn's (1974) paper gives us two
things: an even more (O( n 2» economical method for constructing C. (R); and
much more importantly, a proof that the method to be described below is none
other than the well-known single linkage algorithm when (* = 1\). In order to
appreciate this, we next describe the method itself.
Zadeh (1971) established that every ~-cut of C" (R) yields a hard
equivalence relation, say R,,~ E En' Because En =Mcn , R,,~ induces a unique
crisp partition U ,,~ on the n objects represented by R. Consequently, one may
generate a nested sequence of crisp object clusters by taking ~-cuts of C " at
Ws separating each pair of distinct elements. Specifically,
~l > ~2 ~ R "~1 C R "~2 . (43)

which leads to the sequence {R ,,~} ¢:::;> {U ,,~}, and hence to nested crisp clus-
ters. We shall discover below that when (* * 1\) the same clusters are gen-
erated, but not always sequentially.
276

Table 8. Clusters of Rat (41) using (y*) Transitive Closures.

1 .6 .60 1 .42.60 1 .3 .60


C * (R) 1 .70 1.70 1 .70
10 10 10
1 1 1

f3 = .29 1 110 11 10 1 1 10
110 110 1 10
10 10 10
1 1 1

f3 = .31 1 11 0 1110 10 10
110 1 10 110
10 10 10
1 1 1

f3 = .43 1 11 0 1010 1 0 10
110 1 10 110
10 10 10
1 1 1

f3 = .61 1 000 1000 1000


110 110 110
10 10 10
1 1 1

f3 = .71 1000 1000 1000


100 100 100
10 10 10
1 1 1
As an example, Table 8 shows the (y *) transitive closures of the matrix R
at (41) for (* = T I , T2, T3). First note that the only element in C* which varies
as a result of changing (*) is the (1,2) entry. As the definition of transitivity
changes, the (mathematically imposed) "bonding" between objects 1 and 2
277

varies. As the size of E y * increases (E y" c E y. c E y L\), the strength


needed to bind objects 1 and 2 decreases
(C /\12 = .60 > C' 12 = .42 > C L\12 = .30). Looking at the matrices arrayed against
the coefficient /3 = 0.29, we find that all three operators suggest the clusters
{ 1, 2, 3} and {4} for /3 < 0.30. For 0.30 S /3 < 0.60 things get more interest-
ing.
It is a widely held misconception that all /3-cuts of C* yield crisp
equivalence relations for arbitrary operators (*). Indeed, only column two of
Table 8, which corresponds to (* = 1\), contains a completely nested sequence
of such relations. The crisp relation matrices R* ~ extracted from C. and C L\
using the /3-cut idea are not, for (* = T2' T3) and 0.30 S /3 S 0.60, transitive in
the crisp sense. What may be learned from this? At /3 > 0.60 all three R*~'s
separate {I, 2, 3} into {l} and {2, 3}, because the link between (1) and (2,3) is
broken. For 0.30 S /3 S 0.60, the O's that appear in the (1,2) (and (2,1» entries
of R.~ and R L\~ might be interpreted as precursors of an impending rupture
(which already exists between objects 1 and 2 ) that will ultimately see objects
1 and 2 in different clusters - the undecided issue in this range of /3 is whether 3
will be joined to 1 or 2 after the break. Well, this is pretty far-fetched, but was
the best I could do for a quick explanation of a heretofore unnoticed fact:
amongst all of the (y *) transitive closures of R, the nested sequence of hard
equivalence relations obtained through this procedure is unique: just use
(y I\)! This is not to say that C* (R) for other operators is not useful in other
contexts (cf. Bezdek (1986), where chaining in expert systems is done using
C>I< (R) without recourse to crisp conversions like R* ~ ); rather, it appears that
the method of clustering via (y *) transitivity as proposed in Zadeh (1971) is
confined to the (y 1\) case. The fact contained in Dunn's paper (1974), that
this apparently novel method was nothing more than the single linkage algorithm
used by so many advocates of agglomerative hiearchical clustering takes some of
the luster away from this technique. As mentioned above, the matrices C* (R)
have found other uses; but it seems fair to say that they offer little to serious
users of clustering algorithms. Whether my infonnal interpretation of the events
shown in Table 8 can be fonnalized and exploited to any advantage remains to
be seen. There have been a number of recent fuzzy sets papers that report good
results using the method of (y 1\) transitive closure. Since these amount to suc-
cessful applications of single linkage ( a "standard" clustering algorithm in my
view), there is no need for me to review them here.
278

4.E. Generalized Nearest Neighbor Rules


In the context established by Figure 1 k-nearest neighbor rules (k-NNR) are
not clustering algorithms; rather, I regard them as classifiers on ~. However,
k-NNR's are sometimes used to cluster an unlabelled data set X through the
expedient of submitting each x E X to the rule; and then aggregating the results.
To make this notion more concise, we let Xd be a set of labelled design data,
IXdl = nd. The labels provided with Xd are a partition of Xd , say Ud EMfen •
Our assumption is that Ud has either been provided by the modeller; or perhaps,
obtained through the use of a clustering algorithm (crisp or fuzzy) as described
above. Each column of Ud lies in Nfe -- a fuzzy label vector for the datum in
Xd associated to it. Generalized k-NNR's operate quite as one would expect: (i)
choose k, the number of NN's in Xd to look for; (ii) choose a way to compute
"nearest", e.g., any metric (B) on R S ; and finally (iii), specify a NN decision
rule (NNDR) for assigning a label in Nfe to x based on the labels of its k NN's.
Suppose X to be an unlabelled data set as above. There are many ways to use
the labels in Ud to label x E X. For example, one may simply average the k-
NN label vectors as follows: compute

Ni = the k points in Xd which are B-nearest to Xi (44a)

Ii = indices of points in Ni (44b)

C i = columns of Ud for indices in Ii = [cit' ci2' .... ' cik]' and (44c)

Ii =L (Cij / k). (44d)

Then vector Ii E N fe is a fuzzy label vector for Xi E X. If one repeats (44a) -


(44d) for i = 1 to n and arrays the n vectors {Ii} from (44d) as a (c x n) matrix
U, the result is that U E L fen • Note especially that even though the nd points in
Xd have at least one partial representative in each class (because Ud E Mfen)'
there is no guarantee that the arrayed Ii's will, so U may be degenerate (have a
zero row). Ordinarily, however, U is in M fen , and in this case the labels gen-
erated by (44) do provide a fuzzy c-partition of X. Once U E M fen is in hand,
the remaining steps for interpreting it in the context of a particular problem are
279

exactly as above. One may threshold to find Umm or U ~ ; make a convex


decomposition, U = J:P'iui ; or convert it to T(U) = R E Vnn and proceed
accordingly.
Conventional k-NNR's often replace each Ii by the vertex of N fe closest to
it (majority voting when Ud E Men)' More to the point, if U d E Men' the vec-
tor Ii is, under suitable assumptions· about the distribution of X from statistical
decision theory (Devijver and Kittler 1982), an approximation to the posterior
probability vector Pi obtained through Bayes rule. This interpretation for Ii
demands a lot of statistical apparatus which is not implicit in (44). If informa-
tion about substructure in X resides primarily in the labels supplied with Xd , one
needs lots of them. At the other extreme, when Xd has no labels, the mixture
assumptions needed for SDT make as much sense as anything else. Devijver
and Kittler (1982) give a very lucid discussion of two classes of k-NNR's that
accommodate weighted voting and non-integer thresholds. However, the three
conventional classes of NNR's (k,(k,t),(k,{ti }) all assume that (i) Ud E Men is
crisp; and that NNR's should result in crisp labels, i.e., U E Len' Thus, the
scheme at (44) is more general in terms of both the input and output labels.
The extant literature on fuzzy k-NNR's is pretty sparse, I suspect because it
is hard to see where useful fuzzy input labels will come from (if not gotten by
processing Xd ). Of course the algorithm of Pedrycz reviewed above accommo-
dates information of this type through the objective function Jpz, which would
take Ud as the non-zero part of the matrix W in equation (34). Various authors
have discussed different ways to obtain Ud . For example, Bezdek et al. (1986)
suggest that under some circumstances it may be profitable to actually ignore
crisp labels for Xd , apply FCM to it, and use the columns of Ud obtained
thereby in (44). In Bezdek et al. (1986) the authors compared four classifiers:
the crisp k-NNR, the FCM/k-NNR, the FCM/I-NPR, and Jozwik's fuzzy k-NNR
on several artificially generated mixtures of multivariate normals as well as the
IRIS data. Columns of U were converted to crisp labels in N e using simple
maximum membership conversion so that classifier errors could be tallied. The
results of their computational experiments implied that the FCM nearest proto-
type (closet vi to each xkE X, vi from (31a)) rule enjoyed a slight advantage in
terms of efficiency; and that the crisp k-NNR was consistently poorest in terms
of error rate, while the FCM/k-NNR was best. One should view general conclu-
sions that these remarks may invite very carefully; the data were finite, well-
structured, and limited. Perhaps the best thing to say about generalized k-NNR's
at this writing is that they seem interesting enough to deserve further study.
280

Readers interested in further discussion along these lines may begin with Keller
and Givens (1985), Keller, Gray and Givens (1985), Jozwik (1983), and Duin
(1982).
5. CONCLUSIONS
There are, of course, many fuzzy clustering algorithms that have not been
reviewed above. Some are ostensibly quite interesting and useful -- others seem
preposterous! On the other hand, any scheme that really solves a problem or
provides useful insights to data deserves a place in the literature. I hope that the
above review constitutes at least a glimpse of the major structures and clustering
models now being pursued by the "Fuzzy sets" community.
Perhaps the best single piece of advice that can be given to potential users
of (any) clustering algorithm is this: try two or three different algorithms on your
data. If the results are stable, interpretation of the data using these results gains
credibility; but widely disparate results suggest one of two other possibilities:
either the data has no cluster substructure, or the algorithms tried so far are not
well matched to existent but as yet undetected substructure. The algorithms
described above have enjoyed varying degrees of success with a wide cross sec-
tion of data types. There is every reason to expect that in some cases clusters
obtained using, e.g., FCM, with ecological data will provide very serviceable
interpretations of the ecosystem under study. I encourage readers in the applica-
tions community to try one or more of the fuzzy algorithms discussed above -
the results might be very surprising! On this note my survey concludes.

REFERENCES
ANDERBERG, M. R. 1983. Cluster analysis for researchers, Academic Press,
New York.
ANDERSON, I., BEZDEK, J., AND DAVE, R. 1982. Polygonal shape descriptions
of plane boundaries, in Systems science and science, vol. 1, pp. 295-301,
SGSR Press, Louisville.
ARBIB, M. 1977. Book reviews, Bull. AMS, vol. 83, no. 5, pp. 946-951.
(Arbib provides scathing reviews of three fuzzy sets books).
BACKER, E. 1978. Cluster analysis by optimal decomposition of induced fuzzy
sets, Delft Univ. Press, Delft.
BANDLER, W., AND KOHOUT, L. 1984. The four modes of inference in fuzzy
expert systems, Cyber. and Sys. Res. , vol. 2, pp. 581-586.
281

BELLMAN, R., KALABA, R., AND ZADEH, L. A. 1966. Abstraction and pattern
classification, Jo. Math. Anal. and Appl., vol. 13, pp. 1-7.
BEZDEK, J. C. 1974. Numerical taxonomy with fuzzy sets, Jo. Math. Bio, vol.
1, no. 1, pp. 57-71.
BEZDEK, J. c., AND DUNN, J. C. 1975. Optimal fuzzy partitions: a heuristic for
estimating the parameters in a mixture of normal distributions, IEEE Tran-
sactions on Computers, vol. 24, no. 8, pp. 835-838.
BEZDEK, J. C. 1976a. Feature selection for binary data: medical diagnosis with
fuzzy sets, Proc. 1976 NCC, AFIPS (45), pp. 1057-1068, AFIPS Press,
Montvale.
BEZDEK, J. C. 1976b. A physical interpretation of fuzzy ISODATA, IEEE
Trans. SMC, vol. 6, no. 5, pp. 387-389.
BEZDEK, J. c., AND CASTELAZ, P. 1977. Prototype classification and feature
selection with fuzzy sets, IEEE Trans. SMC, vol. 7, no. 2, pp. 87-92.
BEZDEK, J. c., AND HARRIS, J. D. 1978. Fuzzy relations and partitions: an
axiomatic basis for clustering, Fuzzy Sets and Systems, vol. 1, pp. 111-127.
BEZDEK, J. c., AND FORD ON, W. 1978. Analysis of hypertensive patients by
the use of the fuzzy ISODATA clustering algorithms, Proc. 1978 Joint
Automatic Control Conference, pp. 349-355, ISA Press, Pittsburgh.
BEZDEK, J. C. 1978. Fuzzy algorithms for particulate morphology, in Proc.
1978 int'l powder and bulk solids conf., pp. 143-150, ISCM Press, Chicago.
BEZDEK, J. c., AND HARRIS, J. D. 1979. Convex decompositions of fuzzy par-
titions, Jo. Math. Anal. and Appl., vol. 67, no. 2, pp. 490-512.
BEZDEK, J. c., AND FORDON, W. A. 1979. The application of fuzzy set theory
to medical diagnosis, in Advances in fuzzy set theory and applications, pp.
445-461, North Holland, Amsterdam.
BEZDEK, J. C. 1980. A convergence theorem for the fuzzy ISODATA cluster-
ing algorithms, IEEE Transactions on Pattern Analysis and Machine Intelli-
gence, vol. PAMI-2, no. 1, pp. 1-8.
BEZDEK, J. C. 1981a. Pattern recognition with fuzzy objective function algo-
rithms, Plenum Press, New York.
BEZDEK, J. c., CORAY, c., GUNDERSON, R., AND WATSON, J. 1981b. Detec-
tion and characterization of cluster substructure: I. linear structure: fuzzy c-
lines, SIAM Jo. Appl. Math, vol. 40, no. 2, pp. 339-357.
BEZDEK, J. C., CORAY, c., GUNDERSON, R., AND WATSON, J. 1981. Detec-
tion and characterization of cluster substructure: II. fuzzy c-varieties and
282

convex combinations thereof, SIAM Jo. Appl. Math, vol. 40, no. 2, pp.
358-372.
BEZDEK, J. c., AND SOLOMON, K. 1981. Simulation of implicit numerical
characteristics using small samples, in Proc. ICASRC, ed. G. E. Lasker, vol.
VI, pp. 2773-2784, Permagon, New York.
BEZDEK, J. c., AND ANDERSON, I. 1984. Curvature and tangential deflection of
discrete arcs, IEEE Trans. PAMI, vol. 6, no. 1, pp. 27-40.
BEZDEK, J. c., AND ANDERSON, I. 1985. An application of the c-varieties clus-
tering algorithms to polygonal curve fitting, IEEE Trans. SMC, vol. 15, no.
5, pp. 637-641.
BEZDEK, J. C., HATHAWAY, R. J., AND HUGGINS, V. J. 1985. Parametric esti-
mation for normal mixtures, Pattern Recognition Letters, vol. 3, pp. 79-84.
BEZDEK, J. C., GRIMBALL, N., CARSON, J., AND ROSS, T. 1986. Structural
failure determination with fuzzy sets, in press, Civil Engr. Sys ..
BEZDEK, J. C., BISWAS, G., AND HUANG, L. 1986. Transitive closures of fuzzy
thesauri for information retrieval systems, in press, IJMMS.
BEZDEK, J. c., CHUAH, S., AND LEEP, D. 1986. Generalized k-nearest neigh-
bor rules, Fuzzy Sets and Systems, vol. 18, pp. 237-256.
BEZDEK, J. C., AND HATHAWAY, R. J. 1986. Clustering with relational c-
means partitions from pairwise distance data, in press, Jo. Math Modeling.
BEZDEK, 1. c., HATHAWAY, R. J., HOWARD, R. E., WILSON, C. E., AND
WINDHAM, M. P. 1986. Local convergence analysis of a grouped variable
version of coordinate descent, in press, Jo. Optimization Theory.
BISWAS, G., JAIN, A. K., AND DUBES, R. C. 1981. Evaluation of projection
algorithms, IEEE Trans. PAMI, vol. 3, no. 6, pp. 701-708.
BLOCKLEY, D. I., PILSWORTH, G. W., AND BALDWIN, J. F. 1983. Measures of
uncertainty, Civil Eng. Sys, vol. 1, pp. 3-9.
BOCK, H. H. 1984. Statistical testing and evaluation methods in cluster analysis,
Proc. lSI, pp. 116-146, Calcutta.
BOISSONADE, A., DONG, W., Lm, S., AND SHAH, H. C. 1984. Use of pattern
recogniton and bayesian classification for earthquake intensity and damage
estimation, Int. Jo. Soil Dynamics & Earth. Engr., vol. 3, no. 3, pp. 145-149.
BONNIS ONE, P., AND DECKER, K. 1985. Selecting uncertainty calculi and
granularity: an experiment in trading-off precision and complexity, GE
TR85.5C38, Schenectady.
283

CANNON, R., DAVE, J., AND BEZDEK, J. C. 1986. Efficient implementation of


the fuzzy c-means clustering algorithms, IEEE Trans. PAMI, vol. 8, no. 2,
pp. 248-255.
CANNON, R., DAVE, J., BEZDEK, Jr C., AND TRIVEDI, M. 1986. Segmentation
of a thematic mapper image using the fuzzy c-means clustering algorithm,
IEEE Trans. Geo. & Remote Sensing, vol. 24, no. 3, pp. 400-408.
CHERNOFF, H. 1973. The use of faces to represent points in K-dimensional
space graphically, JASA, vol. 68, pp. 361-368.
COXON, A. P. M. 1982. The user's guide to multidimensional scaling,
Heinemann, London.
DEVI, B. B. 1986. Compact clustering using fuzzy ISODATA, Proc. NAFIPS,
pp. 31-37, NAFIPS Press, Columbia.
DEVUVER, P. A., AND KITILER, J. 1982. Pattern recognition: a statistical
approach, Prentice-Hall, Englewood Cliffs.
DIDAY, E., AND SIMON, J. C. 1976. Clustering analysis, in Digital pattern
recognition, pp. 47-94, Springer-Verlag, New York.
DONG, W., BOISSONADE, A., SHAH, H. C., AND WONG, F. 1985. Fuzzy
classification of seismic intensity, Proc. ISFMER, pp. 129-148, Seismologi-
cal Press, Beijing.
DUDA, R. 0., AND HART, P. E. 1973. Pattern classification and scene analysis,
p. 249, Wiley-Interscience, New York.
DUIN, R. P. W. 1982. The use of continuous variables for labelling objects,
Patt. Recog. Letters, vol. 1, pp. 15-20.
DUNN, J. C. 1974a. A fuzzy relative of the ISODATA process and its use in
detecting compact well-separated clusters, Jo. Cyber, vol. 3, pp. 32-57.
DUNN, J. C. 1974b. A graph theoretic analysis of pattern classification via
Tamura's fuzzy relation, IEEE Trans. SMC, pp. 310-313.
EVERITI, B. S. 1980. Cluster analysis (second edition), Heinemann, London.
EVERITT, B. S., AND HAND, D. J. 1981. Finite mixture distributions, Chapman
& Hall, New York.
FOLEY, D. H., AND SAMMON, J. W. 1975. An optimal set of discriminant vec-
tors, IEEE Trans. Comp, vol. C24, no. 3, pp. 281-289.
PRESI, E., COLOGNOLA, R., GAMBI, M.C., GIANGRANDE, A., AND Scardi, M.
1983. Richerche sui popolamenti bentonici di substrato duro del porto di
Ischia. Infralitorale fotofilo : Policheti, Cahiers de biologie marine, vol. 24,
pp. 1-19.
284

FRIEDMAN, J. H., AND TuKEY, J. W. 1974. A projection pursuit algorithm for


exploratory data analysis, IEEE Trans. Comp., vol. C23, no. 9, pp. 881-890.
FU, K. S. 1974. Syntactic approaches to pattern recognition, Academic Press,
New York.
FU, K. S. 1982. Syntactic pattern recognition and applications, Prentice Hall,
Englewood Cliffs.
FUKUNAGA, K., AND KOONTZ, W. 1970. Application of the Karhunen-Loeve
expansion to feature selection and ordering, IEEE Trans. Comp., vol. C19,
pp. 311-318.
FUKUNAGA, K. 1972. Introduction to statistical pattern recognition, Academic
Press, New York.
FULL, W., EHRLICH, R., AND BEZDEK, J. C. 1982. A new approach for linear
unmixing, Jo. Math. Geo., vol. 14, no. 3, pp. 259-270.
GOODMAN, 1. R. 1982. Some fuzzy set operations which induce homomorphic
random set operations, in Proc. 1982 SGSR, SGSR Press, Washington.
GRANATH, G. 1984. Application of fuzzy clustering and fuzzy classification to
evaluate provenance of glacial till, Jo. Math Geo." vol. 16, no. 3, pp. 283-
301.
GUNDERSON, R. W. 1983. An adaptive FCV clustering algorithm, IJMMS, vol.
19, no. 1, pp. 97-104.
GUSTAFSON, D., AND KESSEL, W. 1978. Fuzzy clustering with a fuzzy covari-
ance matrix, Proc. IEEE CDC, pp. 761-766, San Diego.
HARTIGAN, J. A. 1975. Clustering algorithms, Wiley, New York.
HATHAWAY, R., AND BEZDEK, J. C. 1986. On the asymptotic properties of
fuzzy c-means cluster prototypes as estimators of mixture subpopulations,
Comm. Stat., vol. 15, no. 2, pp. 505-513.
HUNTSBERGER, T., JACOBS, C. L., AND CANNON, R. L. 1985. Iterative fuzzy
scene segmentation, Patt. Recog., vol. 18, pp. 131-138.
HUNTSBERGER, T., AND DESCALZI, M. 1985. Color edge detection, Patt.
Recog. Letters, vol. 3, pp. 205-209.
ISMAIL, M. A., AND SELIM, S. A. 1986. On the local optimality of the fuzzy
ISODATA clustering algorithm, IEEE Trans. PAMI, vol. 8, no. 2, pp. 284-
288.
JACOBSEN, T., AND GUNDERSON, R. 1983. Trace element distribution in yeast
and wort samples: an application of the FCV clustering algorithms, IJMMS,
vol. 19, no. 1, pp. 105-116.
285

JOHNSON, R. A, AND WICHERN, D. W. 1982. Applied multivariate statistical


analysis, Prentice-Hall, Englewood Cliffs.
JOZWIK, A. 1983. A learning scheme for a fuzzy k-NN rule, Patt. Recog.
Letters, vol. 1, pp. 287-289.
KANDEL, A, AND YELOWITZ, L. 1974. Fuzzy chains, IEEE Trans. SMC., pp.
472-475.
KELLER, J. M., AND GIVENS, J. A. 1985. Membership function issues in fuzzy
pattern recognition, Proc. IEEE SMC, Tucson.
KELLER, J. M., GRAY, M. R., AND GIVENS, J. A 1985. A fuzzy k-nearest
neighbor algorithm, IEEE Trans. SMC, vol. 15, no. 4, pp. 580-585.
KENT, 1. T., AND MARDIA, K. V. 1986. Spatial classification using fuzzy
membership models, in review, IEEE Trans. PAM!.
KLEINER, B., AND HARTIGAN, J. A. 1981. Representing points in many dimen-
sions by trees and castles, JASA, vol. 76, pp. 260-276.
KRUSKAL, J. B., AND LANDWEHR, J. M. 1983. Icicle plots: better displays for
hierarchical clustering, Amer. Stat., vol. 37, pp. 162-168.
LEE, R. C. T., SLAGLE, J. R., AND BLUM, H. 1977. A triangulation method for
the sequential mapping of points from N-space to 2-space, IEEE Trans.
Comp., vol. C27, pp. 288-292.
LIBERT, G., AND ROUBENS, M. 1982. Non-metric fuzzy clustering algorithms
and their cluster validity, in Fuzzy information and decision processes, ed.
M. Gupta and E. Sanchez, pp. 417-425, Elsevier, New York.
LINDLEY, D. V. 1982. Scoring rules and the inevitability of probability, Int.
Stat. Review, vol. 50, pp. 1-26.
LORR, M. 1983. Cluster analysis for the social sciences, Jossey-Bass, San Fran-
cisco.
MATHERON, G. 1975. Random sets and integral geometry, Wiley, New York.
LOWEN, R. 1982. On fuzzy complements, Inf. Sci., vol. 14, pp. 107-113.
MCBRATNEY, A. B., AND MOORE, A W. 1985. Application of fuzzy sets to
climatic classification, Ag. & Forest Meteor, vol. 35, pp. 165-185.
NARENDRA, P. M., AND FUKUNAGA, K. 1977. A branch and bound algorithm
for feature subset selection, IEEE Trans. Comp, vol. C26, pp. 917-922.
PEARSON, K. 1898. Contributions to the mathematical theory of evolution, Phil.
Trans. of the Royal Soc. of London, vol. 185, pp. 71-110.
286

PEDRYCZ, W. 1985. Algorithms of fuzzy clustering with partial supervision,


Patt. Recog. Letters, vol. 3, pp. 13-20.
PELEG, S., AND ROSENFELD, A 1981. A note on the evaluation of probabilistic
labelings, IEEE Trans. SMC., vol. 11, no. 2, pp. 176-179.
REDNER, R. A, AND WALKER, H. F. 1984. Mixture densities, maximum likeli-
hood, and the EM algorithm, SIAM Review, vol. 26, no. 2, pp. 195-240.
ROUBENS, M. 1978. Pattern classification problems with fuzzy sets, Fuzzy Sets
and Systems, vol. 1, pp. 239-253.
ROUBENS, M. 1982. Fuzzy clustering algorithms and their cluster validity, Eur.
Jo. Op. Res., vol. 10, pp. 294-301.
RUSPINI, E. 1969. A new approach to clustering, Inf. and Control, vol. 15, pp.
22-32.
SABIN, M. J. 1986. Convergence and consistency of fuzzy c-Means/ISODATA
algorithms, in review, IEEE Trans. PAM!.
SAMMON, J. W. 1969. A non-linear mapping for data structure analysis, IEEE
Trans. Comp., vol. C18, pp. 401-409.
SELIM, S. A, AND ISMAIL, M. A 1984. K-means type algorithms: a general-
ized convergence theorem and characterization of local optimality, IEEE
Trans. PAMI, vol. 6, no. 1, pp. 81-87.
SNEATH, P. H. A, AND SOKAL, R. R. 1973. Numerical taxonomy, Freeman,
San Francisco.
THOMASON, M., AND GONZALEZ, R. 1981. Syntactic pattern recognition: an
introduction, Addison-Wesley, Reading.
TOU, J. T., AND GONZALEZ, R. C. 1974. Pattern recognition principles,
Addison-Wesley, Reading.
TRIBUS, M. 1979. Comments on fuzzy sets, fuzzy algebra, and fuzzy statistics,
Proc. IEEE, vol. 67, pp. 1168-1169.
TRIVEDI, M., AND BEZDEK, J. C. 1986. Low level segmentation of aerial
images with fuzzy clustering, IEEE Trans. SMC, vol. SMC-16, no. 4, pp.
589-597.
TRYON, R. C. 1939. Cluster analysis, Edwards Bros., Ann Arbor.
TuKEY, J. W. 1977. Exploratory data analysis, Addison-Wesley, Reading.
VERHAGEN, C. 1975. Some general remarks about pattern recognition; its
definition; its relation with other disciplines; a literature survey, Patt. Recog.,
vol. 8, no. 3, pp. 109-116.
287

WEE, W. G. 1967. On generalizations of adaptive algorithms and applications


of the fuzzy sets concept to pattern classification, Purdue Univ. PhD Thesis,
Lafayette
WINDHAM, c., WINDHAM, M. P., WYSE, B., AND HANSEN, G. 1985. Cluster
analysis to improve food classification within commodity groups, Jo. Amer.
Diet. Assoc., vol. 85, no. 10, pp. 1306-1314.
WINDHAM, M. P. 1982. Cluster validity for the fuzzy c-means algorithm, IEEE
Trans. PAM!, vol. 4, no. 4, pp. 357-363.
WINDHAM, M. P. 1985. Numerical classification of proximity data with assign-
ment measures, Jo. Class, vol. 2, pp. 157-172.
WINDHAM, M. P. 1986. A unification of optimization-based numerical
classification algorithms, in Classification as a tool for research, ed. W. Gaul
& M. Schader, pp. 447-451, North Holland, Amsterdam.
ZADEH, L. A. 1965. Fuzzy sets, Inf. and Control, vol. 8, pp. 338-353.
ZADEH, L. A. 1971. Similarity relations and fuzzy orderings, Inf. Sci., pp. 177-
200.
CONSTRAINED CUJSTERING

Pierre Legendre
Departement de Sciences biologiques
Universite de Montreal
C.P. 6128, Succursale A
Montreal, Quebec H3C 317, Canada

Abstract - Results of cluster analysis usually depend to a large extent on the choice of a clustering
method. Clustering with constraint (time or space) is a way of restricting the set of possible
solutions to those that make sense in terms of these constraints. Time and space contiguity are so
important in ecological theory that their imposition as an a priori model during clustering is
reasonable. This paper reviews various methods that have been proposed for clustering with
constraint, flrst in one dimension (space or time), then in two or more dimensions (space). It is
shown, using autocorrelated simulated data series, that if patches do exist, constrained clustering
always recovers a larger fraction of the information than the unconstrained equivalent. The
comparison of autocorrelated to uncorrelated data series also shows that one can tell, from the
results of agglomerative constrained clustering, whether the patches delineated by constrained
clustering are real. Finally, it is shown how constrained clustering can be extended to domains
other than space or time.

lNTRODUCI10N

Constrained clustering is part of a family of methods whose purpose is to delimit


homogeneous regions on a univariate or multivariate surface, by forming blocks of pieces that are
also adjacent in space or in time. As an alternative to clustering, this same problem of "regional
analysis" can be addressed by ordination methods, as is the case with most other problems of
descriptive data analysis. Various methods of "regional analysis" have been reviewed by
Wartenberg (manuscript) who divided them into three basic classes: (1) a posteriori testing of
nongeographic solutions; (2) clustering or ordering with absolute contiguity constraint; and, (3)
geographic scaling of phenetic information.

Clustering with constraint is one way of imposing a model onto the data analysis process,
whose end result otherwise would depend greatly on the clustering algorithm used. The model
consists of a set of relationships that we wish the clustering results to preserve, in addition to the
information contained in the resemblance matrix (or, for some clustering methods, in the raw data:
Lefkovitch 1987). These relationships may consist of geographic information, placement along a
time series, or may be of other types, as we will see. In any case, imposing a constraint or a set
of constraints onto a data-analytic method is a way of restricting the set of possible solutions to
those that are meaningful in terms of this additional information.

NATO ASI Series, Vol. G14


Developments in Numerical Ecology
Edited by P. and L. Legendre
© Springer-Verlag Berlin Heidelberg 1987
290

In this paper, we will flrst describe various forms of constrained clustering. Then we will
examine the questions of whether constrained clustering is necessary to get meaningful results,
and how to determine if the patches found by constrained clustering are real. Finally, we will
suggest that the concept of constrained clustering can be extended to models other than space or
time.

Ecologists are primarily interested in two types of natural constraints: space and time.
Ecological sampling programs are usually designed along these physical axes, so that information
about the position of ecological samples in space and in time is almost always known.
Furthermore, various parts of ecological theory tei! us that elements of an ecosystem that are
closer in space or in time are more likely to be under the influence of the same generating process
(competition theory, predator-prey interactions, succession theory), while other parts of
ecological theory tell us that the discontinuities between such patches in space or in time are
important for the structure (succession, species-environment relations) or for the dynamics of
ecosystems (ergoclines).

These reasons are so compelling as to legitimize a clustering approach where the clusters
will be considered valid only if they are made of contiguous elements. From this point of view,
clusters of noncontiguous elements, such as can be obtained from the usual unconstrained
clustering algorithms, are seen as an artifact resulting from the artificial aggregation of effects from
different but converging generating processes. We will come back to this point later on.

ONE-DIMENSIONAL CONSTRAINT

In many ecological problems, the a priori information to be taken into account is


one-dimensional. This is the case when the sampling takes place through time or along a transect,
or else when studying sediment cores (that may represent either space or time series). The
methods for dividing such data series into segments, using a constrained approach, go back to W.
D. Fisher (1958), an economist, who suggested an algorithm for univariate data based on
minimizing the weighted sum of within-group sums of squared distances to the group centroids.
The user must also decide how many groups he/she wishes to obtain. Fisher's method was valid
in both the constrained and the unconstrained situation. It was later generalized to multivariate
data by Ward (1963), who considered only the unconstrained case, and proposed the well-known
minimum-variance hierarchical clustering method.
291

Several other proposals have been reviewed by Wartenberg (manuscript). Among these, let
us mention the method of Webster (1973), a soil scientist who needed to partition multivariate
sequences corresponding to a space transect or to a core. Moving a window along the series,
Webster compared the two halves of the segment covered by the window, either with Student's t
or Mahalanobis' D 2 , and he placed boundaries at points of maximum value of the statistic. While
the results obtained depend in part on the window length, Webster's method is interesting in that it
looks for points of maximal changes between regions.

The dual approach to this problem is to look for maximal homogeneity within segments.
This was the point of view adopted by Hawkins and Merriam who proposed a method for
segmenting a univariate (1973) or a multivariate (1974) data series into homogeneous units, using
a dynamic programming algorithm. This method was advocated by Ibanez (1984) for the study of
successional steps in ecosystems.

Although it represents a methodological improvement over previous ways of studying


succession, this method is still problematic. First, the user must determine the number of
segments she/he wishes to obtain, using as an indicator the increase in explained variation relative
to the increase in the number of segments. A second problem with ecological data is that strings of
multiple zeroes, which are very often found in species abundance data series, are likely to cause
the formation of segments based on species absences. Actually, the method assumes each group to
be drawn from a multivariate normal distribution and it is sensitive to departures from this
condition, which is rarely met by ecological data. Finally, as the user increases the number of
groups, group breaks that appear at one grouping level may change position at the next level
(Legendre et al. 1985: 274).

Using the hierarchical clustering approach, Gordon and Birks (1972, 1974) and Gordon
(1973) included the time constraint in a variety of algorithms to study pollen stratigraphy. They
used constrained single linkage, constrained average linkage, and a constrained binary division
algorithm. Their purpose was to define zones of pollen and spores that are homogeneous within
zones and different between zones. They compared their various techniques, which led by and
large to the same result. As we will see below, this was probably due to the predominant
influence of the constraint on the results.

Legendre et al. (1985) used a very similar approach to study ecological successions
through time. The basis of their method, called "chronological clustering", is proportional-link
linkage hierarchical clustering with a constraint of time contiguity. This means that only
time-adjacent groups are considered contiguous and are assessed for clustering. There is one
292

important addition to the ideas of Gordon and his co-workers, however: this algorithm is
supplemented with a statistical test of cluster fusion whose hypotheses correspond to the
ecological model of a succession evolving by steps.

Prior to this analysis, a distance matrix among samples has been computed, using a
dissimilarity function appropriate to the problem at hand (ecological succession, or other).
Considering two groups (1) that are contiguous and (2) that are proposed for fusion by the
clustering algorithm, a one-tailed test is made of the null hypothesis that the "large distances" in
the sub matrix are distributed at random within and among these two groups. The test is
performed by randomization; this test could actually be re-formulated as a special form of the
Mantel test (1967). The above-mentioned paper shows the true probability of a type I error to be
equal to the nominal significance level of the test. When the null hypothesis is accepted at the
given confidence level, the two groups are fused. The computer program also allows for the
elimination of aberrant samples that can form singletons and prevent the fusion of their
neighboring groups, and it offers complementary tests of the similarity of non-adjacent groups.
The end result is a nonhierarchical partition of the samples into a set of internally contiguous
groups, the number of which has not been coined by the user.

[ill
-,-
1966: 1967

T
3 4 11 12 14 15 23 24 28 29 I 41 42 47
(21 """'Q) 0 ® @ 0
48 51 52 55

®I @

56 6667 74 75 78
, @ ® ~
1967: 1968

Fig. 1. Schematic representation of the chronological clustering of 78 samples of


Mediterranean chaetognaths. Cluster numbers are circled. Between-group pairwise relationships
are represented by vertical lines. The boxed sample is a singleton. Connectedness = 25%, (l =
0.25. From Legendre et ai. (1985), Figure 4.

I shall illustrate time-constrained clustering with this method. The example consists of a
series of78 samples of Mediterranean zooplankton (chaetognaths) obtained from 1966 to 1968
and analyzed by Legendre et at. (1985). In Figure 1, the series is folded to allow represention of
the relationships among clusters; these relationships have been computed by a posteriori testing,
using the test of cluster fusion described above. The ecological significance of the group breaks is
discussed in the above-mentioned paper.
293

This data set was also subjected to chronological clustering using several values of
connectedness during the proportional-link linkage agglomeration. Without the constraint, low
values of connectedness have a space-contracting effect while high values cause an effect
equivalent to an expansion of the reference space (Lance and Williams 1967). As shown in Figure
2, the results are quite stable through a range of connectedness values. This illustrates the
predominant effect of the constraint during the clustering process, as previously noted by Gordon
and Birks (op. cit.). Clustering the same data set by unconstrained proportional-link linkage
produced scrambled, uninterpretable results (Legendre et al. 1985).

'966 '1'967
9 ~
SAMPLE,'", , 5 " , 'O,",!S, ~~ 25 ~o, , 35 40 45 75 78

oj COMPARISON Of CONNECTEDNESS LEVELS (0<=0,25)


(0=25'1 _ _ __
-----~-----------
(0=50'1 _ _ __
- - - - - ------0-------- - - - - - - - - - - - - - - -
(0=75'1 _ _ __
- - - - - ------0-------- - - - - - - - -----
(0=100'1 _ _ __
- - - - - - ------0-------- - - - - - - - -----

Fig. 2. Comparison of four connectedness levels (Co), keeping a fixed at 0.25. Same data as
in Figure 1. Full horizontal lines: clusters of contiguous samples, with blanks representing
significant breaks in the series. Stars: singletons. From Legendre et al. (1985), Figure 3.

Chronological clustering, which was developed with reference to the problem of species
succession in ecosystems, could be applied to other problems where one hypothesizes sharp
breaks within the data series. Besides the examples in Legendre et al. (1985), the method has
been applied to a variety of other problems, that include the successional dynamics of bacteria
through time in sewage treatment lagoons (Legendre et al.1984), the study of fish communities in
a coral reef transect (Galzin and Legendre 1987) and of a stratigraphic sequence of fossil fish (Bell
and Legendre 1987).

TWO-DIMENSIONAL AND HIGHER SPATIAL RELATIONSHIPS

Often, the spatially distributed data of interest to the ecologist are not sampled from a
transect, but are spread across a surface or, in some instances, a volume. If the spatial
relationships among samples are to be taken into account during the clustering process, it is
important to define clearly what is meant by "contiguous samples".

If the data represent sub-units of the area under study, with these smaller surfaces
touching one another, then a simple and natural way is to define as contiguous two surfaces that
share a common border.
294

On the contrary, if the data can be seen as attached to points in space that are distant from
one another, then there are various ways of defIning the connection network among these points.

(a) The easiest way is to use the minimum spanning tree among points in geographic space. This
method is also the least effIcient in that it uses only a small fraction of the geographic information.

(b) Among the various types of connection networks, one that is often used is the Gabriel graph
(Gabriel and Sokal1969). In this graph, two points A and B are connected if no other point is
found inside the circle whose diameter is the line joining A and B; in other words, connect A and
B when 0 2AB < 0 2AC + 02BC for every triplet of points A, B, C under study.

(c) Another commonly used type of connection network is the Oelaunay triangulation. This is a
way of dividing the whole plane into triangles without crossing edges. The algorithm proposed
by Green and Sibson (1978) also allows the user to remove those long edges that form along the
perimeter of the surface as "border effects". A Gabriel graph is a subset of a Oelaunay
triangulation (Matula and SokalI980).

(d) When the points form a regular grid (or when the surface is divided into squares or
rectangles), it is a simple matter to connect them in 4 directions if they form a square lattice, or in
8 directions by adding diagonal edges. They could also be connected in 6 directions if they are
positioned in staggered rows.

These connecting schemes can be extended to three dimensions if the points come from a
volume of space, or if the volume is divided into regular or irregular blocks.

Using one or another of these connecting schemes, authors have constrained many of the
usual clustering algorithms: linkage clustering, UPGMA, minimum-variance method, hierarchical
binary division, and so on. Others used the geographic information a posteriori, selecting among
the set of possible partitions those that are consistent with the spatial constraints. Wartenberg
(manuscript) has reviewed these developments, that go back to Ray and Berry (1966).

Tests of various kinds have been developed, either as a part of constrained clustering
algorithms, or to assess the interest of the results.

(a) Howe (1979) used a test of the difference between the means of adjacent groups, during
pairwise agglomeration. In the same line of thought, Gabriel and Sokal (1969) developed a
signifIcance test of the homogeneity of a whole partition based on the sum of squares criterion.
Given what we know now about the influence of spatial autocorrelation on statistical tests, and
295

especially on analysis of variance (e.g., Cliff and Ord 1981, ch. 7), these tests should be used
with caution.

(b) Ray and Berry (1966) evaluated the various agglomeration levels by plotting the changes of the
within-group and the among-group variances as a function of the number of groups. Changes in
the slope of these curves indicate the best partition.

(c) Okabe (1981) developed an index for the difference between the constrained and the
unconstrained solution, that he tested for significance by randomization. His index is based on the
number of point displacements that are necessary to transform one solution into the other, but the
Jaccard or the Rand index (described below), or information measures such as Rajski's metric
(1961), could be used for the same purpose.
79 70 60 56
0987654321098765432109876

7777777 63
77777777 62
7777777777 8 61
jjjjjjjjjj 888 60
jjjjj666666666888 59
jjjj6666666666888 58
6666666666666688 57
66666666666666666888 56
Fig. 3. One of the maps from the 66666666666666666668333 55
constrained clustering study of Legendre and 999966666666666666633333 54
Legendre (1984). This map represents 999996666666666666444333 53
clustering level S = 0.70 of the 999999966666666444433333 52
proportional-link linkage agglomeration, with 999999966666aaaaaa333 51
connectedness of 50%. Each group of 99999996666aaaaaaaa 50
quadrats formed at this level is represented 119999999hhhaaaa 49
by a different letter or number. Longitude 11199999hhh 48
(W) and latitude (N) are shown outside the 1111111110 47
frame. 1111100 46

I will illustrate constrained clustering on a surface using results from our program
(BIOGEO), which is a constrained proportional-link linkage agglomerative algorithm that can
handle large data sets; this property comes from the fact that, in a constrained situation, the search
for the next pair to join is limited to adjacent groups only, as previously noted by Openshaw
(1974) and by Lebart (1978). The program can use either (a) points in a regular grid, or (b) a list
of connections obtained for instance from a Delaunay triangulation. It presents the advantage of
producing directly a series of maps, each corresponding to a clustering level, instead of the usual
dendrogram. These maps are drawn either for the regular grid, or using the X and Y coordinates
of the points. Figure 3 shows one such map, from a biogeographic study of freshwater fishes in
the Quebec peninsula (Legendre and Legendre 1984), based upon the presence/absence of 109
species in 289 units of territory. Figure 8 shows a pair of such maps for points positioned by
their X and Y coordinates.
296

When constrained clustering has been completed, distant groups could be tested a
posteriori to determine if recurrent group structures exist through space. See Cliff and Ord (1981)
for tests of the difference among means in the presence of spatial autocorrelation.

IS CONSTRAINED CLUSTERING NECESSARY?

The question has been raised, whether constrained clustering represents a methodological
advance. Could the same results be obtained without the constraint? A constraint is after all
difficult to imbed into computer programs. I would like to argue that if one assumes the existence
of an ecological process generating autocorrelation along the sampling axes (space or time), then
one is more likely to miss uncovering the corresponding ecological structure if the clustering is
carried out without constraint. This property of clustering algorithms will be demonstrated for
agglomerative methods; divisive or nonhierarchical methods would likely lead to the same result.

For the sake of clarity, let us limit this discussion to spatially autocorrelated phenomena,
although the results apply as well to autocorrelation along the time axis. In community ecology,
one can often hypothesize generating processes related either to the abiotic environment, or to
some form of contagious biological growth. If, for the scale of sampling under consideration, the
generating process has produced a gradient, the existence of such a gradient can be demonstrated
by spatial autocorrelation analysis (univariate autocorrelation analysis: Cliff and Ord 1981;
multivariate Mantel correlogram: Sokal et al. 1987), while the gradient itself can be described
adequately by ordination analysis (scaling). On the other hand, if the generating process has
produced locally homogeneous community structures within some larger area subjected to
sampling, then the description of these structures becomes a clustering problem. Since one is then
interested in forming connected clusters of objects, there is no question as to the appropriateness
of constrained clustering, since this is exactly what this family of methods does: it produces
clusters of spatially connected points. On the contrary, clustering without constraint would open
the door to clusters possibly formed by grouping objects whose apparent similarity is the result of
different mechanisms that converged to produce somewhat similar effects on the community
structure; these clusters would present a blurred picture, as noted by Monestiez (1978).
Wartenberg (manuscript) gives a similar example from the health sciences, where lung ailments
may be due to a variety of causes: occupational (Le., from coal mining), ambient (such as near
industrial areas), or personal habits (tobacco consumption), all of which can lead to light or
severe lung conditions; unconstrained clustering would group the samples by severity of cases
while spatially constrained clustering is more likely to delineate areas with similar types of causes.
297

The same rule applies to community ecology, where it is better to form the regional clusters fIrst,
and to fInd the relationship among clusters in a second step.

To demonstrate that constrained clustering is not only appropriate, but also necessary, we
will rely on Monte Carlo simulations. Analyzing known conditions will show that one is less
likely to get a meaningful answer after unconstrained clustering than if a constraint has been used,
in cases where a generating process has produced patchiness.

Five groups of equal size (30 objects each) have been generated by an autocorrelated
process with random components. To make them easier to picture, the groups are made to form
for the moment a one-dimensional array of 150 objects. Within each group, one of the objects is
selected at random to become the nucleus of the generating process giving rise to the group. A
value is given to each of these nuclei; this value is drawn from a normal random distribution with
mean 0 and variance VAR. The rest of each group is made to grow out of its nucleus by a
contagious process, that consists of giving to a point located at distance n from the nucleus, the
value of the point located at distance (n-l ), plus a N(O,I) random normal deviate. Such
autocorrelated Monte Carlo series have been generated with group nuclei variances VAR = 1, 5,
10, 15, 20, 25 and 30, as well as for the intermediate integer values of VAR between 1 and 10;
the amount of variance added at random to the contagious within-group growth process is kept
constant. The data sets, 150 objects long, are univariate; this should not affect the generality of
the conclusions. Spatial autocorrelation analysis was performed on these series to verify that the
data are indeed autocorrelated; signifIcant positive autocorrelation extended to about distance 20 in
each of these data series. Five of them are shown in Figure 4; the seed of the random number
generator was the same for all runs.

After computing a (150 x 150) Euclidean distance matrix among objects, ag- glomerative
clustering is performed using constrained as well as unconstrained clustering. Both of the
algorithms used are based upon proportional-link linkage clustering, and a connectedeness value
of 50% was used throughout for the sake of uniformity.

Since the "truth" is known from the generating process (fIve equal groups of 30 objects
each), it can be used to assess the effIciency of each clustering model. To achieve this, a (150 x
150) half-matrix is fIrst computed for any given partition level of the hierarchical classifIcation,
containing a "1" to describe two objects that are members of the same group at the said level, and
"0" otherwise. Another such half-matrix is built for the reference classifIcation of the objects into
five groups. Milligan (1983) recommends using both the Jaccard and the Rand index to compare
the two partitions:
298

.... ,, ..
.. Var-1
·. t·. . .·. ·:· ... ;. .
.
'

..
::
,

.
"I.
.it:
.. .
.. ..
... I I

.1 I • •'
"
30 1>" I'"
Var-5
,
"
",
i
"
HI

'.
., .
,.
"

•' .........
'"0::
.
30 . '" 120 1>0

, I ••

I"
"n
.. •
ouo" 1.1.

: ... 'I: Var=10


: ...~
1·:···.
.:"'........",t.:....... : .
, ,
..,
'0 .. . 120
I"
... .............. ,
.,
", Var-20

~. I . . . . .
..
..............''
..
IU"

,,".........
.................... ~

30 . '" 120 150

:...H....................
~ •• • , ~
Var-30

:.....
" .......... 10'
I I.. • . . . . . . .~

:..·..•······.·.·····..•..····1

................ I

I....1'
t..

I
• ••
,

60 90 120 150

Sample No.

Fig. 4. Five autocorrelated Monte Carlo series, generated with different values (VAR) of group
nuclei variance. Ordinate: the value attributed to each sample along the series. The seed of the
random number generator was the same for these five runs. Group breaks are materialized by
dashes.
299

Jaccard=al(a+ b+ C) Rand =(a+d)1 (a+ b+ c+d)

where a, b , c , and d are the frequencies of


the (2 X 2) contingency table comparing the
Second matrix
two half-matrices. Since we are dealing with
hierarchical clustering, there are many levels
~
'1::
1 0
~

of partition; the partition was selected that ttl 1 a b


S
maximizes the relationship between the ~
rJ)
r...
computed classification and the "truth", for
~ 0 c d
each of the indices (Jaccard and Rand).

The results obtained with the Jaccard index are clear (Fig. 5). For any amount of variance
among group nuclei, constrained clustering recovers more of the original classification's
information than does unconstrained clustering.

The results obtained with the Rand index are the same, although the Rand criterion, at low
VAR values (VAR ~ 5) and only in the unconstrained case, regularly picked out as optimal
partition levels where very few points had been clustered, all the others being treated as singletons
(one-member clusters). The Rand index could pick out these partition levels because the quantity
to be maximized involves d, the number of pairs pertaining to unlike groups in both
classifications.

These simulations lead to the conclusion that one should always use constrained
clustering, when working under the assumption that the phenomenon under study is spatially (or
temporally) autocorrelated.

VERIFYING THE ASSUMPTION OF PATCHINESS

What if one uses constrained clustering while there is no spatial structure, despite the
assumptions to that effect? Of course, one could have ascertained first that there is a patchy
spatial structure, by spatial autocorrelation analysis (Sokal and Thomson 1987). Spatial
correlograms, however, can only recognize patchiness when it is somewhat regular; they
may fail to give a significant answer if the patches are greatly variable in size. So, constrained
clustering may be needed even if spatial autocorrelation analysis has not demonstrated the
existence of regular patches. Can we use the results of the clustering itself to tell us whether the
patches obtained with constraint are real entities ?
300

1.00 2 2 2 3 3

0.75

)(
W
Q
Z

A Q
II:
0.50
-<
0
0
.,-<
0.25

0.00
5 10 15 20 25 30

1.00 2 2 2 3 3
,e{II\Q
e 4 G\Il·
,{ell'
Gol\·
2
0.95 8 0 i
0 0

)(
W


Q

B -z 6
Q
Z
0.90
0
0

-<
II:

0.85 2

0.80~~~---;~-------r--------.--------.--------.--------.~
5 10 15 20 25 30
GROUP NUCLEI VARIANCE (VAR)

Fig. S. Fraction of the group structure information recovered using constrained (open circles) and
unconstrained (closed circles) clustering, according to (A) the Jaccard index, and (B) the Rand
index, for groups generated with various amounts of variance among group nuclei (abscissa).
301

This can indeed be done. Let us compare what should happen during constrained
clustering, in the absence or in the presence of patchiness. Let us consider fIrst an example where
the values to be clustered are the result of a strictly random process. In that case, the probability
that two neighbors (groups, or single objects) will be the next most similar pair is equal among
pairs of neighbors, and its value is (l/number of possible pairs) in ideal cases. It varies with
group size in the case of space-contracting (like single linkage) or space-dilating (like complete

f.
It It *fl
* If *

...•...
35 ff**
* fff
If * ** ft
ftf
• *f 1* * *f
* **
30 • f

••••
25
... •••
...
-
10

.
~

CD

... .•..
~ 20

-u
o 15
...••
.. •
..
Z
o
.

..
10

.. •
.
5
Decrease = 40 steps

30 60 90 120

Clustering steps

30
.*t* *
f*
it
**t
* **11" ++
-I
** ** f*
tft

25 -I .,,* tf. * *
*f

-
10

.
..•..
~

CD 20
I I)
:::J

-
U
o
15

.
..
o 10
Z •
5

30 60
..
15 steps
•• 90 120

Clustering steps

Fig. 6. Spatially autocorrelated data, from Figure 4 (V AR 10), produce a longer zone of =
decrease (top) than 150 random points (bottom panel). The ordinate of each graph represents the
number of groups, other than single-object clusters, that are present at the corresponding level
(abscissa).
302

linkage) clustering methods (Lance and Williams 1967); this point deserves further investigation.
In any case, one expects the random agglomeration mechanism to produce at fIrst a large number
of small patches, that grow according to some random model, while near the end of the clustering
process, we can expect the quick formation of very large patches (within a few clustering steps),
before the fInal formation of a single group. If there is a spatially autocorrelated structure, the
beginning of the agglomeration should follow essentially the same pattern, since the points that
cluster correspond at first to random within-group variations; near the end of the agglomerative
process, the differences among groups should translate into extra steps in the larger distance
classes, contrary to the no-structure case.

Actual experiments show that this is indeed what happens (Fig. 6 and 7). When the data
series is one-dimensional (circles in Fig. 7), the difference in length of the zone of decline is very
large at all values of connectedness, from 1% to 100%, used in the proportional-link linkage
agglomeration. When the series are made to form a two-dimensional grid of 5 lines and 30

1.0 (tx150) (5x30)

Autocorrel.ted .erle. °° 0
Random numb... • -

0'_;-_:2: ,,--0............... -=0


=-----e
_g=
~
e~
__
-
0_ 0-

~~-------
----_iC"i
~-~-~=---~-----
~ .. _ _ _ _ _ _ _ e
o.o~----------~~~-----------.-------------,-------------.
25 50 75 100

1.0
-OJ------c9------ CP - - - - - - CD

-----------------------------------

o.o~------------~~------------~------------~--------------,
25 50 75 100
% CONNECTEDNESS

Fig. 7. Length of the zone of decrease, as a function of the connectedness (Co) used during
linkage agglomeration, for autocorrelated series (150 points) and for random numbers (150
points). The zone of decrease is measured (A) as a proportion of the total number of steps, or (B)
as a fraction of the range of distances where the number of groups decreases, over the total range
of distances where agglomeration took place.
303

columns (chosen to agree with the autocorrelated group structure that we created), the difference is
not as large, but it is still significant (sign test). In the lower panel of Figure 7, the ordinate value
0.85 seems to form a line separting the two processes; further statistical investigation of this
property is obviously needed, either by Monte Carlo methods, or by studying the theoretical
distribution of these statistics for constrained group formation.

ECOLOGICAL CLUSTERING WITH CONSTRAINT


OTHERTHANSPACEOR~

One step further up the scale of abstraction consists of using constrained clustering to test
the hypothesis that a variable or a set of multivariate data forms clusters that are autocorrelated in
some other space than geography or time.

An example is given by a set of 99 forest sampling stations studied by P. Drapeau (CREM,


Universite de Montreal) in the "Municipalite Regionale de Comte du Haut Saint-Laurent", a few
kilometers north of the Canada - U.S.A. border in western Quebec. A relationship is sought
between vegetation composition and edaphic conditions. The hypothesis to be tested is that
vegetation is similar under related edaphic conditions. (1) Since the edaphic variables are of
mixed types (geomorphology: qualitative; stoniness, soil texture, drainage, topography:
semi-quantitative; slope: quantitative; orientation: quantitative circular), a similarity matrix among
samples was first computed using the Estabrook and Rogers (1966) coefficient, that can combine
data of these various types in a single measure of resemblance. (2) Principal coordinates were
computed from this matrix and the first two principal coordinates were taken as an approximation
of the edaphic space. (3) From the coordinates of the samples in that space, sample points were
interconnected using a Delaunay triangulation (see above). The list of edges from this
triangulation, derived only from edaphic data, provided the constraints fed into the
space-constrained clustering program. (4) A Steinhaus (Motyka 1947; or Bray and Curtis 1957)
similarity matrix was computed from another set of data consisting of the abundance of 28 species
of trees in each quadrat; edaphic data were not used in these computations. (5) Proportional-link
linkage clustering was performed with 50% connectedness from the vegetation similarity matrix,
using as constraints the list of edges obtained above from the edaphic space. As a consequence,
only sampling stations that are related in edaphic space were allowed to cluster, if the vegetation
data allowed. Two of the clustering steps are represented in Figure 8, mapped onto the first two
coordinates of the edaphic space. Botanists could then go back to the raw data and determine
what group of species corresponds to each cluster in edaphic space.
304

S = 0.32323
8 GRClJPS
o
o 0 0
o 0
o
D 00
D
o 0
o 0
o 0
o o 00
o

E
o E
E
QO E E
Q c: g QE
OF

o 0
o F
o 0

OF

S = 0.2f{£7
4 GRClJPS
...
..••..••••.
.• ..•
..•·
. ••.
A'" A A
II
.F
A A A"

• F

•F

Fig. 8. Two of the steps during constrained agglomerative clustering of the forest vegetation
data. Each step is represented by a map whose abscissa is principal coordinate I and ordinate is
principal coordinate II of the edaphic space. The clustering similarity level is shown on each map.
Each group of sampling stations is represented by a different letter (without order).
305

One should wonder frrst if the relationship is real between community structure and the
edaphic space that we have constructed by principal coordinate analysis; studying the length of the
zone of decrease of the number of clusters shows that the decrease occupies 0.390 of the total
number of steps, and 0.442 along the distance scale; these figures fall in the "random numbers"
zone of Figure 7, for a connectedness of 50%. So, instead of pursuing the interpretation of these
results, one should conclude that the tree community structure data do not lead to significant
clusters in the edaphic space, given the way it was created with the data and by the method
described above.

CONCLUSION

Our experience with clustering methods that impose a constraint of contiguity through
space or time is that the results obtained through a wide range of clustering methods -- linkage
clustering, from single to complete linkage -- are much more similar to one another than without
the constraint. This is because constraining the clustering process also constrains the set of
solutions, eliminating a number of solutions that are compatible with the resemblance matrix, but
that do not make much sense in view of the spatial or temporal relationships existing among the
samples under study.

From the descriptive point of view, constrained clustering is one of the few ways available
for synthetically representing multivariate data onto a map. With many ecological problems, this
type of mapping is far more interesting than separate maps of the variables forming the
multivariate data set. On the other hand, theories about the importance of dispersal routes for
individual species or for whole biotic communities could be tested by comparing the unconstrained
to the space-constrained classification of sites; many other hypotheses of contagiousness of
ecological processes through space or time could be tested in the same way.

A number of constrained clustering programs have been written and are available to other
users. This is the case at least with the present author's programs used in the examples presented
above, as well as the program of Lebart (1978, for two-dimensional constraint), whose paper
includes the program listing. De Soete et a/. (1987) present algorithms for deriving constrained
classifications in a more general context than that of the present paper, and they review the
psychometric literature on the subject.

Geographic information can also be used with unconstrained clustering programs. If A is


the ecological distance matrix among objects, build a matrix B ("penalty matrix") containing
spatial information (either geographic distances, or 0 = connected and I = unconnected objects).
306

Compute C = A + w B, where w is a scalar weight. Cluster C for different values of w and pick
the result with the smallest w where all clusters are internally contiguous. This method can also be
used to obtain constrained ordinations.

In the future, constrained clustering programs, if they are agglomerative, should be made
to include some measure of the information content of the various clustering levels, and also
perhaps a measure of "patchiness" such as the one developed in one of the previous sections.
Since clustering with constraint includes, in the data analysis process, some a priori knowledge
that is pertinent to many of the theories the ecologists are dealing with, it may be viewed by these
same ecologists as an interesting method both for descriptive purposes and for hypothesis
testing.

REFERENCES

Bell, M. A, and P. Legendre. 1987. Multicharacter chronological clustering in a sequence of


fossil sticklebacks. Syst. Zool. 36: (in press).
Bray, R J., and J. T. Curtis. 1957. An ordination of the upland forest communities of southern
Wisconsin. Ecol. Monogr. 27: 325-349.
Cliff, A D., and J. K. Ord. 1981. Spatial processes: models and applications. Pion Limited,
London. 266 p.
De Soete, G., J.D. Carroll, and W.S. DeSarbo. 1987. Least squares algorithms for constructing
constrained ultrametric and additive tree representations of symmetric proximity data. J.
Class. (in press).
Estabrook, G. F., and D. J. Rogers. 1966. A general method of taxonomic description for a
computed similarity measure. BioScience 16: 789-793.
Fisher, W. D. 1958. On grouping for maximum homogeneity. J. Amer. Stat. Ass. 53: 789-798.
Gabriel, K. R, and R R. Sokal. 1969. A new statistical approach to geographic variation
analysis. Syst. Zool. 18: 259-278.
Galzin, R, and P. Legendre. 1987. The fish communities of a coral reef transect. Pac. Sci. (in
press).
Gordon, A D. 1973. Classification in the presence of constraints. Biometrics 29: 821-827.
Gordon, A D., and H. J. B. Birks. 1972. Numerical methods in Quaternary palaeoecology. I.
Zonation of pollen diagrams. New Phytol. 71: 961-979.
Gordon, A. D., and H. J. B. Birks. 1974. Numerical methods in Quaternary palaeoecology. TI.
Comparison of pollen diagrams. New Phytol. 73: 221-249.
Green, P. J., and R Sibson. 1978. Computing Dirichlet tessellations in the plane. Computer J.
21: 168-173.
Hawkins, D. M., and D. F. Merriam. 1973. Optimal zonation of digitized sequential data. J. Int.
Assoc. Math. Geology 5: 389-395.
Hawkins, D. M., and D. F. Merriam. 1974. Zonation of multivariate sequences of digitized
geologic data. J. Int. Assoc. Math. Geology 6: 263-269.
Howe, S. E. 1979. Estimating regions and clustering spatial data: analysis and implementation of
methods using Voronoi diagrams. Ph. D. Thesis, Department of Mathematics, Brown
University.
Ibanez, F. 1984. Sur la segmentation des series chronologiques planctoniques multivariables.
Oceano!. Acta 7: 481-491.
Lance, G. N., and W. T. Williams. 1967. A general theory of classificatory sorting strategies. I.
Hierarchical systems. Computer J. 9: 373-380.
307

Lebart, L. 1978. Programme d'agregation avec contraintes (c. A. H. contiguHe). Cah. Anal.
Donnees 3: 275-287.
Lefkovitch, L. P. 1987. Species associations and conditional clustering: clustering with or
without pairwise resemblances. This volume.
Legendre, P., B. Baleux, and M. Troussellier. 1984. Dynamics of pollution-indicator and
heterotrophic bacteria in sewage treatment lagoons. Appl. Environ. Microbiol. 48: 586-593.
Legendre, P., S. Dallot, and L. Legendre. 1985. Succession of species within a community:
chronological clustering, with applications to marine and freshwater zooplankton. Am. Nat.
125: 257-288.
Legendre, P., and V. Legendre. 1984. Postglacial dispersal of freshwater fishes in the Quebec
peninsula. Can. J. Fish. Aquat. Sci. 41: 1781-1802.
Mantel, N. 1967. The detection of disease clustering and a generalized regression approach.
Cancer Res. 27: 209-220.
Matula, D. W., and R. R. Sokal. 1980. Properties of Gabriel graphs relevant to geographic
variation research and the clustering of points in the plane. Geogr. Anal. 12: 205-222.
Milligan, G. W. 1983. Characteristics of four external criterion measures, p. 167-173. In J.
Felsenstein [ed.] Numerical taxonomy. NATO Advanced Study Institute Series G
(Ecological Sciences), No.1. Springer-Verlag, Berlin.
Monestiez, P. 1978. Methodes de classification automatique sous contraintes spatiales, p.
367-379. In J. M. Legay and R Tomassone [ed.] Biometrie et ecologie. Societe fran~aise de
Biometrie, Paris.
Motyka, J. 1947.0 zadaniach i metodach badan geobotanicznych. Sur les buts et les methodes
des recherches geobotaniques. Ann. Univ. Mariae Curie-Sklodowska Sect C, Suppl. I. viii
+ 168 p.
Okabe, A. 1981. Statistical analysis of the pattern similarity between 2 sets of regional clusters.
Environment and Planning A 13: 547-562.
Openshaw, S. 1974. A regionalisation program for large data sets. Computer Appl. 3-4:
136-160.
Rajski, C. 1961. Entropy and metric space, p. 44-45. In C. Cherry [ed.] Information theory.
Butterworths, London.
Ray, D. M., and B. J. L. Berry. 1966. Multivariate socioeconomic regionalization: a pilot study
in central Canada, p. 75-130. In S. Ostry and T. Rymes [ed.] Papers on regional statistical
studies. Univ. of Toronto Press, Toronto.
Sokal, R R, N. L. Oden, and J. S. F. Barker. 1987. Spatial structure in Drosophila buzzatii
populations: simple and directional spatial autocorrelation. Am. Nat. 129: 122-142.
Sokal, R R, and J. D. Thomson. 1987. Applications of spatial autocorrelation in ecology. This
volume.
Ward, J. H. Jr. 1963. Hierarchical grouping to optimize an objective function. J. Amer. Stat.
Assoc. 58: 236-244.
Wartenberg, D. E. Regional analysis: describing multivariate data distributions using geographic
information. Manuscript (cited with permission of the author).
Webster, R 1973. Automatic soil-boundary location from transect data. J. Int. Assoc. Math.
Geology 5: 27-37.
SPECIES ASSOCIATIONS AND CONDITIONAL CLUSTERING:
CLUSTERING WITH OR WITHOUT PAIRWISE RESEMBLANCES

L.P. Lefkovitch
Engineering and Statistical Research Centre
Agriculture Canada
Ottawa, Ontario, Canada KIA OC6

Abstract - Traditional procedures for clustering objects consist of two


steps measuring pairwise resemblance based on the attributes, and a
clustering algorithm. The use of pairwise resemblances can be avoided; a
set of objects can be represented as a set of lists of attribute states; an
application of the Laplace indifference principle then allows an estimate to
be made of the probability of each list as representative of an association
of objects. By use of set-covering procedures, the object associations
having maximum joint probability are found. The procedure is generalized to
multistate unordered and ordered attributes, to frequencies, and to directly
obtained relational data.

I - INTRODUCTION

One of the principal objectives of clustering in biology is to reveal


(this is the operative verb; it is not to form) hitherto unknown natural
groups; this impossible task in practice is replaced by the alternative of
recognising those groups of objects which are more alike each other than
they are to the members of other groups, and then to use these groups as
hypotheses for further investigation. A common procedure is first to obtain
a measure of pairwise affinity (similarity, dissimilarity, etc.) and then to
use one or several favoured clustering algorithms to obtain a modest number
of groups. Most algorithms for this purpose are such that a partition into
at least two subgroups is necessarily formed, but permitting groups to
overlap may remedy some of the consequences of approximations arising from
the assumptions leading to the clustering procedure (Lefkovitch 1982).
These approximations arise from the fact that the local clustering criterion
and global objective (if there is one) are usually chosen for mathematical
convenience (e.g. minimum distance, maximum between group variance) rather
than being derived from biological considerations.
This paper, or at least that part of it not using similarity
coefficients, attempts to remain close to the biological problem; in
particular, the objective is to recognize which of a set of lists of species

NATO ASI Series, Vol. G14


Developments in Numerical Ecology
Edited by P. and L. Legendre
© Springer-Verlag Berlin Heidelberg 1987
310

found in various localities represent recurrent species associations. No


statistical distributions are assumed, and the only principle involved
(beyond that of the logic of implication) is that of maximum entropy, which
is applied to the incidence table derived from the lists of species. This
objective is shown to be equivalent to a special case of mathematical
programming. There are two main components in a program, namely, the
constraints and the objective function. This paper shows that both can be
obtained from the logical interrelationships among the species and lists of
the incidence table, and avoids one of several soft areas in clustering, in
particular, the definition and estimation of similarity. It also avoids
problems associated with the first phase of conditional clustering
(Lefkovitch 1982), which is the generation of the incidence table from
similarities. The second phase, which is the measurement of the relative
importance of each subset of the objects, and the third, obtaining the
solution by use of these values, together with the incidence table in a
least-cost set-covering mathematical program, are unchanged.
This paper is organized in seven sections. This section provides the
context; section II discusses the incidence table, including methods for
binary and non-binary data, section III considers the measurement of the
relative importance of each subset, section IV describes the objective
function and some solution procedures for least-cost set covering, section V
considers a possible way to formulate and test hypotheses for the loss of
information as a result of clustering, and for comparing different
solutions, section VI gives some numerical examples, while section VII
places the procedure in a more general setting. The material in sections
II (a), the part of II (b) dealing with the relative neighbourhood graph, V
and some of VII is new, while the remaining sections are drawn from
previously published material. Readers of this paper whose interests are
primarily in ecology may prefer to omit II(b), II(c), III(b), IV(c) and V on
a first reading. Upper case letters refer to matrices or sets, lower case
to column vectors or scalars; the context should avoid ambiguity.

II - THE CONSTRAINTS

Suppose a set of n objects is described by the states shown by m binary


(presence/absence) data; these data can be assembled into a n x m incidence
table in which each object is represented by a distinct row, each attribute
311

by a distinct column, presence by unity and absence by zero (Table 1a). At


the end of the clustering, the groups are represented either as a table
resembling the starting point, since the groupings of the objects also form a
binary incidence table, or as a dendrogram, from which such a table can be
derived. The new table, not surprisingly, can often be recognised as being a
subset of the original table, but the columns now represent clusters. In the
present ecological application, an attribute is equated with a site (location,
quadrat etc.), an object is one of the taxa under consideration, and the final
columns represent the conjectured species associations. For convenience, the
arguments will be presented in terms of the clustering of objects (i.e.
species), but they apply equally to obtaining associations among attributes
(i.e. sites).
II(a) Without similarity coefficients. I f A = taij}, i=l ••• n, j=l. •• m is
the incidence matrix corresponding to attribute presence (see table 1a), where
n is the number of objects and m is the number of attributes, and if Xj = 1
implies the choice of column j of A as an object association, and Xj = 0
that i t is not chosen, then the fact that each object lIl1st be located in at
least one of the associations is the proof of the following lemma.

Lemma 1: To be a covering of the n objects, the vector x lIl1st satisfy the


constraint Ax ~ 1 where 1 is a vector of n elements all equal to unity.

The constraints do not require the solution to be a partition, but neither do


they disqualify one.
Any clustering procedure depends on the choice of empirical data, and so
some preliminary thought and decisions are necessary prior to any attempt at
grouping. For example, a species which is found in every (almost every?) site
gives no information on the associations, and can therefore be eliminated from
further arithmetic (compare this with the circumstances in numerical taxonomy
of an invariant attribute). Similarly, a site which contains all species
gives no information on the species associations, and can therefore also be
eliminated (perhaps it is too heterogeneous a site, and should be
subdivided). Some consideration should be given to the information about
associations given by those species which occur very infrequently (e.g. one
individual in one site); there is a strong case for eliminating such species.
Similarly, those sites which contain just one species can be put on one side.
These decisions, however, should not be made for computational reasons but
based on the experience and intuition of the ecologist or taxonomist.
312

II(b) Non-binary data. In addition to presence/absence attributes


(dichotomies sensu Gower 1971; I-state attributes sensu Lefkovitch 1976),
there are four other main classes which need to be considered in forming A.
One of several possible coding schemes is now described; others are considered
by de Leeuw (this volume)
1. Mu1tistate unordered attributes (Gower 1971); unordered s-state
attributes (Lefkovitch 1976).
Let the rumber of distinct states for at.tribute j be Sj; then Sj columns
are introduced into A, one for each state, defined as
I if object i shows state k for attribute j
{
o otherwise
where r indicates the last column generated by attribute j - 1.
Clearly, for Sj = 1, this is identical with presence/absence data; if
Sj = 2, the attribute is called an alternative by Gower (1971).
2. Mu1tistate ordered discrete attributes (Gower 1971); ordered s-state
attributes (Lefkov!tch 1976), s < 00.

Typically, let Sj denote the rumber of distinct states shown by an


attribute and let them be numbered from 1. •• Sj; these data are assumed not
to be a categorized grouping of a continuous measurement.
(a) If the direction of the ordering is clear i.e. there is a well defined
lower bound, and also that if an individual shows state k > 1, it also shows
states 1. .. k-l (e.g. if it has 10 hairs, it has 1,2, ••• 9 hairs), then Sj
columns are formed for which
1 t = 1 ••• k if object i shows state k for attribute j;
= {
ai, r+t 0
, t=k+l ••• sj otherwise.
(b) If the direction of the ordering is not clear i.e. it makes equal sense
to rumber the states in either direction, then 2s j - 1 columns are generated
in which the first Sj columns are as in 2a, while the remaining Sj are the
complements of the first Sj.
If any pairs of columns are duplicates for all n objects for attribute j,
then only one should be retained (a duplicate column would arise, for example,
if one or more states were not represented in the system).
3. Continuous attributes (Gower 1971); ordered s-state attributes, s=oo
(Lefkovitch 1976).
There does not seem to be a natural way to convert these into the binary form
required for A, and so the following is proposed:
for all n objects, obtain a quantile plot, and if there is evidence of a
313

step function, categorize the data for each inter-step class; if there is
no evidence of steps i.e. the data seem not to exhibit po1ymodality,
there is good reason to exclude this attribute from consideration.
Assume that attribute j has been so categorized; then the procedures in 2a or
2b can be used, as appropriate.
4. Frequency data. In ecology, empirical data sometimes consist of the
proportion of either some fixed number of samples or of the total flora or
fauna for each of n species at each of m sites (Table 2a). One possibility
for such data is to choose some threshold value, e.g. 0.5, and define the A
matrix accordingly. This arbitary choice can be avoided, however, by a simple
extension of the binary data model as follows. Define the matrix B to consist
of the probabilities of occurrence of each species in each site, and let these
be estimated by the proportions. It is clear that A can be regarded as a
special case of B in which the probabilities are either 0 or 1.
I I (c) With similarity coefficients. Relational data are often obtained in
psychometric contexts, in antibody/antigen studies, in crossing experiments,
and are often estimated from attribute data by use of some measure of
similarity (see Gower and Legendre 1986, for a recent review). Without loss
of generality, it is assumed that the pairwise relationships have been
converted to dissimilarities (which need not be a metric). The objective of
this section is to summarize the procedure given in Lefkovitch (1982) to form
an A matrix, which is essentially the first phase of conditional clustering.
Its motivation is the question: if a particular subset of objects is
postulated, what other objects should be included? The requirement is that the
answer should satisfy two conditions. First, if the postulated objects
consist only of the pair with dissimilarity zero, then no others should be
included unless they also have dissimilarity zero with the pair and each
other; and, second, if the postulated objects include the pair of maximum
dissimilarity, then all objects should be included. If the maximum
dissimilarity in a subset is equated to the interval between the lower and
upper characteristic values of extreme value theory, the following procedure
generates a family of subsets of interest.
Let E be the adjacency matrix of the relative neighbourhood graph
(Toussaint 1980) of the objects based on the disSimilarities, D = {d ij },

i.e.

= I
01 ' d i j $ max (d ik' d jk ), V k "* i, j; i "* j;

otherwise.
314

Since each of the subsets is generated in stages, let St be a subset of the


objects at stage t, 0t the maximum dissimilarity among the members of St,
and ~ the average dissimilarity between object k and the members of St.
The following four steps summarize the procedure:
Step 1 (initialization); A = rJ; arrange the edges of the relative neighbour-
hood graph in ascending order of length;
Step 2 (next edge); let i and j be the determining vertices of the next edge
in the relative neighbourhood graph; set
t = 0, So = {i , j }, S 1 = rJ, 00 = d i f
Step 3 (extension); for k=1. •• n, i f d k ~ 0t then St+1 = St+1 U k.

Step 4 (testing); i f St+1"* St then t=t+1, 0t = max(dijli,j £ St)' St+1= C/J,

go to step 3
else if S i A include S in A,
t t
go to step 2.
The heuristics described in Lefkovitch (1982) to restrict the number of
initial subsets which need be considered without changing the optimal
covering solution can be shown to be unnecessary since they are dominated by
the pairs adjacent on the relative neighbourhood graph. This graph has O(n)
edges (a sometimes achieved sharp lower bound is n - 1, since the minimum
spanning tree is a subgraph of E; an upper bound has empirically been found to
be less than 3.5n in random graphs (Lefkovitch 1984) and appreciably less in
those with obvious groupings). The generation of subsets from each of the
edges is 0(n 2 ), generating the graph requires arithmetic of 0(n3) and so
the subset generation phase is 0(n 3 ).
The number of initial pairs may be further constrained if there are other
known relationships among the objects. For example, given the geographical
distribution of the objects, the initial pairs can be confined to those which
are adjacent on the (geographical) Gabriel graph or Dirichlet tesselation,
with the condition that the candidates for inclusion must form a connected
subgraph with the current members (Lefkovitch 1980), even though the primary
decisions are based on the dissimilarities. In the special case that the
objects form a linear sequence (e.g. a pollen core, a line transect), the
number of initial pairs is precisely n - 1. Some other classes of constraints
are considered by Legendre (this volume).
315

III - THE RELATIVE IMPORTANCE OF THE SUBSETS

III (a) The information available is contained within A; this will now be
exploited with as few assumptions as possible. A represents a set of
predicates, each of which is either true or false, about each object (e.g.
object i shows presence for attribute j), about each attribute (e.g. the
jth column shows presence for object i), and compound predicates about the
objects, the attributes and the object/attribute combinations. All these
predicates constitute the evidence, and it is propositions of the form "the
objects showing presence in attribute j represent an association of interest"
which are being considered to determine a minimal number of recurrent
associations; thus a measure of the extent by which the evidence supports
these assertions is sought. If the sole evidence were to be that there are m
attributes (1.e. without knowledge of the elements of A), this necessarily
leads to a statement that the evidence in support of the jth attribute being
in the optimal solution is equal to that of any other. After evaluation of
the predicates in A, the evidence may suggest otherwise; for example, if
column j' consists entirely of unities, then the evidence is overwhelming that
the whole set of objects can and do show the same attribute state, and that
perhaps the most reasonable course is to define just one association, so that
the evidence in favour of the remaining attributes drops to zero. This
extreme example shows that the evaluation of the evidence in A may lead to
unequal degrees of support for the attributes as potential candidates for
definition of an association. If the degree of support is assigned a
numerical non-negative value, for which zero indicates certainty that the
attribute does ~ participate in the optimal object solution, and if complete
support is assigned a value of unity, then these (posterior) degrees of
support have the basic formal properties of a finitely additive probability,
and are logical probabilities in the Carnap sense (Fine 1973). Informally,
therefore, such a probability represents the degree of support for a
hypothesis (e.g. the objects showing presence for attribute j are an
association in the optimal solution) given the evidence in the set of
predicates explicit and implicit in A.
With the interpretation of probability just given

Theorem 2: the m-element vector, p, of probabilities of participation in an


optimal solution is given by the principal column eigenvector of

ATA*, where {a;j} = 11 - aijI.


316

The proof of this theorem, given informally by Lefkovi tch (1982) and more
formally by Lefkovitch (1985), depends on two components, first, that the set
of all coverings of the objects is a sigma-algebra on the columns of A, and,
second, on the equivalence in information of the complementary dual problem
(that of set representation in A*), namely to determine the relative
importance, qI. of object i as an indicator of which subsets are in the
optimal covering. (In determining the probabilities, rather than forming

ATA* , a two pass iteration using sparse matrix procedures is to be preferred,

especially since the elements of A and A*, being either 0 or I, imply that
additions and subtractions can replace multiplications (see Appendix 1».
III (b) It is rare that the states of all attributes are known for all
objects, so that it is not always possible to specify that aij = 1 or that
aij = 0, because data are missing. In these circumstances, there are
potential difficulties in obtaining the probabilities and the constraints on
the covering solution. While it is possible to exclude objects or attributes
to obtain a partial solution, the following proposal omits reference to the
missing elements in obtaining both p and q. Let K be the number of elements
of A equal to unity or which are missing values, and let IL(L),L=I ... K be
their row indices, JL(L),L=I ••• K their column indices, where JL(L) is positive
if aij is unity, and is negative if it is missing. In the Fortran
subroutine given in Appendix I, it is apparent that the missing elements are
omitted in both passes of the iterative procedure, but that excluding
these elements from IL and JL would equate missing values with absence, which
is clearly incorrect.
For frequency data (see above), the arguments leading to finding p from B
are identical with those of A, and lead to the following extension of the
theorem in Lefkovitch (1985).

Theorem 3: the probabilities of participation in an optimal set covering are

given by the Perron-Frobenius column eigenvector of BTB* , where B = {I - b ij }.

The constraint matrix for the least cost set covering is then given by
I, b ij > t
a ij = {
0, b ij ~ t

where t ~ 0 is a threshold value.


317

IV - THE OBJECTIVE FUNCTION AND SOLUTION

IV (a) Having obtained the probabilities, all that remains is to use them to
make a final choice of attributes, and to interpret the solution obtained.
Any subset of the columns of A can be indicated by the binary vector x, and
evaluated as a conjectured covering. If x fails to satisfy the constraint
Ax ~ 1, it is immediately disqualified by lemma 1; if it satisfies the
constraint, the joint probability of the chosen subsets given the hypothesis
represented by x is clearly x l1 P j • Clearly, x chosen to satisfy the
j
constraints and to maximize the joint probability is an optimal solution.
Formally, this problem is equivalent to least-cost set-covering; if Cj is
the "cost" of including subset j, which here is defined as -log Pj' the
optimal choice is given by the vector x for which

( cTx I Ax~ 1, Xj e: {O,l} )

is a minimum. It is not difficult to show that the optimal solution is


irredundant; it is less obvious, however, that it is also a minimum covering.
The non-empty columns of A(diag x) give the associations.
Let x now represent the optimal covering solution; every element of the
n-e1ement vector y = Ax is a strictly positive integer giving the number of
associations to which each species belongs. It follows that those objects for
which Yi = 1 belong to one and only one association, and can be regarded as
characterising it, while those objects for which Yi > 1 are more
ubiquitous. It dOE:s not necessarily follow that the characteristic objects
are always present together or that they never occur together with the
characteristic objects of another association, or that the objects in common
are always found.
IV(b) Linear least-cost set-covering problems have worst case arithmetic of
O(exp(m» for an exact solution. Fortunately, very small problems can be
solved by exhaustion, and others can often be simplified. One obvious
simplification arises from the fact that duplicate subsets in A will have
identical probabilities, and so only one of these need be considered. These
special circumstances are of lesser interest, however, than the use of a set
of rules, which usually reduce the size of the problem, and often provides the
complete solution. These reduction rules are based on some simple
propositions derived from A:
(1) Consider distinct objects, i and j; if object i belongs only in those
subsets to which object j belongs (object j may belong to others), any subset
318

which contains object i also contains object j. This allows row j to be


deleted from A.
(2) If a row of A contains precisely a single unity, its column is mandatory
in the covering. The corresponding element of x is set to unity, and its row
and column deleted from A.
(3) If a column of A is emptied as a result of these rules, it is deleted,
and the corresponding element of x set to zero.
(4) If column k of A is a subset of one or several others but has a greater
cost, xk is set to zero and the column deleted. Any row emptied by this
rule, combined with the others, is deleted.
These four rules, which are repeated in any sequence (the application of
one rule may permit others to become available) until no further reductions
are possible, form a Church-Rosser reduction system. The optimal solution to
the reduced problem, coupled with the unit elements set during the reduction
process, is an optimal solution to the original problem (Garfinkel and
Nemhauser 1972). Table la, which has 10 species in 25 sites, is reduced by
rules 1-3 to 4 representative species in 7 sites, given in Table lb, and with
rule A to the two subsets in table lc.
IV(c) Algorithms. If A is emptied by the reductions, the optimal solution is
given by the elements of x which are unity. For those problems which remain
after reductions, there are three possibilities: to attempt to use an exact
procedure (for the present class of problems, a cutting plane algorithm,
coupled with a linear programming relaxation of the dual program, appears not
to require excessive amounts of computer time), or a heuristic procedure (that
described by Chvat~l (1979) often gives the optimal solution and requires
O(nm) arithmetic), or to use the simulated annealing algorithm (Lundy 1985),
(Fortran code is given in Appendix 2), which obtains the optimal solution with
probability unity. It is doubtful i f the quality of the empirical data is
such to necessitate heroic efforts to find the optimal solution.
To obtain the optimal set covering i f there are randomly missing data,
then with the probabilities obtained as described above, two least-cost set
covering problems can be solved; first, replace missing values by unity in the
constraints, and second, replace them by zero. Since the missing elements
have played no role in obtaining the probabilities, the two solutions will be
very similar, and have proved to be identical in all cases considered. A more
complicated procedure assumes that the indifference principle is true and
replaces the missing data by a value of 0.5; thus the array A is such that
aij <: { O,~, 1 }. This leads to an integer programming problem which can be
319

summarised as
mine -I:Xj log Pj I Ax> 0, Xj E iO,l} )
and requires a different solution procedure. Bounds on the solution to this
problem are given by the solutions to the first two possibilities, and will
indicate whether an exact solution is worth the seeking.

v- HYPOTHESIS TESTING

This section is tentative, and included primarily to show that there is a


basis for examination of hypotheses in the present context; a full theory is
yet to be completed. In the process of going from the original A to the
subsets in the optimal covering, it is obvious that information has been
discarded. The question of interest is to determine if the amount eliminated
is large; this leads to a null hypothesis that there has been no loss. A
second class of hypotheses considers different solutions to the problem e.g. a
comparison between the covering which maximizes the joint probability and that
which maximizes the information; which is to be preferred?
The joint probability, given that the solution is a partition, is the
familiar

L1 = n' II
i

where n is the number of objects, ni the number in each subset, and lPi= 1.
If the subsets form a covering, it is clear that ini> n and so Ll is not
applicable. Suppose (for the moment) that the intersection of three (and
more) distinct subsets is empty and let nij denote lIn JI, where I,J denotes
the objects in two distinct subsets, and let Pij= PiPj' The problem,
therefore, is to adjust Ll for these intersections. In particular, the
numerator should be reduced by the size of the weighted probability against

intersection i.e. by (1 - Pij)~j, and the denominator reduced by nij"


This gives
n i
p
L = nl 11_i_ II
nij '
2 i,j n
i ni ' (l-Pij) ij

The generalization to the intersection of 3 ••• m subsets is immediate, and with


320

an obvious extension of the definitions of nij and Pij' the joint


probability is

n ij ... m .'
L
m n
(l-P ij ••• m) ij ••• m

For m of any reasonable size, this expression is rather formidable to compute,


but the following observations make it less difficult than i t may seem at
first.

1. Many multiple intersections are empty, and so n(t)!/(l-p(t»n(t)= 1

2. If for t subscripts all intersections are empty, then so will be those for
t+1, t+2, ••• ,me
3. A good approximation can be made to Lm by using L2 (or perhaps L3),
This follows from the following lemma.

Proof (a) as t + m, then n(t)+ 0 I.e. n(t)! + 1

(b) as t + m, then P(t)+ 0 I.e. (1 - p(t»n(t)+ 1

(c) combining (a) and (b) completes the proof.

Computationally, the Stirling approximation to the factorials simplifies the


calculations of Lt. The only serious computation, therefore, is generating
the intersections of all subsets while ensuring that there are no repeats.
The relative likelihood of two solutions can be obtained as the
difference in the natural logarithms of the joint probabilities, and in a
hierarchical context, may be used to examine the various hypotheses.

VI - NUMERICAL EXAMPLES

Using the first of two (artificial) examples given by Andre (1984), the
incidence matrix corresponding to his figure 1 is given in table la together
with the computed probabilities of the lists; the optimal covering was
obtained by the reductions and is given in table 1c. As noted above, the dual
problem can also be solved by the methods of this paper, namely, which sites
should be grouped. In the absence of spatial continguity information, the
cost-free reductions yielded the unique solution to be sites {l-12}, and {ll,
13-25}, with indicator species a and h respectively. Only site 11 is in
321

Table 1. Numerical example (from Andre 1984, fig. 1).

(a) Original incidence matrix in transposed form (AT)

Species
Sites a b c d e f g h i j Probabilities
1 1 0.0119
2 1 0.OU9
3 1 0.0119
4 1 1 0.0198
5 1 1 1 1 1 0.0382
6 1 1 1 1 1 0.0502
7 1 1 1 1 1 0.0382
8 1 1 1 1 1 0.0502
9 1 1 1 1 1 0.0382
10 1 1 1 1 1 1 0.0582
11 1 1 1 1 1 0.0458
12 1 1 1 1 1 0.0502
13 1 1 1 1 1 1 0.0428
14 1 1 1 1 1 1 1 1 0.0635
15 1 1 1 1 1 1 1 1 0.0635
16 1 1 1 1 1 1 1 1 0.0635
17 1 1 1 1 1 1 0.0592
18 1 1 1 1 1 0.0402
19 1 1 1 1 1 1 0.0521
20 1 1 1 1 1 1 0.0521
21 1 1 1 1 1 1 0.0521
22 1 1 1 0.0258
23 1 1 1 0.0258
24 1 1 0.0258
25 1 1 0.0170

(b) Fully reduced incidence matrix in transposed form (AT)

Species
Sites a c f j Probabilities
4 1 0.0198
6 1 1 1 0.0502
11 1 1 0.0458
12 1 0.0428
14 1 1 0.0635
i7 1 1 0.0592
18 1 0.0402

(c) Best covering solution samples (6,14} + (a,b,c,d,e,j}


lb,c,d,e,f,g,h,i}
Characteristic species Common species
Association 1 la,j} Associations 1 and 2 (b,c,d,e I
Association 2: f,g,h,i}

(d) Relational data transformation: Unique solution: {a, b,c,d.,e}


{e,f,g,h,i}
{j}

Characteristic species Common species


Association 1: {a, b,c,d} Associations 1 and 2: {e}
Association 2: {f,g,h,i}
Association 3: {j}
Table 2. Numerical example (from Dale, 1971): percent presence of 18 species in 25 sites.
(a) Species Site Nos.
2 17 22 24 3 18 4 20 9 13 23 11 12 14 10 7 8 15 16 5 21 6 25 19

canins.
Agrostis canina. 1 13 7 36
A. tenuis. 2 11 8 23 9 20 38 28
Anthoxanthum odoratum. 3 2 8
Blechnum spicant. 4 1
Calluna vulgaris. 5 2 3
Carex binervis. 6 64
f1exuosa.
Deschampsia flexuosa. 7 94 71 95 95 81 99 100 99 100 48 86 66 86 94 98 70 17 74 94 98 84 93 12 24 22
Festuca ovina. 8 95 92 59 66 53 47 18 23 54 30 26 51 12 57 26 30 67 24 100
Galium saxatlle.
saxatile. 9 13 11
13 1311 24 1 17 4 34 6 6
Ilolcus
Holcus lanatus. 10 2
Juncus
Junetts squarrosus. 11 2 3 1 1 4
Luzu1a campestris.
Luzula 12 1
Nardus stricta. 13 5 1 8 31 33 27 36 13 3 16 59 80 81 59 53 46 4 2
Potentl11a
Potentilla erecta. 14 2
Pteridium aquilinum. 15 2 17 17 7 1 2 2 13 1 14 17
Rumex acetosa. 16 28
Vacclnium myrti11us.
Vaccinium myrtillus. 17 100 100 100100 99 100 100 100 99 79 99 98 97 100 98 100 1 45 10 39 31 27 79
V. vi tis-idaea. 18
---- --

(b) Site Percentages (B) Incidence (A) (c) Full data


1 0.045305 0.013280 (.0)
c.>
~
I\>
2 0.042030 0.024640 Association Species ~
I\>
17 0.035498 0.006719 1 7,8,12,13,15,17
22 0.039485 0.013280 2 7,8,13,17,18
24 0.039247 0.018079 3 1,2,7,8,9,10,17
3 0.049168 0.0494'51 I, 4,7,13,17
18 0.039671 0.038380 5 1,2,6,7,8,9,13,14,15,17
4 0.035869 0.049596 6 2,7,8,9,13,16
20 0.044891 0.064188 7 1,2,3,7,8,9,13,15,17
9 0.034586 0.073783 8 3,5,7,8,11,13,15,17
13 0.043759 0.024836
23 0.041188 0.013280 (d) Threshold data (25%)
11 0.034090 0.041345 1 2,8,16
12 0.022838 0.020476 2 6,13
14 0.026803 0.034131 3 1,7,8,9,13,17
10 0.026498 0.025276
7 0.055979 0.113792
8 0.092059 0.089616 (e) Relational data transformation
15 0.034728 0.032090 1 1,6,15
16 0.038020 0.033240 2 7,8,13,17
5 0.050708 0.080235 3 2,9,15
21 0.039598 0.030881 4 3,4,5,10,11,12,14,16,18
6 0.019060 0.0361%
0.03619'6
25 0.018900 0.013719
19 0.050020 0.059492
323

common.
Two further analyses to smooth the data were made using the Jaccard and
Russell/Rao similarity coefficients for estimating the similarity among the
species. Using phase 1 of conditional clustering (see section II(c», eight
subsets were generated from each, of which the same three remained after the
cost-free reductions. All three (table 1d) were mandatory with respect to the
constraints, and so probabilities did not need to be estimated. It can be
seen that while the optimal solutions for the analyses have much in common,
the groupings obtained from the similarity coefficients suggest more species
as characteristic than in the direct analysis of the incidence table, and
also, somewhat surpriSingly, places j in a group by itself. Since the
presence of species j implies that species e is present, it seems more
reasonable that these two should belong in the same group, as in table 1c.
The second example uses frequency data for 18 species in 25 sites given
by Dale (1971) and reproduced in table 2a. Estimates of the site
probabilities both from the matrix B and from A formed as a presence/absence
array, are given in table 2b. Both arrays suggested the same eight species
associations (table 2c), of which the first six are mandatory. There was
considerable overlap among the associations. The analyses were repeated using
a threshold to eliminate infrequent species; all values in table 2a exceeding
25% were retained, and others replaced by zero. As a result, only three
associations were obtained (table 2d).
The indirect method was also used, i.e. computing dissimilarities as the
sum of the absolute values of the difference in frequencies, and using the
subset generating procedures described in section II(c) to smooth the
relationships. 12 subsets were generated, none of which were mandatory.
There were four subsets in the optimal covering (table 2c). Comparing tables
2c, d, e suggests that while there are differences, there are also some
apparent recurrent species associations.
Remembering that the role of clustering is to provide candidate groupings
of objects for further evaluation, this diversity of result demonstrates the
need for further ecological investigations to determine which if any
association is more than random.

VII - DISCUSSION

Determining associations is a continuing problem, and so it is of


324

interest to decide if there is any merit in the present proposal compared with
others previously made. Traditional procedures seem to be as follows
1. determine a relational measure among each pair of objects;
2. by some clustering method, determine subsets of the objects.
There are many different relational measures which have been proposed (Gower
and Legendre 1986). Each has its arbitariness and hidden assumptions; there
does not seem to be any single relational measure which is superior in all
circumstances, or which by assuming a particular probability distribution for
the observed incidence matrix, does not impose more structure on the data than
they themselves have. Nevertheless, the conversion of attribute data into
similarities, and the (re-)generation of an incidence table by the algorithm
of section II (c), can be regarded as a smoothing process, which may lead to
simpler solutions. In the present proposal, although ATA* can be regarded as
a relational measure among the attributes, there is none among the objects
which replaces the data; these remain as lists of objects which exhibit the
same attribute state.
The two assumptions which are made are :
1. the principle of indifference, which is used to obtain the probabilities;
here, this is equivalent to the maximum entropy principle to obtain a
probability distribution just consistent with the structure of the data
without imposing further structure (such as that arising from the
assumption of a Poisson, binomial etc. distribution); and the principle
2. of maximum joint probability, which in the present formulation is
equivalent both to that of minimum cross-entropy and of maximum
likelihood (Lefkovitch 1985); this is used to select the attributes from
among those seen. It has been shown that any choice has to agree with
this principle (Shore and Johnson, 1980) if consistency is desired; this
contrasts sharply with traditional clustering procedures, whose
assumptions are rarely known (or are even knowable) in the context of
consistency.
An open question is whether the probabilities should be obtained from the
original A or from this array after duplicate attributes (i.e. identical
columns in A) have been eliminated. The numerical values, after allowing for
the different standardization, can be very different i f the duplication is
considerable for some attributes. A decision to retain duplicate columns
clearly depends on the original sampling procedure for their choice; if it was
random (see also below), it seems preferable to use the original A. In any
325

case, it is not difficult to obtain the probabilities from both arrays, and to
come to some decision based on both solutions; cluster analysis, after all, is
a hypothesis generating procedure and not an evaluation.
The second component of traditional group-forming procedures is the use
of a clustering algorithm; this requires a choice from the plethora currently
available, since each has requirements about the metric, and makes somewhat
arbitary even if plausible definitions of relationships among compound
subsets, as well as in the initial definition of dissimilarity. The end
result of many of these methods is usually a dendrogram, so that it is
necessary to make further assumptions to obtain the subsets of the objects.
In the present proposal, the dissimilarity, clustering and reconstruction
phases are avoided, since the incidence matrix itself gives the candidate
subsets; the only problem is to choose from among these. The choice is based
on the logic of implication, on the duality of the information in the rows of
a table with that in the columns (see the proof of the theorem in Lefkovitch
1985), and on the classical principle of maximizing the joint probability.
The only component which is somewhat unfamiliar is the meaning of this
probability, since it is not a frequency nor is it subjective, but has a
logical interpretation in the sense of Carnap.
Although linear least-cost set covering is NP complete, its special
structure makes it one of the easiest of integer programs to solve, primarily
because of the reductions which are possible. In the present context, because
the costs are a function of the constraints, the problem is even further
simplified, and arguments based on worst case performance can be neglected.
It also conjectured that because of the definition of the probabilities,

(Pj> Pk) <= > (J::o K), which implies that reduction rules 1-3 can be performed

on the transposed complements without changing the optimal solution; if so,


the probabilities will be required iff the reduced array is not empty.
There are two other treatments of binary data which ought to be
compared with that of the present paper. As noted by Lefkovitch (1985), there
is a resemblance between the probabilities as obtained here with the values of
the object ordination given by correspondence analysis. One representation of
the latter procedure is essentially as follows : with A as in this paper
(without missing values and ignoring various normalizations), the reciprocal
averaging solution is to find v and w so that
Av ex w
T
and Aw p v.
If the interest is in v, the solution is given by the Perron-Frobenius
326

eigenvector of

yv
which clearly differs from
T T T T
A A*p = (A 11 - A A)p = A P
of the present paper.
In the present model, the rows (= objects), columns (= attributes) and
elements of A are ~ regarded as being random. There is a superficially
similar set of circumstances, arising from item analysis (see Rasch 1960;
Andersen 1980; Tjur 1982), which by contrast, assumes that the aij are
independent Bernoulli random variables, with
ai
Pr (a ij = 1)
ai + ~j

where a.t is a row parameter which increases with the increasing 'ability' of
object i to show the suite of attributes under consideration, and j:\j is a
column parameter which decreases with the increasing 'difficulty' of attribute
j to be shown by the objects under consideration. The objective of the
analysis, which is to estimate a.t, is different from that of the present
paper, which is to identify recurrent sets of individuals. The Rasch model
leads to determining the set representation probabilities of A, given that
they balance the covering probabilities of A*, and so a.t is equivalent to
qi of the present paper. It should emphasized, however, that in the set
covering model, there is no probabilistic interpretation of the elements of A,
and that p and q have meaning only with respect to providing evidence relevant
to propositions about the grouping of objects. Were either the rows, columns
or elements of A to be regarded as random samples from populations of rows or
columns, then it is apparent that the Rasch model would be of interest, and
advantage could be taken of any hypothesis tests which may be relevant.

ACKNOWLEDGEMENTS

I am grateful to my colleagues in Agriculture Canada, the University of


Ottawa and Carleton University for discussions. I must particularly
acknowledge the many questioners at talks given at several Numerical Taxonomy
meetings and elsewhere, including this workshop, who have obliged me to
rethink several of the concepts in this paper, and to the referees, whose
comments led to a number of improvements in the presentation. This paper 1s
contribution No. 1-692 from the Engineering and Statistical Research Centre.
327

REFERENCES

Andersen, E.B. 1980. Discrete statistical models with social science


applications. North-Holland, Amsterdam.
Andre, H.M. 1984. Overlapping recurrent groups : an extension of Fager's
concept and algorithm. Biometrie-Praximetrie 24 : 49-65.
Chvata1, V. 1979. A greedy heuristic for the set covering problem.
Mathematics of Operations Research 4 : 233-235.
Dale, M.B. 1971. Information analysis of quantitative data. Statistical
Ecology 3:133-148.
Fine, T.L. 1973. Theories of probability. Academic, New York.
Garfinkel, R., and G.L. Nemhauser, 1972. Integer programming, Wiley, New
York.
Gower, J.C. 1971. A general coefficient of similarity and some of its
properties. Biometrics 27:857-871.
Gower, J.C., and P. Legendre, 1986. Metric and Euclidean properties of
dissimilarity coefficients. Journal of Classification 3:5-48.
Lefkovitch, L.P. 1976. Hierarchical clustering from principal coordinates:
an efficient method for small to very large numbers of objects.
Mathematical Biosciences 31:157-174.
Lefkovitch, L.P. 1980. Conditional clustering. Biometrics 36:43-58.
Lefkovitch, L.P. 1982. Conditional clusters, musters and probability.
Mathematical Biosciences 60 : 207-234.
Lefkovitch, L.P. 1984. A nonparametric method for comparing dissimilarity
matrices, a general measure of biogeographical distance, and their
application. American Naturalist 123:484-499.
Lefkovitch, L.P. 1985. Entropy and set covering. Information Sciences
36:283-294.
Lundy, M. 1985. Applications of the annealing algorithm to combinational
problems in statistics. Biometrika 72:191-198.
Rasch, G. 1960. Probabilistic models for some intelligence and attainment
tests. Danmarks Paedagogistic Institut, Copenhagen.
Shore, J.E., and R.W. Johnson, 1980. Axiomatic derivation of the principle
of maximum entropy and the principle of minimum cross entropy. IEEE Trans.
Inform. Theory IT-26:26-37.
T jur, T. 1982. A connection between Rasch's item analysis model and a
multiplicative Poisson model. Scand. J. Statist. 9 : 23-30.
Toussaint, G.T. 1980. The relative neighbourhood graph of a finite planar
set. Pattern Recognition 12:261-268.
328

Appendix 1 : Fortran subroutine to obtain the probabilities.

SUBROUTINE COVPRB(N,M,K,IL,JL,P,Q,Y,TOL)
C
C THIS SUBROUTINE OBTAINS BOTH THE SET COVERING AND
C SET REPRESENTATION (COMPLEMENTARY PROBLEM) PROBABILITIES
C
C N IS THE NUMBER OF ROWS
C M IS THE NUMBER OF COLUMNS
C K IS THE NUMBER OF ELEMENTS
C IL IS A VECTOR OF LENGTH K CONTAINING THE ROW
C INDICES OF THE ELEMENTS OF A
C JL CONTAINS THE CORRESPONDING COLUMN INDICES
C WHICH IF NEGATIVE INDICATE MISSING (NOT ABSENT) DATA
C P WILL CONTAIN THE COVERING PROBABILITIES
C Q WILL CONTAIN THE REPRESENTATION PROBABILITIES
C Y IS A WORK VECTOR OF LENGTH M
C TOL IS A CONVERGENCE CRITERION
C
C THIS SUBROUTINE IS NOT PROTECTED AGAINST N, M, K, IL OR
C JL BEING ZERO ETC. ON INPUT OR FOR Z = 0.0
C DURING THE CALCULATIONS
C
DIMENSION P (M), Y(M), Q(N)
C
C IT IS SUGGESTED THAT P,Q,Y,Z,V,TOL BE DOUBLE PRECISION
C REMOVE THE C IN COLUMN 1 FROM THE NEXT CARD,
C DOUBLE PRECISION P,Q,Y,Z,V,TOL
C AND REPLACE ABS BY DABS, 0.0 BY O.DO, 1.0 BY 1.DO
C WHERE APPROPRIATE
C
INTEGER*2 IL(K),JL(K)
C
C INITIALIZE P
C
Z=l. 0 /FLOAT (M)
DO 1 J=l,H
1 P (J )=Z
C
C INITIALIZE A NEW ITERATION
C
100 DO 5 I=l,N
5 Q(I)=l.O
DO 10 J=l,M
10 Y(J)=O.O
C
C NOW DO AN ITERATION
C
DO 15 L=l,K
J=JL(L)
IF(J.LT.O) GO TO 15
I=IL(L)
Q(I )=Q(I )-P(J)
329

15 CONTINUE
Z=O.O
DO 20 L=l,K
J=JL(L)
IF(J.LT.O) GO TO 20
I=IL(L)
Y(J )=Y (J )+Q (I )
Z=Z+Q(I)
20 CONTINUE
C
C NEW VALUES OBTAINED; CHECK FOR CONVERGENCE
C
W=O.o
DO 25 J=l,M
V=Y(J)/Z
W=W+ABS (P (J )-V)
25 P(J)=V
IF(W.GT.TOL) GO TO 100
C
C STANDARDIZE Q; P IS ALREADY STANDARDISED
C
Z=O.o
DO 30 I=l,N
30 Z=Z+Q (I)
DO 35 I=l,N
35 Q(I)=Q(I)/Z
RETURN
END

Appendix 2: Fortran subroutine to obtain an optimal set covering using the


annealing algorithm.

SUBROUTINE SETCOV(MVRS,NCON,A,COEF,BETA,EPS,MITR,ISEED,PC,
1 ULMT,XBEST,FBEST,X,XX,T)
C
C INPUT
C ***************
C MVRS INTEGER NUMBER OF COLUMNS OF A
C NCON INTEGER NUMBER OF ROWS OF A
C A LOGICAL*l BINARY CONSTRAINT MATRIX
C COEF REAL *4 THE (NON-NEGATIVE) FUNCTION COEFFICIENTS (CHANGED
C BY THE SUBROUTINE TO SUM TO UNITY)
C BETA REAL *4 POSITIVE: TO CONTROL CONVERGENCE (APPROX 5.0)
C EPS REAL *4 POSITIVE: TO TEST CONVERGENCE (E.G. 0.001)
C MITR INTEGER MAXIMUM NUMBER OF CANDIDATE SOLUTIONS
C (E.G. MVRS*NCON)
C ISEED INTEGER TO INITIALIZE THE RANDOM NUMBER GENERATOR
C PC REAL *4 0.5 PC 1.0 TO DETERMINE NEIGHBOURING SOLUTIONS
C ULMT REAL *4 UPPER LIMIT ON COEF FOR INCLUSION
C
C OUTPUT
C ***************
C XBEST LOGICAL*l THE SOLUTION ARRAY
C FBEST REAL *4 THE FUNCTION VALUE AT THE OPTIMUM
330

C (BASED ON THE MODIFIED COEF)


C X,XX,T LOGICAL*l WORK ARRAYS
C
DIMENSION COEF(l)
LOGICAL*l X(l),XX(l),T(l),XBEST(l),A(NCON,MVRS)
LOGICAL COVER, ACCEPT
C
C INITIALIZE AND EQUALIZE MAXIMUM FUNCTION VALUES TO UNITY
C
IF(PC.LT.0.5.0R.PC.GE.1.0) PC=0.75
Z=O.O
C=1.0
ULMT=O.O
DO 30 J-1,MVRS
XX(J )=XBEST (J)
IF(XBEST(J» ULMT=ULMT+COEF(J)
Z=Z+COEF (J)
30 CONTINUE
DO 20 J=l,MVRS
20 COEF (J)=COEF (J)/Z
FLAST-ULMT /Z
FBEST=FLAST
C
C ITERATIONS BEGIN HERE
C
DO 100 ITER=l,MITR
IF(C.LT.EPS) GO TO 130
C
C OBTAIN A RANDOM COVERING IN THE NEIGHBOURHOOD OF XX
C IN THIS VERSION, IT IS A PRIME COVER
C
DO 25 I=l,NCON
25 T(I)=.FALSE.
DO 45 J=l,MVRS
45 X(J)=.FALSE.
FNOW=O.O
C
C FIND THE NEXT SUBSET
C
10 J=RAN(ISEED)*MVRS
12 J=J+1
IF(J.GT.MVRS) J=l
IF(X(J).OR.(COEF(J).GE.FBEST» GO TO 12
X(J)=XX(J).XOR.(RAN(ISEED).GT.PC)
IF(.NOT.X(J» GO TO 12
C
C IS IT NEEDED, AND DOES IT COMPLETE A COVER?
C
COVER =. TRUE.
ACCEPT=.FALSE.
DO 60 I=l,NCON
IF(.NOT.T(I).AND.A(I,J» ACCEPT=.TRUE.
T (I )=T (I) .OR.A(I,J)
60 COVER=COVER.AND.T(I)
IF(ACCEPT) GO TO 55
X(J)".FALSE.
GO TO 12
331

55 FNOW=FNOW+COEF(J)
IF(.NOT.COVER) GO TO 10
C
C FEASIBLE COVERING FOUND. UPDATE C
C DETERMINE IMPROVEMENTS OVER THE BEST AND LAST
C
C=C/ (l.O+BETA*C)
C
C IF BEST SO FAR, KEEP
C
IF(FNOW.GT.FBEST) GO TO 80
DO 5 J=l,MVRS
XX(J )=X(J)
5 XBEST(J)=X(J)
FLAST=FNOW
FBEST=FNOW
GO TO 100
C
C IF BETTER THAN THE LAST, REPLACE
C OR IF WORSE, THEN 'HEAT UP' RANDOMLY
C
80 Z=AMAX1«FNOW-FLAST)/C,-1000.)
IF(Z.GE.O.O.OR.EXP(-Z).GT.RAN(ISEED» GO TO 100
DO 85 J=l,MVRS
85 XX(J)=X(J)
FLAST=FNOW
100 CONTINUE
C
C END OF ITERATIONS
C
130 RETURN
END
Fractal theory
APPLICATIONS OF FRACTAL THEORY TO ECOWGY

Serge Frontier
Laboratoire d'Ecologie numerique
Universite des Sciences et Techniques de Lille Flandres Artois
F-59655 Villeneuve d'Ascq Cedex, France,
and Station marine de Wimereux
B.P. 68, F-62930 Wimereux, France

Abstract - Forms with fractal geometric properties are found in ecosystems. Fractal geometry
seems to be a basic space occupation property of biological systems. The surface area of the
contact zones between interacting parts of an ecosystem is considerably increased if it has a fractal
geometry, resulting in enhanced fluxes of energy, matter, and information. The interface structure
often develops into a particular type of ecosystem, becoming an "interpenetration volume" that
manages the fluxes and exchanges. The physical environment of ecosystems may also have a
fractal morphology. This is found for instance in the granulometry of soils and sediments, and in
the phenomenon of turbulence. On the other hand, organisms often display patchiness in space,
which may be a fractal if patches are hierarchically nested.
A statistical fractal geometry appears along trips and trajectories of mobile organisms.
This strategy diversifies the contact points between organisms and a heteregeneous environment,
or among individuals in predator-prey systems. Finally, fractals appear in abstract representational
spaces, such as the one in which strange attractors are drawn in population dynamics, or in the
case of species diversity. The "evenness" component of diversity seems to be a true fractal
dimension of community structure. Species distributions, at least at some scales of observation,
often fit a Mandelbrot model!r =!O (r +~) - Y, where!r is the relative frequency of the species
of rank r, and Ity is the fractal dimension of the distribution of individuals among species.
Fractal theory is likely to become of fundamental interest for global analysis and
modelling of ecosystems, in the future.

INTRODUCTION

The importance of fractal geometry in the morphology of living beings has often been
stressed, for scales of observation ranging from intracellular organelles (mitochondria) to entire
organisms (trees) to vegetation physiognomies. Fractal geometry not only is an attempt to search
for an order in the inextricable morphology of living beings, but seems to point out some property
that is essential for the functioning of life. Indeed, life is made of ceaseless interactions, and
incessant fluxes of matter, energy and information through interfaces, which at first sight look like
surfaces. As a matter of fact, it is at the level of these interfaces that the geometry becomes
inextricable, suggesting an interpenetration volume instead of a smooth surface, between two
adjacent interacting elements. Actually, they are neither surfaces nor volumes, but fractals.

The challenge of living matter resides in managing a biomass, which is a volume, by


means of fluxes through surfaces, at numerous, nested scales of activity. As is well known,
such management does have dimensional constraints because if growth is homothetic (that is,
NATO AS! Series, Vol. G14
Developments in Numerical Ecology
Edited by P. and L. Legendre
© Springer-Verlag Berlin Heidelberg 1987
336

without any change of form), surface areas increase less rapidly than volumes. In order for the
surfaces to grow at the same rate as the volume, a particular highly folded morphology has to
develop, which strongly reminds one of fractal objects.

At scales larger than organs and organisms, fIrst appears the population, then the
ecosystem which is an interacting system of various populations and the environment. Ecology is
the science of these interactions, which are produced by fluxes of energy and matter and by
information exchanges. Once more, a fractal organization is visible here. Since little has been
written about "fractal ecology" up to now, my purpose is to review what can be considered as
fractals at the scale of the ecosystem. Such an inventory has to include the following:
- Forms characterising the contact between organisms, between organisms and the
environment, between communities and the environment, and among ecosystems. In developing
these forms, fractal structures seem to be part of the biological strategy at all scales of observation.
- Size frequency distributions, which often have a fractal dimension (Mandelbrot 1977,
1982; Section 1 and Fig. 3 below).
- Spatial distributions of organisms (patchy distributions, and so on).
These three fIrst items describe a strategy of space occupation. There are also stragegies of
time-and-space occupation:
- Paths or trajectories make it possible for organisms to increase the number of their
contacts with a heterogenous environment, or among populations; all these increase the rates of
interaction.

Besides fractals in physical space (either systematic, or completed by a random


component, resulting in statistical fractals ), this inventory also has to consider fractals in an
abstract representational space, sometimes called "abstract fractals" below, for convenience:
- Strange attractors, which are frequent in dynamic systems, have a fractal dimension.
- Entangled and nested cycles of matter, which make up the ecosystem, possibly have a
fractal organization just like economical networks do, as well as computers and automata
(following a demonstration by Mandelbrot).
- The distribution of total biomass among different species, with various degrees of
abundance and rarity, is also a fractal property of biomass. The distribution of biomass among
species, that causes species diversity, is not a random process, but it follows a particular type of
frequency distribution related to fractal theory, as will be discussed.

Generally speaking, the interaction network within an ecosystem is organized


hierarchically; it is a condition for its stability (Margalef 1980).
337

The uninitiated reader can refer to the Appendix, where defmitions, elements of fractal
theory, and methods of computation are presented.

[Note: references to Mandelbrot without a date concern the 1977 or the 1982 editions of his book;
both contain an extensive bibliography].

L FRACfALFORMS IN ECOWGY

The fractal shapes (morphologies) observed at scales greater than the individual are just
a continuation of those observed inside the cells and organisms. For instance, the fractal
dimension of the external surface of mitochondria is 2.09, 2.53 for the internal surface, 2.17 for
the inner membrane of the human lung, etc. Intuitively speaking, the fractal dimension indicates a
certain degree of occupation of the physical space by a contorted, ramified, or fragmented surface,
where some exchanges occur. The histological structure of any tissue appears as a kind of
sponge, whose fractal dimension is between 2 and 3. Moreover, tissues are connected with their
biological environment (inside and outside the organism) thanks to an organized circulation of
substances, which may have the form of an arborescence of canals (invaginations); or, the tissue
may have its external surface ramified (evagination). For example, the branching out of the
bronchioles inside the lung has a fractal dimension which is slightly less than 3. As a matter of
fact, the circulation of substances is one of the main factors coupling two living webs or
organisms, and it cannot be dissociated from energy flow; according to the Morowitz' (1968)
principle, any flux of energy is associated with (at least) one cycle of matter in dissipative
systems. Figure I shows two isomeres A and B in a state of energy equilibrium. An energy flow
crosses the system and is coupled with transport mechanisms between points of different energy
levels; there is either turbulent diffusion or organized channels (both being represented in Fig. 1),
resulting in a cycle of matter. A fractal geometry is a logical requirement for the wall
configuration and the transport system, in order to accelerate the energy flow through the system,
as well as the cycling of matter. That particular geometry can be directly observed in the
morphology of trees, for example, where the canopy allows sufficient contact between the
atmosphere and the chlorophyll web, and the beard of roots and rootlets allows an intimate contact
with the nutrients in the soil. Other examples of contact between organisms and the environment
are given by animal lungs and gills, filtering apparatuses, and so on. Figure 2 shows various
fractals evoking canopies, root beards or bronchioles, tremendously increasing the contact
between the organism and the medium, as the black and white parts do.

In other cases, the fractal geometry responsible for the efficiency of the system is more
subtle. Sometimes biomass uses the fractal geometry of its physical environment instead of
338

Fractal interface

~----- .. ----~---- .. -----~

turbulent transport

A
~
I
I
high I low
•••••• energy flow •••••• I energy
energy_---...;~
I level
level I
'f:, "- I
s-:::----_ S
-:....~-.;-- --~ - ~- '?>--
--:. ::-;r. _ _ --::r - - nort
- - - :=- - . d trans ..
--- n,se
C e- orga
spa

Fig. 1. Energy flow and matter cycling through a fractal wall geometry. Inside the system, A and
B are two isomeres whose equilibrium depends on the energy level. They are transported from
one wall to the other either by diffusion, or by a spatially organized transport mechanism. The
broad arrows symbolize energy flows. The dashed lines represent matter cycling. Modified from
Morowitz (1968).

organizing itself in a fractal fonn. For example, in the aquatic environment, the enhancement of
contact surfaces is obtained by parcelling out the biomass into isolated cells; this is the strategy
followed by bacteria and phytoplankton cells, where the renewing of contact surfaces with water
is produced by turbulence: Mandelbrot demonstrated that the geometry of turbulence is fractal, for
it is composed of eddies, which dissipate into smaller and smaller ones -- a typical fractal process
-- up to the scale of viscosity. The fractal dimension of dissipation is approximately 2.6. The
fractal dimension of boundaries of wakes and clouds is 2.3.

The importance of turbulence for pelagic production, and of contacts and shears
between complementary water bodies and currents, is well known (Legendre 1981; Legendre and
Demers 1984). It is important both for primary production, and for the exploitation of this
primary production by consumers. Moreover, turbulence is sometimes induced by organisms,
when either they shake the surrounding water, or constitute a roughness that increases the velocity
of eddies within a previously regular current (Frechette 1984).
339

Fig. 2. Fractal models evoking


ramifications, as are found
in plants and in animals.
From Mandelbrot (1982), with
permission.
340

The environmental fractal geometry used by organisms is also seen in the soil and in
sediments, where organisms are moving and growing. Any sediment or soil is characterized by a
particular distribution of grain sizes, which is their granulometry. Smaller grains are lying
between the larger ones, resulting in a picture that can be schematized as an arrangement of
spheres (Fig. 3). This arrangement can be studied for its fractal geometry. Some general
properties of soils, related to percolation and water retention by surfaces, depend on this fractal
geometry. It would be interesting to see whether the size distribution of organisms, from the
tiniest ones (bacteria) to the biggest (vertebrates), also has a fractal-type regularity, and whether its
fractal dimension is linked to that of the medium, which can be made of solid particles or be
aquatic and turbulent.

At another scale of observation, relations have been known for a long time by
limnologists between the morphology of ponds and lakes, and such biological properties as
overall productivity and so on (Hutchinson 1957; Ryder 1965; Wetzel 1975; Adams and Oliver
1977). Lake morphology, as well as the "morphoedaphic index", have always been expressed in
terms of a ratio between the length of the shoreline and the volume of water, but we know today
that the shoreline is a fractal and that its "length" is not uniquely defined, depending on the stride
length (or "yard-stick" of Mandelbrot) that has been used to measure it. Figure 4 indicates the
fractal dimension of a lake shoreline, following Kent and Wong (1982). It follows that it is not
the lengthlvolume ratio, but its fractal dimension, that ought to be correlated with ecosystem
properties. This has been pointed out by Kent and Wong, but without any deep investigation on
the relationship, which they only assumed to exist; the process can be seen in the fact that the
littoral zone oflakes (the extent of which depends on the fractal dimension of the shoreline) brings
together the primary producers and the decomposers, then accelerating the cycling of matter. It
seems necessary to persevere in this way, renewing entirely the notion of "morphoedaphic index"
in the light of fractal theory.

More precisely, a shoreline is a contact zone between two ecosystems, an aquatic and a
terrestrial one (including soil and vegetation). Often the limit between them cannot be stated
precisely, because a particular ecosystem (contact or interface ecosystem) develops in the vicinity
of the water-soil contact line: interpenetration area, reed-belts and their fauna, intensified
exchanges, etc. The shallow coastal stretch is very important in the economy of the whole lake,
and also of the surrounding terrestrial ecosystem (both ecosystems may "exploit" it). The surface
area of that contact ecosystem is then important, and it depends upon the "length" of the theoretical
boundary -- or, more precisely, on its fractal dimension. Figure 4c explains that "law", starting
from the assumption that the interface ecosystem can only develop within a distance L from the
geometrical boundary.
341

Fig.3. Arrangement of circles in a plane, or of spheres in space. i, b: Isodiametrical circles. The


ratio of empty:full areas is «6ht) • 1), for all diameters. ~: Fractal arrangement. Indefinitely
smaller circles are located in the empty spaces left between the larger ones, resulting in (1) the
complete filling of the plane or space (since the ratio empty:full becomes zero), and (2) a particular
frequency distribution of diameters, corresponding to a fractal dimension. Several geometric
solutions are available, depending on the starting point.
342

Q)
.: 10.3
Q)
o
~

-
.r:.
(I)

o
of 10.0
CI
c:::
Q)
..J
c:::
..J
9.7

4.0 6.0 8.0
Ln Length of ·stick-

c
water

dlow dhigh
.........
L

Fig. 4. Shoreline of Gull Lake, Ontario, Canada. a: Map of the lake. .11: The length of the
shoreline is a decreasing function of the length of the" yard stick" used for measuring it. The
fractal dimension can be inferred from the slope of the line (see text); it could provide a new type
of "morphological index' for lakes. Modified from Kent and Wong (1982). £: Two different
fractal dimensions d of the shoreline, resulting in two different areas of land/water
interpenetration surface; this surface is here defined as the set of points located within a maximum
distance L from any point of the fractal shoreline.
343

Similar indexes could certainly be described for the contact zones between other pairs
of ecosystems (Frontier 1978), such as a forest-savanna contact zone, a coral reef, and so on.
They should include both the structure of the multispecies living community and the fractal
morphology of the landscape, as represented for instance in Mandelbrot's recent landscape
models.

2. SOME LIMITATIONS OF THE FRACI'AL MODEL

Let me now discuss the limitations of the fractal geometric model in biology and
ecology. Fractal theory is, to my knowledge, the fIrst mathematical theory that explicitely uses the
notion of observation scale, for in building up a fractal object, it states that the same generative
process repeatedly acts from scale to scale, following a so-called "cascade". Nevertheless, the
reality of the scale in a mathematical fractal is, so to speak, immediately occulted by the generative
process itself because, when looking at a fractal picture, it is impossible to infer at what scale it is
actually considered; all scales are equivalent and undiscernible from the form itself. For example,
a theoretical tree is branching out ad infinitum, any tree being a branch of a larger one, and so
on, following the rule of self-similarity. Consequently, the very question of the scale at which we
are looking at a particular branching has no mathematical meaning.

On the other hand, a real biological object, such as a living web, does not look the
same at different observation scales. For example, when looking at a histological preparation
under a microscope, with a little knowledge of histology one can infer the scale from the structure
seen, even without knowing what magnifIcation is being used.

Generally speaking, a biological object in which a fractal geometry appears, displays


that particular geometry only between two observations scales, sometimes close to one another. A
real tree stops its branching-out after, for example, eight binary steps. Beyond that, the tiniest
ramifIcations do not ramify any more, but they bear leaves, whose parenchymatous web realizes
another fractal structure. At the other end of the scale, an individual tree does not belong to a
larger one, but to a forest which is another fractal. Mandelbrot built a geometric fractal reproduced
as Figure 5a; the "forest" not only includes fractal trees, but also a distribution of tree sizes.

It has been shown that the fractal dimension of the shape of a coral reef changes for
different intervals of the observation scale (Bradbury et al. 1983, 1984; Mark 1984), being
approximately 1 if measured with 20 cm to 2 m steps, and a little more than 1.1 outside that
interval; transitions are sharp. We could say that a biological object, in which a fractal geometry
can be recognized, actually "mimics"a fractal upon some range of scales. The lung branches out
344

:t· ·
b
• .0

,iI!,'
'iII'
"iIt·
'
-.

.
'
"
_

.
, '

" :

. "'f
, ,
, ,

"', ....
,iI!"
'" '

."r:r
iJti
*
:t".
' . .'
'
,,'.::t:
. .'

Fig. 5. Two fractal figures from Mandelbrot (1982, with permission). !!: Model evoking a spruce
tree forest; its fractal dimension is 1.88. h: Model evoking the Roscoff coastline (location of this
NATO Workshop); its fractal dimension is 1.61. s:: The generator of figure (b).
345

23 times, a fish gill 4 times, and so on; beyond these limits, organs belong to other fractals. That
"fractality" of the living matter represents a developmental strategy by which living matter is able
to conduct the volume of exchanges that are necessary for the biomass to remain alive, and which
imply a sufficient surface/volume ratio. So the fractal view of the object is only a mathematical
model, pertinent at one observation scale or between two scales, that describes the developmental
strategy at that scales, in the same way as a mathematical smooth surface describes a leaf or a lung
surface at a particular observation scale. We do not have to expect any "real" (mathematical)
fractal to stand out in nature, no more than a "real" plane; this is also true for any artificial object,
for the smoothest technological object has a very rugged surface, when examined at high
magnification.

Rather than calculating only the fractal dimension within an interval of scales, it is
perhaps more interesting to look for those scales of observation where the fractal dimension is
changing, because at these critical scales, the constraints of the environment that act upon the
biomass are changing too.

Properties of the unliving matter also depend on the observation scale. For instance, the
same fractal dimension can be observed over a very broad range of scales, as in "breaking
surfaces" (ten orders of magnitude: Mandelbrot, pers. comm.) or in turbulence. The breaking of
stony material is bounded between the planet scale and that of atoms, while turbulence is bounded
between the planet scale again and the scale of molecules, where it turns out to be viscosity. At
intermediate scales, we can recognize viscosity, lapping, waves, local currents, and geostrophic
currents. From the point of view of the living organisms or of the ecosystems, they are not the
same phenomenon at all,since organisms and ecosystems have to adapt themselves in different
ways according to the scale, resulting in different morphologies, behaviours or fractal
dimensions.

If a tree was growing indefinitely, a problem of sap supply to the leaves would arise.
In the opposite way, if it was branching out infinitely, it would result in a colmated felting, which
would hinder both air circulation along the tissues, and sap circulation inside them because of
viscosity. Hence branching out cannot be infinite either towards huge or towards small sizes.
For the contact between air and sap to be efficient, the foliage chlorophyllous tissues have to be
organized as a porous sponge -- another fractal structure. The choice of a limited number of
branching steps appears to be an optimizing choice for the transfer of matter and energy.

Another example, which clearly shows that a fractal geometry has to be truncated
instead of going on infinitely, is in the utilization of soil by organisms. Not only the latter are
moving and growing inside it, but a liquid charged with dissolved nutrients, organic molecules
346

and gas has to be able to circulate within the soil. Remember the fractal model of the set of
spheres with various diameters (Fig. 3), more numerous in proportion as the diameter decreases.
At each step along the observation scale, smaller spheres fill in the holes left by larger ones. If the
process were repeated indefinitely, the sediment would be completely compact. Even before the
sediment could be completely sealed, it would block the water because of viscosity and surface
tension. So, to maintain a sufficient level of porosity, the rate of grain fragmentation into smaller
and smaller ones has to decrease, at least at the level of the smallers grains; that is, the fractal is
necessarily truncated. Adsorbent surfaces are also very important in soil ecology, and a fractal
geometry enhances these surfaces. Since free volumes are also necessary, the soil quality
depends upon a balance between surfaces and volumes. Burrough (1981,1983) has shown that
granulometry, as well as other properties of soils, exhibit variability in fractal dimension. On the
other hand, the percolation properties of porous materials are presently thoroughly investigated
by fuel engineers; this was revealed, together with the role of fractal "surfaces" in catalytic
reactions, through the papers presented during the colloquium "lournee application des fractales",
sponsored by the petroleum company Elf-Aquitaine (paris, 21 November 1985). I suggest that
investigations should be carried out, relating the biological properties of soil with its fractal
structure, in the same way as benthologists are relating benthic communities to the roughness of
the substratum (E. Bourget, in prep.). To summarize, the "fractality" of a living object has to be
described by means of a succession of fractal models, or perhaps an infinity of models if the
fractal dimension changes progressively.

In any case, it is less interesting to calculate precisely a fractal dimension than to


determine at what scales it changes abruptly, or whether it is continuously changing. In the latter
case, it can be said that the concrete object is "tangent to a certain fractal" at each observation
scale. At scales showing a steep rate of change, the new physical properties the ecosystem is
facing are to be investigated, for they are of great interest to ecologists.

An additional reason for a fractal form or process to be truncated, when generated by


living matter, is that such a morphogenesis is expensive in energy and information or negentropy.
Let me paraphrase well-known facts involved in the partitioning of an industrial product into
smaller and smaller parts, from the producer to the wholesaler, then to the sub-wholesaler and
finally to the consumer. We know and suffer from the fact that at each step, the price of the
product increases (sometimes exponentially), for its distribution requires energy for
transportation, as well as information for organization, marketing and protection. In biology,
building a fractal form is likely to involve a cost in energy, physiological and genetic information,
etc., as does the process of maintaining it in spite of the biological turnover. Nervertheless, the
necessary global properties of the structure (for example, a sufficient contact surface) are often
fully obtained after a limited number of generative steps only. That has been clearly demonstrated
347

through simulations by Villermaux sa.ul. (1986a, 1986b). They built a "Devil's comb" (Fig. 6)
made of a handle bearing a number of teeth, these teeth bearing smaller teeth, and so on. The
structure of the object, represented in black in the picture, is hollow, so that a substance can
diffuse inside its tubing. The authors modeled the diffusion of a gas up to the very end of the
teeth pattern. While they had thought initially that the molecules would take an infinite amount of
time to reach the ultimate teeth, since it is an infinite process, the result is actually the opposite: the
amount of time required converges to a fmite value. Moreover (and what is still more important),
the time is almost the same to fill up the 4 or 5 first sets of teeth, or the entire structure. Finally,
assuming that the internal surface was covered with a catalyst, an efficiency close to maximum is
obtained as soon as 4 or 5 steps of the fractal structure are covered. This is of great importance in
the design of an industrial catalytic apparatus, for it shows that it is not necessary to build more
than 4 or 5 steps. Knowing that, the cost of such an object can be minimized, since the object
becomes more expensive as the amount of detail increases.

Fig. 6. The Devil's comb of ViIIermaux flal. (1986a, 1986b). Every tooth bears
8 teeth that are 4 times smaller; the fractal dimension is then log 8 I log 4 = 1.5.
The generating process is repeated indefinitely.

In our field of interest, biology, this allows to understand why trees, or lungs, or
mitochondria, have a fractal morphology with only a limited number of steps (4 to 23); it is
because the chief properties of such a morphology are obtained after a few steps, and it is not
useful for the living object to continue its fragmenting process beyond, at the cost of too delicate
and expensive a morphology. Possibly also, an organism cannot maintain a structure beyond a
certain degree of complexity and delicacy, which would be another reason for living fractals to be
truncated. Any real object with a fractal form is then trying to optimize a life condition in a given
range of scales, and not at others. Let me add that fractal geometry by no means provides an
explanation of forms, but only a description; our astonishment is not to be diminished when
observing living forms, since their morphogenesis has still to be explained.
348

3. SCATTERING LIVING MATTER THROUGH SPACE

A phenomenon closely related to the genesis of forms is the scattering of biomass


through the physical space. As already noted, biomass fragments itself either into isolated cells,
as for phytoplankton, or into tiny organisms. This is a morphological strategy that allows an
increase in the contact between living matter and the medium. Moreover, it is well known that the
distribution of plankton (or, for this matter, of living organisms) is by no means uniform, nor
distributed at random, but patchy. That patchiness, or aggregated distribution in space, is
hierarchical. Indeed, patches are themselves heterogeneous, for they can be divided into areas of
greater or lesser density; these areas are, in turn, heterogeneous, and so on. Conversely, patches
are assembled into packs, then in packs of packs, and so on. A fractal geometry can clearly be
recognized (Fig. 7). The size distribution of plankton patches has been studied by Platt and
Denman (1977, 1978). But, contrary to the structure of organs and individuals, the various

.~
0_. :: -:." .. :r
'.
.... .,. 0 . ."

..... ',c.
;~. ':. .:

.... "::
.::- r.

.i: .
.. :,
.
'
. .
'.

..
: ..
..:
::'-...
.
' ..," ..
..
~:
".
..: .~ .. ~
~

y •

•; 0:. ~'.
". :\

Fig.7. Statistical fractal of dimension 1.2, that indefinitely models the scattering of biological
organisms in space.
349

levels are not limited by membranes or walls. Limits are fuzzy, hence the intervals between
patches created by the process are less evident. The analysis of this type of form requires another
method, which was developed by Fournier d'Albe (1907) and used by Mandelbrot for studying
the distribution of galaxies in the sky; galaxies are separated from living organisms only by "a
few" orders of magnitude, say 15 or 16.

Starting from a material point (either a galaxy, or a planktonic cell), neighbouring


particles are included in spheres of increasing radii (Fig. 8). At each step, the average density of
points per unit volume is calculated. Because of the patchy distribution of material points, the
spheres include larger and larger empty areas lying between the groups, so that the density of
points decreases; the rate of this decrease indicates the fractal dimension. Namely, at each step of
generation of the fractal structure, the "cascade" is such that n small clouds are included into a
cloud of clouds k times larger in linear size, so that the self-similarity dimension is d = log n !log
k. If the number of points inside a sphere of radius r is proportional to r d , and the sphere
volume is proportional to r 3 (in a tridimensional Euclidian space), then the average density is
proportional to r d -3. So, in a log-log graph (Fig. 8b), one can fit a line of slope d -3 , from
which the fractal dimension d can be inferred directly.

The method could be applied to plankton swarms, as an alternative to the method of


Platt and Denman (1977, 1978) which is a Fourier spectral analysis of continuous records such as
those of chlorophyll in the sea. tt can be expected that, within a given interval of scales, the
fractal dimension of plankton patches will be that of turbulence; for a scale smaller than that,
where Platt and Denman found a spectral discontinuity, another dimension has to arise,
corresponding to the influence of living processes, which do not have an effect at larger scales.
To my knowledge, the fractal approach has not been used yet to study swarms of organisms,
although the description of spatial patterns has been done using the related method of periodogram
analysis (Ripley 1981). It would be exciting to try the fractal approach using a continuous
plankton record, for instance. Such a data sequence whould describe the intersection of a fractal
object (the spatial distribution of plankton which extends in a tridimensional space) with a line,
namely the ship trajectory. Assuming (provisionally) that the spatial distribution is isotropic, we
can infer the fractal dimension of the swarm from that of the linear record, using the rule that the
intersection of two fractal objects is a fractal, the dimension of which is equal to the sum of the
dimensions of the two original objects, minus the dimension of the container space. The same
rule applies to the intersection of a fractal with a non-fractal object. Hence, if D and d are the
fractal dimensions of, respectively, the swarm and the continuous record, d =D + 1 - 3 =D - 2,
then D = d + 2. But it is probable that plankton patchiness is anisotropic, in which case we
should find different fractal dimensions in the different directions of space, as it has been
observed in meteorology.
Fig. 8. Determining the fractal dimension of a cloud of
points. Clouds of points of increasing sizes (scattered in
3-dimensional space) are marked by dashed lines. a: Spheres
of increasing radii r, centered on anyone of the points, are
intercepting a smaller and smaller density of points, for they
include larger and larger empty areas. h: If d is the fractal
dimension, the number of intercepted points is n ocr d ; since
the volume of the sphere is V oc r 3, then the density of
points inside a sphere is nIV oc rd. 3. That results in a line
of slope (d - 3) in a log-log graph, so that d can be directly
estimated from the slope of that line.

",.,,, \ ~
, ... ~ .... '. I
':-;:t '
, '-, f •• , ~ /
\':~, \:~ ~/
~ ...
--- /

C:I>
Cl
Q
o

....
,,-
I
'"""'_ ....
- \
I /~i' I:~"
I ":1 \;. \
\
I I ,,:,. , ..... I I log r
I
\ I ,:F' ~:,'I'/ "....... I
I
\\ \ ..•,ti'
:.1 . I,.!,,",.~ '..' I
1,/ ..•.. t~ .\\ I
\\. .." 1''''''
...-/,/ I
'..... ... ...._;.",...,,"
a ----., .... b
351

4. FRACTAL DIMENSION OF THE MOVEMENTS OF ORGANISMS

The geometry of plankton patches, as described above, is strongly linked to


hydrodynamic turbulence, which contributes to the renewal of water close to the organisms.
Planktonic organisms are largely passive, and they are the prey of larger and more mobile
predators. These predators have to travel, searching for prey swarms; they exploit them,
sometimes exhausting them, before they go searching for other patches. In that process, they
consume a part of the energy assimilated, and an optimization problem arises about these trips,
which have to insure the best probability of encountering preys with the least energetic
expenditure.

As a matter of fact, it can be observed that the behaviour of predators is complex and
stratified (hierarchical). As soon as a prey species is located, the very broad exploration of the
hunting area is replaced by a more specific behaviour within a smaller spatial range. That is a
response to the patchiness of the prey population because, by definition of an aggregated
distribution, the probability of a prey item existing at a point is enhanced by the presence of other
prey in the vicinity, and conversely. Hence the predator alternatively displays a scanning
behaviour, including straight travels from one patch to another, then a more Brownian motion
inside a patch, probably within a hierarchical pattern due to the hierarchical distribution of patches.
It can be conceived (but has yet to be proved) that such a "cascade" or intermittency of behaviours
occurs in conformity with the fractal pattern of the prey distribution. The predator trajectory
resembles a Brownian motion which would be divided hierarchically, following a cascade of
levels. The movement cannot be a perfectly Brownian one in the detail because, in animal
trajectories, the direction at one instant is positively correlated with that of the previous moment,
since sharp changes of direction are costful. Mandelbrot developed a "fractional Brownian"
model of a trip; Figure 9b presents an example of such a motion, with a fractal dimension of 1.11.
A true Brownian motion has a dimension of 2 (Fig. 9a), that is, each point of the plane is likely to
be occupied once by the travelling molecule, which is not the case for the hunting predator.

Another pattern investigated by Mandelbrot to describe the distribution of galaxies (if


not to explain it) characterizes the behaviour of an inhabitant of the sky. The model is called
"Rayleigh flight"; an angel is assumed to have traveled by steps, changing direction after each step
following a uniform distribution of the directions, while the distribution of lengths of the straight
segments between two stopping points is hyperbolic: Pr (u <U) = U -d. The pattern of points so
352

\B
,, \
\

,magnification
\
\
\
\
,
\ B'

, \

,magnification\
,, \
'.
\

Fig. 9. a: Classical Brownian motion, d = 2. Dots are the successive positions of the particle.
Segments are interpolated trajectories between two dots • .b.: Fractional Brownian motion of
dimension d = 1.11. From Mandelbrot (1982), with permission.

obtained varies according to the value of d , which is a fractal dimension, and a whole range of
values of d give a plausible representation of the natural patterns (Fig. 10).

It would be interesting to analyze the trails followed by animals, from that fractal point
of view, as well as the confonnity of that fractal line to the fractal pattern of distribution of the
prey in space; I believe it may reflect a fundamental process in the management of biomass and
energy in ecosystems. It is now admitted in ecology that displacements of matter have a
fundamental importance in ecosystems; considering aquatic ecosystems, water movements and the
passive or active movements of biomass have complementary effects. Primary productivity is
enhanced at the level of an interface, or "ergocline" (Legendre and Demers 1985), for turbulence
353

puts phytoplankton cells in contact with nutrients and with light. Furthennore, animals consume
that primary production, and they are consumed in turn following the trophic chain, which is
usually associated with a set of migration behaviours. The larger organisms eat the smaller ones
while at the same time they undertake longer migrations, so that an overall migration of the
biomass occurs, from the zones with high primary productivity to the zones oflow productivity,

d =1.0

d =1.5
Fig. 10. Rayleigh flights of fractal dimension 1.0 and 1.5. From Mandelbrot (1977), with
permission.
354

turbulence:
.nterpene-
~tration of
(J ~(!) water
masses

:It ;C)~v~
~ C))(9 I

~fJ :I
I
I
I
I
I

l
increase of phytoplankton

biomass and production


in the vicinity of the front

----------~-----

;'
;' "
"" ..,,----
,/'" .....

I '"

.' '., .... ',. ','

.' ' ... '

Fig. 11. Diagram of an hydrological front, associated primary production, and exploitation by
trophic chains. Production is increased at the interfaces between warm and cold waters
(ergoclines). Contact surfaces are enhanced by turbulence, that possesses a fractal geometry
enhancing the complementarity of the water masses. Primary production is exploited by animal
biomass through a fractal" cascade" of sizes of organisms and of trajectories. Full lines are
interfaces between the two water masses. Dashed lines represent biomass transport; the
trajectories are also fractal curves, as in Fig. 9b and Fig. 10.
355

as schematized in Figure 11. This is another aspect of the fractal organization of ecosystems: as
the size distribution of organisms is a fractal, the set of movements (at all scales of magnitude)
probably is another one, which is linked to the distribution of sizes. Both are aspects of a strategy
of space-and-time occupation. We can hypothesize that this strategy tends to optimize the flows
of matter and energy. Such an organization is no longer a physical fractal, for no physical form
(systematic or statistical) is measured here, but only the size of the spatio-temporal domain
involved in a trajectory. This is then an abstract case of fractal geometry. I will now look at even
more abstract fractal objects.

5. FRACTALS IN AN ABSTRACT REPRESENTATIONAL SPACE

These fractals aim at modelling processes; they can be graphed in a phase space. Or,
they can be useful in the abstract description of networks of interacting elements. Let me give
some ecological examples.

5.1 - Stram:e attractors

The dynamics of an ecological system often leads to a strange attractor, as shown by


May (1974, 1975, 1981), Goodman (1975), and Meyer (1980, 1981), among others. In a phase
space, representing for example the changes of state of a multi specific community, the trajectory,
corresponding to successive phases of evolution of the system, never passes exactly through the
same points. In many cases, the trajectory of community evolution passes through zones of high
curve density. In other words, certain regions of the phase space are much more frequently
visited than others; it is an intermediate state between an entirely random and an entirely
deterministic behaviour. Cutting the phase space by a plane, in order to "summarize" the
trajectory with only two degrees of freedom, a particular pattern of intersection points appears,
showing areas of high and low density. This set of points forms a "strange" object with fUzzy
limits, and shows a certain degree of symmetry and repeatability at various scales of observation.
An example (from Ekeland 1984) is given in Figure 12.

An interesting feature is that these strange attractors, that look like "forms in the fog",
are fractal objects. So, the mixture of order and chaos, or of determinism and indeterminism, that
characterizes this mixture of biological or ecological evolutionary solutions, has, as a matter of
fact, a fractal structure. That evidence will probably play an important role in future ecosystem
modelling, although the functional significance of the fractal structure of ecosystem evolution is
still unknown; perhaps it is again a fractal occupation of space-and-time.
356

Dr-______~------_r------_,------__,

b ".::':';
~-;.' .
,1.1
r"----=-..." ...

------X',:--------",··,-----------!,.,
D

,,~-;;.,:--------j;-,.•
,

c
Fig. 12. Strange attractor. From Ekeland
(1984), with permission.
a: Trajectories in the phase space. T is a
stationary attractor. T' is the location of a
strange attractor that produces denser
trajectories in certain regions of the phase
space.
!!: When intersected by a plane, it results in a
"strange" picture with crisp and fuzzy parts.
.\:: Magnifying a part of (b) results in another
"strange" picture at another scale of
observation. A mixture of fuzzy and crisp
parts is observed again. A part of (c),
magnified, reproduces the pattern seen in (b),
thus showing that the strange attractor has a
fractal geometry.
357

The theory of strange attractors has been developed for rather complex physical
systems, but these attractors may be obtained from a very simple set of equations. These
equations cannot be solved analytically, but only stepwise, using a computer. The example in
Figure 12 corresponds to a very simple system of equations, called "Henon's formula" (following
Ekeland 1984):
xn +1 = xn e cos a - (yn - Xn 2 ) e sin a
Yn+l = Xn esina+(Yn - Xn 2 )ecosa

After a large number of iterations, made possible thanks to the power of present-day
computers, a set of drawings appears in any plane that may intersect the set of rings of the
multidimensional trajectory. The picture becomes more and more distinct as the iterations are
pursued. After awhile, it can be observed that the cloud of points is a fractal: any part of the
cloud contains a miniature model of the whole. For the time being, its fractal dimension cannot be
computed analytically, but only observed. It accounts for a certain degree of occupation of the
phase space by the trajectories.

Ibanez and Etienne (submitted) applied a method due to Grassberger and Procaccia
(1983) in order to assign a fractal dimesion to a serie of 1200 chlorophyll observations along a
transect in the sea. For each observation x (t) , they considered the points with coordinates x (t),
x (t - 't) ,x (t - 2't), ... , x (t - k't) , 't being equal to an integer multiple of the sampling step, and
k varying from 2 to 9. So k is the Euclidean dimension of a phase space, in which the cloud of
points can be described by a fractal dimension if the chlorophyll record shows any stochastic
regularity. It was observed that the fractal dimension of the attractor increases up to 2 as k varies
from 2 to 6, after which it remains constant. According to Ibanez and Etienne, this means that six
degrees of freedom are sufficient to describe the fractal regularity (of fractal dimension 2) of this
sequence of 1200 observations. It remains to investigate what these 6 degrees of freedom are,
and to search for their ecological significance. It would be interesting to compare this result with
that of a Fourier spectral analysis, or other data treatments.

5.2 - Species diversity as fractal characteristic ora complex community

5.2.1 - The fractal dimension of an information system. This is not a problem of


geometry any more, not even in an abstract space, but a problem of graph theory, which is the
mathematical description of the links among elements.

Mandelbrot (cited in Landman and Russo, 1971) gave the following technical example.
Consider a number C of internal elements or "components" of a computer unit, and a number T
of connections with its environment of "terminals". Computer engineers have observed the
358

existence of an allometric relationship of the form T oc C 2!3, equivalent to T Itl oc C 1/3 , as


ifT represented a "surface" of the computer, and C a "volume". So, the quantities T Itl and
C 1/3 do represent some measure of the "linear size" of the computer. Landman and Russo
(1971) further demonstrated that the computer efficiency actually varies as a function of a
certain parameter d . Mandelbrot demonstrated that d is a fractal dimension, and is such that T oc
C (d - I) Id. This law seems to make explicit an optimal condition for the balance between the
number of internal and of external connexions. It is tempting to generalize it to other kinds of
complex information systems, including ecosystems, which are managing a great quantity of
information. Such a relationship could be related to other global properties of the system, such as
permanence, elasticity, etc.

5.2.2 - Diversity and evenness. The number of species of a natural community is


neither an extensive nor an intensive variable. An extensive variable is, for example, the biomass,
which is thought to be proportional to the surface (or volume) sampled. An intensive variable is,
for example, the salinity of water or its temperature, which is not doubling when the sample size
goes from 1 to 2 cubic decimetres. The number of species recorded increases, when the volume
or the number of individuals sampled increases, but not proportionally; its increase may be
logarithmic, for example. Then the "average density of species per cubic or per square meter", or
the "average number of species per individual", which can be calculated but have no meaning by
themselves, decrease when the space or the total number of individuals increase, just like a fractal
quantity, and just like material points scattered through space.

The species diversity of a community (or of a sample) is usually characterized by


numerical indices. The one presently most commonly used is the Shannon-Weaver index:

S
H = - L.fi log2.fi
;=1

where S is the number of species and .fi is the relative frequency of the i -th species. This
expression represents the average amount of information per individual, knowing that each
individual is bringing, when determined, a quantity of information that is larger when its
frequency is smaller. It is easy to show that the maximum value of H is obtained when all
species are equally frequent; when the distribution of individuals among species is uniform,
H max = log2S . The ratio J = HIH max is called the evenness. The diversity index can then be
written as (H/H max) • H max = J • 10gzS ; in other words, diversity is the product of its two
components, evenness and number of species (on a log scale). All that is very classical.
359

H is the mathematical expectation E(-log1i}, when calculated over the set of species
considered. The sample is described by its species frequency distribution, for convenience. We
can write H =E(-log1i} = log~, or A =2H , where A is the fictitious number of species
which would give the same diversity index, if these species were equifrequent. It follows that
the evenness J has the form of a fractal dimension, that is, a ratio of two logarithms: J =
log2A Ilog 2S ,or A = S J. The latter equation expresses the rate of increase of the
diversity-equivalent number A of equifrequent species when the real number of species S
increases, the evenness remaining the same. These considerations will take their full meaning in
the discussion of the theory of species distributions that follows.

5.2.3 - The distribution of individuals among species. One may wish to have species
diversity given by a synthetic description which would be a little more informative than a simple
one-number index. The distribution of individuals among species can be used as a synthetic
parameter; it can take the form of a histogram describing the proportion of frequent, less frequent,
and rare species, divided into a number of classes. Various well-known models have been
proposed in the literature to fit such distributions, as for example the log-normal distribution of
Preston, etc. (Pielou 1975; Legendre and Legendre 1983). Of course, the empirical distribution
can also be used, without being fitted to any model, as a mere synthetic description of species
distribution within a sample.

The distribution often cannot be represented by a histogram because the total number of
species is too small; then, the number of species belonging to each class of abundance is low, and
the histogram becomes very irregular and uninformative. The distribution can, in that case, be
represented as a function of ranks: it is the rank-frequency method (Frontier 1976, 1985).
Species are ordered in decreasing frequencies, as they appear in the community or in the sample.
Each species is represented by a point in a diagram, with rank on the abscissa, and frequency on
the ordinate. The scatter diagram, or "Rank-Frequency Diagram", is monotonically decreasing by
construction, but the shape of the decrease (either linear, or convex, or concave, with steps, etc.)
gives a lot of information about the distribution. This representation is exactly equivalent to a
retrocumulate frequency function, for it is equivalent to saying that the species with rank r has the
frequency f, ,or that for a fraction r / S of the species, their frequency is larger than or equal to
f, .

To make the curve easier to read, a logarithmic transformation is applied either to f,


alone, or to bothf, and r . The same overall shapes and the same characteristic deformations of
the curve can be found in all types of ecosystems and communities, either terrestrial or aquatic;
either for insects, zooplankton, phytoplankton, fish, etc. I gave a number of examples in a recent
review (Frontier 1985). Some of them are reproduced here in Figure 13.
100

100 100
a b

10 \ \\ \X"/\ 10
Intermedlat. 10
or atag. 31 W
W (!)
C!' W
w
(!)
C!' 4(
« (!)
C!'
4(
« ~
I- 4(
«
~
I- Z
z ~
I-
zZ W
w zZ
W
w 1 ()
(.) W
w
()
(.) a: ()
(.)
a: w
Q. a:
w c.. W
w
Q.
c.. Q.
c..
"\ (,)
(..)
0)
0>
0
0.1
0.1-/ \\ "- I\.. \ \\
0.1

0.01 ~
2 5 10 20 5 10 20 515 10
RANK RANK RANK

Fig. 13. Some examples of rank-frequency diagrams. The ranks of the species are on the
abscissa while their relative frequencies in the sample are on the ordinate, both on a log scale. a:
Marine benthos, along a pollution gradient (modified from Hily 1983). b: h: Lake phytoplankton
along a seasonal ecological succession (modified from Devaux 1980). !:: ~: Euphausids in an
East-West transect along the Pacific Equatorial Current (modified from Frontier 1985).
361

Besides ecology, these diagrams have been used in the past in several other fields, also
dealing with complex interaction systems. Characteristic distributions have been described and
analyzed in socio-economics (Pareto 1896, 1965) and in linguistics (Zipf 1949-1965). Observed
frequency distributions have been fitted to a family of curves, given by the Zipf model, which was
not very well-known to ecologists until recently:
!,=!ler-Y
Later, the Mandelbrot model, which is a generalization of the Zipf model, was used for the same
purpose:
!, =!o (r +~)-Y
were ~ and y are parameters, and!o is chosen such that the sum of all!, values predicted by the
model is 1; the!, values are relative frequencies. Convergence is possible only when y>1.

a b

log r

Fig. 14. Mandelbrot models adjusted to rank-frequency diagrams. a: Curves!, =!o (r + ~) - Y


with ~ positive or negative, all with the same asymptotic slope -y. The fractal dimension is Vy.
.b: Behaviour of the model during ecological succession (Frontier 1985). (1) An opportunistic
community; (2) evolution of this community until it reaches a high diversity level; (3) a mature
community with a moderate and stable diversity. The slope -yis assumed here to be constant
through the succession, although it may vary in real cases.
362

Drawn on a log-log scale, the curve whose equation is


log!, = 10g!O - 'Y. log(r + J3)
is asymptotic to a line of slope - 'Y , and it departs from this asymptote in the left part of the
diagram, downwards when J3 > 0 and upwards when -1 < J3 < 0; for analytical reasons, J3 cannot
be less than -1. Mandelbrot did not consider negative values of J3 in his theory. When J3 = 0 (the
Zipf model), the curve is identical to the asymptote (Fig. 14).

The first interpretation of these curves, made by Mandelbrot (1953), refers to the notion
of a "cost" of an element in an information system, in the framework of information theory. It
does not specify the nature of this cost, nor does it give it a precise value, for the rank-frequency
distribution is very robust in that respect. The distribution of the frequency of words in a
language may respond to a psycho-physiological cost, or perhaps to a sociological one linked with
the amount of time required to assimilate a new notion. Without specifying this cost, a good fit of
the model to data can be observed in the case of real languages, but not in artificial languages such
as Esperanto, nor in the language of young children. Analyzing the diversification of signals in a
code, Mandelbrot demonstrated that the above equation corresponds to an optimum in information
transfer, namely: the costlier signals also have to be the rarest (obviously without disappearing
completely), and the maximum efficiency occurs for a particular distribution of frequencies, which
has precisely the form given above, with parameters J3 and 'Y which, then, have a meaning.

In ecology, the "cost of a species" is linked with the amount of assimilated energy that it
requires; for example, it is more costly in terms of energy for an ecosystem to produce and
maintain a carnivore than a primary producer, because of the loss of energy at each trophic level.
The "cost of a species" can also be related to other kinds of expenditures, expressed in terms of
accumulated information. A specialized species, for instance, has to wait for some particular
conditions to be present, or for the state of the ecosystem that allows it to appear. This introduces
a historical aspect in ecosystem theory, and leads to thinking of this "cost" in terms of required
past history.

The rank-frequency diagrams and the Mandelbrot models associated with them do not
provide proofs for these philosophical considerations. It is nevertheless very exciting to explore
the properties of the model, and to investigate possible ways of generating such distributions. Let
us come back to fractals for a moment, since Mandelbrot has recently specified a way of
generating this kind of distribution. He expressed this in the context of the analysis of a
"lexicographic tree", so-called because once again it initially dealt with languages, but it can easily
be translated into ecological terms.
363

"
/1\ .1\/\ /\
: \: \ \' '.

Fig. 15. A lexicographic tree, following Mandelbrot (1982). The ai" hj and ck are previous
conditions required by species S 1 , S 2 , S 3 •.. to appear. See text.

Let us suppose that the occurrence of a species depends on the previous realization of a
number of conditions in its physical, chemical and biotic environment. The nature of these
conditions is not specified; one condition can even be the previous appearance of some other
species in the community. Let a i ,bj ,ck' ... designate these previous conditions that are
required by species Sr' The probability of this species is:
Pr(Sr) = Pr(ai ) • Pr(bj ) • Pr(ck) •...• Pr(Sr I ai ,bj ,ck ' ... )
if these conditions are independent from one another. The sequence of events can be as follows
(Fig. 15):
- An ubiquitous species S 1 appears as soon as a restricted number of conditions are
realized; let us represent this ftrst set by a single condition aI' so that
Pr(S 1) oc Pr(a 1)
- If the second species requires conditions a 2 and b 1 ' then
Pr(S 2) oc Pr(a 2 ) • Pr(b 1) < Pr(a 1)
since all the probabilities are assumed to be small and of the same order of magnitude.
- For the third species to be allowed to occur, let us suppose that the conditions are a 2'

Pr(S 3) oc Pr(a 2) • Pr(b 2 ) • Pr(c 1) < Pr(a 2 ) • Pr(b 1)


and so on. The sequence is theoretically infmite: whatever the number of species having appeared
364

at any given time, it is always possible to expect one more to appear in the future. The only
condition that was stated, in Mandelbrot's demonstration, is that the probabilities for the
occurrence of the "previous conditions" be small, compared to the probability for the species to
occur when the previous conditions are met. With these very broad conditions, the probability of
a species Sr is a function of its rank r in the frequency distribution, of the form

Pr(Sr) = Po (r + /3Y'Y

where PO' /3 and y are the same parameters as above. In the course of Mandelbrot's
demonstration, it appears that the parameters /3 and y have a functional importance. Directly
transposing his words to ecology, /3 is linked with the diversity of the environment, that is, with
the average number of modalities of type ai ' or bj , or ck ' etc. at each level. On the other hand,
1Iy is linked with the predictability of the community, that is, the probability of a species to
appear when the conditions that it requires have been met. This is of great interest, because
environmental diversity and predictability of the organic assemblage are two important elements
determining the composition of a community, as is well-known in ecosystem theory.

Finally, it can be stated that 1Iyis afractal dimension « 1): it is the dimension of a
fractal representing the set of species abundances as forecasted by the model; in other words, it is
the fractal dimension of the "species distribution", or distribution of the individuals among
species, studied as to its diversity. Diversity is then a fractal property of the biomass. The
demonstration that 1Iyis a fractal dimension rests on Cantor sets (Mandelbrot 1977,1982). On
the other hand, it has been shown (Frontier 1985) that 1Iy is strongly and almost linearly
correlated with the evenness measure J =HIH max; this supports the idea of using the latter as a
fractal dimension. The equation A =S J of section 5.2.2 then becomes A '" S 1/ 'Y •

The ecological interest of such a simplistic model is questionable, because any


ecosystem is a dynamic system, that includes a lot of interactions and many feedback controls,
whereas a lexicographic tree seems to be a rather static structure. Better said, it describes a
language in a static fashion, founded upon the probability of a word appearing immediately after a
given series of other words. But a language is also a dynamic system, resulting in semantic
significance. In any case, an optimal frequency distribution of its elements does emerge, as is
clear also in other kinds of complex information systems such as socio-economic ones (Pareto
1896, 1965). Such a frequency distribution, in ecology, is generally summarized by a
one-number diversity index or, better, by a rank-frequency diagram. The evenness of the latter at
given observation scales indicates that the persistence of an ecosystem, which is a complex
feedback system, is only possible for a given statistical distribution of frequent and rare elements.
This distribution represents a realization of its diversity, which is stationary under the actual
constraints. Stating that the same shapes of frequency distributions are observed in many
365

different kinds of systems may signify that they are describing optimal conditions of general
information and dynamic systems, whatever the physical support of the information is.

In community samples observed in real ecosystems, a great variety of shapes have been
found in rank-frequency diagrams, according to the degree of complexity of the community, its
stage of evolution, its stress, the observation scale, etc. Few are found to conform exactly with a
Mandelbrot model, at least at the level of the single sample. The sampling process introduces a
statistical irregularity, of which we get an idea by superposing a number of curves describing
individual samples from the same community. The width of the bundle of curves so obtained
indicates something about the random variability. In Figure 16 for example, the population
consists of young fish of various species, coexisting in a littoral nursery sampled at various times
during a year. Superposing two sets of curves coming from two different years is an approximate
statistical test showing, in this case, that no significant difference exists between the two sets.

100

w
oc(
I-
Z
W
o
a::
w
Il.

0.1

Fig. 16. Superposing various rank-frequency


diagrams, representing samples taken from
the same environment, gives an idea of the
natural variability. Here are five samples of
5 10 20 30 young fish in a multispecies nursery. From
RANK Safran (in press).
366

In this example, no Mandelbrot model can easily be fitted, due to the fact that the curves
do not show much evidence of an asymptotic behaviour, so that the slope - 1 cannot be estimated
precisely. It seems justified to fit a Mandelbrot model only in cases where the existence of an
asymptotic line is supported by the graph. In most cases, such a model will be found by
cumulating a number of samples over an ecologically homogeneous area and/or time span. As a
matter of fact, at too small a scale, the patchiness of the spatial distributions of the various species
is biasing the overall species distribution, for in a very limited site, a small number of species are
dominant, while at some other site, other species may dominate. It follows that at a given site,
and consequently also in a sample, we often observe a concave or a convex rank-frequency
diagram, the ordering of the species varying from sample to sample. Summing the number of
individuals sampled, species by species, over a set of samples, results in a curve more extended
towards the right; then a Mandelbrot-like distribution is found, as in curve b of Figure 17, that has
1= 3.54 and ~ '" 12. On the contrary, summing the numbers of individuals rank by rank,

100

-
Q)
C)
ta
C
Q)
o
~
Q) Fig. 17. Two ways of summing
Q.. frequencies to get a "mean"
rank-frequency diagram. a: Summing
by ranks, i.e., total of individuals of
species of rank 1, whatever the
0.1
species name is in the various
samples; then, total of individuals of
species of rank 2; etc. That produces
an average of the individual sample
curves, without increasing the
number of species. h: Summing
species by species; species are ranked
after summing their abundances over
the set of samples. This increases the
5 10 50 total number of species, so that the
shape of the rank-frequency diagram
Rank is different From Safran (in press).
367

independently from the actual species names, provides an "avemge" curve (Fig. 17, curve a) that
passes through the center of the bundle of sample curves, and cannot be fitted to a Mandelbrot
model.

CONCLUSION

I have presented in this paper many more working hypothesis and questions, than
results. Up to now, fractal geometry has been applied very little to ecological problems;
nevertheless, it seems to offer perspectives that are not trivial. Our short exploration through
forms, spatial distributions, movements of organisms, size distributions, strange attractors,
species diversity and species distributions indicates that fractals properties go far beyond
morphological analysis, that calls only upon fractals in physical space. We have to rephrase the
discussion in terms of the dynamics of the interactions of a system, made by a biomass divided
into various populations, size classes, trophic levels, and so on, with its physical environment.
These interactions imply a fractal geometry of surfaces and of sets of contact points. An
ecosystem could not exist if it were made only of lines (D=l), surfaces (D=2) and volumes (D=3),
as engines are made because we made them, and as the Greek philosophers tried to describe the
world. Interactions imply a "fractal" kind of complexity in time and in space. In that sense,
fractal geometry provides a new tool, and a new paradigm, for analyzing that mixture of order and
chaos that classical science had up to now generally avoided, but that numerical ecology can now
grasp.

I am grateful to B. Mandelbrot for useful suggestions, and to S. Ferson and P.


Legendre for discussions and editorial work.

APPENDIX: ELEMENTS OF FRACTAL THEORY

Fractal theory has been introduced by Mandelbrot, first in a book in French in 1975,
"Les objets fractals: forme, chance et dimension" (Flarnmarion, Paris), then in English, "Fractals.
Form, chance, and dimension" (Freeman and Co., San Francisco, 1977), with a second edition
in 1982 entitled "The/ractal geometry o/nature". The fundamentals of fractal theory are brought
together in these books, which summarize the papers of the author and of others on the subject.

What is a fractal? Initially, the term designates a geometrical object with a non-integer
dimension. Such an expression may be astonishing, for we usually describe real and conceptual
spaces in terms of points (dimension = 0), lines and curves (dimension = I), surfaces (dimension
368

=2) and volumes (dimension = 3). Furthennore, multivariate analysis and phase space analysis
have accustomed us to speak about Euclidean spaces with 4, 5, ... N dimensions, N being always
an integer.

Fractal geometry then allows one to describe conceptual or concrete objects that realize
"a certain degree" of occupation of a bi- or tri-dimensional Euclidean space, somewhere between a
curve and a surface, or between a surface and a volume. The "fractal dimension" has to be
considered as a measure of that degree of occupation, following a mathematical rule that identifies
the properties of the index with those of a "dimension" in the usual sense. An integer dimension
turns out to be a particular case of a generalized fractional dimension. This mathematical theory
had already been developed by previous mathematicians such as Hausdorff (1919) and
Besicovitch and Ursell (1937). Mandelbrot used and deepened these previous theories in order to
make them applicable to the description of the real world, and this attempt was extraordinarily
fruitful since it allowed one to describe the various states of fragmenting and branching out of
living and unliving matter.

1 - Fractals in geometric space. As a first example, let us construct a "fractal line". We


need two initial concepts: one initiating element, for instance a straight segment of length 1 called
initiator; and a generator, which is a rule for progressively transforming the segment into the
final fractal pattern. The rule consists in a particular and simple transfonnation, indefinitely
repeated. Let me describe here, as example, the construction of the so-called "Koch triadic curve"
(Fig. 18). The initial segment of length 1 is divided into three segments of equal lengths 1/3; the
middle one is removed and replaced by two segments of length 1/3, following the shape of an
equilateral triangle. Then, the same process is repeated separately on the four segments of length
1/3 previously obtained, resulting in twelve segments of length 1/9, with the shape of four
equilateral triangles. The twelve latter segments undergo the same transfonnation, and so on,
indefinitely repeated from scale to scale, finally resulting in an infmitely indented curve. That final
curve, of course, cannot be drawn; only the successive stages of its generation can. The total
length of the fractal line is infinite: indeed, at each step of the generating process, the previous
length is multiplied by 4/3. Despite that infinite length, the curve is clearly bounded inside a finite
part of the plane. Being of length infinity and of surface area zero, the "fractal line" lies
somewhere between a fmite line and a finite area, since it realizes a certain degree of occupation of
a finite area by an infinitely contorted line. At first sight, the final line may look like it has a
"thickness" but, when blown up, that thickness resolves into a more detailed curve, and so on,
indefinitely.

How can we talk about a "dimension"? Let us remember the usual meaning of a
dimension 1, 2 or 3 of a geometric object. Dividing a segment of length 1 metre into N equal
369

Initiator

Generator
2 _ _ _ _ _---'

5
etc.

Fig. 18. The Koch triadic curve, d =log 4/ log 3 =1.2619 .


segments of length (1IN) metres, we can construct on this segment either a square of 1 m2
containing N 2 squares of (1IN 2) m2 (Fig. 19a), or a cube of 1 m3 containing N 3 cubes of
(l/N3) m3 each (Fig. 19b). The respective dimensions of segment, square and cube are 1,2 and
3. Each smaller element contained in the initial one is similar to the latter, and we call that
self-similarity. The fragmenting process can be indefinitely repeated, with the principle of
self-similarity respected between successive steps. Generally speaking, when one element is
partitioned into k self-similar ones, whose linear size is N times smaller, then the dimension d is
such that

or d = lQd
10gN
370

Fig. 19. .a: Square, d =log 4/log 2 =2.0. 11: Cube, d =log S/Iog 2 =3.0.
In the case of the Koch curve, self-similarity is obvious for, after indefinite generation
of the fonD, any part is a miniature model of the whole. Each element contains 4 elements 3 times
smaller, so that the dimension is:
d = }Q&A = 1.2619
log 3
This is a fractional, or "fractal", dimension. Another example of a fractal line is a "tree" (Fig. 20),
whose ecological significance is described in the main part of this paper. Starting again with a
straight segment of length 1, two branches are added, branching out from the middle point of the

". ..... ------- .... ---,..


.... .... ,
,--,
,/
"",
"."
.....
.....
-- .............
I "I " X , ,
I I \
11/'''', \
, , I ' J I
I \ \ J I
,,
,''''--'''' -
\
"/ , ...
--
\ \
\
,'.....
,, ,/
/
' ....
......
... ".

'......... ........ _ - - -*"'* ..,,"


".

Fig. 20. .a: Geometric fractal tree, d = log 3/log 2 = 1.585. The partial trees (or" branches" , that
are miniature models of the tree) are surrounded by dashed lines, and are self-similar.
11: Statistical fractal tree, d '" 1.6; partial trees are statistically self-similar.
371

a b

6
etc.

Fig. 21. Cantor dusts. a: On a line, d = log 2 / log 3 = 0.631. h: In the plane, d = log 4 / log 7
= 0.712.

previous segment, giving three branches, each of length 1/2, plus one stem. Then each of the
three branches is submitted to the same generator, each one giving three sub-branches of length
1/4, and so on. At each step of the generation, each of the terminal segments is replaced by one
trunk and three branches, so that its total length is multiplied by 2. Since the remaining part of the
tree remains the same, the total length of the ramified object tends to infmity. The final object is
self-similar because, after indefinitely branching out, any branch or sub-branch is a miniature
model of the whole "tree". The fractal dimension is found by considering that, at each step, one
tree bears 3 sub-trees whose linear size is twice smaller, hence
d = k!U = 1.5850
log 2
That represents a higher degree of occupancy of a portion of the plane, by a fractal curve, than in
the case of the Koch curve. For other examples, refer to the books of Mandelbrot, that provide a
wide variety of fractal patterns, with dimensions between I and 2.

A fractal dimension less than I can be obtained with a generator rather close to that of
the Koch curve. The middle segment of the three is removed, at each step, without being replaced
(Fig. 21a). At the limit, there remains an infinite set of points (or "Cantor dust") showing couples
of points, couples of couples, and so on. At each step of the generating process, any segment is
replaced by 2 segments 3 times smaller, so that the fractal dimension is
d =.lQU = 0.6309
log 3
The fractal picture represents a rather low degree of occupancy of a line by an infinite set of
points. The total length of the set of points is obviously zero.

In the plane, a Cantor dust can be built in two dimensions, for example (Fig. 2Ib), by
372

constructing groups of 4 squares, each one containing 4 squares 7 times smaller (in linear size).
The fmal picture is self-similar with dimension
d = J.QU = 0.7124
log 7
The total length and area are, of course, zero. If we had 4 squares 4 times smaller instead, d
would be equal to log 4/ log 4 = 1, although the object is not a line. This shows that a fractal
dimension can happen to be an integer.

Conversely, a geometric fractal with a dimension between 2 and 3 can easily be


constructed, either by fragmenting a square of surface area 1 and making it more rugged,
following a generative process inside the three-dimensional space; or by exhausting more and
more parts from an initial cube, following a repeated pattern of excavation of the parts left solid by
a previous step. It results in a kind of regular "sponge" that represents a degree of occupancy of
the three-dimensional space by a "fractal surface". Many more examples are found in
Mandelbrot's books.

A bounded (fmite) object of integer dimension d has a measure of zero with respect to a
higher dimension, infinite with respect to a smaller one, and it has a finite measure only in its own
dimension d. For example, an area has a volume 0, a length 00 (that is, the length of a line filling
the whole area), and it has a finite measure in square metres only.

For a fractal object of fractional dimension d , its measure is 0 in any dimension larger
than d , and 00 in any dimension smaller than d ; it is a finite number only in the fractal dimension
d . A Koch curve built up starting from a 1 metre segment has an infmite length (in metres), and a
om2 or 0 m3 area or volume. The following table illustrates that rule, for d varying from 0.5 to 3:

Dimension of measure 0.5 1 1.26 2 2.71 3


Dimension of the object

0.5 (fractal dust) fmite 0 0 0 0 0


I (curve) 00 finite 0 0 0 0
1.26 (fractal line =tree) 00 00 fmite 0 0 0
2 (surface) 00 00 00 finite 0 0
2.71 (fractal surface =sponge) 00 00 00 00 finite 0
3 (volume) 00 00 00 00 00 fmite
373

. ...
,.
I

11'
a . .
,,

..
. I
11
,
.' 3rd blow-up
------;;> .. .. - ............. _---
\
...
\ ... 2nd blow-up \
\ \

" \
\
--- --- ---
---
\
\
\
\
\
\
1s1 blow-up \
- - - --- ---;'>-
--- ... --- .. _--\ ...

.-
",
,,
1.1-1= lOOmm

,
\
\
\

Fig. 22. Fractal dimension of a rocky shoreline. B: Statistical self-similarity• .11: Computation of
the fractal dimension. If one segment of length l.~ 1 is replaced by 20 segments of length 1.I =
(1/ 10, then d =log 20 / log 10 =1.301 .
2 - Statistical fractals. Another way of constructing fractals consists of adding a random
element to the generator. Hence, from one step to the next, only the statistical or stochastic
characteristics of the fragmenting process are maintained. The object shows a much greater
resemblance to a natural, physical object. For example, a rocky coastline (Fig. 22a) can be
374

described by considering that the roughness has the same statistical characteristics at all
observation scales. An approximate description of the coastline is given by a broken line made of
equal segments of length P(Fig. 22b). In order to estimate the length of that coast as it appears at
that observation scale, we add the lengths of all the segments necessary to cover the whole coast.
When we want to detail the coastline, replacing each straight segment by a rugged line, the coast
appears longer. Choosing a unit segment N times smaller than the previous one, one has to
insert more than N small segments, because of the contorted shape of the coast; that is true at
every observation scale. At each step of the decreasing scale, if we assume for example that one
segment has to be replaced, on the average, by 20 segments 10 times smaller, then the fractal
dimension of the coastline is estimated as
d = ~ = 1.3010
log 10
The final length is obviously infinite, as more details are taken into account at each step. So, the
usual concept of the "length of a coastline" is a non-concept, because the real length is always
infinite. The length of a coast, as measured from a map, is arbitrary and depends on the
cartographic scale; it can be indefinitely enlarged, as more and more details of the coastline are
taken into account.

Similarily, a fractal tree can be built up "statistically" by adding a random element at


each stage of the branching out, namely by adding a statistical variability in the size and/or the
number of branches. This results in a pattern like Figure 20b, resembling more a real tree than the
geometric tree of Figure 20a does.

Physical phenomena often evoke a fractal generating process with a random component,
so that a fractal dimension can often be assigned to them. A classical example is the Brownian
motion. When observing at time intervals the displacement of a particle on a plane, we see the
movement as a broken line; observing the same movement at intermediate times, each of the
straight line segments previously seen is replaced by a finer broken line, whose length is greater
(Fig. 9a). The trajectory clearly appears as a fractal line; it can be calculated that its fractal
dimension is 2, that is, an integer, meaning that the particle is equally likely to be found at any
point of the plane.

Another complex physical phenomenon corresponding to a statistical fractal is


turbulence, well-studied by Mandelbrot. Turbulence in a water body is made of eddies, that
resolve in smaller and smaller self-similar eddies. The geometry of that mixing process has a
fractal dimension of approximately 2.6, representing a rather high degree of occupation of the
space by an infinitely contorted contact surface. This process is central to limnology and
oceanography, as discussed in the main part of this paper.
375

m km
6 3

5 2

_+-____--r3_ _ _-T"2_ _ _- rl _ _--,O..---_ _--r-....... :ml IOQ i


o 2 3

i
Fig. 23. Length L of a boundary as a function of the length of the" yard stick" used to
measure it. The slope of the line in a log-log graph is a = -0.2, so that the fractal dimension is d
= 1 - a = 1.2. The conversion of measurement from km1.2 to m1.2 is done as follows: X(m1.2) =
10001.2 • Y(km1.2), or 3981.07 • Y. For example, 18838 m1.2 = 4.732 km1.2.

A Cantor dust can also be randomized, as seen in Figure 7. As such it could model
either the dispersion of galaxies in the sky, or of plankton in the sea.

With many real fractals, there is no geometric generator that would allow to calculate a
fractal dimension through self-similarity considerations, since they have a statistical component.
In that case, the fractal dimension has to be inferred through observing the increase of (for
example) the length of a line between two points, as the unit of measure decreases. The greater
the fractal dimension of a coastline -- that is, the more pronounced its roughness -- the faster the
measured length will increase when the unit segments used to cover the curve decrease in length.
Precisely, if the length of the unit segment is! and the number of segments covering the fractal
line is N , then the length measured at that step is L =N.£. Choosing another unit segment, of
length .l!k , the number of segments gets multiplied by k d, so that the new length is Nk d •
( ilk) = L ·k d -1; now, .£ being inversely proportional to k, L is proportional to fd-l. Then,
putting Land 1- on a graph with log-log scale, we obtain a straight line of slope (d -1), from
which the unknown fractal dimension can immediately be inferred. For example in Figure 23, a
376

......

Fig. 24. Fractal dimension of the


geometric tree in Figure 20a, with
a residue corresponding to the
"trunk". The asymptotic slope is
o 1 • d = ·0.585, so that d =
-2 -1 o 1.585, as previously shown by
logi the self.similarity rule.

slope of -0.2 is observed, hence the fractal dimension is 1.2. A fractal measure of the line has to
be expressed in m1.2 ("metres to the 1.2"), or km1.2 , or cm1.2 ... Since 1 km = 1000 m, the
measure in m1. 2 is equal to 10001.2 = 3981 times the measure in km1.2 .

In the "fractal tree" (with or without a random component), at each step a given number
of self-similar smaller trees appear, plus a stem (or fractal "residue"), that increases the total
length. For that reason, the length measured at any step increases more rapidly when the
branching-out goes on, than predicted by the mere self-similarity rule, as seen in Figure 24. The
curve is asymptotic to a straight line of slope (l-d ), giving again the fractal dimension d .

For a Cantor dust, an estimation of the fractal dimension can be made from the decrease
of the density of points inside spheres of increasing diameters, as explained in section 3 above and
in Figure 8. The slope of the line describing the decrease in log-log scale gives, here again, the
dimension of the fractal object.

3 - Fractals in abstract representational space. Finally, fractals can be conceptual rather


than geometrical in nature, resulting in abstract structures in which typical fractal properties and
behaviour are again recognizable. They lead, for example, to the concept of the fractal dimension
of a classification or of a flow-diagram. I give two examples of abstract fractals in the main part
of this paper, namely:
(a) Strange attractors (Section 5.1 and Fig. 12). In a climatological problem for
instance, Nicolis and Nicolis (1984) demonstrated the presence of an attractor of dimension 3.1 in
a phase space of more than 4 variables. The fractal dimension of the figure given by the
intersection of the trajectory by a plane can be discovered by the same method as that of a set of
points scattered in the physical space: start from a point, include points into circles of increasing
377

diameter, and finally observe the decrease of the mean density of points per unit volume.
(b) Lexicographic trees (Fig. 15), used by Mande1brot for linguistic analysis, may also
be applied to ecology, as shown in Section 5.2.

REFERENCES

Adams, G.F., and C.H. Oliver. 1977. Yield properties and structure of boreal percid
communities in Ontario. J. Fish. Res. Bd. Canada 34: 1613-1625.
Besicovitch, A.S., and H.D. Ursell. 1937. Sets of fractional dimensions (V): On dimensional
numbers of some continuous curves. J. London Math. Soc. 12: 18-25.
Bradbury, RH., and RE. Reichelt. 1983. Fractal dimension of a coral reef at ecological scales.
Mar. Ecol. Progr. Ser. 10: 169-171.
Bradbury, RH., RE. Reichelt, and D.G. Green. 1984. Fractals in ecology: methods and
interpretation. Mar. Ecol. Progr. Ser. 14: 295-296.
Burrough, P.A. 1981. Fractal dimensions of landscapes and other environmental data. Nature
(Lond.) 294: 240-242.
Burrough, P.A. 1983. Multiscale sources of spatial variation in soil. I. The application of fractal
concepts to nested levels of soil variation. J. Soil Science 34: 577-597.
Devaux, J. 1980. Structure des populations phytoplanctoniques dans trois lacs du Massif Central:
successions eco10giques et diversite. Acta CEcol./Oecol. Gener. 1: 11-26.
Eke1and, I. 1984. Le calcul, l'imprevu. Seuil, Paris. 170 p.
Fournier d'Albe, E.E. 1907. Two new worlds: I The infra world; II The supra world. Longmans
Green, London.
Frechette, M. 1984. Interactions pelago-benthiques et flux d'energie dans une population de
moules bleues, Mytilus edulis L., de l'estuaire du Saint-Laurent. These de Ph.D.,
Universite Laval, Quebec. viii + 172 p.
Frontier, S. 1976. Utilisation des diagrammes rang-frequence dans l'analyse des ecosystemes. J.
Rech. oceanogr. 1: 35-48.
Frontier, S. 1978. Interfaces entre deux ecosystemes. Exemples dans Ie domaine pelagique.
Ann. Inst. oceanogr., Paris 54: 96-106.
Frontier, S. 1985. Diversity and structure in aquatic ecosystems. Oceanogr. mar. BioI. ann.
Rev. 23: 253-312.
Goodman, D. 1975. The theory of diversity-stability relationship in ecology, Quart. Rev. BioI.
50: 237-266.
Grassberger, P., and I. Procaccia. 1983. Characterization of strange attractors. Phys. Rev. Lett.
50: 346-349.
Hausdorff, F. 1919. Dimension und iiuBeres Mass. Mathematische Annalen 79: 157-179.
Hily, C. 1983. Modifications de la structure ecologique d'un peuplement a Mellina palmata. Ann.
Inst. oceanogr. Paris 59: 37-56.
Huchinson, G.E. 1957. A treatise on limnology. Wiley and Sons, New York.
Ibanez, F., and M. Etienne. The fractal dimension of a chlorophyll record. (Submitted).
Kent, C., and J. Wong. 1982. An index of littoral zone complexity and its measurement. Can.
J. Fish. Aquat. Sci. 39: 847-853.
Landman, B.S., and RL. Russo. 1971. On a pin versus block relationship for partition of logic
graphs. I.E.E.E. Tr. on Computers 20: 1469-1479.
Legendre, L. 1981. Hydrodynamic control of marine phytoplankton production. In J. Nihou1
[ed.] Ecohydrodynamics. Elsevier Scient. Publ. Co., Amsterdam.
Legendre, L., and S. Demers. 1984. Towards dynamic biological oceanography and limnology.
Can. J. Fish. Aquat. Sci. 41: 2-9.
Legendre, L., and S. Demers. 1985. Auxiliary energy, ergoc1ines and aquatic biological
production. Naturaliste can. (Rev. Ecol. Syst.) 112: 5-14.
Legendre, L., and P. Legendre. 1983. Numerical ecology. Developments in Environmental
Modelling, 3. Elsevier Scient. Publ. Co., Amsterdam. xvi + 419 p.
378

Mandelbrot, B. 1953. Contribution ala theorie mathematique des jeux de communication. These
de Doctorat d'Etat, Univ. Paris. Pub!. Inst. Stat. Univ. Paris 2: 1-121.
Mandelbrot, B. 1974. Intermittent turbulence in selfsimilar cascades: divergence of high
moments and dimension of the carrier. J. Fluid Mech. 62: 331-358.
Mandelbrot, B. 1975. Les objets fractals: forme, chance et dimension. Flammarion, Paris.
[Second edition in 1984.]
Mandelbrot, B. 1977. Fractals. Form, chance, and dimension. Freeman & Co., San Francisco.
365 p.
Mandelbrot, B. 1982. The fractal geometry of nature. Freeman & Co., San Francisco. 468 p.
Margalef, R. 1980. La biosfera. Ediciones Omega, Barcelona. 236 p.
Mark, D.M. 1984. Fractal dimension of a coral reef at ecological scales: a discussion. Mar.
Ecol. Progr. Ser. 14: 293-296.
May, R.M. 1974. Stability and complexity in model ecosystems. 2nd ed. Princeton Univ.
Press. 265 p.
May, R.M. 1975. Deterministic models with chaotic dynamics. Nature (London) 256:
165-166.
May, R.M. 1981. Nonlinear phenomena in ecology and epidemiology. Ann. N.Y. Acad. Sci.
357: 267-281.
Meyer, J.A. 1980. Sur la dynamique des systemes ecologiques non lineaires. J. Physique
(Colloque C5, 1978: suppl. au nO 8) 38: C5.29-C5.37.
Meyer, J.A. 1981. Sur la stabilite des systemes ecologiques plurispecifiques. 335-351 in B.E.
Paulre [ed.] System dynamics and analysis of chance. North Holland Publ. Co.
Morozitz, H.J. 1968. Energy flow in biology. Acad. Press, New York. 179 p.
Nicolis, C., and G. Nicolis. 1984. Is there a climatic attractor? Nature (London) 311: 529-532.
Pareto, V. 1896, 1965. Cours d'economie politique. Reimprime dans un volume d' "Oeuvres
Completes", Droz, Geneve.
Pielou, E.C. 1975. Ecological diversity. Wiley Interscience, New York. viii + 165 p.
Platt, T., and K.L. Denman. 1977. Organization in the pelagic ecosystem. Helgoland Wiss.
Meeresunters. 30: 575-581.
Platt, T., and K.L. Denman. 1978. The structure of pelagic marine ecosystems. Rapp. P.-v.
Reun. ClEM 173: 60-65.
Ripley, B.D. 1981. Spatial statistics. John Wiley & Sons, New York. x + 252 p.
Ryder, R.A. 1965. A method for estimating the potential fish production of north-temperate
lakes. Trans. Amer. Fish. Soc. 94: 214-218.
Safran, P. Etude d'une nurserie littorale a partir des peches accessoires d'une pecherie artisanale
de crevettes grises (Crangon crangon). Oceanol. Acta (in press).
Villermaux, J., D. Schweich, and 1.R. Hautelin. 1986a. Le peigne du diable, un modele
d'interface fractale bidimensionnelle. C. R. hebd. Seances Acad. Sci., Paris. In press.
Villermaux, 1., D. Schweich, and 1.R. Hautelin. 1986b. Transfert et reaction a une interface
fractale representee par Ie peigne du diable. C. R. hebd. Seances Acad. Sci., Paris. In
press.
Wetzel, R.G. 1975. Limnology. Saunders, Toronto.
Zipf, G.K. 1949-1965. Human behavior and the principle of least-effort. Addison-Wesley,
Cambridge, Mass.
Path analysis for mixed variables
PATH ANALYSIS WITH OPTIMAL SCALING

Jan de Leeuw
Department of Data Theory FSW, University of Leiden
Middelstegracht 4
2312 TW Leiden, The Netherlands

Abstract - In this paper we discuss the technique of path analysis, its extension to
structural models with latent variables, and various generalizations using optimal
scaling techniques. In these generalizations nonlinear transformations of the
variables are possible, and consequently the techniques can also deal with nonlinear
relationships. The precise role of causal hypotheses in this context is discussed. Some
applications to community ecology are treated briefly, and indicate that the method
is a promising one.

INTRODUCTION

In this paper we shall discuss the method of path analysis, with a number of
extensions that have been proposed in recent years. The first part discusses path
analysis in general, because the method is not very familiar to ecologists. In fact we
have been able to find only very few papers using path analysis in the literature of
community ecology. With the help of Pierre and Louis Legendre we located Harris
and Charleston (1977), Chang (1981), Schwinghamer (1983), Gosselin et al.
(1986), and Troussellier et al. (1986).
In this paper we combine classical path analysis models, first proposed by
Wright (1921, 1934), with the notion of latent variables, due to psychometricians
such as Spearman (1904) and to econometricians such as Frisch (1934). This
produces a very general class of models. If we combine these models with the
notion of least squares optimal scaling (or quantification, or transformation),
explained in De Leeuw (1987), we obtain a very general class of techniques.
Now in many disciplines, for example in sociology, these path analysis
techniques are often discussed under the name causal analysis. It is suggested,
thereby, that such techniques are able to discover causal relationships that exist
between the variables in the study. This is a rather unfortunate state of affairs (De
Leeuw 1985). In order to discuss it more properly, we must start the paper with
some elementary methodological discussion.
One of the major purposes of data analysis, in any of the sciences, is to arrive at a
convenient description of the data in the study. By 'convenient' we mean that the
data are described parsimoneously, in terms of a relatively small number of
NATO AS! Series, Vol. G 14
Developments in Numerical Ecology
Edited by P. and L. Legendre
© Springer-Verlag Berlin Heidelberg 1987
382

parameters. If possible this description should be linked as tightly as possible to


existing scientific theory, and consequently the parameters should not be merely
descriptive, but they must preferably be part of a model for the phenomenon that is
studied. This makes it possible to communicate efficiently, and to fit the results into
an existing body of theory. Fitting data into existing theory, or creating new theory
to incorporate the data, is called explanation. If the theory is formulated in terms of
if-then relationships, or more generally in terms in functional relationships, then we
can call this explanation causal .
Thus causality is interpreted by us as a way of formulating theories, a way of
speaking about the world. Whether everything, or almost everything, moves or
develops deterministically according to causal laws is, from a scientific point of
view, not an interesting question. It is an undeniable fact that everybody, including
scientists, uses causal language all the time. It is also true, that in most contexts the
word causality suggests a necessary connection, a notion of the cause producing the
effect, and the idea that it must be possible to change the effect by manipulating the
cause. This does not imply, as we sometimes hear, that causal connections can only
be established by experimental methods. Causal connections, if they are necessary
connections, cannot be established at all, in the same way as natural laws cannot be
proven inductively. Causality is a figure of speech, and there is no need to 'establish'
a figure of speech.
This does not mean, of course, that persons engaged in scientific discourse can
afford to choose their terminology in a misleading and careless way. The word
'causality' has all the connotations we have mentioned above (necessity,
productivity, manipulation), and if social scientists, for instance, want to use the
word, they must realize that it has these connotations. If social scientists set out to
prove that 'social economic status' causes 'school achievement', and 'school
achievement' causes 'income', then they will have a hard time convincing others that
they are using the word 'cause' in the same sense as somebody who says that putting
a kettle of water on the fire causes it to boil.
We briefly mention some other points that are important in this connection.
There has been a justifiable tendency in statistical methodology either to avoid the
word 'cause' altogether, or to give it a precise meaning which does not necessarily
have much to do any more with the common sense notion. Simon (1953) and Wold
(1954), for instance, define 'causality' as a property of systems of linear
regressions, some are causal and some are not. This is not very objectionable,
although of course not without its dangers. A very important point of view,
defended for example by Pearson (1911), is that causation is merely the limiting
case of perfect correlation. This resulted from a conscious attempt, started by the
Belgian astronomer Quetelet, to bring the laws of the social and life sciences on an
383

equal footing with the laws of the physical sciences. Pearson eloquently argued that
correlation is the more fundamental scientific category, because causality is merely
a degenerate special case, which does not really occur in practice. Again this point
of view is not inherently wrong, provided we broaden the definition of correlation
sufficiently.
This is related to the fact that lawlike relationships in the social sciences and the
life sciences are usually described as probabilistic in stead of deterministic. If we
have ten kettles, and we put them on the fire, then the water will boil in six or seven
of them. But this difference is mainly a question of choosing the appropriate unit. A
probabilistic relationship between individual units is a deterministic relationship, in
fact a functional relationship, between the random variables defined on these units.
A linear regression between status and income is a deterministic relationship
between averages, even though it does not make it possible to predict each
individual income precisely from a known status-value. If we call a law-like
relationship between the parameters of multivariate probability distributions a
correlation, then Pearson's point of view about causality makes sense. Of course we
must again be careful, because another far more specific meaning of the word
'correlation', also connected with the name of Pearson, is around too. Compare
Tukey (1954) for more discussion on this point.
Up to now we have concentrated on data analysis as a method of description. We
summarize our data, preferably in the context of a known or conjectured model
which incorporates the prior information we have. At the same time we also
investigate if the model we use describes the data sufficiently well. But science does
not only consist of descriptions, we also need to make predictions. It is not enough to
describe the data at hand, we must also make statements about similar or related data
sets, or about the behaviour of the system we study in the future. In fact it is
perfectly possible that we have a model which provides us with a very good
description, for example because it has many parameters, but which is useless for
prediction. If there are too many parameters they cannot be estimated in a stable
way, and we have to extrapolate on a very uncertain basis. Or, to put it differently,
we must try to separate the stable components of the situation, which can be used for
prediction, from the unstable disturbances which are typical for the specific data set
we happen to have.
We end this brief methodological discussion with a short summary. The words
'correlation' and 'causality' have been used rather loosely by statisticians, certainly
in the past. Causal terminology has sometimes been used by social scientists as a
means of making their results sound more impressive than they really are, and this
is seriously misleading. It is impossible, by any form of scientific reasoning or
activity, to prove that a causal connection exists, if we interpret 'causal' as
384

'necessary'. What we are really looking for is invariant functional relationships


between variables, or between the parameters of multivariate probability
distributions. These invariant relations can be used for prediction. The method of
path analysis, that we shall discuss in detail below, has the specific advantage over
other data analysis techniques that it makes causal hypotheses explicit by translating
them into regression equations. Thus it becomes possible to integrate prior 'causal'
knowledge in the data analysis, and to test 'causal' hypotheses. These important
positive aspects of the technique are important in so far as this prior knowledge is
relatively well-established, and in so far the hypotheses really make sense.
Incorporating prior knowledge which is just conjectural means that we are treating
prejudice as certainty, and this can lead to very undesirable consequences (as the
nature-nurture debate about the genetics of intelligence amply shows; compare for
instance Jaspars and De Leeuw 1980).

PATH MODELS IN GENERAL

We shall now define formally what we mean by a path model. In the first place
such a model has a qualitative component, presented mathematically by a graph or
arrow diagram. In such a graph the variables in our study are the corners, the
relationships between these variables are the edges. In the path diagrams the
variables are drawn as boxes, if there is an arrow from variable V 1 to variable V2
then we say that V 1 is a direct cause of V2 (and V2 is a direct effect of VI)'

Figure 1.
Path diagram.
385

Compare Figure 1, for example. Observe that we use causal terminology without
hesitation, but we follow the Simon-Wold example and give a precise definition of
causes and effects in terms of graph theory. If there is a path from a variable VIto
another variable V 2, then we say that V 1 is a cause of V 2 (and V2 is an effect of
VI). In Figure 1, for instance, V 1 is a cause of V6 and V7, although not a direct
cause.

Table 1.
Causal relations in Figure 1.

direct
level causes causes predecessors
Var 1 0 **** **** ****
Var2 0 **** **** ****
Var 3 1 {1,2} {1,2} {1,2}
Var4 1 {I} {I} {1,2}
Var 5 1 {2} {2} {1,2}
Var6 2 {1,4} {4} {1,2,3,4,5}
Var7 2 {1,4} {4} {1,2,3,4,5}

An important class of graphs is transitive, by which we mean that no path


starting in a comer ever returns to that comer. Figure 1 would not be transitive
any more with an arrow from V7 to VI, because of the path V 1 ~ V4 ~ V7 ~ VI,
but it would still be transitive with an arrow from V7 to V2. There have been heated
discussions about the question whether or not non-transitive models can still be
called causal. With our definition of causality they obviously can.
In transitive models we can define an interesting level assignment to the
variables. This concept is due to De Leeuw (1984). Variables at which no arrows
arrive are often called exogenous variables. They get level o. The level of an
endogenous (i.e. not exogenous) variable is one larger than the maximum level of
its direct causes. We call Via predecessor of V2 (and V2 a successor of VI) if the
level of VIis less than that of V2. In the Table 1 we give causes, direct causes, and
predecessors for the variables in Figure 1. Clearly the direct causes are a subset of
the causes, and the causes are a subset of the predecessors. If x is any variable, we
write this symbolically as pred(x) :2 cause(x) :2 dcause(x). By using lev (x) for
the level, we can now say dcause(x) = 0 ~ lev(x) = 0, and lev (x) = 1 + max
{lev(y) lyE dcause(x)}. A model is transitive if (Vx){x ~ cause(x)}. These
qualitative concepts make it possible to explain what the general idea of path analysis
386

is. We have defined our notion of causality in terms of the path diagram. Other
notions which are important in path analysis will be discussed below.

TRANSITIVE PATH MODELS

We know make the path diagram quantitative, by embedding the qualitative


notions in a numerical model for the variables. We restrict ourselves to linear
structural models. There exist nonlinear path analysis techniques, developed in the
framework of log-linear analysis (Goodman, 1978, Kiiveri and Speed, 1982), but
these are outside our scope. They are discussed and compared with our approach in
De Leeuw (1984). The only nonlinearity we allow for, at a later stage, is that
connected with the transformation or quantification of variables. We assume, for
the moment, that all variables are completely known, and, moreover, standardized
to zero mean and unit variance. Thus VAR(x) = 1 for all variables x, and AVE(x) =
o.
The model in Figure 1 can be made numerical in the following way. We take all
the endogenous variables in tum, and we suppose that they are a linear function of
their direct causes, plus a disturbance term. The linear model corresponding with
Figure 1 becomes

x3 = 1331 x 1 + 1332x2 + E3, (la)


X4 = 1341 Xl + E4, (1 b)
x5 = 1352x2 + E5, (1 c)
X() = 1364X4 + E6. (ld)
x7 = 1374X4 + E7· (le)

The assumptions we make about the disturbance terms Ej are critical. These
assumptions are in terms of uncorrelatedness, for which we use the symbol .L First
assume for each j that the Ej are uncorrelated with dcause(xj). Thus

E3 1. {x1,x2}, (2a)
E41. {Xl}, (2b)
E51. {x2}, (2c)
E61. {x4}, (2d)
E71. {x4}· (2e)

Now model (1)(2) describes any data set of seven variables perfectly. To see this it
suffices to project each Xj on the space spanned by its direct causes, i.e. to perform a
387

multiple regression with Xj as the dependent variable and dcause(xj) as the


independent ones, and to take £j equal to the residual. Then the disturbance is, per
definition, uncorrelated with the direct causes in the same equation, and description
is perfect. We can also say that the model is saturated, or just identified. It does not
impose any restrictions, it merely provides us with an alternative description which
is perhaps preferable to the original one because it links the data with some existing
theory. But although description is, in a trivial sense, perfect, the performance of
(1)(2) as a predictive model may still be very bad. The predictive power of the
model is measured by the variances of the disturbances or residuals. If this is large,
then we do not predict the corresponding variable efficiently. Thus we can have
models which are good descriptors but poor predictors.
Path models can also be poor descriptors. But in that case we clearly must make
stronger assumptions about the distribution of the disturbances. Let us call for any
path model the assumption that for each j we have £j ~ dcause(xj) the weak
orthogonality assumptions. The strong orthogonality assumptions are defined for
transitive models only. They are (i) that the disturbances are uncorrelated with the
exogenous variables, and (ii) that disturbances of variables of different levels are
uncorrelated with each other. In symbols this reads £j ~ {x Ilev(x) = O} and £j ~
{£k Ilev(xk) "* lev(xj)}.Thus, in a convenient compact notation, in our Figure 1,

{£3,£4,£5,£6,£7} ~ {xl>x2}, (3a)


{£3,£4,£5} ~ {£6,£7 }. (3b)

Assumption (3) is much stronger than (2), and not all sets of seven variables satisfy
(1) and (3). Because £4 ~ {xl>x2}, for example, regression of x4 on Xl and x2 will
give ~42 = 0 if (1)(3) is true, and this is clearly restrictive. Thus model (1)(3) can
be a poor descriptor as well as a poor predictor. It is clear, by the way, that a model
which is a good predictor is automatically a good descriptor.
For the causal interpretation the following argument is useful. It extends to all
transitive models. We have £6 ~ {xl>x2} and E6 ~ £3· Thus, from (1a), £6 ~ x3. In
the same way £6 ~ x4 and £6 ~ x5· Thus £6 ~ {xl>x2,x3,x4,x5}, which implies that
proj(x6Ixbx2,x3,x4,x5} = proj(x6Ix4), with proj(ylx1, ... ,xm ) denoting least
squares projection of y on the space spanned by x 1> ••• ,xm . In words this says that the
projection of x6 on the space spanned by its predecessors is the projection of x6 on
the space spanned by its direct causes. The interpretation is that, given the direct
causes, a variable is independent of its other predecessors. Thus the strong
orthogonality assumptions in transitive models imply a (weak) form of conditional
independence .
We shall now treat some more or less familiar models in which description is
388

perfect. These models are consequently saturated. The structural equations defining
the model can be solved uniquely, and the model describes the data exactly. The
first, and perhaps simplest, example is the multiple regression model. An example is
given in Figure 2.

r13 y

Figure 2.
Multiple regression model.

If we compare this with Figure 1 we see some differences which are due to the
fact that we have made the model quantitative. In the first place the arrows now have
values, the regression coefficients. In the second place it is convenient to use curved
loops indicating the correlations between the exogenous variables. The curved loops
can also be used to represent correlated disturbances. This becomes more clear
perhaps if we add dummy equations like Xj = Ej for each of the exogenous variables,
which is consistent with the idea that exogenous variables have no causes; exogenous
variables are, in this sense, identical with disturbances. The strong orthogonality
assumptions on disturbances can now be stated more briefly, because they reduce to
the single statement Ej 1- { Ek I Iev(xk) "# Iev(xj) }. Arrows are also drawn in
Figure 2 to represent uncorrelated disturbance terms.
In Figure 2 , and in multiple regression in general, there is only one endogenous
variable, often called the dependent variable. There are several exogenous
variables, often called predictors or independent variables. The linear structural
model is

(4)

The orthogonality assumptions on the disturbances are E 1- dcause(y) =


{xl, ... ,x m }. In this case the strong assumptions are identical with the weak
389

assumptions, because dcause(y) are exactly the exogenous variables. Thus (4) is a
saturated model. If we project the dependent variable on the space spanned by the
predictors, then the residual is automatically uncorrelated with each of the
predictors. The description is perfect, although the prediction may be lousy. We
measure quality of prediction by the multiple correlation coefficient R2 = I -
VAR(£), in this context also known as the coefficient of determination.
Figure 3 shows a somewhat less familiar model. Its linear structure is

X2 = ~21xI + £2, (5a)


x3 = ~31xI + ~32x2 + £3· (5b)

The weak orthogonality assumptions, which make (5) a saturated model, are £2 1.
{Xl} and £31. {xJ,x2}. It follows from this that £2 is the residual after projection of
x2 on Xl. Thus ~21 is equal to the correlation between Xl and x2, and £2 = x2 -
~21xI is a linear combination of Xl and x2. This implies that £31. £2, and
consequently the strong orthogonality assumptions are true as well. Although we
did not require it, we automatically get uncorrelatedness of the disturbance terms.

l-~
, ,
... ...
Xl ~""1
_A x2 --
~.,...,
X3
Figure 3 .

L . A simple saturated
recursive model.
~31

If we try to generalize the structure in Figures 2 and 3 we find something like


Figure 4. Variables are partitioned into sets, and variables in the same set have the
same level. In saturated block-transitive models dcause(x) = pred(x) for all
variables x. Thus there are arrows from each variables to all variables of a higher
level. There are no arrows within sets. The arrows indicating errors in Figure 4
actually indicate correlated errors. Saturated simple transitive models (also called
causal chains) have only one variable in each set, and thus all variables have a
different level. For both block transitive models and simple transitive models the
weak orthogonality assumptions , together with the structure, imply the strong
390

orthogonality assumptions. And, consequently, imposing the strong orthogonality


assumptions leaves the model saturated and the description perfect. Residuals of
variables of different levels are uncorrelated, and residuals are uncorrelated with
variables of a lower level. There can be correlation between the residuals of
variables of the same level, or between residuals and variables of a higher level. We
can find path coefficients by regressing each endogenous variable on the set of its
predecessors. We have seen that transitive models are path models corresponding
with transitive graphs having no 'causal loops'. Saturated transitive models, of
which the block transitive models and simple transitive models are special cases,
describe the dispersion matrix of the variables precisely. Non-saturated or
restrictive transitive models, of which the model in Figure 1 is a special case, arise
from saturated models by leaving out certain arrows. It is still the case that an
unambiguous level assignment is possible, and the terminology of predecessors and
successors still applies.

level 0 level 1 level 2

18 arrows 24 arrows
~ ~

12 arrows
..-
Figure 4.

General recursive saturated model.

In quantifying any path model we can simply use the path diagram to write down
the linear structural equations. We also have to assume something about the
disturbances in terms of their correlation with each other and with the Xj. The
391

weak orthogonality assumptions can be applied in all cases. They make the model
saturated, and have as a consequence that consistent estimation of the regression
coefficients is possible by projecting a variable on the space spanned by its direct
causes. In all transitive models, saturated or not, the strong orthogonality conditions
follow from the weak orthogonality conditions and the linear structure. Thus the
causal interpretation in terms of conditional independence is available.
The notion of a linear structural model is more general than the notion of a
transitive model, of course. If we assume a structural model, such as (1), then we
can make alternative assumptions about the residuals, for instance that they are all
uncorrelated. In fact we can easily build linear structural models which are not
transitive at all. Simply write down the model from the path diagram, one equation
for each endogenous variable, and make some sort of assumption about the
disturbances. By allowing for correlations between the disturbances we can create
saturated nontransitive models, and we can also get into problems with
identifiability. For these identification problems we refer to the econometric
literature, for instance to Hsiao (1983) or Bekker (1986). Observe that
nontransitive models can not be translated into conditional independence statements,
which has caused some authors to say that nontransitive models are not causal.
For a small ecological example we use a part of the correlation matrix given by
Legendre and Legendre (1983, Table 5.6). The data have to do with primary
production, and were collected in 1967 in the Baie des Chaleurs (Quebec). There
are 40 measurements on four variables. These are:
K: the biological attenuation coefficient which represents
the relative primary production,
C: the concentration of chlorophyll a,
S: the degree of salinity,
T: the temperature.
The correlation matrix, and some simple path models, are given in Table 2.
Model (a) is the saturated model which has T and S as exogenous variables (level 0),
has C as a variable of levell, and K as the innermost variable of level 2. Model (b) is
not saturated, because the paths from T and S directly to K are eliminated. All
effects of T and S on K go through C, or, to put it differently, K is independent of T
and S, given C. Model (c) is also saturated, but no choice is made about the causal
priority of C or K. Thus C and K have correlated errors, because they both have
level 1. In the part of Table 2 that gives the fitted coefficients we see that the
covariance of the errors in (c) is .721. Because of this covariance variable K has a
much larger error variance in model (c).
392

K C T
C +.842
T +.043 +.236 Correlations
s -.146 -.369 -.925 Baie des Chaleurs

(b)

Three recursive models

(a) (b) (c)


(c)
T ~
C 0.730 -0.730 -0.730
S ~
C 1.044 -1.044 -1.044
T ~ K +0.031 ***** -0.638
S ~ K +0.220 ***** -0.736
C ~ K +0.916 +0.842 *****
Table 2.
Legendre and Legendre VAR ERR C 0.787 0.787 0.787
Primary Production Data. VARERRK 0.260 0.291 0.920
OV ERRC,K 0.721

Models (a) and (c) give a perfect description of the correlations, so the choice
between them must be made purely on the basis of prior notions the investigator has.
We are not familiar with the problems in question, so we cannot make a sensible
choice. Model (b) is restrictive. If we compare it with (a) we still see that its
description is relatively good. If we want to decide whether to prefer it to (a) we can
either use statistics, and see if the description is 'significantly' worse. But we can
also use (a) and (b) predictively, and see which one is better. Our guess is that on
both counts (b) is the more satisfactory model.
393

DIRECT AND INDIRECT EFFECTS

In this paragraph we discuss the calculus of path coefficients explained by


Wright (1921, 1934). We do not present the general theorems here, but we illustrate
the calculus by using our examples. First consider the model in Figure 3. Let us use
equations (5) to compute the correlations between xl, x2 and x3. We find r21 = ~21
and r31 = ~31 + ~32r21 = ~31 + ~32~21' In terms of Figure 3 the equation for r31
can be interpreted as follows: there is a direct effect of xl on x3 with size ~31' and
an indirect effect (via x2) of ~32~21' The indirect effect comes about because there
is a path from xl to x3, which passes by x2. Coefficients along the path are
multiplied to quantify the indirect effect. In the same way we find r32 = ~31q2 +
~32 = ~31 ~21 + ~32' Again a direct and an indirect effect, but now the indirect
effect does not correspond with a path in the directed graph but with the path in the
corresponding undirected graph.
An even clearer example can be obtained from Table 2. In model (a), for
instance, we have K = ~KCC + ~KTT + ~KSS + EK and C = ~CTT + ~CSS + Ee.
Thus rKC = ~KC + ~KTrCT+ ~Ksrcs = ~KC + ~KT(~CT + ~CSrST)+
~KS(~crrST + ~CS) = ~KC + ~KT~CT + ~KS~CS + ~KT~csrST + ~KS~crrST'
Thus fKC is the sum of a direct effect, an indirect effect via T and another indirect
effect via S. The two remaining contributions to fKC come from the (undirected)
paths from K to T to S to C and from K to S to T to C. In model (b) we have fKC =
~KC' because this direct effect is the only path. In model (c) K = ~KTT + ~KSS + EK
and C = ~CTT + ~CSS + EC· Thus fKC = ~KT~CT + ~KS~CS + ~KT~csrST +
~KS~CTrST + r(EK,Ec) and there is no direct effect.
The terminology of direct and indirect effects is causal, of course, and our
earlier warnings against taking this terminology too literally apply. For model (a)
in Table 2 we find, for instance, for the direct effect from C on K +.916, the
indirect effect via T is -.023, the indirect effect via S is -.230, and the two effects 'K
to T to S to C' and 'K to S to T to C' are +.030 and +.149. The sum of these effects is
+.842, which is indeed the correlation between C and K. It is difficult, and risky, to
give a causal interpretation, because the values depend strongly on the model that we
have chosen. In model (c), for instance, the indirect effect via Tis +.466 and the
indirect effect via S is +.768. The equation for fKC in (c) becomes .842 = .466 +
.768 - .616 - .497 + .721. The model also fits perfectly, but presumably the causal
interpretation would be quite different.
Although the calculus of path coefficients in transitive models is an interesting
and perfectly legitimate way to decompose correlation coefficients, causal
interpretation in terms of direct and indirect effects seems valuable only if there are
394

strong reasons to prefer the particular model in the study over other competing
models. And this happens only if we already have a pretty good idea about the
mechanisms that are at work in the situation we are studying. If the sociologist says
that fathers's income only has an indirect effect on the career of the child, this is
either just a figure of speech, or a statement that a particular partial correlation
coefficient is small. In Chang (1981), and Troussellier et al. (in press), it is shown
that the decomposition of the correlation coefficients in direct and indirect
contributions (with respect to a particular path model) can lead to useful
interpretations in community ecology.

LATENT VARIABLES

Now consider the path models in Figures 5 and 6. They are different from the
ones we have seen before, because they involve latent or unobserved variables. In
the diagrams we indicate these latent variables by using circles instead of squares.
First we give the causal interpretation of Figure 5. If we project the observed
variables on the space spanned by the unobserved variables then the residuals are
uncorrelated. Thus the observed variables are independent given the unobserved
variable. All relationships between the observed variables can be 'explained' by the
latent variable, which is their common factor. In somewhat more intuitive terms a
good fit of this common factor model to the data means that the variables all
measure essentially the same property. A good fit, and small residuals, means that
they all measure this property in a precise way. Again we see that the model can be a
good description of the data without being a good predictor. Uncorrelated
variables, for instance, are described perfectly by the model, but cannot be
predicted at all.
The structural equations describing the model are

(6)

The Ej are assumed to be uncorrelated with~. Model (6) is saturated and transitive,
but it has the peculiar property that the exogenous variable is not measured. In De
Leeuw (1984) it was suggested that latent variables are just another example of
variables about which not everything is known. We have nominal variables, ordinal
variables, polynomial variables, splinical variables, and we also have latent
variables. About latent variables absolutely nothing is known, except for their place
in the model. Thus the basic optimal scaling idea that transformations and
quantifications must be chosen to optimize prediction also applies to latent variables.
395

Consequently latent variables fit very naturally into the optimal scaling approach to
path analysis.

,
a £1
1

x
2 I~ ~
Figure 5.
a One-factor model
3
x
3 ~

The model in Figure 6 is a special case of the MIMIC model proposed by


Joreskog and Goldberger (1975). In MIMIC models there are two sets of variables.
The exogenous variables influence the observable endogenous variables through the
mediation of one or more latent variables.

x
1

Y1 £ 1

x
2

Y2 £2

x
Figure 6.
3
MIMIC model.

The MIMIC model combines aspects of psychometrical modelling with aspects


396

of econometric modelling. It follows from the MIMIC equations, that the


observable endogenous variables satisfy a factor analysis model, while the joint
distribution of exogenous and endogenous variables is a reduced rank regression
model. For Figure 6 these equations are

~ = ~IxI + ~2x2 + ~3x3 + 0, (7a)


YI = al ~ + £1, (7b)
Y2 = a2~ + £2· (7c)

The MIMIC model is closely related to canonical correlation analysis (Bagozzi,


Fornell, and Larker, 1981) and to redundancy analysis (Gittins, 1985, section
3.3.1).

.567

.620

.062
Figure 7.
MIMIC model,Legendre data.

Figure 7 illustrates an application of the MIMIC model to the Baie des Chaleurs
data of Legendre and Legendre. The values of the path coefficients and the error
variances are given in the diagram. The model provides a reasonably good
description, compared with the transitive models in Table 2. The causal
interpretation of Figure 7 is that temperature and salinity determine the unmeasured
variable ~, which in its tum determines primary production and chlorophyll
concentration.
In our experience some people find it difficult to accept the concept of a latent
variable. But there are several reasons why we still think that such a concept is
useful. In the first place in many of the sciences measurement errors can not be
neglected. This means that the observed variable is an indicator of the latent 'true'
variable. The concept of an indicator can be generalized considerably, and this has
happened mainly in psychometrics and in sociological methodology. It is not
possible to measure 'intelligence' directly, but it is possible to measure a large
number of indicators for intelligence. If the common factor model is acceptable,
397

then we have found a way to measure intelligence as a linear combination of


indicators (it is still possible, under these circumstances, that measurement of
intelligence is poor in a predictive sense). The situation can be compared with
determining the weight of a number of objects if we have a number of spring
balances with unknown characteristics. This can be done quite well by common
factor analysis. Social scientists happen to use a large number of concepts such as
intelligence (or attitude, or status, or power), which can not be measured directly
but for which indicators are available. It seems to us that the situation in ecology is
not really different. This means that the path models in terms of the observed
variables are theoretically not very statisfactory, because the theory says something
about the relationships between conctructs or concepts, which should not be
confused with their indicators. And finally, we have already used latent variables in
classical path analysis as well. The errors or disturbances in the equations are also
unobserved, and measurable only by making linear combinations of observed
variables. If we allow for 'errors in equations', we may as well allow for 'errors in
variables' .

OPTIMAL SCALING OF VARIABLES

We now briefly indicate where the theory of optimal scaling comes in. We have
seen in De Leeuw (1987) that optimal scaling (or transformation, or quantification)
can be used to optimize criteria defined in terms of the correlation matrix of the
variables. In path analysis the obvious criteria are the coefficients of determination,
i.e. the multiple correlation coefficients. In De Leeuw (1987) we already analyzed
an example in which the multiple correlation between predictors SPECIES and
NITRO and dependent variable YIELD was optimized. In path analysis we deal with
nested multiple regressions, and we can choose which one (or which combination)
of the multiple correlations we want to optimize. If there is no prior knowledge
dictating otherwise, then it seems to make most sense to maximize the sum of the
coefficients or determination of all the endogenous variables. But in other cases we
may prefer to maximize the sum computed only over all variables of the highest
level.
In general nontransitive models the methods of optimal scaling can be used
exactly as in transitive models. We have one coefficient of determination for each
endogenous variable, and we can scale the variables in such a way that the sum of
these coefficients is optimized. This amounts to finding transformations or
quantifications optimizing the predictive power of the model. Moreover it is
irrelevant for our approach if the model contains latent variables or not. We have
398

seen that latent variables are simply variables with a very low measurement level,
and that they can be scaled in exactly the same way as ordinal or nominal variables.
This point of view, due to De Leeuw (1984), makes our approach quite general. It is
quite similar to the NIPALS approach of Wold, described most fully in Joreskog
and Wold (1982) and Lohmoller (1986).
It is of some interest that we do not necessary optimize the descriptive efficiency
at the same time. Optimizing predictive power is directed towards the weak
orthogonality assumptions. It is possible, at least in principle, that a model with
optimized coefficients of determination has a worse fit to the strong orthogonality
assumptions. Scaling to optimize predictability does not guarantee an improved fit
in this respect. This has as a consequence that there is a discrepancy between the
least squares and the maximum likelihood approach to fitting non transitive path
models. We do not go into these problems, but refer the interested reader to
Dijkstra (1981), Joreskog and Wold (1982), and De Leeuw (1984) for extensive
discussions.
We now outline the algorithm that we use in nonlinear path analysis somewhat
more in detail. We minimize the sum

(8)

over both the regression coefficients f3jl and the quantifications (or transformations)
of the variables. The outer summation, over j, is over all endogenous variables, the
inner summation, over 1, is over all variables that are direct causes of variable j. The
algorithm we use of is the alternating least squares type (Young, 1981). This means
that the parameters of the problem are partitioned into sets, and that each stage of
the algorithm minimizes the loss function over one of the sets, while keeping the
other sets fixed at their current values. By cycling through the sets of parameters we
obtain a convergent algorithm. In this particular application of the general
alternating least squares principle each variable defines a set of parameters, and the
regression coefficients define another set.
We give an ecological illustration of this nonlinear PATHALS algorithm. The
data are taken from Van der Aart and Smeenk-Enserink (1975), who reported
abundance data for 12 species of hunting spiders in a dune area in the Netherlands.
A total of 28 sites was studied, and the sites were also described in terms of a
number of environmental variables. We have used a selection and coding from these
data made by Ter Braak (1986a). He used the six environmental variables:
WC Water content, percentage dry weight,
BS Percentage bare sand,
CM Percentage covered by moss layer,
399

LR Reflection of soil surface at cloudless sky,


FT Percentage covered by fallen leaves or twigs,
CH Percentage covered by herbs layer.
Ter Braak categorized all variables into 10 discrete categories, in order to present
them succinctly. We have taken over his categorization, and used it in our analysis.
The results of a MIMIC analysis with two latent variables (factors) are given in
Table 3. Analyses with only a single latent variable were not very successful. We
first perfonned a linear analysis, using the category scores from the coding by Ter
Braak, and we then computed optimal monotone transfonnations. As an illustration
the optimal transfonnations for the environmental variables are given in Figure 8.
We see a large variety of shapes. It would carry us too far astray to give a detailed
analysis of these nonlinearities. Of course these transfonnations are only optimal
given the path model, in this case given the number of latent variables, for instance.

Table 3. Hunting spider data. Metric and nonmetric MIMIC analysis.

weights metric weights nonmetric residual variances


metric nonmetric
WC -.77 .20 -.97 .24
BS -.02 .11 -.30 .53
CM .20 .17 .09 .27
LR .17 .52 -.07 .54
Ff .62 -.32 .29 -.02
CH -.26 .13 -.52 .41

SI -.77 .21 -.87 .19 .39 .21


S2 -.10 -.79 .21 -.86 .37 .21
S3 -.89 -.04 -.89 -.20 .21 .16
S4 -.91 .23 -.97 .18 .12 .04
S5 -.92 .26 -;96 .18 .08 .04
S6 -.88 .16 -.93 .16 .20 .10
S7 -.95 -.15 -.97 -.08 .07 .04
S8 -.75 .15 -.85 -.01 .42 .27
S9 -.25 .63 -.40 .73 .54 .31
S10 .30 .83 .15 .90 .22 .16
SII .59 .53 .54 .72 .37 .18
S12 .57 .35 .60 .58 .56 .31

For a more detailed discussion and interpretation of the data we refer to Van der
Aart and Smeenk-Enserink (1975) and to Ter Braak (1986a), who both perfonned
fonns of canonical analysis. Actually Ter Braak used canonical correspondence
analysis, a fonn of nonlinear canonical analysis, also discussed in Ter Braak
(1986b). We merely point out some 'technica1' aspects of our analysis, and we
400

0.4 0.4

0.3 0.3
0.2
., 02 .,
I)
I)
::l 0.1 ::l 0.1
~ ~
> -0.0 > -0.0
'"
I)
-0.1 '" -0.1
I)

8 -0.2 8 -0.2
.8., .8.,
§ -0.3 Water Content
I:
t':I -0.3 Bare sand
b b
-0.4 -0.4
0 3 0
category numbers category numbers

0.4 0.4
0.3 0.3
0.2 0.2
., .,
8 0.1 8 0.1
~ 'iU
> -0.0 > -0.0
]
-0.1 '§
" -0.1
8 -0.2
.8., .£., -0.2
a
b
-0.3 Cover moss a -0.3 Light reflection
b
-0.4 -0.4
0 3 0 3
category numbers category numbers

0.4 0.4
0.3 0.3
0.2 0.2
.,
I)
.,
I)
::l 0.1 ::l 0.1
~ ~
> ·0.0 ;:- -0.0
'E" ·0.1
I)
'"
I)

E
-0.1
.8., ·0.2 .8., -02
I: ~

b
~ ·0.3 Fallen twigs Ol
b
·0.3 Covered herbs
·0.4 ·0.4
0 1 2 3 0 1
category numbers category numbers

Figure 8. Optimal monotone transformations,environmental variables.


401

compare the linear and nonlinear solutions. It is clear that the 'explained' variances
of the transformed abundance variables increase considerably. The table does not
give the 'explained' variance of the two latent variables. For the metric analysis the
residuals are .06 and .14, for the nonmetric analysis they are .01 and .01. Thus the
latent variables in the nonmetric analysis are almost completely in the space of the
transformed environmental variables, which implies that our method is very close
to a nonmetric redundancy analysis.
The interpretation of the latent variables is facilitated, as is usual in forms of
canonical analysis, by correlation the latent variables with the transformed
variables. This gives canonical loadings. If we do this we find, for example, that
the first latent variable correlates -.75 with both Water Content and Cover Herbs,
while the second one correlates +.80 with Light Reflection and -.80 with Fallen
Twigs. The analysis clearly shows some of the advantages of nonlinear multivariate
analysis. By allowing for transformations of the variables we need fewer
dimensions to account for a large proportion of the variance. Much of the
remaining variation after a linear analysis is taken care of by the transformations,
and in stead of interpreting high-dimensional linear solutions we can interprete
low-dimensional nonlinear solutions, together with the transformations computed
by the technique. Using transformations allows for simple nonlinear relationships in
the data, and the optimal transformations often give additional useful information
about the data.

CONCLUSIONS

Discussions of multivariate analysis, also in the ecological literature, often limit


themselves to various standard situations, and the associated techniques. Thus
multiple regression, principal component analysis, and canonical correlation
analysis are usually discussed, for situation in which we want to predict one
variables from a number of others, in which we want to investigate the structure of
a single set of variables, or in which we want to relate two sets of variables. The path
analysis techniques, with latent variables, discussed in this paper, make it possible to
use a far greater variety of models, and even to design a model which may be
especially suited for the data or the problem at hand. Usually the choice of the path
model will be based on prior knowledge the investigator has about the causal
relationships of the variables in the study. Although this far greater flexibility may
have its dangers, it is clearly a very important step ahead because incorporating
prior information into the analysis can enhance both the stability and the
interpretability of the results.
402

The nonlinear extensions of path analysis discussed in his paper allow for even
more flexibility. Not only can we choose the overall structure of the analysis by
choosing a suitable path model, but within the model we can also choose the
measurement level of each of the variables separately. Or, if one prefers this
terminology, we can define a suitable class of transformations for each variable
from which an optimal one must be chosen. The use of transformations can greatly
increase the explanatory power of path models, at least for the data set in question.
If the transformations we obtain are indeed stable, and also increase the quality of
the predictions, is quite another matter. This must be investigated by a detailed
analysis of the stability and the cross-validation properties of the estimates, which is
a very important component of any serious data analysis.
Thus we can say that this paper adds a number of very powerful and flexible
tools to the toolbox of the ecologist, with the logical and inevitable consequence that
these new tools can lead to more serious forms of misuse than the standard tools,
which are more rigid and less powerful. The major hazard is chance capitalization,
i.e. instability, and the user of these tools must take precautions against this danger.
But if suitable precautions are taken, the path analysis methods and the
generalizations discussed in this paper provide us with a convenient and useful way
to formalize scientific theories in situations, in which there is no precise knowledge
of the detailed mechanisms, or in which there are too many factors influencing the
system to make a precise deterministic description possible.

REFERENCES

BAGOZZI, R.P., C. FORNELL, AND D.F. LARKER. 1981. Canonical


correlation analysis as a special case of a structural relations model. Multivariate
Behavioural Research 16: 437-454.
BEKKER, P. 1986. Essays on the identification problem in linear models with
latent variables. Doctoral Dissertation. Department of Econometrics, Tilburg
University, Tilburg, The Netherlands.
CHANG, W.Y.B. 1981. Path analysis and factors affecting primary productivity.
Journal of Freshwater Ecology 1: 113-120.
DE LEEUW, J. 1984. Least squares and maximum likelihood for causal models
with discrete variables. Report RR-84-09, Department of Data Theory,
University of Leiden, The Netherlands.
DE LEEUW, J. 1985. Review of four books on causal analysis. Psychometrika 50:
371-375.
DE LEEUW, J. 1987. Nonlinear multivariate analysis with optimal scaling. This
Volume.
DIJKSTRA, T.K. 1981. Latent variables in linear stochastic models. Doctoral
Dissertation. Department of Econometrics, University of Groningen, The
403

Netherlands.
FRISCH, R 1934. Statistical confluence analysis by means of complete regression
systems. Economic Institute, University of Oslo, Norway.
GITIINS, R 1985. Canonical analysis. Springer, Berlin, BRD.
GOODMAN, L.A. 1978. Analyzing qualitative categorical data. Abt, Cambridge,
Ma.
GOSSELIN, M., L. LEGENDRE, J.-C. THERRIAULT, S. DEMERS, AND M.
ROCHET. 1986. Physical control of the horizontal patchiness of sea-ice
microalgae. Marine Ecology Progress Series 29: 289-298.
HARRIS, RE., AND W.AG. CHARLESTON. 1977. An examination of the marsh
microhabitats of Lymnaea tomentosa and L. columella (Mollusca: Gastropoda)
by path analysis. New Zealand Journal of Zoology 4: 395-399.
HSIAO, C. 1983. Identification. In Z. Griliches, and M.T. Intriligator [eds.]
Handbook of Econometrics I. North Holland Publishing Co., Amsterdam, The
Netherlands
JASPARS, J.M.F., AND J. DE LEEUW. 1980. Genetic-environment covariation in
human behaviour genetics. In L.1.Th. van der Kamp et al. (eds.) Psychometrics
for Educational Debates. John Wiley and Sons, New York, NY.
J0RESKOG, K.G., AND AS. GOLDBERGER. 1975. Estimation of a model with
multiple indicators and multiple causes of a single latent variable. Journal of the
American Statistical Association 70: 631-639.
J0RESKOG, K.G., AND H. WOLD. 1982. Systems under indirect observation.
North Holland Publishing Co., Amsterdam, The Netherlands.
KIIVERI, H., AND T.P.SPEED. 1982. Structural analysis of multivariate data. In
S. Leinhardt (ed.) Sociological Methodology. Jossey-Bass, San Francisco, CA
LEGENDRE, L., AND P. LEGENDRE. 1983. Numerical ecology. Elsevier
Scientific Publishing Company, Amsterdam, The Netherlands.
LOHMOLLER , J.B. 1986. Die Partialkleinstquadratmethode ftir Pfadmodelle mit
latenten Variablen und das Programm LVPLS. In L. Hildebrand et al. (eds.)
Kausalanalyse in der Umweltforschung. Campus, Frankfurt, BRD.
PEARSON, K. 1911. The grammar of science. Third Edition.
SCHWINGHAMER, P. 1983. Generating ecological hypotheses from biomass
spectra using causal analysis: a benthic example. Marine Ecology Progress
Series 13: 151-166.
SIMON, H.A 1953. Causal ordering and identifiability. In W.e. Hood, and T.e.
Koopmans (eds.) Studies in Econometric Method. John Wiley and Sons, New
York, NY.
SPEARMAN, e. 1904. General intelligence objectively measured and defined.
American Journal of Psychology 15: 201-299.
TER BRAAK, e.L.F. 1986a. Canonical correspondence analysis: a new eigenvector
technique for multivariate direct gradient analysis. Ecology, in press.
TER BRAAK, C.L.F. 1986b. The analysis of vegetation-environment relationships
by canonical correspondence analysis. Vegetatio, in press.
TROUSSELIER, M., P. LEGENDRE, AND B. BALEUX. 1986. Modeling of the
evolution of bacterial densities in an eutrophic ecosystem (sewage lagoons).
Microbial Ecology 12: 355-379.
TUKEY, J.W. 1954. Causation, regression, and path analysis. In O. Kempthorne
404

(ed.) Statistical Methods in Biology. Iowa State University Press, Ames, Iowa.
VAN DER AART, PJ.M., AND N. SMEEK-ENSERINK. 1975. Correlation
between distributions of hunting spiders (Lycosidae, Ctenidae) and
environmental characteristics in a dune area. Netherlands Journal of Zoology
25: 1-45.
WOLD, H. 1954. Causality and econometrics. Econometrica 22: 162-177.
WRIGHT, S. 1921. Correlation and causation. Journal Agricultural Research 20:
557-585.
WRIGHT, S. 1934. The method of path coefficients. Annals of Mathematical
Statistics 5: 161-215.
YOUNG, F.W. 1981. Quantitative analysis of qualitative data. Psychometrika 46:
347-388.
Spatial analysis
SPATIAL POINT PATTERN ANALYSIS IN ECOLOGY
B.D. Ripley
Department of Mathematics
University of Strathclyde, Glasgow U.K. Gl lXH.

Abstract - Statistics has been applied to ecological problems


involving spatial patterns for most of this century. Even in
the 1950's quite specialised methods had been developed for
detecting "scale" in grassland and to census mobile animal pop-
ulations (especially game). After a general discussion this
paper concentrates on point patterns and their analysis by quadrat
methods, distance methods and by fitting point~process models to
mapped data. Methods for detecting an interaction between species
are also discussed.

1. SOME HISTORY
Spatial statistics has a long history in fields related to
ecology. Forestry examples go back at least to Hertz (1909),
and ecologists have been proposing new methods since the pioneer-
ing work of Greig-Smith (1952) and Skellam (1952). The concerns
in those early days were principally to census populations and to
detect "scales" of pattern in plant communities. These problems
are still alive today, and many methods have been proposed.
(Unfortunately the statistical problems are subtle and by no means
all these methods are statistically valid.) Some specialist
techniques such as those for enumerating game from transect counts
have a history of thirty years or more.
It seems that the computer revolution has yet to make m~ch

impact on spatial studies in ecology. Laborious studies to map


bird populations, for example, have not been matched by similar
efforts in analysis (Ripley 1985). Automated data collection
by remote sensing is in its infancy but will raise many new
problems. The methods of spatial statistics available today
are undoubtedly somewhat subtler than the basic statistical
methods known to most biologists, and involve some computer
programming to be used effectively. However, the subject is now
in a fairly mature state and deserves to be better known (amongst
statistical consultants as well as by ecologists). Several texts
are available at different levels (Cormack and Ord 1980;

NATO ASI Series, Vol. G14


Developments in Numerical Ecology
Edited by P. and L. Legendre
© Springer-Vedag Berlin Heidelberg 1987
408

Diggle 1983; Pielou 1977 parts II and III; Ripley 1981).


Applications in other subjects can also be helpful (Hodder and
Orton 1976; Upton and Fingleton 1985).

2. WHAT IS SPATIAL STATISTICS?

Statisticians have the advantage of a central position in


being consulted by scholars from a wide range of disciplines
about problems with a common structure. Thus spatial statistics
has grown up as a collection of methods distilled from typical
problems in agriculture, archaeology, astronomy, biology,
ecology, forestry, geography, geology, mining, oceanography, •••
In addition it has a mathematical life of its own in studying
these methods.
Not all of the strands of spatial statistics appear to be
directly relevant to ecology, and here I will concentrate on
point patterns. That is, we will study populations made up of
distinct individuals or clumps of individuals, such as
(a) trees in a forest
(b) flowering plants in grassland or heathland
(c) nesting birds on an island
(d) schools of whales in an ocean
(e) ants' nests.
As far as the .methods are concerned the points might equally be
crystals in a rock or stars in the sky. The points need not all
be of the same type. For example, Harkness and Isham (1983)
studied the interactions of the populations of two species of
ants.
There are two basic questions to be addressed:
(I) How many individuals are there in the population,
(II) Do the individuals interact on any characteristic 'scales'?

The obvious answer to the first question, to count the population,


may be economically infeasible and. some form of samplin~ is
required. The second question can be considered either for
sampled or completely mapped populations. Modern trends in the
subject are to work with mapped patterns and to try to summarize
the structure in the pattern(s) by a mathematical model. Some
409

examples are described below.


One potentially important area which has had little impact
on ecology is the analysis of set patterns. Pielou (see part
III of Pie lou 1977) considered mosaics of plant communities of a
small number of types. Diggle (1981) and Ripley (1986) analyse a
20m x 10m plot of heather (Calluna vulgaris). Part of the plot
was covered by heather and part was bare, the analyses aiming to
characterize the shape of the patches of heather. I suspect such
analyses would be quite widely used if they were better known.
Some other spatial problems in ecology are much more
specialized. For example, the study of the potential spread of
rabies by foxes has used specialized models of space-time
epidemiology (Bacon 1985). More generally, the mapping of
mobile animal populations by detailed observation or telemetry can
provide a great deal of data to which few formal techniques have
been applied. We should note that the human visual system forms
a very efficient pattern recognition system which can often make
sense of a computerized replay of such data. Unfortunately (in
this context) humans appear to be pre-programmed to expect
patterns and so readily detect patterns where none exist. The
desire for objectivity is behind much of the development of formal
methods of spatial analysis.
Very many ad hoc methods have been suggested by ecologists
and others to answer questions (I) and (II) above. Statisticians
have in general preferred a model-based approach. For example,
there are many tests of "randomness" of a point pattern. Most of
these are pure significance tests which when they reject the null
hypothesis give no indication of a suitable alternative. The
model-based approach embeds "randomness" into a spectrum of models
and selects the best-fitting model in that class. This avoids the
problem of a departure from "randomness" which is statistically
significant but ecologically minute.

3. QUADRAT SAMPLING
A traditional way to sample grassland is to use quadrats.
These are small (Scm-1m) metal squares used to select a sampling
region. Three types of sampling are in common use:
410

(a) random sampling. Here the quadrats are thrown down at


random (often literally thrown) and their spatial
positions ignored.
(b) grid sampling. A square or rectangular area is
systematically divided up into squares.
(c) transect sampling. A long line (64-512) of squares is
marked out.

A score is calculated for each quadrat. This can be a count of


plants or animals or a measurement of yield or "cover".
Similar principles apply on other scales. The quadrats can
be squares etched on a microscope slide, marked areas in forest
or moorland, or superimposed squares on an aerial photograph.
Quadrat sampling has two distinct aims corresponding to
questions (I) and (II) of the introduction. Suppose we are
interested in estimating a total population and we count the
individuals in each square. Define the intensity A to be the
number of individuals per unit area. Suppose the quadrats have
area A and the quadrat counts are xl"",x n ' Then a very obvious
estimator of A is

A = x/A
Under random sampling this is unbiased but its variance depends on
the spatial pattern. Intuitively, the variance will be low if
the pattern is rather regular, and high if the individuals occur
in small (relative to the quadrat) clumps. The unbiasedness of
this estimator makes it a good choice for censusing populations
whenever it is feasible.
The benchmark for spatial point patterns is the Poisson
process, the mathematical model for complete randomness. The
number of points in non-overlapping subregions are independent.
In a region of area A the total number has a Poisson distribution
of mean AA. Thus

EA = A, var A = A/A
This can be used to give confidence limits on the total population
size but will be optimistic for clustered patterns.
Some workers have tried to turn the dependence of var(x.) on
1
the pattern to advantage. Many indices have been developed which
are combinations of x and s2, the sample mean and variance of the
411

quadrat counts. Some are given by Ripley (1981 pp.104-6). Their


proposers have given heuristic interpretations for these ind~ces
but without exception they have failed to survive closer exam-
ination.
Another approach of long standing amongst ecologists has
been to fi t a discrete distribution to the counts (xl"'" x n )·
Early examples of such studies include Thomas (1949) and Skellam
(1952). Rogers (1974) gives an elementary introduction to the
theory whereas Douglas (1979) is more advanced.
In a sense these methods are all doomed to failure. Although
there is some information on the spatial pattern in the counts
(and more can be extracted if several sizes of quadrat are used)
it is negligible compared to the information lost when the positions
of the quadrats is ignored. This is the advantage of grid and
transect sampling, techniques associated with Greig-Smith (1952)
and Kershaw (1957).

4. BLOCKS OF QUADRATS

With a systematic layout of quadrats, information on different


spatial scales can be extracted in one of two ways.

(a) Look at pairs of quadrats distance r apart and compute


a measure of dependence such as their correlation.
(b) Aggregate quadrats into larger rectangles, and see how
the variability of counts varies with the size of the
quadrat.

Greig-Smith's original method was of type (b). The grid of


squares is combined alternately horizontally and vertically, so
a 16 x 16 grid becomes successively 16 x 8, 8 x 8, 8 x 4, 4 x 4,
4 x 2, 2 x 2, 2 x 1 and 1 x 1. His analysis was a nested
analysis of variance, measuring, for example, the variability of
4 x 4 squares within 8 x 4 rectangles. This is then plotted
against block size (here 16 = 4 2 ).
Many later modifications (Hull 1973; Usher 1969, 1975;
Mead 1974; Orloci 1971; Zahl 1974, 1977) use the same idea but
with different measures of variability. Fewer analyses of type
(a) have been proposed. Goodall (1974) is one, fallacious
(Zahl 1977, p.684), example. The main alternative has been
412

spectral analysis, proposed for a transect by Hill (1973), Usher


(1975) and Ripley (1978) and illustrated for grids by Ripley
(1981).
This is a specialized area with considerable dispute between
ecologists as to which methods are valid. As a statistician I
have been very critical of much of the work in this area (Ripley,
1978, 1981) and am least unhappy about spectral analysis. How-
ever, some of the realistic synthetic examples given in Ripley
(1978) show that none of the transect methods detect visually
obvious spatial patterns. Methods to study data on grids of
quadrats, especially spectral analysis, seem to be a little less
disappointing.

5. DISTANCE METHODS
The basis of distance methods for estimating intensity is
that if the points are densely packed the distances from each
point to its nearest neighbour will be small. Let d denote this
distance. Then dimensional considerations show that

Unfortunately the constant of proportionality depends on the


pattern of the points.
These methods were originally developed in forestry, and to
reduce the load on the word "point" we will consider estimating
the intensity of a forest of trees. Suppose this is a Poisson
forest, so completely random. Then

P(d > r) = P(no tree within disc of radius r)

= A x area of disc = TIAd 2


From this we can deduce that d 2 has an exponential distribution of
rate TIA. Suppose we select m sample points and measure the
distance d. from each to the nearest tree. Then the maximum
1
likelihood estimator A of A is

A = "!
m/TI E d.
2
1 1

This is not unbiased, but E l/A = l/A! Unfortunately for regular


patterns d.1 will tend to be smaller than for a Poisson forest
413

and so A will be an over-estimate. Conversely, for a clustered


pattern A will be an under-estimate.
A related idea is to measure distances from randomly chosen
trees to the nearest tree. This gives the same distribution for
a Poisson forest. However, with distances measured this way A
tends to under-estimate for regular patterns and over-estimate for
clustered patterns.
These comments have been known for a long time and have led
to three responses. One of the earliest ideas was to make some-
thing of this apparent drawback. Hopkins (1954) suggested using
the ratio of A for point-tree to tree-tree sampling as a test of
randomness. Skellam (1952) had the same idea for the ratio of
point-tree and quadrat estimators.
Another idea was to combine the two estimators in order to
try to cancel out their biases. Many such studies were done on
the 1970's using simulation, of which the most recent survey is
Byth (1982).
The final response is to seek other sampling schemes. Tree-
tree measurements as described above are pointless, since to
select a tree at random one needs to have enumerated all the trees!
It was not until Byth and Ripley (1980) that a valid way was found
to implement Hopkins' scheme. However, two earlier schemes have
similar (but not identical) properties. An early idea was to
select a sample point, move to the nearest tree and then measure
the distance to its nearest neighbour. The first tree is not
selected at random and the distribution theory is complicated but
has been used by Cox (1976) and Cox and Lewis (1976) to produce
estimators of A and tests of randomness (respectively).
Perhaps the most promising scheme is the T-square method of
Besag and Gleaves (1973), illustrated below

• • •

• • •
• •
• • • • •
• • • •
• • •
414

A sample point is chosen, and the distance to the nearest tree


measured. The distance to its nearest tree outwards (away from
the sample point) is then measured. Since searching for this
tree is over an area disjoint from the first search, the distances
are independent in a Poisson forest. Let the distances be d l and
d 2 , and let u = ndi, v = !nd~ be the areas searched in selecting
the nearest tree. For samples (u.,v.) i = l, ••• ,m, the
~ ~

recommended estimator is

A =m//[Eu.xEv.]
~ ~

and a good test of randomness is


2
t[ = 2mE(u.+v.)/{E(/u.+/v.)
~ ~ ~ ~
}

For further details see Ripley (1981, §7.1)


Distance methods have not proved reliable in practice and
foresters have adopted other methods, principally the use of a
relascope (an instrument which is used to look for trees which
subtend an angle exceeding aD). Ord (1978) gives an elementary
account of forest enumeration, but from the point of view of a
mathematical exercise.

6. COUNTING MOBILE POPULATIONS

Special problems arise in trying to census deer, grouse and


other game species; indeed all animals which will flee when
approached. Line transect methods have been developed to combat
these difficulties.
The observer walks along a long transect across the study
region. When an animal is flushed he marks the spot and measures
the distance to the line. This can either be the perpendicular
distance x or the direct distance d

observer
415

The idea is to use the number of animals flushed, n, in a walk


of distance L to assess the intensity A and hence the total
population. To do so we assume
(a) birds move only after detection,
(b) no bird is detected more than once,
(c) the probability of detecting a bird is a decreasing
function g(x) with g(O) = 1, and
(d) birds are flushed independently.

With these assumptions one can show that

En = 2AL~
so

A= n/2L~

where
~ = f~ g(y)dy < ~
o

To use this estimator we need to know g. However, the probability


density function f of x is g(x)/~, so we can estimate
l/~ = g(O)/~ = f(O) from our measurements of x.

A large variety of methods have been proposed to estimate


reO). One can fit a parametric model or use non-parametric
density estimation plus many ad hoc techniques. No clear
consensus has emerged. Burnham et al, (1980) is the
main reference. More complicated methods infer the pdf of (x.)
1
from observations (d.)
1
but seem less often used. (See Upton and
Fingleton 1985, §2.3.)
The problems with this method lie more with its assumptions
than with technical points. Animals will move whilst the survey
takes place, and will occasionally be counted more than once.
The assumption of independent flushing is not essential for the
estimator of A, but the approximation

var A ~ AO+/(J2)/2itL
where n var f(O) ~ (J2 depends critically on independence. Such
surveys seem more useful for assessing trends in population
numbers than for estimating absolute population sizes.
Other specialized methods are available for, for example,
416

censusing songbirds, and for small mammals (Anderson et al. 1983).

7. MAPPED POPULATIONS

When a completely mapped population (such as a map of nests)


is available the interest is, of course, entirely in the spatial
pattern of the points. One might wish to study the pattern to
estimate the maximum capacity of a region, for example. Harkness
and Isham (1983) studied the nesting patterns of two species of
ant, one of which provided a food supply for the other.
The simplest analyses of a mapped pattern is merely to test
for "randomness". However, this is unlikely to have much point
since ecologists are unlikely to go to all the trouble of mapping
a pattern unless they expect interesting features. Thus in this
case the null hypothesis of "randomness" is probably not tenable
Qefore any data are collected. Such tests are used frequently
but apparently merely to add an aura of statistical respectability.
The next stage is a trichotomy into "regular", "random" and
"clustered". At least in such cases the test gives some
indication of the departure from randomness. One of the most
(mis)used of such tests is that of Clark and Evans (1954).
Suppose N points are observed in a region of area A. From each
point compute the distance d i to its nearest neighbour. Then the
Clark-Evans test is
(d-Ed)
CE = stdev(d)
which is referred to the standard normal distribution. The claim
is that

Ed = l/(A/N)

var d = (4-n) A/ 4nN 2 "" O. 0683A/N 2

Then small values of R = d/Ed correspond to a clustered pattern


and large values to a regular pattern.
The test is widely used in this form, but the derivations of
Ed and var d ignore both edge effects and dependence between d.
1
and d.. A large number of remedies have been proposed, the most
J
effective being that of Donnelly (1978), who kept the same formula
417

with

Ed ~ 0.5/(A/N) + (0.514 + 0.412//N)P/N

var d ~ 0.070A/N 2 + 0.037P /A/N 2 . 5

for a rectangle of area A and perimeter P. Since Clark and


Evans underestimated Ed they would overestimate R and hence biased
their trichotomy towards "regularity".
Brown and Rothery (1978) considered a problem in which the
region and hence A was not well defined. Newton and Campbell
(1975) had studied the spacing of the nests of ducks on an
island. These ducks nested densely only within Deschampia tussocks,
the boundary of this area being ill-defined. Brown and Rothery
proposed scale-free tests of "randomness", the coefficient of
variation S and the ratio G of the geometric mean to the arithmetic
mean of (d~). However, whereas the mean values of Sand G may
1
not depend on A, th~ir distribution does depend on the shape of
the region. This raises another problem. The theory for all
these assumes that the intensity A is uniform throughout the
study region. In the case of nesting ducks this is probably not
true, the nesting density reducing as the tussocks thin out. Thus
the problem is not really one of an ill-defined region but one of
heterogeneity, a variation of intensity across the region.
A basic question we have avoided so far is precisely what is
meant by a regular, clustered or heterogeneous pattern. It
transpires that the concepts are not exclusive. A pattern can
be regular at a very small scale but clustered at a larger scale
or conversely. Consider the nests on the island again. At the
scale of 1m they are regularly spaced from the birds' territorial
behaviour. Yet at a scale of 100m the nests clump together on
the favourable nesting areas. Further, clustering and
heterogeneity cannot be distinguished from a single sample. The
patterns produced by birds choosing to nest together (clustering)
are statistically indistinguishable from those governed by
environmental factors (heterogeneity). The two mechanisms can
only be distinguished by a series of samples.
These points are not well understood in the ecological
literature and have led to much confusion. There are even tests
418

proposed to distinguish clustering from heterogeneity! In studies


of spatial pattern it helps enormously to set up carefully
ecological hypotheses about what might be happening. In this sense
modelling becomes an essential pre-requisite to data analysis. It
is essential also to be able to summarize whether clustering or
regularity is occurring at different scales. This is the aim of the
more refined analyses presented in Ripley (1981) and Diggle (1983).
We will give only the most popular such analysis here and refer
the reader to the texts for further details.
Ripley's K-function is based on distances between all pairs
of points. Up to any distance apart there could be more or fewer
pairs than we would expect under randomness. If there are too
many there will be clustering at that scale, if too few,
regularity, and the extent of the excess or shortfall measures the
size of the effect. The problem is that what we expect depends
heavily on the size and shape of the region under study. This
could be a very complicated shape such as patches of woodland as
in Ripley (1985). By correcting for edge effects we can produce a
distribution of interpoint distances independent of the shape of
the study region.
Formally,
A
K(t) = -Z Lk(x,y)
A

the sum being over ordered pairs (x,y) of points. Here k(x,y) is
a weighting factor to allow for edge effects;
l/k(x,y) = proportion of circle centre x through y
which is within the study region.
For a Poisson pattern ("randomness")
EK(t) ~ nt 2

so this is the standard against which we measure regularity or


clustering at scale t. To stabilize the variance, and to give a
visually simpler plot it is easier to consider
L(t) = /(K(t)/n)

for which L(t) =t is the datum of randomness.Some intricate


theory shows that if we do have randomness we would not expect
419

L(t) to stray from t more than 1.5/N at any t-value. This gives
a very sensitive formal significance test of randomness, but the
plot of L vs t is more useful in describing the ecologically
significant features of the pattern.
Some examples of this analysis are shown in Figures 1-4.
All the examples are within a metre square, and all distances
are in metres. Figure la is a "random" pattern, a sample of a
Poisson process. Its L-plot in Figure Ib shows conformity to
L(t) = t. Figure 2 is a regular pattern, of points restrained
from being closer than 40cm apart, a feature which is seen quite
clearly in Figure 2b. The pattern in Figure 3a could be either
heterogeneity or clustering; Figure 3b indicates "clustering" at
a scale of 250cm. Finally, Figure 4a is the type of pattern
which defeats the indices referred to in section 3. As Figure 4b
shows, there is regularity, clustering and regularity at
successively increasing scales.
Biological case studies in the use of K are given by Ripley
(1981, 1985) for nest spacings, Ripley (1977) (see also Diggle
1983) for redwood seedlings and biological cells, Diggle (1983)
for bramble canes, and Pedro et al. (1984) and Appleyard et al.
(1985) for features in membranes of muscle fibres.
These summaries can be used both to suggest suitable models
for the patterns under study and to help fit such models. For
example, the studies of birds' nests concluded with a model that
inhibited pairs of nests closer than a certain distance and, for
some species, a less rigorous exclusion for slightly larger
distances. This provides both a biologically useful summary of
the pattern and reassurance that there is nothing significant in
the data not explained by such a simple description.

8. INTERACTION BETWEEN SPECIES

Thus far we have only considered patterns of indistinguish-


able points. Interesting ecological problems often involve
the interaction of two or more species. We have already mentioned
the study of Harkness and Isham (1983). A more complicated and
extensive study by Byth (1980) involved the association of
420

. .," . .".
.. :
-. ..

:. . .'
." (a)
.'
. . '.
, ...••. -..
. .'
.. '
'. '.

.. .. •
.. -.
'.' .. ""
' .. ','

0.6

0.5

( b)
0."1

0.3

0.2

0.1

0.1 0.2

Fig.1. (a) A plot of 300 points within a 1 metre square.


(b) L-plot of this set of data.
421

.. .. . .

(a )

..
..
..

.. ..

0.8

0.7

0.6
(b)
0.5

0."t

0.3

0.2

0.1

0.2 0.3 0."t 0.5 0.6 0.7

Fig. 2. (a) A plot of 200 points with (b) its L-plot.


422

. ' .. ~. . .' .0.


., .
..'
! : .........' .. ~

'0 • '0 ':' ~. • .'

0° ,-. ': •••• ~. •

... ",
., .
"',
'
.... (a )
..
"
" '
'
..
..
'0 ••

• • 0°
.....
'. , .' .'
:'

0.6

0.5

(b)
0.'"

0.3

0.2

0.1

0.0~.-ro-.-r'-"-'-r,-"ro-r,-"ro-r'-'-~1
0.0 0.1 0.2 0.3 0.'" 0.5

Fig.5. (a) A plot of 314 points. (b) L-plot of this dataset,


with the lines L = t and L = + I.S/N.
423

, . .'

(a )

.
"
.. .. .
..

. ....
,
,
..
~

0.6

0.5

(b)
0.'1

0.3

0.2

0.1

0.0~~"-..-.-,,-..-.-ro-.-.,-,,-.-.,-,,,,~
0.0 0.1 0.2 0.3 0.'1 0.5

Fig. 4. (a) A plot of 80 points with regularity at scales of


20 and 200cm but clustering at scale of 8ocm.
(b) The L-plot.
424

three species of fungi with birch trees (Betula pendula). The


patterns of each of the species around a single tree were mapped
in three successive years. Thus the total pattern contained
nine types of points, identified by species and year. This study
was unusual in that the pattern was clearly not homogeneous, and
only radial symmetry about the tree was assumed. Newton and
Campbell (1975) in studying patterns of ducks' nests considered
four species, and distances from nests to others of any species
as well as to the nearest of the same species.
Some traditional ways to analyse association and segregation
are given by Pielou (1977 Chapters 13-15). Consider first counts
of individuals in each of k species. These could be in discrete
habitable units (e.g. rock pools) or in quadrats. The analysis
is then spatial only in the sense that we are testing whether or
not the species tend to occur together or not.
For simplicity consider k = 2 and species A and B. Th.en the
data can be summarized as

species B

present absent
species A present a b
absent c· d

giving the counts of all four possibilities of A and/or B being


present in the quadrat. This is a 2 x 2 contingency table. If
there was no interaction between the species we would expect

(a+b) (a+c)
a = (a+b+c+d) x (a+c+b+d) x (a+b+c+d)

and similar formulae for b,c and d. These reduce to the single
condition ad = bc. The cross-product ratio

1/1 = ad/bc
measures association. For 1/1 = 0 there is no association. If
1/1 > 0 species A and B tend to occur together whereas if 1/1 < 0
then segregation occurs. Another indicator of association is
2
the X -test statistic
425

2
X
2 N!ad-bc! N = (a+b+c+d)

Here, X2 > 0 indicates either association or segregation. It is


tempting (and commonly done) to test X2 against a chi-squared
distribution. This will only be valid if the individuals occur
in the quadrats independently. This implies a null hypothesis of
no association and randomness for each species separately. (In
fact randomness for one species would suffice.) The important
point here is that a slight association between two species with
very regular patterns is much less likely to happen by chance than
if the species were each clustered.
A similar drawback applied to another traditional analysis.
Consider just species A and B. For each individual we record
whether its nearest neighbour is of species A or B. The data
ore summarized as

neighbour
A B
point A a b
B c d

This is again a 2 x 2 contingency table to which a X2 -test could


be applied (and has been). However, here there is the problem of
'reflexive neighbour pairs' in which two individuals are mutual
nearest neighbours. Thus the chi-squared distribution is never
appropriate, not even if we are considering two independent
completely random populations!
These simple examples indicate a rather general problem with
testing for interaction. It is fundamentally impossible to test
for interactions without assuming something about (or conditioning
on aspects of) the pattern of each species separately. Two
successful approaches have been taken.
One simple idea is to condition on both patterns. That is,
the pattern of each species separately is assumed given but the
two patterns are allowed to be moved relative to each other. If
there is no interaction the distribution of any measure of inter-
426

action must be unchanged. Thus one takes random displacements


of the pattern of species B and computes the measure of inter-
action with the fixed pattern of species A. If the measure is
extreme for the true position then there is evidence of genuine
interaction. Besag and Diggle (1977) describe an example for
blackbirds and Harkness and Isham (1983) give an extended example
for their two species of ants.
A more sophisticated type of analysis is to attempt to
describe and model the patterns of two or more species
simultaneously. This is ambitious and has been done rarely. The
statistic K(t) of section 7 can be extended to pair of species,
using pairs of individuals of species A and species B. Byth (1980)
and Harkness and Isham (1983) both took this approach, with some
limited success.

9. EPILOGUE

A lot is known about spatial analysis in ecology, and


ecological examples have been important in the development of
spatial analysis in statistics. Yet many weak, inappropriate
or even misleading methods continue to be used and can be seen
in almost any issue of an ecological journal. It is certainly
true that a full statistical analysis of an ecological data set
is time consuming and needs some statistical maturity, yet the
time involved is unlikely to be significant compared to the
fieldwork involved in collecting the data. Perhaps we
statisticians must accept that ecologists prefer to be out in
the field and both sides should seek more effective collaboration.

REFERENCES

ANDERSON, D.R.,. K.P. BURNHAM, G.C. WHITE, and D.L. OTIS. 1983.
Density estimation of small-mammal populations using a
trapping web and distance sampling methods. Ecology 64:
674-680.

APPLEYARD, S.T., J.A. WITKOWSKI, B.D. RIPLEY, D.M. SHOTTON, and


V. DUBOWITZ. 1985. A novel procedure for the pattern analysis of
features present on freeze-fractured plasma membranes.
J. Cell. Science. 74: 105-117.
427

BACON, P.J. 1985. Mathematical Aspects of Rabies Epizootics.


Academic Press, London.
BESAG, J. and P.J. DIGGLE. 1977. Simple Monte Carlo tests for
spatial pattern. Appl. Statist. 26: 327-333.

BESAG, J.E. and J.T. GLEAVES. 1973. On the detection of spatial


pattern in plant communities. Bull Int. Statist. Inst.
45(1): 153-158.

BROWN, D. and P. ROTHERY. 1978. Randomness and local regularity


of points in a plane. Biometrika 65: 115-122.

BURNHAM, K.P., D.R. ANDERSON, and J.L. LAAKE. 1980. Estimation


of Density from Live Transect Sampling of Biological
Populations. Wildlife Monograph no. 72 (with J. Wild. Mang.
44).

BYTH, K. 1980. The Statistical Analysis of Spatial Point Patterns.


Univ. London Ph.D. thesis.

1982. On robust distance-based intensity estimators.


Biometrics 38: 127-135.

BYTH, K. and B.D. RIPLEY. 1980. On sampling spatial patterns


by distance methods. Biometrics 36: 279-284.

CLARK, P.J. and F.C. EVANS. 1954. Distance to nearest neighbour


as a measure of spatial relationships in populations.
Ecology 35: 445-453.
CORMACK, R.M. and J.K. ORD. (eds) 1980. Spatial and Temporal
Analysis in Ecology. Int. Co-op. Publ. House, Burtonsville,
Md.

COX, T.F. 1976. The robust estimation of the density of a


forest stand using a new conditioned distance method.
Biometrika 63: 493-499.
COX, T.F. and T. LEWIS. 1976. A conditioned distance ratio
method for analysing spatial patterns. Biometrika 63:
483-491.

DIGGLE, P.J. 1981. Binary mosaics and the spatial pattern of


heather. Biometrics 37: 531-539.

1983. Statistical Analysis of Spatial Point


Patterns. Academic Press, London. 148p.

DONNELLY, K.P. 1978. Simulations to determine the variance and


edge effect of total nearest neighbour distance. In
I. Hodder [ed] Simulation Methods in Arcnaeology, cambridge
University Press, London.

DOUGLAS, J.B. 1979. Analysis with Standard Contagious


Distributions. Int. Co-op. Publ. House, Burtonsville, Md.
428

GOODALL, D.W. 1974. A new method for the analysis of spatial


pattern by random pairing of quadrats. Vegetatio 29:
135-146.
GREIG-SMITH, P. 1952. The use of random and contiguous quadrats
in the study of the structure of plant communities.
An~ Botany 16: 293-316.

HARKNESS, R.D. and V. ISHAM. 1983. A bivariate spatial point


pattern of ants' nests. Appl. Statist. 32: 293-303.

HERTZ, P. 1909. "


Uber die gegenseitigen Durchshnittlichen Abstand
von Punkten, die mit bekannter mittlerer Dichte im Raum
angeordnet sind. Math. Ann. 67: 387-398.

HILL, M.D. 1973. The intensity of spatial pattern in plant


communities. J. Ecology 61: 225-235.

HODDER, I. and C. ORTON. 1976. Spatial Analysis in Archaeology.


Cambridge University Press, London. 270p.

HOPKINS, B. 1954. A new method of determining the type of


distribution of plant individuals. Ann. Botany 18: 213-227.

KERSHAW, K.A. 1957. The use of cover and frequency in the


detection of pattern in plant communities. Ecology 38:
291-299.

MEAD, R. 1974. A test for spatial pattern at several scales


using data from a grid of contiguous quadrats. Biometrics 30:
295-307.

NEWTON, I. and C.R.G. CAMPBELL. 1975. Breeding of ducks at


Loch Leven, Kinross. Wildfowl 26: 83-103.

ORD, K. 1978. How many trees in a forest? Math. Scientist 3:


23-33.
ORLOCI, L. 1971. An information theory model for pattern
analysis. J. Ecology 59: 343-349.
PEDRO, N., M. CARMO-FORSECA, and P. FERNANDES. 1984. Pore
patterns on prostate nuclei. J. Microscopy 134: 271-280.
PIELOU, E.C. 1977. Mathematical Ecology, Wiley, New York. 384p.

RIPLEY, B.D. 1977. Modelling spatial patterns. J.R. Statist.


Soc. B39: 172-212.

1978. Spectral analysis and the analysis of


pattern in plant communities. J. Ecology 66: 965-981.

1981. Spatial Statistics. Wiley, New York. 252p.


429

1985. Analyses of nest spacings. p.151-158.


In B.J.T. Morgan and P.M. North reds] Statistics in
ornithology. Lecture Notes in Statistics 29.

1986. Statistics, images and pattern recognition.


Can. J. Statist. (in press).

ROGERS, A. 1974. Statistical Analysis of Spatial Dispersion.


The Quadrat Method. Pion, London. 164p.

SKELLAM, J.G. 1952. Studies in statistical ecology. I. Spatial


pattern. Biometrika 39: 346-362.

THOMAS, M. 1946. A generalization of Poisson's binomial limit


for use in ecology. Biometrika 36: 18-25.

UPTON, G. and B. FINGLETON. 1985. Spatial Data Analysis by


Example. Volume I. Point Pattern and Quantitative Data.
Wiley, Chichester, 410p.

USHER, M.B. 1969. The relation between mean square and block
size in the analysis of similar patterns. J. Ecology 57:
505-514.

1975. Analysis of pattern in real and artificial


plant populations. J. Ecology 63: 569-586.

ZAHL, S. 1974. Application of the S-method to the analysis of


spatial pattern. Biometrics 30: 513-524.

1977. A comparison of three methods for the analysis


of spatial pattern. Biometrics 33: 681-692.
APPLICATIONS OF SPATIAL AUTOCORRELATION IN ECOLOGY

Robert R. Sokal and James D. Thomson


Department of Ecology and Evolution
State University of New York at Stony Brook
Stony Brook, New York 11794-5245 USA

Abstract The methods of spatial autocorrelation analysis for


both continuous and nominal variables are explained. Spatial
correlograms depict autocorrelation as a function of geographic
distance. They permit inferences from patterns to process. The
Mantel test and its extensions are special ways of detecting
autocorrelation in ecology. The methods are applied to the
spatial distributions of ecological variables in two understory
plants in the genus Aralia.

INTRODUCTION

Most problems in ecology have a spatial dimension because


organisms are distributed over the surface of the earth.
Ecologists have, for many years, studied problems involving the
spatial distribution of individuals of a species and the joint
distributions of several species. One way to examine such
distributions is through the study of point distributions, a
subject reviewed in another chapter, by B.D. Ripley, in this
volume. Other spatial approaches in ecology are biogeographic
and deal with the distribution of species over the face of the
earth and with the congruence between spatial distribution
patterns of different species (Lefkovitch 1984, 1985). The
present chapter deals with yet another spatial aspect of
ecological research, the statistical properties of surfaces
formed by variables of ecological interest.
Typical data for such studies are sampling stations in
geographic space, represented as pOints in the plane. These
stations may be regularly spaced as in a linear transect or a
lattice; in most applications they are irregularly distributed,
as are plants in a field or islands in an archipelago. Defined
regions or areas can be used as well. For purposes of analysis,

NATO AS! Series, Vol. G14


Developments in Numerical Ecology
Edited by P. and L. Legendre
© Spnnger-Verlag Berlin Heidelberg 1987
432

each such unit would be considered a pOint. Irregular spatial


distribution of the sample locations may reflect no more than
the haphazardly chosen sites for specimen collection. However,
the distribution of the sample stations may often impart
important information about the populations. Because organisms
are more common in one area than another, different densities of
collection sites result. Such a pattern of distribution of
sites may well be of interest and is dealt with by Ripley (1987)
in this volume. However, for purposes of this chapter we shall
consider the distribution patterns of pOints as given and focus
attention on the variables mapped onto the pOints, one value per
variable for each pOint. The variables may run the gamut of
those studied in ecology, including biomass, population density,
morphometrics, species diversity, gene frequency, and others.
The data values observed at a set of sampling localities
constitute a set of discrete observations assumed to have been
taken from an underlying "surface". The observations mayor may
not have measurement error and the surface mayor may not be
continuous.
We shall focus on the spatial autocorrelation exhibited by
the variables observed at the sampling stations. Spatial
autocorrelation is the dependence of the values of a variable on
values of the same variable at geographically adjoining
locations. Early work in this field (Moran 1950; Geary 1954)
was rapidly followed by applications to ecological work (Whittle
1954; Matern 1960). However, only with the important summary
furnished by Cliff and Ord (1973) and its renewed application to
biology (Jumars, Thistle and Jones 1977; Jumars 1978; Sokal and
Oden 1978a,b) did the study of spatial autocorrelation begin to
make an impact on ecological and population biological research.
Biological variables are spatially autocorrelated for two
reasons: inherent forces such as limited dispersal, gene flow,
or clonal growth, tend to make neighbors resemble each other;
and organisms may be restricted by, or may actively respond to
environmental factors such as temperature or habitat type, which
themselves are spatially autocorrelated. Spatial
autocorrelation methods may be used for description of surfaces
as well as for making inferences from pattern to the process
433

that has produced the pattern We shall detail both aspects in


the ensuing account, which is arranged as follows. The
methodology is introduced first. followed by an account of its
application. This will include aspects of inference about
ecological processes from spatial patterns in the data.
Finally, we shall present two ecological examples to illustrate
the application of the methods.

THE METHOD

Spatial autocorrelation computations. Two coefficients are


most frequently employed to describe spatial autocorrelation in
continuous variables. Moran's coefficient (Moran 1950) is
computed as

and Geary's ratio (Geary 1954) as

In these formulas. n is the number of localities studied; Sjk


indicates summation over all j localities from 1 to n and over
all k localities from 1 to n. j P k; Sj indicates summation over
all j localities from 1 to n; Wjk is the weight given to a
connection between localities j and k (these weights are
discussed below; Wjk need not equal Wkj); Zj = Yj - Y. where Yj
is the value of variable Y for locality j and Y is the mean of Y
for all localities; and W = SjkWjk. the sum of the matrix of
weights. j ~ k. Details of the computation. as well as standard
errors for testing the statistical significance of the spatial
autocorrelation coefficient. are furnished by Cliff and Ord
(1981) and. in simplified form. by Sokal and Oden (1978a).
Moran's I-coefficient resembles a product-moment
correlation coefficient. It usually varies between -1 and +1;
Cliff and Ord (1981) have shown that its upper bound ordinarily
will be less than unity. but could exceed unity for an irregular
4M

pattern of weights. The limits for Geary scare 0 for perfect


positive autocorrelation (similar neighbors) and a positive.
variable upper bound for negative autocorrelation (dissimilar
neighbors). In the absence of spatial autocorrelation. the
expected value of 1 is -l/(n - 1) and of Geary's c is 1. The
results of employing 1- and c-coefficients are generally
similar. although. with unusually distributed weight matrices.
results by the two methods may differ substantially (Sokal
1979). Following a Monte Carlo simulation study. Cliff and Ord
(1981) conclude that "the I-test is generally better than the c-
test although the margin of advantage may be slight".
The weights in the above formulas measure the connection or
influence of locality j upon locality k. They can be functions
of geographic distances between pairs of localities, such as
inverse distances or inverse squared distances. These weights
are assembled in an n x n matrix with a weight for each locality
pair jk. An alternative approach uses a binary weight matrix,
where 1 indicates connection or adjacency between two localities
and 0 signifies the lack of such a connection. When the
sampling stations represent regions, all regions sharing a
common boundary may be connected, and those lacking such a
boundary left unconnect·ed. When the sample localities are
points in a space. various geometric rules for establishing
connectivity can be imposed (Tobler 1975). A common method for
biological applications assumes that spatial influences take a
direct path: In a Gabriel graph (Gabriel and Sokal 1969; Matula
and Sokal 1980) two localities A and B are connected if, and
only if, the square of the distance between A and B is less than
the sum of the squares of the distances to any other locality C.
Because a Gabriel graph connects nearest neighbors, it
represents the paths of likely interaction (such as gene flow)
among localities (Gabriel and Sokal 1969). An alternative
design, the nearest neighbor or minimum spanning tree
connection. is a subgraph of a Gabriel graph (Matula and Sokal
1980).
From a binary matrix connecting the localities, geographic
distances between localities can be computed along the
connections rather than directly (great circle or Euclidean
435

distances). The shortest distance between any pair of


localities along a connecting graph is computed by a so-called
c~scade algorithm. Distances between adjacent localities will
be the same for great circle distances or distances along
Gabriel graphs. But distant localities will be farther apart
when measured along a connectivity graph. In studies with a
large number of localities, it probably does not matter which
approach is chosen; direct distances require fewer computational
steps.
Graphs of the relation between spatial autocorrelation
coefficients and geographic distance are called spatial.
correlograms. They are computed by preparing a frequency
distribution from the matrix of geographic distances between all
pairs of localities and grouping these distances into a number
of classes, each based on predetermined distance limits. For
example, the first distance class might contain all locality
pairs 0 to 20 m apart, the second distance class all those
between 20 and 40 m, and so forth. The widths of the class
intervals need not be the same. Some workers include
approximately the same number of locality pairs in each distance
class. It is furthermore not likely that the process under study
is linear with distance, and greater refinement is generally
required at close than at far distances. Both of these
conSiderations lead to distance classes with unequal intervals.
More than 10 to 15 distance classes are generally not useful.
In our investigations, when the number of localities is small,
we set up fewer distance classes so that no class contains fewer
than 40 point pairs.
The weight matrix for each distance class is binary, a
weight of 1 between a pair of localities indicating that the
pair falls in this distance class and 0 that it does not. Using
the binary weight matrix for each distance class, one computes
the corresponding spatial autocorrelation coefficients and plots
them against the geographic distance implied by the distance
classes. The resulting correlogram summarizes the pattern of
geographic variation exhibited by the surface of a given
variable. Correlograms describe the underlying spatial
relationships for a surface rather than its appearance, and are
436

probably closer guides to the processes that have generated the


surfaces than are the surfaces themselves. Sokal and Oden
(1978a) have illustrated the characteristic correlograms of
various types of surface patterns. A unidirectional gradient
shows a monotonically decreasing correlogram from positive to
negative autocorrelation as distances increase from near to far.
A bowl-like depression yields a similar correlogram that
eventually reverts to positive autocorrelation at the farthest
distance classes. Other surfaces show similarly characteristic
correlograms. The distance at which the correlogram first
reaches -l/(n-l) is the distance at which positive spatial
autocorrelation vanishes. In certain patchy environments this
measure may be an indicator of the average size of homogeneous
patches (Sokal 1979).
When the data are nominal, spatial autocorrelation is not
estimated in the form of a coefficient, but as deviations of
observed frequencies of like and unlike neighboring pairs from
their expectations based on random spatial arrangement. Thus,
when a distribution of individuals comprising three species, A,
B. and C, is studied, one computes the frequencies of AA, BB,
and CC pairs by a criterion of connectivity or adjacency as for
continuous data. Then one computes the expected frequency of
such pairs on the assumption of a random spatial arrangement.
One also counts the frequency of adjacent unlike pairs, AB, AC,
and BC, and compares them with their expectations, under a null
hypothesis of spatially random placement of the three species.
Thus, in this example, six deviations would be tested.
Sometimes the frequencies of all unlike neighbors are summed for
a single test irrespective of the particular pairs involved.
The deviations have been shown to be asymptotically normally
distributed and are tested against their standard deviation
units (Cliff and Ord 1973, 1981). To construct a correlogram
for each deviation type, one needs to plot the signed deviations
from expectation as a function of spatial distance. As in the
computation of distance classes for continuous measurement data,
one can compute binary connectivity matrices showing neighbors
at specified distances. For anyone type of pair (species
combination), great spatial distances will generally show no
437

departure from expectation. However, an area with two


ecological regions in which the proportions of species differ,
and for which interregional distances are greater than
intraregional distances, would necessarily show a decrease in
homotypic pairs over expectations at the higher distance and a
corresponding increase in heterotypic pairs. An analogous
phenomenon has been observed in two medieval cemeteries whose
ABO blood groups have been determined by paleoserological
methods and where graves in two regions of the cemetery were
settled by different ethnic groups, apparently differing in
their ABO gene frequencies (Sokal et al. 1986).
Ordinary spatial correlograms do not indicate the direction
of clines. Oden and Sokal (1986) have developed a method of
computing directed correlograms which permit the evaluation of
spatial trends for different compass directions. The procedure
is carried out by dividing the pairs of localities into
direction/distance classes that indicate not only distance but
also the compass bearing between the sampling stations.
Mantel approaches. An alternative procedure for estimating
and testing spatial autocorrelation is the Mantel test. This
test is carried out by an element-by-element multiplication of
the weight matrix with a proximity matrix representing some
similarity function between all pairs of local~ties. either with
respect to a single variable or to numerous variables. Examples
are genetic, morphologic. serologic. or geographic distances.
Designating the elements of these two matrices as Wjk and djk.
respectively. the Mantel test statistic Z is computed as

The null hypothesis tested is independence of the elements of


the two matrices--the weight matrix (representing spatial
distances) and the proximity (distance) matrix for the
variable(s) studied. Expectations for moments of Z under this
null hypothesis have been derived by Mantel (1967) who showed
the distribution of Z to be asymptotically normal, leading to a
straightforward significance test. Because of distributional
uncertainties, the preferred way to test the significance of the
438

Mantel statistic is by a Monte Carlo test. in which rows and


columns of one of the two matrices are randomly permuted.
followed each time by recalculation of Z. Proposals for
normalizing Z to a coefficient ranging from -1 to +1 have been
made by Hubert and Golledge (1982). Hubert (1985). and Smouse et
al. (1986). The Mantel test is a very general test with
considerable appeal because of its simplicity. Hubert et al.
(1981) have shown that by specifying the proximity matrix
appropriately. spatial autocorrelation coefficients I and c can
both be expressed as Mantel statistics. Among other useful
applications. the Mantel test enables one to compute spatial
correlograms for proximity matrices representing overall
distances between pairs of localities based on numerous traits
(such as biogeographic or genetic distances). In such cases
conventional 1- or c-coefficients cannot be evaluated. An
example of an ecological application of Mantel tests is the work
of Setzer (1985) on spatial and space-time clustering of
mortality in gall-forming aphids of the genus Pemphigus.
Because distance data are so common in population biology
and ecology. investigators have attempted to extend the Mantel
test to analyzing three or more matrices simultaneously. Such
multiple tests examine the interactions of several types of
distances. for example. spatial. ecological. and genetic
distances. or geographic. climatic. and faunistic distances.
Three different approaches have been suggested within the last
year for investigating the relations among three distance
matrices. Let the three matrices to be compared be designated
as A. B. and C. Dow and Cheverud (1985) propose to compare
matrices A and (B-C). that is. they carry out a Mantel test
between matrix A and the difference matrix. B-C. The matrices B
and C must be comparably scaled before the subtraction. The
Mantel test indicates whether TAB = TAC' and. by its sign
suggests which of the two distance matrices B or C has the
greater correlation with distance matrix A. The method assumes
that associations of A with B and A with C exist. and that A. B.
and C represent potentially spatially autocorrelated surfaces.
Hubert (1985) computes A.(BC). in which the matrix BC is the
Hadamard (element-by-element) product of matrices Band C. and
439

tests the association between A and BC by means of the Mantel


statistic. The question posed by Hubert is whether A has a
significant matrix correlation with the Be product matrix which
is supposed to embody the relations between Band C. It is
assumed in this method that Band C have a significant
association, and, as before, that A. B. and C are separately
autocorrelated. Smouse et al. (1986) consider the correlation
rBC to be fixed and do not permit this correlation to be
destroyed by permutation of either B or C. They compute the
partial correlations rAB.C and rAC.B of the matrix elements.
These authors test the significance of partial correlation rAB.C
by computing residual matrices from the regressions of A on C
and B on C, then obtaining the distribution of the partial
correlation as a normalized Mantel product of the two residual
matrices. permuting rows and columns of either matrix. This
method assumes that rAB and rAC are significant and A, B, and C
separately spatially autocorrelated. None of the methods has
yet been corroborated by a Monte Carlo analysis of suitable
autocorrelated surfaces to see whether independent but spatially
autocorrelated surfaces fall into the acceptance region of the
distribution of outcomes. An example of an ecological
application of multiple Mantel tests is given in an analysis of
causal factors of floristic composition of granite outcrops by
Burgman (1986). Other examples are furnished below in this
paper.
In some situations ordinary Mantel tests will not provide
sufficient information on spatial relationships. Although the
null hypothesis may be rejected in a given case. this does not
automatically permit us to distinguish between two competing
alternative hypotheses HI and H2 . Thus, if a set of populations
for which densities or gene frequencies have been obtained can
be grouped by two separate ecological criteria, how can one
decide which criterion more nearly coincides with the spatial
genetic pattern? When each of the alternative hypotheses
specifies a set of mutually exclusive and jOintly exhaustive
groups (aquivalence classes), as in the just postulated example.
such alternative hypotheses can be tested by the appropriate use
of restricted randomization techniques developed by N.L. aden in
440

Sokal et al. (1986). An example will make this clear. Suppose


we carry out a standard Mantel test of some variable against the
grouping implied by the habitats of Figure 1a. Distances with

B
B

c A

a b

Figure 1. a. An area divided into 3 contiguous ecological


regions A, B, and C. Sampling stations in each region are shown
as tiny squares. b. The same area as in Figure 1a but divided
up differently to represent a competing alternative hypothesis.
There are only two ecological regions. A and B. by this scheme.

respect to the variable mapped onto the area studied are


compared with distances implying occurrence of a pair of
localities in the same or a different habitat by HI' The
complete permutation of the matrix for the standard Mantel test
would test the null hypothesis that the grouping of the
localities into three habitats creates no greater homogeneity
within these habitats than any other arrangement of the
localities. There may be, however, a competing alternative
hypothesis H2 as in Figure lb. Suppose that two Mantel tests
reject the null hypothesis of random arrangement against both
alternative hypotheses. We may now carry out test (a) of HI as
the null hypothesis against the alternative of H2' This test
involves the connection matrix of H2 in the Mantel product, but
allows permutations of pOints only within the groups of HI' A
test (b) of H2 as the null against an HI alternative is similar.
Ml

Suppose HI is closer to the truth than H21 but the null


hypothesis of no spatial pattern is rejected against both
alternative hypotheses because of the correlation between
alternatives. In this case, we would expect test (b) to be
significant but not test (a). The reverse results should occur
when H2 is closer to the truth than HI' A pilot experiment
along these lines has been carried out by Sokal et al. (1986).
The approach of restricted randomization has a large, as yet
unexplored, range of possibilities for hypothesis and
significance testing in spatial analysis.
Significance tests. Individual spatial autocorrelation
coefficients are tested using standard errors based on the
expectations of their moments. Cliff and Ord (1981) have shown
that both I and c are asymptotically normally distributed;
significance is tested in the conventional manner. Adjustments
are given by these authors for small sample sizes, and are
usually built into the available computer programs. The
overall significance of a correlogram cannot be evaluated on the
basis of the individual autocorrelation coefficients, because
these are not independent of each other. Oden (1984) developed
a test for the significance of a correlogram against the null
hypothesis of no autocorrelation whatsoever. He has also shown
that the significance of an entire correlogram can be tested
approximately using a Bonferroni or Sidak approach. After a
spatial correlogram has been computed, it should routinely be
tested for significance in this manner.
Two further tests are important in spatial autocorrelation
analysis. but generally accepted procedures have not yet been
worked out for them. These are tests of the following two null
hypotheses, which concern different variables mapped onto the
same set of localities and connections. 1. The spatial
autocorrelation coefficients for the two variables are equal and
at the same time significantly different from zero. 2. The
spatial correlograms of the two variables represent the same
spatial autocorrelation structure. An approach toward testing
these hypotheses is currently being worked on by Neal L. Oden,
based on results obtained by Wolfe (1976, 1977) and Dow and
Cheverud (1985).
442

The issue of the reliability of correlograms obtained from


surfaces is an important one in spatial autocorrelation work.
Two kinds of errors should be considered. One is the
subsampling error that would be observed if we were to take a
single realization of n pOints from a surface, repeatedly
subsample a number n' < n pOints from it, and calculate
correlograms based on these n' pOints. If we did this, we would
then have a distribution representing not only a generating
function with the same parameter, but also the exact same
realization. However, because the number of pOints would be
less than the total number from which we sampled. there would be
an error attached to the correlogram. This error should become
greater as n', the number of pOints sampled, decreases. Because
one would only rarely encounter an example when this particular
sampling model needs to be tested, this model of error is less
useful biologically than the second type of error, realization
error. Null hypotheses for most tests between correlograms in
population biology involve different realizations of the same
process. This is true whether the variable is different (the
usual case, as in two population densities or gene frequencies),
or the variable is identical (the rarer case, as when the same
variable is studied at different time periods). Work estimating
the relative magnitudes of these errors is currently under way
in the laboratory of one of us eRRS).

APPLICATIONS OF SPATIAL AUTOCORRELATION ANALYSIS

Beyond the mere description of the spatial properties of


the surfaces of variables, the methods outlined above are
employed for reasoning from pattern to process. Such inferences
are complicated by several difficulties. Different processes
may give rise to the same pattern; two realizations of the same
process may engender different patterns, and several processes
may be working to produce a mixed or intermediate pattern that
needs to be resolved into its components if the system is to be
understood. We must be alert for these complications in the
account and the examples that follow.
443

Inferences concerning population structure are based On the


results of four procedures (Sokal 1983; Sokal and Wartenberg
1981). The first procedure is to calculate significance tests
for heterogeneity of localities. These test the null hypothesis
that the variable under consideration is identical in mean (or
in frequency) for the set of localities being studied. For
measurement data one employs analysis of variance, whereas for
frequency data this is carried out by a G-test of homogeneity
(see Sokal and Rohlf 1981, for a discussion of both methods).
The second procedure is the computation of spatial correlograms
by the techniques described above. The third procedure is the
computation of similarity of spatial patterns. For those
variables that show significant spatial structure, i.e. ,
significant spatial correlograms following the methods of Oden
(1984), one computes a measure of similarity of the pattern for
all pairs of variables over the set of localities. To this end,
prOduct-moment correlation coefficients of all pairs of
variables with each other are calculated over the localities and
assembled in a matrix. The fourth procedure is the computation
of similarity of significant correlograms. This can be done by
computing the average Manhattan distance (Sneath and Sokal 1973)
between these pairs of correlograms. Both matrices are
subjected to UPGMA or k-means clustering (Sneath and Sokal 1973;
Spath 1983) to detect interesting structure in the results.
Samples statistically homogeneous for one variable will
usually lack spatial differentiation for that variable,
permitting the rejection of some ecological hypotheses and the
erection of others. Thus, homogeneity, when based on adequate
sample sizes, is incompatible with adaptation to regional
environmental differences or with genetiC differentiation. But
statistical homogeneity is compatible with an enVironmentally
homogeneous area, or with random mating within the entire area
under study. Spatial patterning in the variable may reflect the
influence of a correspondingly patterned environmental variable.
Alternatively, the spatial dynamiCS of the populations may be
circumscribed in direction and/or distance, resulting in
regional patterns. For example, if there are two populations
that differ with respect to a given variable and one of these
444

populations migrates into the area of the second and interbreeds


with it, the resulting spatial pattern for this variable will
reflect the diffusion process. Setzer's (1985) work on aphid
migration is an application of these principles.
Further inferences can be made by examining several
variables for each population, studying similarities among their
patterns, as well as among their spatial correlograms.
Dissimilar patterns will reflect differences in the processes
producing them. Examples would be differential responses by
several variables to diverse environmental factors differing in
spatial patterns, or migration at different rates and in
different directions from several source populations. Different
patterns usually result in different correlograms, but random
processes, such as genetic drift, are an exception. Here, the
same generating function yields independent patterns for
frequencies of different genes, yet results in similar
correlograms because the patterns have the same variance-
autocovariance structure (Sokal and Wartenberg 1983). Variation
patterns similar for two or more variables will also result in
similar correlograms. Patterns may be similar because the
variables concerned are functionally related. Thus dispersal
patterns of seed-eating rodents and of the seedlings resulting
from this dispersal should be similar. An alternative
explanation for similar patterns would be responses to the
identical environmental factor.
The types of inferences that can be made for ecological
data have been enumerated by Sokal (1979). Homogeneity of
variables of ecological interest in a study area is relatively
rare, its coupling with spatially significant patterns even
rarer. It could arise when observations drawn from the same
population subsequently ordered themselves spatially. No such
cases are known to us. Homogeneous variables that also lack
spatial pattern indicate uniformity of the environment and of
the source populations inhabiting it. Statistically
heterogeneous variables of ecological interest will typically
have spatial pattern. This may be due to differences in source
populations inhabiting local areas, asynchrony of population
growth among local population samples, or spatial patterning of
~5

the resources or other environmental factors affecting the


populations. The combination of statistical heterogeneity for
the variables coupled with lack of spatial pattern should be the
result of random settlement patterns from heterogeneous
populations or random arrangement of environmental factors and
resources. Similarities and differences between correlograms
for different variables measured on the same population may be
indicative of the differences in patterning of resources or in
causation of the variables studied.
The potential range of application of the spatial
autocorrelation techniques to ecology is considerable. The
distance at which the correlogram first reaches -1/(n-1)
indicates the average distance at which the value of the
variable cannot be predicted from its value at a given location.
Sokal (1979) has shown that this value is related to patch size
but because of the diverse shapes and distributions of patches
and patch sizes in nature, the relation between this distance
and patch parameters is a complex one. However, this is a
subject well worth further investigation, since the underlying
patch structure of much of the environment is cryptic and
unknown. Inferences about patch structure must be made from
biological response variables (population counts, biomass, gene
frequencies). This aspect of inference is illustrated in one of
the examples furnished below.
The mobility of organisms is another important ecological
dimension. Whether the particular process investigated deals
with dispersal and vagility or with migration of individuals or
populations, the results of the process leave their record in
terms of population counts and as frequencies of genetic or
other markers. Spatial autocorrelation analysis also permits
the testing of the observed patterns against different
alternative hypotheses and the evaluation of the relative
likelihoods of the separate alternative hypotheses. Although we
furnish no example of such a test in this paper, relevant cases
have been analyzed for large scale migration in humans (Sokal
1979; Sokal and Menozzi 1982) and for a small scale spatial data
set testing alternative models in an archaeological example by
Sokal et al. (1986).
446

When the variables studied are nominal or categorical, the


questions addressed by spatial autocorrelation relate to the
interdependence of observations. Cases in point are
distributions of two or more species, the two sexes of one
species (Sakai and Oden 1983), and of genotypes. Spatial
patterns in such variables reveal something about the inherent
populational and ecological processes of these organisms and
about the spatial structure of the underlying environment that
affects their distribution. We show an example in the
distribution of the two sexes of Aralia nudicaulis below. Other
examples are distributions of tree species (Sokal and Oden
1978b) and of fine structure in populations of mice (Sokal and
Oden 1978b) and humans (Sokal et al. 1986).
Spatial autocorrelation takes on a special importance in
ecology when one organism (say, a plant) constitutes a
harvest able resource for a second organism (an animal), and the
distribution of the former is nonrandom. In such a case, the
autocorrelation pattern of the plant resource should influence
the harvesting behavior of the animal. Such examples are likely
to involve patterns in both time and space. For example,
positive spatial and temporal autocorrelation of a food resource
might favor site fidelity, either in the form of feeding
territoriality or "trapline" behavior, in which an animal
repeatedly visits a series of rewarding sites. Negative
autocorrelation of resources should result in flexible behavior
by the visitors: Pleasants and Zimmerman (1979) describe nectar
standing crops in bee-pollinated plants as fitting a "hotspot-
coldspot" pattern. Recently unvisited patches are "hot" because
nectar has accumulated; recently visited patches are "cold"
because their nectar has been drained. Bees forage
systematically, making short flights after being rewarded at a
flower, and flying longer distances after a disappointment.
Thus they tend to stay in hot spots, turning them cold, and to
pass over cold spots, allowing nectar resecretion to turn them
hot again. Here, the foraging behavior generates and maintains
the patchy resource pattern, and is at the same time well-suited
for the exploitation of that pattern. The idea that foraging
behavior should be responsive to the spatial distribution of the
447

food resource is an appealing one, but existing treatments tend


to be highly informal, for want of an explicit language for
describing such patterns. Spatial autocorrelation analysis can
improve this situation; in this spirit, we offer two examples
below, featuring two bee-pollinated species of Aralia. In these
cases, the plants vary with respect to sexual expression, which
might be expected to influence not only the foraging of the bees
for pollen and nectar, but also the reproductive success of the
plants.

EXAMPLES

Aralia nudicaulis. The first example is from a study of


the spatial pattern of an understory plant, Wild Sarsparilla
(Aralia nudicaulis L.) (Barrett and Thomson 1982). This is a
rhizomatous perennial common to the boreal forest of North
America. It forms large clones that grow by means of an
extensive subterranean rhizome system. Clones are composed of
aerial shoots (ramets), which can be vegetative or reproductive.
Each ramet produces a single compound leaf and, if it is
reproductive, a single umbellate inflorescence. A. nudicaulis
is dioecious, each clone possessing flowers of one sex only.
The study area in New Brunswick was visited during the first
three weeks of June. In common with earlier observations
(Barrett and Helenurm 1981), the study area in a forest site
contained a larger number of males (1244) than of females (499).
The pattern of distribution of the male and female ramets is
shown in Figure 2. vegetative ramets, which outnumber flowering
ones by several times, are not shown in the figure.
The method of sampling the area has been described in
detail by Barrett and Thomson (1982). For our purposes we need
record only that the one- hectare sampling block was subdivided
into one-hundred 10 x 10 m plots within each of which the
position of each flowering ramet was mapped and its sex
recorded. To determine fruit set without losses to frugivores,
the female inflorescences were protected by nylon mesh bags
after anthesis. This bagging was done only in the central 64
448

o 10 20 30 40 50 60 70 80 90 100
( mel res )

Figure 2. Distribution of male (circles; n 1244) and female


(triangles; n 449) flowering ramets of Aralia nudicaulis
within a 1-ha block of spruce-fir forest in central New
Brunswick, June 1979. From Barrett and Thomson (1982).

quadrats of the block. When fruits were nearly ripe but not yet
abscised, the infructescences were harvested. Fecundity was
calculated as the number of fruits divided by the number of
flowers. The unbagged infructescences were attacked heavily by
animals, so that analyses involving fecundity consider only the
inner 64 quadrats. Since 20 of these quadrats contained only
males, fecundity could be defined for only 44 quadrats. The
variables analyzed were Aralia density (numbers of male plus
female ramets), percent female per quadrat, and three habitat
variables, density of Clintonia borealis (Ait.) Raf.
(Liliaceae), development of bracken (and shrubs), and canopy
449

cover (degree of tree canopy closure). Clintonia blooms


synchronously with A nudicaulis in early June; both species are
primarily pollinated by bumble bees. The three habitat
variables were scored subjectively, using a 5-point scale.
The first analysis carried out was an examination of the
randomness of the distribution pattern of the sexes. As can be
seen from an examination of Figure 2, the sexes seem to be
nonrandomly distributed, with clusters of each sex interspersed
in the area. This question can easily be tested by means of
nominal spatial autocorrelation analysis, considering males and
females to be two nominal classes and calculating a correlogram
of the deviations from expectation under the hypothesis of
spatial randomness. Because the total number of 1743 ramets
exceeded the capacity of our computer program, we drew 5 north-
south transects traversing the sample area at equal intervals
and recorded all plants within 0.5 m of the transect. The
results for the three possible combinations and the 5 transects
are shown in Table 1. In summary, male-male combinations show
positive spatial autocorrelation (excess of observed over
expected pairs) up to 20 m, whereas female-female combinations
show significant positive autocorrelation up to 30 m (up to 60 m
for transect 5). There is a large cluster of females in the
eastern region of the study area (see Figure 2) so that it is
easy to travel 60 m along transect 5 while still remaining
within the female cluster. The male-female pairs show negative
autocorrelation up to 20 m and positive values thereafter. On
the basis of these findings we can show that the two sexes of
this species are significantly spatially clumped. The clumps
are somewhat larger for females with respect to area. In terms
of ramet numbers, the clumps are larger for males, which are
denser. The spatial nonrandomness of the data is corroborated.
Spatial correlograms for the six variables investigated are
shown in Figure 3. We divided the distances into 10 distance
classes of unequal intervals, to provide approximately equal
frequencies of pairs in each distance class. We illustrate only
the I-correlograms of these variables in Figure 3. All variables
except fecundity show correlograms significantly different from
the expectation of no autocorrelation by Bonferroni
450

Table 1. Nominal autocorrelations between sexes for 5 transects


inA. nudicaulis.

Male-Male
Meters
Transect 10 20 30 40 50 60 70 80 9U lOU

1 + +

2 + +

3 + +

4 + +

5 + +

Female-Female
Meters
Transect 10 20 30 40 50 60 70 80 90 JOO

1 +

2 + +

3 + +

4 +

5 + + + +

Male-Female
Meters
Transect 10 20 30 40 5U 60 70 80 90 JOO

1 + + + + + +

2 + + + + +

4 + + + + +

5 + + + + + + + +

Note: Entries in the table show the signs of deviations significant at P < 0.05.
451

0.50

0.25

I-...t
',
.
, ........---- ................---- .....
en
z 0 ....::::.::.::.::.::.:- - - - - - - .... - -
« BR _
a::
0
::i: CA
-0.25

- 0.50 t----2-::+:0:---::3+:-0-+--+:--r--4--l--I7f::-3- a + - 5 - - - - -1-+-J


27
36 51 67
METERS

Figure 3. Spatial correlogram of 5 variables potentially related


to reproduction in Aralia nudicaulis. Abscissa shows spatial
distance in meters (upper limits of distance classes); ordinate
gives Moran's I-coefficient. Abbreviations: AN--Aralia density,
BR--Bracken development, CA--Canopy cover, CL--Clintonia
density, F--fecundity, PF--percent female.

tests (Oden 1984). As is evident from the figure,


the correlograms are quite dissimilar, furnishing evidence for
different spatial structure in these variables. Canopy cover
shows moderate significant positive autocorrelation (0.18) at 20
m and significant negative autocorrelation (-0.17) at 73 m and
beyond. Bracken shows only moderate significant positive
autocorrelation (0.15) at 20 m and no negative autocorrelation
at substantial distances. Clintonia density has an even weaker
local structure (0.10) at 20 m, with some negative
autocorrelation at 85 m. Aralia density shows moderate but
452

significant positive autocorrelation (0.17) at 20 m, with


negative autocorrelation (-0.14) commencing at 45 m but no
significant patterns beyond 51 m. Percent female shows the
strongest spatial pattern with highly significant substantial
positive autocorrelation (0.50) at 20 m extending to distances
of 30 m. Negative autocorrelation (-0 . 19) commences at 45 m as
for Aralia density , but unlike that variable, continues
significantly negative all the way to 73 m. Note that percent
female has a significant positive autocorrelation of 0.22 at the
greatest distance, 127 m, probably because females predominate
in three corners of the plot and thus the majority of the
largest distances possible are those with high female
percentages. Finally, fecundity shows no spatial structure at
all. Thus, it would appear that each of these variables, even
though they may be functionally related to some degree, has its

a b

Figure 4. Values of ecological variables assessed for each


quadrat in the one-hundred 10 X 10 meter plots. Shading
indicates codes as follows: white - -O. horizontal hatching--l,
diagonal hatching--2, cross hatch--3, black--4.
453

own spatial pattern within the area.


In connection with our analysis of fecundity we had
occasion to carry out a spatial autocorrelation analysis using
only the inner 64 quadrats of the study area. To conserve
space. the correlograms of this reduced data. set are not shown.
While the correlograms for the rest of the variables remained
more or less the same. the correlogram for canopy cover changed
appreciably. The reason for this change can be seen from the
map for this variable (Figure 4a). where low values are found
along the southern margin and there are patches of high canopy
cover in the east center and in the northwest. Once the outer
quadrats are removed there is little structure left in the
variable. as reflected in the resulting nonsignificant
correlogram. In contrast with canopy cover the amount of
bracken shows relatively smooth contours from west to east. but
with sufficient noise so as not to be a clearcut cline (Figure
4b). There is only the moderate significant positive
autocorrelation at 20 m. This value was not changed by reducing
the data matrix to the inner 64 quadrats.
The lack of similarity among correlograms is borne out by
the lack of correlations among the variables over the area. The
only even moderately sized correlation of real interest is
between percentage female and Aralia density (-0.45). This
occurs apparently because females are more sparsely distributed
than the males. as can be seen in Figure 2. This in turn may be
due to a higher flowering rate of the males; the overall ramet
densities may be similar. if non-flowering ramets were taken
into account. There is a weak correlation (-0.23) between
Clintonia density and Aralia density. It is not surprising to
find low correlations between these variables in view of the
lack of similarity of the correlogram. However. it would have
been possible for variables to be highly correlated yet show no
spatial structure. as painted out by Hubert et al. (1985).
Multiple regression analysis of fecundity on the other
ecological variables showed that only one variable seems to be
affecting fecundity in any way--canopy cover with a negative
effect on fecundity.
The data were also examined by pairwise Mantel tests of
4~

various variables against spatial distances, and by multiple


Mantel tests. We first examined pairwise relations between
distances with respect to percent females, fecundity and Aralia
density for the subarea reduced to 64 quadrats. Aralia density
and percent female versus fecundity have nonsignificant and low
correlations. The relationship between percentage females and
Aralia density is marginally significant and yields a
coefficient of 0.087. This confirms the earlier findings with
respect to the negative correlation of Aralia density and
percentage females. It must be remembered that in the Mantel
analysis we are not dealing with correlations of variables but
with correlations of distances between pairs of localities.
Thus the new result informs us that localities that differ with
respect to Aralia density also differ with respect to percentage
females.
The multiple Mantel results are all based on residuals from
multiple regression of spatial distances and distance matrices
for Aralia density, fecundity, and percent females on distance
matrices for canopy cover, bracken and Clintonia density. The
reSidual matrices for spatial distances are paired with those
for Aralia density, fecundity and percent females. Here the
results are more clear cut. Aralia density is independent of
space, as is fecundity, once the other three variables are kept
constant. This is not surprising for fecundity, which showed no
spatial structure at all. But apparently Aralia density also
shows no further spatial pattern, once it is regressed on canopy
cover, bracken and Clintonia density. Percent females, however,
continues to show a clear spatial pattern, with a highly
significant partial correlation of 0.150 for space versus
percent females, the three habitat variables kept constant.
This means that whatever factor determines female ramet
production has a clear spatial pattern, not determined by either
canopy cover, bracken or Clintonia density.
Barrett and Thomson (1982) measured fecundity because it
seemed reasonable that the pollination process might be affected
by the spatial patterning of the habitat variables or of the
sexual morphs of A. nudicaulis for pollinators; dark shade from
the tree or shrub layer might discourage pollinator flights;
455

pollinators might feed preferentially in areas of high Aralia


density; they might prefer male plants for their pollen reward;
or the pollination of females near the interior of large female
clones might be limited by the lack of local pollen sources. In
fact, however, none of these effects was strong enough to
influence the spatial patterning of fecundity in a detectable
way; the reproductive output of female ramets appeared to be
independent of all the measured variables, which in turn
suggests that fecundity may have been limited more by resources
than by insufficient pollination.
The autocorrelation analysis does, however, economically
describe the pattern of males and females in statistical terms.
Table 1 is a summary of the main patterns evident in Figure 2:
the large size of the (presumably clonal) patches, the larger
size of the female patches than of the males, and the variation
in patch sizes within a sexual type (as shown by the disparity
among the transects). Similarly, the correlograms of Figure 3
abstract the spatial information content of the habitat
variables. AlthQugh analysis of the interrelations of the
variables gave mostly negative results, some inferences about
process are still possible. For example, the persistence of
clear spatial pattern in percent females, after the removal of
all the habitat variables, is probably best attributed to the
history of clone establishment. Indeed, there is reason to
believe that the long-lived clones of A. nudicaulis--and
possibly even some of the existing ramets (Bawa et al. 1982)--
antedate the present forest, which has grown up since being
clear-cut in 1940.
Aralia hispida. The second example comes from an
investigation of bee foraging behavior on Aralia hispida
(Thomson, Peterson, and Harder 1986). A. hispida plants are
hermaphroditic, unlike those of A. nudicaulis, but their sexual
functions are separated in time, rendering the plants
"temporally dioecious". They bear numerous small flowers in
inflorescences comprising several orders of umbels. Within each
order of umbels, the flowers open synchronously; thus, flowering
begins with a single primary umbel. After all of its flowers
have opened and completed their function, the several secondary
456

umbels open in synchrony, then the tertiaries. etc. Larger


plants commonly have three orders; four is very rare. All
flowers open in a male or staminate condition. offering both
nectar and pollen ~o insects. After all the flowers of an umbel
have opened. shed their pollen. and stopped secreting nectar. a
subset of them enter a female phase. In the female phase. the
five previously connate styles separate. the stigmas become
receptive. and nectar secretion usually resumes. Thus A.
hispida is andromonoecious. i.e .. it bears perfect flowers (with
temporally separated male and female phases) and male-only
flowers. The proportion of perfect flowers declines with
increasing umbel orders. so the proportion of male-only flowers
increases through time. As a consequence of the synchronized
sexual changes within each order of umbels. a typical plant
undergoes a series of temporal switches from male to female.
one alternation per umbel order. The male phases last longer
than the female phases--approximately 4-6 days and 2-3 days.
respectively. depending on weather and on the clone. Thomson
and Barrett (1981) give details on the temporal patterns of
gender expression.
Furthermore. A. hispida. like A. nudicaulis. forms clonal
patches through rhizomatous spreading. and the plants within a
clone usually bloom in synchrony. such that all are male at the
same time. then female at the same time. promoting outcrossing.
This clonal synchrony should produce a pattern that. at any
point in time. resembles that of A. nudicaulis--male and female
patches--but is unlike that of A. nudicaulis in that the gender
of the patches is continually changing. The sex ratio of a grid
square would be expected to show temporal cycles if the area is
dominated by a single clone or multiple clones that are in
synchrony. If a square contains multiple clones that are out of
synchrony. temporal patterns in sex ratio may be blurred. A
stand of A. hispida was divided into 2 m squares and the
boundaries marked by spray-painted lines. On three dates (10.
14. and 18 July 1984) during the A. hispida bloom. the numbers
of open flowers in each square were counted. Flowers were
either male or female. depending on their developmental stage.
Numbers of male and female flowers and percent female flowers
457

were recorded for each square


In addition, a pollinator removal experiment was carried
out as follows. Numerous bumble bee workers, of several
species, were caught while feeding on A. hispida in the grid and
given individual paint markings. These bees typically maintain
small foraging areas that are stable for several days (Thomson,
Maddison, and Plowright 1982; Thomson, Peterson, and Harder
1986). To determine whether bees would shift their foraging
areas toward local areas of lowered competition, Thomson et al.
(1986) performed the following experiment on 17 July 1984.
During the morning, four Bombus ternarius workers were followed
as continuously as possible, and the time spent by each bee in
each grid square was recorded. Beginning at 1250 hours, all
other bees that appeared in the northeast quarter of the grid
were removed, while the four bees remained under observation for
the rest of the day. Thomson et al. (1986) concluded that all
four bees, as expected, shifted their foraging areas toward the
removal area, and also rejected fewer umbels than control bees
foraging elsewhere, an indication that the experimental bees
were able to forage more efficiently following the reduction of
competition (rejections indicate that an umbel has recently been
drained of nectar).
The correlograms for A. hispida are shown in Table 2 for
the three variables studied, separately for the three dates. For
July 14, the correlogram has meaning only up to 24 m because
only an 8 x 10 grid was censused. For number of male flowers on
10 July, there is moderate spatial structure with significant
positive autocorrelation (0.19) at 4 m, and a weak, but
significant negative trend at 16 m. On 14 July, there is
significant positive autocorrelation (0.16) at 4 m, an
appreciable negative value (-0.10) at 16 m and a significant
positive autocorrelation (0.13) also for the last distance class
(24 m). On 18 July the correlogram is not unlike that on 10
July. For number of female flowers on 10 July there is stronger
autocorrelation (0.29) at 4 m, with weak but significant
negative autocorrelation (-0.04) again at 16 m. One can
conclude that there are relatively small patches with respect to
numbers of female flowers with the change from positive to
458

Table 2. Spatial autocorrelation coefficients I for three flower census variables in A. hispida on
three dates in 1984.

Distance classes in m
4 8 12 16 20 24 28 32 36 46
Number of male flowers in bloom
10 July .19*** .01 .00 -.04* -.02 .00 -.04 .01 .00 -.01

14 July .16*** .01 -.06* -.10** .13**

18 July .17*** -.04* -.02 .00 -.04** .03 .01 .02 .00 -.02

Number offemale flowers in bloom


10 July .29*** .02 .00 -.04** -.02 -.02 -.03 -.05 .02 .02

14 July .09 -.06 -.01 -.04 .08

18 July .17*** -.01 .01 -.01 -.02 -.02 -.03* -.03 .01 .00

Percentfemaleflowers in bloom
10 July .28*** .10*** -.01 -.06** -.03 -.03 -.05 -.03 -.02 -.04

14 July .03 .04 -.08* -.06 .16*

18 July .14*** .00 .05*** .05** -.06** -.05* -.06* -.06* -.05 -.04

Notes: Distance classes are identified by upper class limit only.


* 0.01 < P :s; 0.05
** 0.001 < P S; 0.01
*** P S; 0.001

negative autocorrelation taking place between 8 and 12 m. On 14


July no significant spatial structure is shown and on 18 July
there is a pattern similar to that of 10 July for female flowers
as well as to that of 18 July for male flowers. For percent
female flowers in bloom, there is clear spatial structure on 10
July--significant autocorrelations (0.28 and 0.10) at 4 and 8 m,
4~

respectively. Weak significant negative autocorrelation (-0.06)


appears at 16 m. On 14 July there is weak negative
autocorrelation (-0.08) at 12 m and an appreciable positiv~
value (0.16) at 24 m. The data argue for a change to negative
autocorrelation between 8 and 12 m. For the last census date
(July 18) spatial autocorrelation at 4 m is 0.14. There are
some significant weakly positive autocorrelations. at 12 and 16
m. and weakly negative values between 20 to 32 m. For this date
it is not too clear at what distance positive autocorrelation
ceases.
There is also a temporal structure to the gender patterns.
as expected from our knowledge of the flowering biology of the
plants. This emerges clearly when we compute appropriate
multiple Mantel tests in the manner of Smouse et al. (1986) as
partial correlations of the surfaces of percent females at the
two dates with spatial distance kept constant. Between 10 July
and 14 July. there is a negative partial correlation (r
-0.506. p ~ 0.008. but between 10 July and 18 July. the partial
correlation of percent female is positive ( r = 0.161. P ~
0.008. As would be expected. the correlation for 14 July and 18
July is also negative in sign (x = -0.217. P ~ 0.008). The
alternation of negative and positive correlations through time
is due. of course. to the synchronized gender shifts of the
clones of A. hispida. There are various reasons why any
particular 2 x 2 m square might not show gender cycling in this
analysis. First. the square may contain two or more clones that
are out of synchrony. such that some turn female as others turn
male. In this case. little change in percent female would be
apparent at the scale of the spatial sampling unit. although
such changes are occurring within each plant contained in the
sampling unit. Second. the four-day census interval may be
shorter than the length of a given plant's gender phase. For
instance. if a clone is male for five days. and if it has just
turned male at the first census. it will still be male at the
second census four days later. Because the male phases are
several days longer than the female phases (Thomson and Barrett
1981). we would predict that squares with high values of percent
female flowers on one census would be highly likely to yield low
460

values on the succeeding census, whereas squares with initially


low values would often remain low, i.e., continue in the male
phase for four days. This effect shows up very clearly in the
scattergrams; there are virtually no squares that are
predominantly female on consecutive censuses, but many that are
predominantly male. Detection of the cyclic nature of gender in
the A. hispida stand thus depends on a double correspondence of
our sampling units with the scale of the variation. The spatial
sampling units (2 x 2 m) had to be small enough to fall inside
the patch size as revealed by spatial autocorrelation, and the
temporal sampling units (4 day census intervals) had to
correspond to the length of the gender phases. Had the censuses
been eight days apart, our analysis would be blind to the
existing variation.
The small-scale shifts of gender should have consequences
for the bees that collect nectar and pollen from A. hispida
flowers. The autocorrelational properties of pollen and nectar
are conspicuously different. Both are patchily distributed in
space, with similar, small patch sizes produced by the synchrony
and spatial contiguity of clone members. The temporal
distribution of nectar at anyone patch will show positive
temporal autocorrelation, because both sex phases produce nectar
and because a patch with many flowers at one census is likely to
have many flowers at the next census. Thus, bees might be
expected to be conservative in their feeding locations, and to
return repeatedly to flower-rich areas. They do this (Thomson
et al. 1982).
The distribution of pollen, unlike that of nectar, will
show strong negative temporal autocorrelation at short time
intervals and strong positive temporal autocorrelation at longer
intervals. A good spot for pollen collecting, therefore, will
not remain a good spot for long. The spatio-temporal exigencies
of pollen collection would then be expected to counter the
conservative foraging-area tendencies favored by the nectar
distribution; given that bees do maintain small foraging areas,
we would expect that these areas should be larger than the
spatial patch size so as to encompass numerous clones, or that
the bees should move their foraging areas through time to track
461

the shifting locations of resource-rich patches. Both appear to


be the case: the surfaces for 18 July (the census date closest
to the removal experiment) indicate X-intercepts of 8 m for both
male and female flower members. At that distance on the
average, the numbers of each gender were independent to slightly
negatively autocorrelated. It appears that the average diameter
of the patches of high (and low) numbers of each gender is 4 m.
Frequency distributions of the time spent in each grid square by
individual bees (Figure 5) permit an estimate of the average
side length of the visited area (described as a quadrilateral).
For the four bees these estimates are 4.5, 6.5, 7.5, and 9.0 m,
all greater than the patch diameter of the flowers. The moving
of bees to less competitive areas has been demonstrated by

38 53

GREEN-AQUA

35 88

RED-SILVER RED-YELLOW

Figure 5. Representation of the use of space for foraging by


four color-marked Bombus ternarius workers in a 20 X 44 m mapped
stand of Aralia hispida on 17 July 1984. Heights of the
vertical bars are proportional to the total amount of time spent
by a bee in each 2 X 2 cell of the grid. The total observation
time (min) is shown for each bee; in all cases, several
different foraging trips contribute to the total. These
observations were made after the bee removal experiment
described in the text. From Thomson et al. (1986).
462

Thomson et al. (1986).


These autocorrelation analyses paint very different
pictures of the two Aralia species. Both present a spatially
patchy gender surface, but in A. nudicaulis the patches are
large in size and stable in nature throughout the 2-3 week
blooming period. In contrast to this rather calm surface, the
gender surface of A. hispida is vividly dynamic, changing its
character over the space of a few meters and the span of a few
days. Clearly, these two congeneric plants of the North Woods
present very different problems in resource tracking to their
pollinators. We hope that our presentation of these examples
will stimulate others to explore the usefulness of spatial
autocorrelation techniques in describing patterns and inferring
processes in ecology.

ACKNOWLEDGEMENTS

Contribution No. 599 in Ecology and Evolution from the


State University of New York at Stony Brook. This research was
supported by grant No. GM28262 from the National Institutes of
Health to Robert R. Sokal and by grant No. DEB-8206959 to James
D. Thomson. Barbara Thomson and Rosalind Harding carried out
the computations. Word processing, table preparation and
illustrations were handled by Cheryl Daly, Donna DiGiovanni and
Joyce Schirmer. We thank two anonymous reviewers for useful
suggestions toward improving the manuscript.
4~

REFERENCES
BARRETT, S.C.H., AND K. HELENURM. 1981. Floral sex ratios
and life history in Aralia nudicaulis (Araliaceae).
Evolution 35:752-762.
BARRETT, S.C.H., AND J.D. THOMSON. 1982. Spatial pattern,
floral sex ratios, and fecundity in dioecious Aralia
nudicaulis (Araliaceae). Can. J. Bot. 60:1662-1670.
BAWA, K.S., C.R. KEEGAN, AND R.H. VOSS. 1982. Sexual
dimorphism in Aralia nudicaulis L. (Araliaceae). Evolution
36:371-378.

BURGMAN, M. 1986. Species coexistence: Factors affecting the


distribution of plant species on granite outcrops.
(Submitted to Vegetatio).
CLIF F, A.D., AND J.K. ORD. 1973. Spatial autocorrelation.
Pion, London. 175 pp.
CLIFF, A.D., AND J.K. ORD. 1981. Spatial processes. Pion,
London. 266 pp.

DOW, M.M., AND J.M. CHEVERUD. 1985. Comparison of distance


matrices in studies of population structure and genetic
microdifferentiation: quadratic assignment. Amer. J. Phys.
Anthro. 68:367-373.

GABRIEL, K.R., AND R.R. SOKAL. 1969. A new statistical


approach to geographic variation analysis. Syst. Zool.
18:259-278.
GEARY, R.D. 1954. The contiguity ratio and statistical
mapping. Incorp. Statist. 5:115-145.
HUBERT, L. 1985. Combinatorial data analysis: association and
partial association. Psychometrika 50:449-467.
HUBERT, L.J., AND R.G. GOLLEDGE. 1982. Measuring association
between spatially defined variables: Tj0stheim's index and
some extensions. Geogr. Anal. 14:273-278.
HUBERT, L.J., R.G. GOLLEDGE, AND C.M. COSTANZO. 1981.
Generalized procedures for evaluating spatial
autocorrelation. Geogr. Anal. 13:224-233.

HUBERT, L.J., R.G. GOLLEDGE, C.M. COSTANZO, AND N. GALE. 1985.


Measuring association between spatially defined variables;
An alternative procedure. Geogr. Anal. 17:36-46.

JUMARS, P.A. 1978. Spatial autocorrelation with RUM (Remote


Underwate.r Manipulator): vertical and horizontal structure
of a bat hal benthic community. Deep-Sea Res. 25:589-604.
4M

JUMARS, P.A., D. THISTLE, AND M.L JONES. 1977. Detecting two-


dimensional spatial structure in biological data.
Oecologia 28:109-123.
LEFKOVITCH, L.P. 1984. A nonparametric method for comparing
dissimilarity matrices, a general measure of biogeographic
distance, and their application. Amer. Nat. 123:484-499.
LEFKOVITCH, L.P. 1985. Further nonparametric tests for
comparing dissimilarity matrices based on the relative
neighborhood graph. Math. Biosci. 73:71-88.
MANTEL, N. 1967. The detection of disease clustering and a
generalized regression approach. Canc. Res. 27:209-220.
MATERN, P. 1960. Spatial variations; stochastic models and
their application to some problems in forest surveys and
other sampling investigations. Matter Meddelanden fran
Statens Skogsforskingsinstitut, 49:1-144.
MATULA, D.W., AND R.R. SOKAL. 1980. Properties of Gabriel
graphs relevant to geographic variation research and the
clustering of pOints in the plane. Geogr. Anal. 12:205-
222.

MORAN, P.A.P. 1950. Notes on continuous stochastic phenomena.


Biometrika, 37:17-23.
ODEN, N.L. 1984. Assessing the significance of a spatial
correlogram. Geogr. Anal. 16:1-16.
ODEN, N.L., AND R.R. SOKAL. 1986. Directional autocorrelation:
an extension of spatial corre1ograms to two dimensions.
Syst. Zool. 35: 608-617.
PLEASANTS, J.M., AND M. ZIMMERMAN. 1979. Patchiness in the
dispersion of nectar resources: evidence for hot and cold
spots. Oecologia 41:283-288.
RIPLEY, B.D. 1987. Spatial analysis in ecology. This volume.
SAKAI, A., AND N.L. ODEN. 1983. Spatial pattern of sex
expression in silver maple (Acer saccharinum). Amer. Nat.
122:489-508.

SETZER, R.W. 1985. Spatio-temporal patterns of mortality in


Pemphigus populicaulis and P. populitransversus on
cottonwoods. Oecologia 67:310-321.
SMOUSE, P.E., J.C. LONG, AND R.R. SOKAL. 1986. Multiple
regression and correlation extensions of the Mantel test of
matrix correspondence. Syst. Zool. 35: 627-632.
SNEATH, P.H.A., AND R.R. SOKAL. 1973. Numerical taxonomy.
W.H. Freeman, San Francisco. 573 pp.
465

SOKAL, R.R. 1979. Ecological parameters inferred from spatial


correlograms, p. 167 196. In G.P. Patil and M.L.
Rosenzweig [ed.] Contemporary quantitative ecology and
related ecometrics. International Co-operative Publishing
House, Fairland, MD.
SOKAL, R.R. 1983. Analyzing character variation in geographic
space, p. 384-403. In J. Felsenstein [ed.] Numerical
taxonomy. Springer-Verlag. New York.

SOKAL, R.R., I.A. LENGYEL, P. DERISH, M. WOOTEN, AND N.L. ODEN.


1986. Spatial autocorrelation of ABO phenotypes in
medieval cemeteries. (MS in preparation).

SOKAL, R.R., AND P. MENOZZI. 1982. Spatial autocorrelation of


HLA frequencies in Europe support demic diffusion of early
farmers. Amer. Nat. 119:1-17.

SOKAL, R.R., AND N.L. ODEN. 1978a. Spatial autocorrelation in


biology 1. Methodology. BioI. J. Linn. Soc. 10:199-228.

SOKAL, R.R., AND N.L. ODEN. 1978b. Spatial autocorrelation in


biology 2. Some biological implications and four
applications of evolutionary and ecological interest.
BioI. J. Linn. Soc. 10:229-249.

SOKAL, R.R., AND F.J. ROHLF. 1981. Biometry, 2nd edition.


W.H. Freeman, San Francisco. 859 pp.

SOKAL, R.R., AND D.E. WARTENBERG. 1981. Space and population


structure, p. 186-213. In D. Griffith and R. McKinnon [ed.]
Dynamic Spatial Models. Sijthoff and Noordhoff, Alphen aan
den Rijn, The Netherlands.

SOKAL, R.R., AND D.E. WARTENBERG. 1983. A test of spatial


autocorrelation using an isolation-by-distance model.
Genetics 105:219-237.
SPATH, H. 1983. Cluster-Formation und -Analyse. R Oldenbourg
Verlag, Munich. 236 pp.
THOMSON, J.D., AND S.C.H. BARRETT. 1981. Temporal variation of
gender in Aralia hispida Vent. (Araliaceae). Evolution
35:1094-1107.

THOMSON, J.D., W.P. MADDISON, AND R.C. PLOWRIGHT. 1982.


Behavior of bumble bee polinators of Aralia hispida Vent.
(Araliaceae). Oecologia 54:326-336.

THOMSON, J.D., S.C. PETERSON, AND L.D. HARDER. 1986. Response


of traplining bumble bees to competition experiments:
shifts in feeding location and efficiency. Oecologia
(Berlin), (submitted).
466

TOBLER, W R. 1975.
Linear operators applied to areal data, p.
14--37. In J. C. Davis and M. J. McCullagh [ed.]
Display and
analysis of spatial data. John Wiley, London.

WHITTLE, P. 1954. On stationary processes in the plane.


Biometrika, 41:434-449.

WOLFE, D.A. 1976. On testing equality of related correlation


coefficients. Biometrika 63:214-215.

WOLFE, D.A. 1977. A distribution-free test for related


correlation coefficients. Technometrics 19:507-509.
II. Workinl: Group Reports
NUMERICAL ECOLOGY: DEVELOPMENTS FOR MICROBIAL ECOLOGY

Manfred Bolter* (Chairman), Pierre Legendre, Jan de Leeuw,


Richard Park, Peter Schwinghamer, Stanley E. Stevens, and
Marc Troussellier

* Institute for Polar Ecology, University of Kiel,


Olshausenstr. 40-60, 0-2300 Kiel 1, F.R.G.

INTRODUCTION

The working group first recognized that in microbiology we


have two different but complementary topics where numerical
methods are relevant:
1) statistical definition of taxonomic and/or functional
entities,
2) statistical descriptors and/or mathematical analysis of
relationships between bacterial and environmental variables.
Hence, we meet the problems of pelagic systems (c.f. Flos et al.,
this volume), benthic communities (Field et al., this volume) and
those of general interest from limnology and oceanography
(Legendre et al., this volume) as well as from terrestrial
environments.

Our data matrices from taxonomic studies are generally in


the form of matrices containing binary informations about various
qualitative results from biochemical tests. The analysis of such
matrices can involve first a taxonomic study, to replace a vector
of binary biochemical results by a species name or number,
followed by an analysis of the species-by-sites data table as in
classical numerical ecology; or microbiologists may wish to
analyze directly the isolates x biochemical descriptors x sites

NATO ASI Series, Vol. G14


Developments in Numerical Ecology
Edited by P. and L. Legendre
© Springer-Verlag Berlin Heidelberg 1987
470

data table, insofar as methods are available to do so. This is


the peculiarity of microbiological data.

Environmental data include binary or ordinal variables both


quantitative and qualitative describing an ecosystem, such as
physical-chemical descriptors, numbers, ratios or allometric
values (e.g. Schwinghamer et ale 1986).

Following is a list of the main problems specific to micro-


biological data.
1) Investigations on micro-organisms often require indirect
methods, such as the estimation of uptake parameters of organic
substances or the measurement of ATP as an indicator of overall
active microbial biomass. Those methods, however, produce high
variability and raise the question of the validity of these
indirect methods. This holds especially true for the estimation
of an "actively metabolizing" population when specific substrates
are regarded as models for the description of general metabolic
processes (Bolter 1982).

2) Further discussions in this working group considered that


micro-organisms act at special scales in time and space with
regard to their small size and metabolism, implying the
definition of an adequate sampling scale (Troussellier et ale
1986a). This has been stressed by more holistic approaches to
ecosystem analysis when micro-organisms were shown to be notable
exceptions in the allometric relationships between particle size
and their turnover rate (Azam et ale 1983, Field et ale 1985).
As illustrated by Steele (1978), scales in ecosystems - both time
and space - are indicative of the relationships between physical
and biological processes (Legendre and Demers 1984).

3) Autocorrelation is one of the most important facts that


have to be mentioned during sampling in the pelagic environment
(c.f. Legendre et al., this volume). Furthermore, the microbial
environment is strongly patterned and during its analysis we have
to consider various niches and size classes. Although physical
descriptors may act over wide gradients and/or distances, the
471

distribution of organic material, which is the main controlling


factor of microbial activity, is very patchy. This holds true for
the pelagic environment as well as for the benthos or soils.
Thus, it is difficult to fulfill statistical requirements during
sampling strategies.

4) Another crucial point to be cons idered is the


"translation" of ecological descriptors into numerical variables.
This includes the above-mentioned high variability of ecological
data. In some cases, variability may be expressed in statistical
terms, such as standard deviation or variance. If so, these terms
may be used in weighing individual parameters. Further methods
include data transformations, like for instance convertion to
ranks or to octave scale (Gauch 1977). This is of special
interest with regard to high internal variability, to eliminate
noise of original field data or to detect thresholds. This may
have fruitful consequences in further analyses.

Most of the methods discussed below are well suited to


describe ecosystems in different ways. However, we recommend that
microbial ecologists try them on various data sets, and publish
the results comparing the new methods with the more classical
ones, in order to assess their relevance and applicability in the
microbial world. The following will reflect the discussions
of the working group with special reference to the topics of the
main lectures given during the individual sessions.

MULTIDIMENSIONAL SCALING

During the discussions on the use of the techniques for


multidimensional scaling (Gower, this volume), we recognized that
these methods are applicable both to problems of fabricating
species-like entities by means of numerical taxonomy, and to
studies of the structure of microbial communities in relation to
environmental parameters. When analyzing communities of different
geographical or temporal origin the problem of overlap between
populations arises leading to autocorrelation problems. This may
472

be due to a recurrent pattern in the communities or to a


homogeneous "background" population.

In such case we can adopt the Procrustes analyses. This


method also allows us to compare different measures which have
been used to set a similarity matrix. Thus, we can elucidate
various interrelationships between original variables or detect
effects of noise. The comparison of results from multiple
correspondence analysis with those from Procrustes analysis is of
further interest when analyzing multivariate data sets.

R-mode Principal Component Analysis can be used to replace


sets of correlated variables by a few synthetic but independent
variables. It was mentioned that another approach to discover
redundancy in numerical taxonomy is the establishment of "median
strains" (Sneath and Sokal 1973) which may serve as a carrier of
information of a group of bacteria. This has been applied during
studies of marine and limnic bacteria by Bolter (1977) and by
Bolter et ale (1986).

Q-mode analysis can be used on data matrices to obtain


information on the metabolically active component of the
population. This provides a functional description of a
community. Problems of numerical taxonomy can be avoided by
analyzing the multi-sample (isolates-by-variables) data table
using canonical variables, or more simply by canonical coordinate
analysis (Digby and Gower 1981) as described by Gower (this
volume, section 7.4).

The partitioning of the original data matrix (e.g. into


results from biochemical properties, or into results from
antibiotics and sera) is recommended in order to compare the
different results of grouping procedures. Such comparison can be
carried out either by canonical correlations or by the Procrustes
approach. It should, however, be kept in mind that any comparison
by this method takes place at the level of the computed distances
(or similarities) and not among the original data. A systematic
comparison of results of these approaches should be done for the
same data sets.
473

SCALING OF MULTIWAY DATA

Multidimensional scaling of multiway data (Carrol, this


volume) can be used to solve problems of asymmetric ecological
matrices, for instance by doubling the entries of the affinity
matrices. This matrix may contain data on relationships between
bacterial communities and their substrates or other environmental
descriptors. A further advantage of this kind of scaling is that
it can be computed from incomplete distance or affinity matrices,
with the computer programs presently available.

The input data into a three-way matrix may include


information about stations, times and environmental variables or
data on the abundance of taxa. Such a data structure often occurs
when analyzing survey data with different dimensions of i, j, and
k. The data can be ordinal variables as well as ranked or binary
data.

The results from such analyses will give ordinations of


sites, times or variables, including taxa, and may lead to
informations about successions or environmental gradients
(Legendre et al. 1985a, b, Sprules 1980). However, as far as we
know such analyses on three-way matrices have yet to be done.
Special problems of interpretation may arise during the
ordination of time-samples when the time span of the analysis
runs over more than one biological cycle of the population
(Legendre et al. 1985a,b). It should also be kept in mind that
physical and different cycles of different scales of magnitude
may act on micro-organisms.

The imprecision of quantitative measurements may have strong


effects on the results of such analyses. For instance, the
definition of microbial taxa by numerical taxonomy may have a
high degree of uncertainty. This holds also true for many
environmental variables and data on abundance. Scaling techniques
can take the standard deviation of a measurement, or some other
measure, as weighting factor.
474

The duality diagram (Escoufier, this volume) emphasizes the


fact that ordination techniques offer the choice of weighting
factors and data, and choosing the distance coefficients best
suited for the goal. There is no statistical method to weigh
variables ~ priori; weights can be obtained for instance from a
panel of experts, by the Delphi method.

NONLINEAR MULTIVARIATE ANALYSIS

Data from microbial ecology are often in the form of


continuous physical-chemical measurements, but may be species
abundances or presence/ absence data of functional groups. Since
any transformation of a data matrix modifies its original
information, the model adopted must be consistent with the
original data in order to get a satisfactory structure. The
representation space Y is a transformation of the data space X
and is characterized by a loss of information (de Leeuw, this
volume). This holds true for the analysis of both taxonomic data
and environmental descriptors. By using the GIFI system of
nonlinear methods, it becomes possible to analyze data which
include mixed types of variables, such as nominal and ordinal
ones as well as physical measurements.

Unfolding analysis (Heiser, this volume) may be useful in


ecological studies under the assumption that (taxonomic)
abundance data are replaced by data on microbial activity.
Examples for possible applications of unfolding analysis may be
found in the sedimentary environment where typical zonations of
different types of metabolic activities occur (c.f. Rheinheimer
1981). Unfolding analysis of those data could offer an innovative
approach to the description of interrelationships between micro-
organisms and their environment. It would be interesting to
compare this method with direct gradient analysis (Gauch 1982) or
to see whether the relation between descriptors (e.g., aerobic and
anaerobic activity), which exhibits a horseshoe effect in peA,
could be resolved into two unimodal distributions by unfolding
analysis. In such an analysis, it is useful to emphasize the
475

guild concept rather than to describe taxa or even species. Thus


the measure of abundance can be replaced by data on different
types of physiological activities, in order to relate them to
other environmental descriptors such as time, depth, etc.

Another use may be the analysis of stratified signals in


sediments or at other boundary layers. The ordination of vertical
or horizontal gradients in depth assemblages of, for example,
foraminifera or diatoms may be improved by application of
unfolding analysis, again in comparison to gradient analysis as
applied by Cisne and Robe (1978). Furthermore, this technique may
also be applied to the ordination of organic or inorganic
residual compounds indicative of past microbial activities.

However, there may be limitations in the ecological


interpretation of an unfolding plot. Unfortunately, we have no
example of such analyses, or the study of the stability of the
results in microbial ecology. A restriction of this kind of
analysis may also be the existence of autocorrelation among
descriptors of biological gradients or along time series. A
correct specification of dimensionality of the model is crucial
to the success of unfolding analys is. Stress diagrams (Carroll,
this volume) are proposed to assess this dimensionality.

CLUSTERING OF FUZZY SETS

Ecological data handled by numerical methods often include


probalistic terms, as mentioned previously. Both environmental
and taxonomic data include various sources of imprecis ion with
unknown distributions. Especially data used in numerical taxonomy
may be in the form of a "more or less" positive or negative
response to a biochemical or physiological test. For instance,
young cultures of bacterial strains may give results that are
quite different from cultures of older strains, or characters may
depend on culture conditions such as nutrient concentration or
temperature. Thus, the final information in the data matrix may
be to some extent uncertain.
476

Hierarchical clustering imposes clear distinctions among


clusters while fuzzy clustering admits the uncertainty of the
cluster space (Bezdek, this volume). Another use of fuzzy sets
may be the analysis of ecological niches by environmental
variables. This technique has not been applied in ecological
studies at the microbial level, although the concept of fuzzy
clusters applies very well to the ecological problem of species
associations.

CONDITIONAL CLUSTERING

Conditional clustering (Lefkovitch, this volume) with or


without pairwise resemblances may be used well in the analysis of
ecosystems described by occurrences of functional groups of
micro-organisms, instead of using taxonomic units. This approach
may be of special interest in identifying spatially separated
communities comprising similar functional groups. Furthermore, it
stresses the identification of similar complex structures rather
than their partitioning into numerous subgroups. This is a
particular advantage when analyzing tables (species-by-sites) in
order to find the most probable associations by eliminating
redundancy. In contrast to hierarchical clustering, conditional
clustering looks for "true groups" in the original matrix. For
the purpose of numerical taxonomy, conditional clustering has
been used to choose relevant taxonomic features for the
identification of yeasts (Lefkovitch, pers. comm.). The algorithm
is apparently efficient, which is important for its application
to large data sets.

CONSTRAINED CLUSTERING

The application of constrained clustering (Legendre, this


volume) is useful for the analysis of time series, spatially
distributed data and environmental gradients when autocorrelation
occurs. Users of such methods should make sure, however, that the
sampling frequency exceeds the frequency of possible stochastic
disturbance factors and environmental cycles.
477

The use of environmental constraints such as edaphic or


nutritional spaces is especially promising. In some
microbiological studies, time and geographic space may be
considered jointly, for example during sampling at different
instances along a river system. Likewise, different environmental
gradients, such as vertical distributions of organic matter in
the sediment or temperature/salinity gradients in a pelagic
system may also be considered.

Another application of this method is the investigation of


microbial processes at different time scales. In sewage plants,
for example, time scales of various active microbial populations,
i. e. their metabolic processes, are superimposed by externally
determined time scales, e.g., the input of large amounts of sewage
at certain times. An example of such a study is given by Legendre
~ ale (1985a). It has been mentioned that constrained clustering

is rather stable compared to other methods in the sense that


small variations in data, or changes in the clustering algorithm,
are unlikely to produce large changes in the clustering results.
It would be very promising to compare results from this approach
with those from unfolding analysis.

There is an urgent need for comparison of results from


different cluster algorithms used in ecological studies. Such
comparison is crucial for the interpretation of any
classification. In many cases, during such data analyses, the
data are regarded as "hard" and they are analyzed under this
assumption. However, one must define criteria with special
reference to the analytical procedure.

FRACTAL THEORY

Fractal theory (Frontier, this volume) can be used to


describe ecosystems in terms of their hierarchical structure. The
working group discussed many examples in which fractal theory
might be useful in describing features of ecosystems that are of
direct interest in microbial ecology. For example, we know that
478

microbial activity is generally enhanced at boundary layers (e.~,


Liebezeit et ale 1980) or in frontal systems (Lochte 1985), both
of which can be described by their fractal nature.

Changes in the fractal dimension of a phenomenon may point


to changing interactions which have to be considered in ecosystem
analysis. This may be used to find the correct scales for
measuring microbial activity. These scales may be defined by
changes in the slope of the Mandelbrot plot describing spatial
distribution of particles, e.g., in the benthic environment. In
general, fractal analysis may aid in the ecological
interpretation of any size distribution and related biological
processes. Again, this method has apparently not yet been used in
microbial ecology.

PATH ANALYSIS

Multiple regression models have been widely used in


ecological studies (e.g., Dale 1974, Bolter et al. 1977).
Classical path analysis as introduced by de Leeuw (this volume)
uses linear regress ion type models to assess the validity of
various causes-and-effect relationships. Schwinghamer (1983) used
this approach in describing relationships in benthic
microbiology, while Troussellier et ale (1986b) used it to model
the behaviour of bacteria in an eutrophic ecosystem. The
advantage of this method is the restriction to a limited model.
This offers a method of setting hypotheses that can be further
tested by field observations.

A valuable extension of this method is the incorporation of


latent variables into the model. These are composites of observed
variables which describe theoretical constraints that are not
measured directly. Many processes in natural systems are linked
by feed-back mechanisms. Path analysis models are not very well
suited to model those systems in which such processes are
dominant features. However, they can be introduced into the
classical model using explanatory variables with a lag of (t-1)
479

for instance. Furthermore, classical path analysis is not adapted


to handling non-quantitative variables, while non-linear path
analysis can easily do so (de Leeuw, this volume).

SPATIAL AUTOCORRELATION

Analysis of spatial autocorrelation (Sokal, this volume)


must be considered very carefully with regard to the scales of
observation in microbial ecology. This is a maj or problem for
sampling design common to most studies in microbial ecology,
because the interactions of interest occur over a broad range of
size and distance scales. Sampling strategies commonly used in
microbiology (e.g., Colwell and Morita 1974) are not likely to
avoid natural spatial autocorrelation completely. Spatial
autocorrelation of microbes is important both in the range of a
few micrometers (contagious growth), and at much larger scales.

Random sampling schemes may ensure the absence of sampling


autocorrelation, yet the values of the variables may still be
autocorrelated in space due to underlying processes.
Autocorrelation must be tested for by methods such as outlined by
Sokal (this volume), and disproved before usual ANOVA or
correlation analyses can be done.

Plots of the Mantel statistics (Sokal, this volume) on


distance classes or correlograms may be used to describe the
autocorrelation structure, but care must be taken because
correlations between distance matrices are not easy to interpret.
The data surface may be reconstructed by contour maps. Trend
surface analysis using polynomial regression, or kriging, may be
used to analyze spatial patterns in autocorrelated data. An
example in analyzing plankton community structure has been
presented recently by Mackas (1984).
480

POINT PATTERN ANALYSIS

Point pattern analysis (Ripley, this volume) is not


originally designed for microbiological purposes; however, it may
be useful in many problems in microbial ecology, despite the fact
that it is difficult to make direct observations of microbial
populations and their distributions in the natural (undisturbed)
environment. In fact, methods such as scanning electron
microscopy have been used to get insight into this ecosystem
(e.g., Zimmermann 1977). Other methods are mainly in use to find
colonies of bacteria in the natural environment rather than
describing their original distributional pattern. Those patterns
on plates or filters (e.g., epifluorescence microscopy) are
rather considered to be artifactual distributions. Nevertheless,
they can contain information about interactions between growing
colonies.

A possible use for this method has been recognized in the


description of the pattern of physiological (functional) groups
in the natural environment. In this case, abundances of
functional groups or other variables describing microbial
activities may be regarded as points distributed over a certain
area. However, it is not known whether this approach in
describing microbial communities has been carried out
successfully. A special problem may arise with regard to unstable
environments like the pelagic system in the oceans.

Ripley (pers. comm.) has mentioned the application of this


method to the epidemiology and geography of human disease. Plant
and animal diseases as well as other associations with microbes
(e.g., in the rhizosphere) may also be studied by similar methods.

"Marked Point" methods are available (e.g., Diggle 1983)


which allow point patterns to be related to other variables,
discrete or continuous. Thus. distributional patterns of
microbial functional or taxonomic groups may be analyzed in
relation to other spatially varying factors. Given precise
positioning techniques these methods may be useful in systems
481

with few stable structures, as mentioned for the marine pelagic


zone. As such, point pattern analysis may be a more sensitive
approach to detecting spatial patterns of microbial communities
in nature than other currently available techniques which rely on
spectral analysis and related methods. In addition, simulation
methods are available to test the significance of cross
correlations between spatial patterns of more than one variable.

CONCLUDING REMARKS

Only a few of the methods discussed above have yet found


applications in microbial ecology. Though numerical taxonomy has
been widely used in general microbiology (Baleux and Troussellier
1985), since its introduction to this field by Sneath (1957),
even this method is rarely applied for ecological purposes.
Oliver and Colwell (1974), Troussellier and Legendre (1981) and
Legendre et al. (1984) tried this method in describing
fluctuations of microbial populations. BOlter (1977), Witzel et
al.(1980) and Bolter et al. (1986) used this method for taxonomic
purposes in the marine and limnetic environments.

Hierarchical cluster analys is has also been used in


structuring correlation matrices from variables of microbial
ecology (Bolter et al. 1981, Bolter and Meyer 1983). Only a few
attempts are known, however, in using non-hierarchical clustering
on microbiological data (Bolter and Meyer 1986) or constrained
chronological clustering (Legendre et al. 1985b).

This lack of applications of numerical methods in microbial


ecology holds also true for other methods. Schwinghamer (1983)
introduced path analysis, while Troussellier et al. (1986b) used
this approach in analysing biological wastewater treatments.

Many other methods discussed during the sessions seemed to


be very promising for use in microbial ecology. However, the
working group could not go further than making recommendations to
ecologists for adopting methods like scaling techniques. For many
482

of the other methods like unfolding analysis, fractal theory or


point pattern analysis, their value for ecological purposes will
be known only after they will have been applied to many real
problems.

The working group thought that it was a great advantage to


obtain knowledge about these advanced mathematical methods and to
introduce them to ecological science. We would like to encourage
more microbiologists to get in closer contact with people who are
familiar with these methods. This would yield new insights into
the system of the micro-organisms due to the stimulation from
methods that help generating hypotheses which complement the more
usual method of hypothesis testing.

REFERENCES

Azam, F., T. Fenchel, J.G. Field, J.S. Gray, L.-A. Meyer-


Reil, and F. Thingstad. 1983. The ecological significance
of water-column microbes in the sea. Mar. Ecol. Progr.
Sere 10: 257-263.
Baleux, B., and M. Troussellier. 1985. Methodes de classifica-
tion et d' identification des bacteries, p. 167-219. In G.
Martin [coord.] Bacteriologie des milieux aquatiques.
S~rie: Point sur l' epuration et Ie trai tement des effluents
(eau, air), Volume 2, Tome 2. Technique et Documentation
Lavoisier, Paris.
Bolter, M. 1977. Numerical taxonomy and character analysis of
saprophytic bacteria isolated from the Kiel Fjord and the
Kiel Bight, p.148-178. In G. Rheinheimer [ed.] Microbial
ecology of a brackish water environment. Ecol. Stud. 25.
Springer-Verlag, Berlin.
Bolter, M. 1982. DOC-turnover and micrObial biomass production.
Kieler Meeresforsch. Sonderh. 5: 304-310.
Bolter, M., L.-A. Meyer-Reil, and B. Probst. 1977. Compara-
tive analysis of data measured in the brackish water of the
Kiel Fjord and the Kiel Bight, p. 249-280. In G. Rhein-
heimer [ed.] Microbial ecology of a brackish water environ-
ment. Ecol. Stud. 25. Springer-Verlag, Berlin.
Bolter, M., L.-A. Meyer-Reil, R. Dawson, G. Liebezeit, K.
Wol ter, and H. Szwerinski. 1981. Structure analysis of
shallow water ecosystems: Interaction of microbiological,
chemical and physical characteristics measured in the overly-
ing waters of sandy beach sediments. Estuar. Coast. Shelf
Sci. 13: 579-585.
Bolter, M., and M. Meyer. 1983. The sandy beach area of Kiel
Fjord and Kiel Bight (Western Baltic Sea) - A structural
analysis of a shallow water ecosystem, p. 263-270. In A.
McLachlan and T. Erasmus [ed.] Sandy beaches as ecosystems.
Junk Publishers, The Hague Boston Lancaster.
483

Bolter, M., and M. Meyer. 1986. Structuring of ecological data


sets by methods of correlation and cluster analysis. Ecol.
Modelling 32: 1-13.
Bolter, M., M. Meyer, and G. Rheinheimer. 1986. Mik-
robiologische Untersuchungen in FIUssen. V. Taxonomische
Analyse von Bakterienstammen aus Elbe und Trave zu
verschiedenen Jahreszeiten. Arch. Hydrobiol. 107: 203-214.
Cisne, J.L., and B.D. Robe. 1978. Coenocorrelation: Gradient
analysis of fossil communities and its application to
stratigraphy. Lethaia 11: 341-364.
Colwell, R.R., and R.Y. Morita led.]. 1974. Effect of ocean
environment on microbial activities. University Park Press,
Baltimore.
Dale, N.G. 1974. Bacteria in intertidal sediments: Factors
related to their distribution. Limnol. Oceanogr. 19:
509-518.
Digby, P.G.N., and J.C. Gower. 1981. Ordination between- and
within-groups applied to soil classification, p. 63-75. In
D.F. Merriam led.] Down to earth statistics: solutions look-
ing for geological problems. Syracuse University Geological
Contributions, Syracuse.
Diggle, P.J. 1983. Statistical analysis of spatial point pat-
terns. Academic Press, London.
Field, J.G., F.V. Wulff, P.M. Allen, M.J.R. Fasham, J. Flos,
S. Frontier, J.J. Kay, W. Silvert, and L. Trainor. 1985.
Ecosystem theory in relation to unexploited marine ecosys-
tems, p. 241-247. In R.E. Ulanowicz and T. Platt led.]
Ecosystem theory for-biolog ical oceanography. Can. Bull.
Fish. Aquat. Sci. 213.
Gauch, H.G. Jr. 1977. ORDIFLEX - A flexible computer program
for ordination techniques: Weighted averages, polar ordina-
tion, principal component analysis, and reciprocal averaging,
release B. Cornell University Press, Ithaca, N.Y.
Gauch, H.G. Jr. 1982. Multivariate analysis in community ecol-
ogy. Cambridge University Press, Cambridge.
Legendre, L., and S. Demers. 1984. Towards dynamic biological
oceanography and limnology. Can. J. Fish. Aquat. Sci.
41: 2-19.
Legendre, P., M. Troussellier, and B. Baleux. 1984. Indices
descriptifs pour l'etude de l'evolution des communautes
bacteriennes, p. 79-86. In A. Bianchi led.] Bacteriologie
marine: Colloque international no 331. Editions du CNRS,
Paris.
Legendre, P., S. Dallot, and L. Legendre. 1985a. Succession
of species within a community: chronological clustering, with
applications to marine and freshwater zooplankton. Am. Nat.
125: 257-288.
Legendre, P., B. Baleux, and M. Troussellier. 1985b. Dynamics
of pollution-indicator and heterotrophic bacteria in sewage
treatment lagoons. Appl. Environ. Microbiol. 48: 586-593.
Liebezeit, G., M. Bolter, J.F. Brown, and R. Dawson. 1980.
Dissolved free amino acids and carbohydrates at pycnocline
boundaries in the Sargasso Sea and related microbial
processes. Oceanol. Acta 3: 357-362.
484

Lochte, K. 1985. Biological studies in the vicinity of a


shallow-sea tidal mixing front. III. Seasonal and spatial
distribution of heterotrophic uptake of glucose. Phil.
Trans. R. Soc. Lond. B. 310: 445-469.
Mackas, D. L. 1984. Spatial autocorrelation of plankton com-
munity composition in a continental shelf ecosystem. Limnol.
Oceanogr. 29: 451-471.
Oliver, J.D., and R.R. Colwell. 1974. Computer program
designed to follow fluctuations in microbial populations and
its application in a study of Chesapeake Bay microflora.
Appl. Microbiol. 28: 185-192.
Rheinheimer, G. 1981. Mikrobiologie der Gew~sser. Gustav
Fisher, Jena.
Schwinghamer, P. 1983. Generating ecological hypotheses from
biomass spectra using causal analysis: a benthic example.
Mar. Ecol. Progr. Ser. 13: 151-166.
Schwinghamer, P., B. Hargrave, D. Peer, and C.M. Hawkins.
1986. Partitioning of production and respiration among size
groups of organisms in an intertidal benthic community. Mar.
Ecol. Progr. Ser. 31: 131-142.
Sneath, P.H.A. 1957. The application of computers to taxonomy.
J. Gen. Microbiol. 17: 201-226.
Sneath, P.H.A., and R.R. Sokal. 1973. Numerical taxonomy.
W.H. Freeman, San Francisco.
Sprules, W.G. 1980. Nonmetric multidimensional scaling analyses
of temporal variation in the structure of limnetic
zooplankton communities. Hydrobiologia 69: 139-146.
Steele, J.H. 1978. Some comments on plankton patches. 1n
Steele, J.H. [ed.J Spatial pattern in plankton communities.
Plenum Press, New York.
Troussellier, M., and P. Legendre. 1981. A functional evenness
index for microbial ecology. Microb. Ecol. 7: 283-296.
Troussellier, M., B. Baleux and P. Andr~. 1986a. Echantillon-
nage de variables bact~riologiques dans les milieux
aquatiques. GERBAM/CNRS, Deuxi~me colloque international de
bact~riologie marine, Brest, octobre 1986. IFREMER, Actes de
Colloques 3: 23-33.
Troussellier, M., P. Legendre, and B. Baleux. 1986b. Modell-
ing of the evolution of bacterial densities in an eutrophic
ecosystem (sewage lagoons). Microb. Ecol. 12: 355-379.
\Htzel, K.-P., H.J. Krambeck, and H.J. Overbeck. 1981. On the
structure of bacterial communities in lakes and rivers - a
comparison with numerical taxonomy on isolates. Verh.
Internat. Verein. Limnol. 21: 1365-1370.
Zimmermann, R. 1977. Estimation of bacterial number and biomass
by epifluorescence microscopy and scanning electron micro-
scopy, p. 103-120. In G. Rheinheimer [ed.J Microbial
ecology of a brackish water environment. Ecol. Stud. 25.
Springer-Verlag, Berlin.
NUMERICAL ECOLOGY;
DEVELOPMENfS FOR STUDYING THE BENTHOS

John G. Field* (chainnan), Roger H. Green (rapporteur), Francisco A. de L. Andrade,


Eugenio Fresi, Phillippe Gros, Brian H. McArdle, Michele Scardi, and Daniel Wartenberg.

*Marine Biology Research Institute, Zoology Department, University of Cape Town,


Rondebosch 7700, South Africa.

INfRODUCTION

In discussing the use of techniques, it is ftrst necessary to note the aims of the potential
users of those techniques, in order to judge whether they are applicable. Some of the main aims
of benthic community ecologists include the following:

1. To analyse patterns in biotic data (species/sites/times);


2. To relate biotic patterns to patterns in the environment in time and space;
3. To predict responses of benthic communities to changes in the biotic and/or environmental
(abiotic) patterns, sometimes via experiments done in the fteld or in mesocosms;
4. To study the functioning of benthic communities and processes (e.g., energy flows and
nutrient cycles).

Aims (3) and (4) are in the forefront of benthic ecology at present, but are only discussed briefly
in passing, since most of the techniques dealt with at the workshop are more relevant to the ftrst
two. In considering aims (1) and (2), there are three alternative approaches in relating biotic and
environmental data:

a) Analyse patterns in the biotic data ftrst, then relate these patterns to environmental factors;
b) Analyse paterns in the environmental data, then relate these to changes in the biotic data
(common in pollution studies);
c) Analyse the patterns and relationships within and between biotic and environmental data
simultaneously.

All three approaches have been used in benthic ecology for some 20 years. Conventional
clustering and classical scaling (ordination) have been used for analysing patterns in both biotic
and environmental data (Legendre and Legendre 1983), whereas canonical correlations have been
used rather rarely to analyse patterns in both biotic and environmental data simultaneously. In

NATO AS! Series, Vol. G 14


Developments in Numerical Ecology
Edited by P. and L. Legendre
© Springer-Verlag Berlin Heidelberg 1987
486

this report, we present an overview of methods that appear to be of potential use to benthic
ecologists, although they may have only been tested so far in other fields, such as psychometrics.

SampJiw:

Benthic sampling methods to a large extent dictate the kind of data collected and therefore
the type of analysis that might be appropriate. Data collected in the littoral region, and by
photography or underwater by SCUBA may be quantitative, and the exact position may be
mapped by co-ordinates. Since relatively immobile organisms are collected, and they are
essentially in a two-dimentional surface, this type of sampling may utilize grid, quadrat, or
transect techniques; the methods and data are to a large extent similar to those collected by plant
ecologists. In deeper waters, benthic ecologists are often forced to sample "blind", using grabs
and/or cores from ships to collect quantitative samples at roughly positioned locations, or using
cruder dredges and trawls dragged over an unmeasured area to collect at best semi-quantitative
data. The scale of observation is an important consideration in interpreting ecological structures.

Most benthic data are obtained from biased sampling. The bias is typically in one direction
(under-estimation) and thus the bias cannot be "averaged out" by sampling in different ways.
Benthic ecologists often wish to compare things (sites, times, conditions) which are estimated
with different biases (e.g., comparing communities on sand versus mud using a grab, which will
of course penetrate differently in sand and mud).

The sampling design needs to take into account the numerical methods which are to follow.
This is critical for meaningful results and interpretation. Furthermore, the high cost of obtaining
raw benthic data usually prevents feedbacks from the analysis to the sampling design and
analytical procedures.

Data pre-treatment

When one is looking for structure in the biotic data, it is often advisable to transform data
in order to stabilize variances, and there are good arguments for recommending using either Y =
log (X+c), where 0.2 < c < 1 (logarithmic transform), or Y = (X)O.25 (fourth-root transform).
The value of c appears to have little influence on the ability of the transformation to stabilise the
variance. Both transformations are special cases of the general Taylor's Power variance:mean
relationship.
487

NUMERICAL METIlODS

Table 1 summarises the main aims of benthic community ecologists in the columns, with
some of the numerical techniques discussed during the workshop as rows. The columns describe
categories of ecological questions to be investigated, from analysing biotic distribution patterns in
space only (sites x species data), to consideration of both space and time (sites x species x times),
to relating the 2- to 3-dimensional biotic features to environmental ones, and finally to questions
concerned with modelling or analysing how the systems function. The techniques (rows) are
approximately arranged from simpler to more complex under each heading, and at the same time,
in general from more to less dependent upon assumptions.

Some of the main features of the techniques and their potential for benthic ecology are
highlighted below.

2-way scalin& (ordjnatjon); Metric scalin&

Principal Components Analysis (pCA: Gower, this volume): This should be restricted to
analysing the correlation or covariance structure among variables (e.g., species) and care should
be taken since it may be sensitive to non-linearity and non-normality in data. With Principal
Co-ordinates Analysis (PCD), one can achieve the same solution starting from a matrix of
inter-object (e.g., site) distances, with the advantage that one can choose different measures of
inter-site distance. Classical metric scaling is equivalent to Principal Co-ordinates Analysis.
Correspondence Analysis (of contingency table count data) differs in that one is tied to the
chi-squared distance measure. Detrended Co"respondence Analysis is not recommended, since
"horseshoes", if they occur, show real relationships in the data.

kwaY scaljn& (ordjpation); Non-metric scaljn&

Non-metric Scaling (Carroll, this volume): Here one is finding a reduced space solution
that preserves the rank order of inter-object distances (monotonicity), as opposed to the linear
relationship of classical (metric) scaling. Non-metric scaling has the advantage of robustness in
that it is not sensitive to outliers (e.g., chance occurence of one individual of massive biomass in a
site).

Non-linear (non-monotonic) Scaling (de Leeuw, this volume): A generalised framework


for scaling which subsumes the others, but it may be more difficult for ecologists to familiarise
488

themselves with this technique. The framework and its methods should be explored by
experienced ecologists and the methods compared.

Asymmetric Matrix Analysis (de Leeuw, this volume): The resolution of matrices into
two, one symmetric (for example where interactions are reversible) and the other asymmetric
(e.g., irreversible interactions), may have applications in showing successional and competitive
phenomena in benthic ecology. There are no known published benthic examples to date.

Unfolding (Heiser, this volume): Unlike other scaling techniques, it applies directly to a
rectangular matrix (e.g., sites vs species distances, or species affinities for different sites). It aims
at producing a geometric representation in a subspace of reduced dimension maximising
conservation of rank-order relationships of distances among species, among sites, and between
sites and species. A behavioral analogue is given by a rectangular matrix of boy-girl relationships,
from which unfolding may infer two triangular matrices, one of girl-girl relationships and another
of boy-boy relationships. It produces a true joint-space (as opposed to a projection), unlike other
techniques such as PCA. It has great potential, but there are no ecological examples except that of
Heiser (this volume) and it needs exploring.

Path Analysis (de Leeuw, this volume): This is a way of testing the fit of an a priori
model of a causal structure, by means of generalized least squares (e.g., as an interpretation of a
matrix of correlations among variables). Non-linear path analysis is the non-parametric
equivalent. In both cases the structure is expressed as a web of arrows joining the variables.
Current methods are capable of handling unobserved latent variables in the causal structure, a
potentially useful feature. The path analysis structure diagram may be a useful complement to
regression and contingency table techniques already in use, but there are limitations to its use in
systems with feed-backs, such as many ecological ones.

Canonical Correlations: They differ from Procrustes (below) in that in reaching a


solution, they take into account the correlations between matrices simultaneously with the
intra-matrix correlations. In fact,they produce pairs of linear combinations within the two original
sets of variables so as to maximise the correlation between them. Useful results have been
obtained in benthic ecology despite its theoretical limitations (sensitivity to heterogeneity and
assumption of linearity).

Procrustes Analysis (Gower, this volume): With species sampled at different times,
Procrustes Analysis can be used to measure the relative variability of each species with time.
489

Similarly, within-site variability can be compared from site to site if replicate samples are taken at
each site. Different sampling devices or techniques can also be compared. Another application
would be to compare matrices based on biological and environmental data. It has been applied to
marine ecological data by Fasham and Foxton (1979) who compared various environmental
hypotheses for goodness of fit to the biotic data. It appears to have great potential in benthic
ecology.

Multiple Correspondence Analysis (Gower, this volume): A useful way of analysing


multi-way contingency tables. It can be used in benthic ecology in grouping species into
age-classes, food items into size classes, and animals into different sediment types, or indeed any
situation in which an observation falls into one of several possible categories (multi-way
contingency tables).

Individual Distance Scaling (INDSCAL) (de Leeuw, this volume): This is a metric
method for comparing Euclidean distance matrices. There are no known benthic examples, and
the method needs exploring. The non-metric version has degenerate solutions.

Constrained Scaling (Heiser, this volume): A multi-dimensional scaling technique in


which an external (e.g., environmental) variable can be imposed as a constraint. This may be
useful where the constraint is continuous or ranked, in contrast to the discrete constraint imposed
in constrained clustering (see below). There are no examples of applications in ecology to date.

3-Way Unfolding (Heiser, this volume): A three-way version exists but has not been
tested. It has potential in benthic ecology but the large amount of data required may limit its
application in practice.

Qusteriru=

Conventional Clustering (Legendre and Legendre 1983): This family of techniques is


useful for grouping sites or times into dendrograms and is widely used in benthic ecology.

Conditional clustering (Lefkovitch, this volume): Using conventional clustering


techniques, it has often been difficult to distinguish species groups in benthic species/site data,
although site-groups may be more apparent in the same data. The new Lefkovitch algorithm
should separate strong species groups if they exist in such data. It has the attraction of being free
of indices of distance or similarity, and allows species to be members of more than one group, an
important improvement on conventional clustering. It should be explored on benthic data.
490

Fuzzy Sets (Bezdek, this volume): The idea of fuzzy sets is intellectually appealing, since
there is no reason to believe that benthic communities are discrete and disjunct. The concept of
fuzzy sets is intermediate between those of clustering and ordination. The techniques for
delineating fuzzy sets involve easy algorithms, and one should try to use several of them to gauge
the stability of the solutions with each particular data set. In particular, it is worth exploring the
C-means algorithm for use on benthic data, and using output from this to speed up the more
time-consuming maximum-liklihood function for fuzzy sets.

Constrained clustering (P. Legendre, this volume): This is useful for tracing successional
data, and for exploring the historical and spatial evolution of dispersion. One should try both
constrained and un-constrained analyses on the same data. The technique has been used in
ecology and needs further application. It may also be possible to test a null hypothesis such as
that there is no spatial auto-correlation (no patches) against a specific alternative hypothesis, in
order to investigate the processes underlying patch formation. One may be able to test the
clustering of the (biotic) x-variables in the environment space by setting up a connection matrix on
the basis of similarity of environmental variables. However, further investigation into the logical
validity of such hypothesis testing is needed.

Spatial analyses

Fractal theory (Frontier, this volume): This describes how a structure may occupy a
space of dimension greater than the structure itself (e.g., surface or volume). It may be of use in
describing the physical dimension of a niche such as the rugosity of hard substrata, or in
predicting the surface area available as an environment at the appropriate scale for particular
organisms (the area available for larval settlement, or growth, or photosynthesis). Changes in
fractal dimension might account for scale transitions which imply changes in structural or
functional properties of the object/system (e.g., transition from a physical to a biological scale). Its
utility in describing soft sediments is unclear at present and examples are needed.

Kriging (Matheron 1969, 1970; Scardi ~ill.. 1986). This is an interpolation technique
useful for mapping and contouring single variables (e.g., species densities, biomasses, sediment
parameters). Kriging also provides an estimate of the interpolation error for each point, which
may indicate where more sampling is needed or where spatial patterns are very irregular. It
appears to be an improvement on trend-surface analysis. Since Kriging is based on variograms, it
should be regarded as a complex and powerful tool for spatial analysis rather than as a simple
interpolator.
491

Spatial Autocorrelation (Sokal and Thomson, this volume): The correlogram is useful for
revealing spatial patterns of a single variable (e.g., density, or a compound or a discontinuous
variable). It can be used to show patterns such as clines, isotropy and anisotropy. It has been
successfully used in benthic ecology to demonstrate the scale of variation of single species.

The Mantel Test (Sokal and Thomson, this volume): This test is useful for comparing
distance matrices. It has been used successfully for analysing spatial and spatio-temporal
relationships. It appears to have much potential for more general use, e.g. for comparing biotic
and environmental dissimilarity matrices.

Point Pattern Analysis (Ripley, this volume): In contrast to spatial autocorrelation, this is
used to analyse spatial patterns described by co-ordinates in space (as opposed to continuous
variables with values at each point). The K(t) method depends on having all the organisms
mapped and counting the average number of organisms within a radius t of each organism in tum.
Distances need not be exact; one needs to know the positions to about 1/3 of the distance between
points (preserving rank order). It would be useful for describing univariate patterns of
aggregation and dispersion, when mapping of the benthic species is possible.

DISCUSSION

We have emphasised the multivariate description of observational data in the above


section. The accent has been on the descriptive and analytical techniques for revealing structure
and relationships in complex data, and there is little emphasis on developing analytical or
predictive models of system functioning. There is neverthelss great potential for using
multivariate numerical methods to analyse multivariate responses in experimental work. For
example, in a factorial MANOV A design with orthogonal treatmentslblocks, each main effect or
interaction including higher level interactions, generates eigenvalues and vectors corresponding to
the degrees of freedom involved. One can then cluster or scale from these to further describe the
multivariate structure of the responses (Green 1979). Response surfaces or kriging could be used
to display multivariate as well as univariate responses where the interactions between the
dimensions are significant.

Description and analysis have been emphasized, rather than hypothesis testing.
"Significance testing" can be a good screening method preceding descriptive multivariate analysis
(NOT to validate "significance"!). For example, one can perform a test of sphericity (lIo: IRI = 1)
and IF the null hypothesis is rejected, then proceed to describe the correlation structure. If the null
hypothesis is not rejected, then there is no evidence of any correlation structure to describe, and
492

the analysis should be abandoned. Another example is provided by contingency table data,
including multiway tables. Beginning with log-linear models, one tests the highest level
interactions fIrst and the main effects last, in the normal way. The table should be collapsed over
dimensions not involved in signifIcant interactions. Correspondence analysis can be performed on
this reduced table; in effect this is an approach which describes a suffIcient model representation
of the data. It is, in a sense, a testing procedure for descriptive multivariate methods such as
clustering and ordination, in that one has found evidence that there is structure present to be
described.

The most promising methods for benthic ecology are also promising in other areas of
ecology. At present each group or school has its favourite techniques and computer programs,
and tends to put data through them. It is not yet clear to what extent the more traditional
techniques such as PCA, metric scaling (PCO), and canonical correlation analysis give distorted
results when data are increasingly heterogeneous, full of zeros, and assumptions about linearity do
not apply. The newer non-metric techniques such as non-linear non-metric scaling, asymmetric
matrix analysis and unfolding are all very attractive because of their generality and lack of
assumptions about the data. Their generality and approximate (loose) nature may make them
particularly suited to analysing ecological data which has much in common with psychometric data
with regard to their approximate nature. However, it is worth noting that, as with non-parametric
statistical methods, one loses in power and rigor what one gains in generality; this is especially
true if one wishes to turn the description into some sort of predictive model afterwards. At the
same time, benthic ecologists have developed some expertise at more conventional techniques,
which in general have given interpretable results, and it will have to be demonstrated that the gain
in robustness and flexibility is worth the effort of learning to use new sets of techniques with
many variants.

In exactly the same way, traditional clustering techniques have become part of the
standard tool box of benthic community ecologists. Conditional clustering, fuzzy sets and
constrained clustering are to a large extent untested and hold much promise for the future.

The spatial analysis techniques all have applications in benthic ecology. Perhaps the most
exciting is the Mantel test, which has applications on all types of data, including relationships
between species, space, time, and environmental factors (Table 1). In particular, this may be
combined with the descriptive technique of constrained or weighted scaling.
493

Table 1. Relationships of some principal aims of benthic ecologists to numerical techniques. See
= = =
text for details. Key:· applies, NR not recommended, (1) univariate analysis, blank =
inappropriate.

QUESTIONS / AIMS

BIOTIC PATIERNS RELATION TO SYSTEM


ENVIRONMENT FUNCTIONING
Space Space/time

sites x spp sites x spp


x times
TECHNIQUES 2-WAY 3-WAY 2-,3-, n-WAY

sites spp. s x spp X t single multiple


2-WAYSCALING

Metric
PCA and Biplot
PCO
Correspondance A.
Detrend. corresp. A. NR NR

Non-metric
N-MScaling
NL N-M Scaling
Asymm. Matrix A.
Unfolding
Path Analysis

n-WAY SCALING

INDSCAL
Canonical Corr.
Multiple Corresp. A.
Constrained Scaling
Procrustes
Unfolding (3-Way)

CLUSTERING

Conventional ?
Conditional
Fuzzy Sets
Constrained

SPATIAL ANALYSIS

Fractals
Kriging (1) (1)
Autocorrelation (1)
Mantel
Point pattern (1)
494

The two techniques of asymmetric matrix analysis and path analysis are the only methods
considered at the workshop which spill over directly into the important area of generating and
testing hypotheses about how benthic systems function. The new methods of approximate
reasoning (Bezdek, this volume; L. Legendre ~.i!l., this volume) also have exciting possibilities
for generating and testing ecological hypotheses.

It is clearly very important that traditional and newly available techniques be evaluated and
compared using different types of data by experienced ecologists and data analysts working
together. This evaluation procedure may be referred to as gauging (see also de Leeuw, this
volume). It is proposed that a gauging workshop be held. Both aspects of gauging are important:

a) Varying the techniques, coefficients and, where appropriate, distance measures on common
data; and,
b) Analysing different types of real or artificial data (more or fewer empty cells, semi-quantitative,
quantitative, continuous and contingency data) using a common technique.

In particular, the traditional scaling techniques need to be compared with the many
variants available from the Gifi School of Leiden (de Leeuw, this volume; Heiser, this volume)
and traditional and newer clustering techniques need to be compared with the fuzzy set algorithms
(Bezdek, this volume). This should result in the production of a guide to the suitability of the
techniques to each purpose and type of data, so that appropriate data may be collected in the first
place. Only after such excercises will it be possible to recommend confidently which of the old
and which of the exciting newly available techniques are most appropriate for which type of data,
and which are robust or sensitive, and to what. It is very likely that benthic ecologists will still be
advised to perform several analyses on each data set, with most confident interpretation of the
patterns and relationships when the results of several techniques agree.

REFERENCES

Fasham, M.J.R., and P. Foxton. 1979. Zonal distribution of pelagic Decapoda (Crustacea) in
the eastern North Atlantic and its relation to the physical oceanography. J. expo mar. Bio!.
Eco!. 37: 225-253.
Green, R.H. 1979. Sampling dessign and statistical methods for environmental biologists.
Wiley, New York. 257 p.
Legendre, L., and P. Legendre. 1983. Numerical ecology. Elsevier, Amsterdam. 419 p.
Matheron, G. 1969. Le krigeage universe!. Cah. Cent. Morpho!. Math. 1: 1-83.
Matheron, G. 1970. La th60rie des variables regionalisees et ses applications. Cah. Cent.
Morpho!. Math. 5: 1-212.
Scardi, M., E. Fresi, and G.D. Ardizonne. In press. Cartographic representation of sea-grass
beds: application of a stochastic interpolation technique (Kriging). In C.F. Bouderesque, A.
Jeudi de Grissac and J. Olivier [ed.] 2nd International Workshop on Posidonia oceanica
beds. G.I.S. Posidonie Pub!., France.
DATA ANALYSIS IN PELAGIC COMMUNITY STUDIES

Jordi Flos* (Chairman), Fortunato A. Ascioti, 1. Douglas Carroll, Serge Dallot, Serge Frontier,
John C. Gower, Richard L. Haedrich, and Alain Laurec

*Departament d'Ecologia, Universitat de Barcelona,


Avinguda Diagonal nQ 645, E-08028 Barcelona, Spain

INTRODUCTION

Since the interior of the sea is not directly accessible to


us (terrestrial beings>, our present view of the pelagic
ecosystem is mainly a product of many "blind observations".
Unlike with terrestrial systems, there was little ordinary
(natural and direct) knowledge of the sea before its scientific
stUdy. Water, as a support for life, has to be studied by
combining sampling and field measurements with models and
theoretical work (e.g. Herman and Platt 1980).
Progress in ecology is often influenced by progress in other
sciences. In the case of the pelagic system, physical
oceanography precedes ecology, although biological evidence or
phenomena often indicates what kind of physical phenomena have to
be sought for and where. An upwelling region for example can be
tracked not only by surface temperatures or other physical
parameters, but also by its high productivity. Maybe the recent
coining of the word "ergocline" (Nihoul (ed) 1985) is an
indication of present trends in the study of the pelagic system,
trends which are not new but have become quite widespread.
Ecologists are specially interested in the "tuning" or "matching"
of physical and biological processes. Biological structures, from
organiSms to populations or communities, are now seen as
dissipative structures that last for a longer or shorter time
span feeding on fluxes of energy and matter. The concept of
external energy (Margalef 1978, 1985), that is, energy indirectly
used and partly incorporated by organiSms, takes on a fuller

NATO AS! Series, Vol. G14


Developments in Numerical Ecology
Edited by P. and L. Legendre
© Springer-Verlag Berlin Heidelberg 1987
496

meaning as its usefulness depends on its dynamic "shape" and


"size". The possibility for organismic structures to profit by
some form of external energy, depends on their capability of
adapting their characteristics ("tuning") to the physical
structures (also dissipative). We could say that ecologists are
now concentrating more on the functional anatomy of the sea. So,
ecologists are still permanently faced with the problem of
describing reality, that is, the space-time structures, physical
and biological, that form the pelagic ecosystem. Communities, the
assemblages of populations of several species that live together
in a given, arbitrarily limited space, are part of the ecosystem.
Any description relies upon sampling, measuring, and analysing
huge sets of data. Historically, the questions were first
centered separately on physical and biological aspects, but now
that it is clear that biological structures are linked to
physical ones, questions focus on combining information on both.
However, because of the complexity of the sea, structures
embedded at different scales are recognized, sometimes closely
following a fractal geometry (Mandelbrot 1977, 1982). Historical
aspects, always present because of time delays and inertia in
systems far from equilibrium, introduce additional difficulties.
In short, ecologists are faced with irreversible historical
processes.
Although the conceptual model of the pelagic ecosystem has
changed in the past few years, we depend on sampling procedures
as much as ever. However, new concepts bring new questions and
force new sampling designs, usually trying to get a more detailed
description of the system under investigation. New sampling
strategies also call for new methods of data analysis (Frontier
1986) .
The discrete constituents of the pelagic system span a wide
range of sizes (from microscopic organisms or particles to the
biggest animals) that can be passive or active. The same spanning
is true for biologically relevant physical structures. Different
aspects of the ecosystem have to be sampled with appropriate
devices. Many measurements may be sampled directly ("in situ"
measurements as fluorometry, nephelometry, temperature, salinity
or currents ... ). Other measurements have to be performed on water
497

samples taken with hydrographic bottles, and use from 10- 1 to


10-2 litres (chemistry, phytoplankton, particles, microbiological
counts, ... ). Other measures need greater volumes of water (1

liter) and others require the filtering of 10 to 10'" litres


<zooplankton, fishes, ... ). Information is also obtained through
experiments <productivity, grazing, excretion, ... ). Assembling
all the information gathered at a given location and time is a
very exacting task, not without dangers, only possible upon
restrictive assumptions. Additional to the problem of sampling
organisms of different sizes, there is the unavoidable fact that
sampling extends over time and space while the "real object" is a
dynamic space-time structure. Assembling data always implies some
kind of filtering (e. g. Haury et al. 1979) which represents a
loss of information. lioise may be introduced at some levels,
while at other levels additional information may be introduced
that is not always well understood (or precisely formulated).
Some of the difficulties of sampling strategies have been
discussed in Frontier <ed. 1982, 1986).
Figure 1 tries to sketch the way in which the pelagic
ecosystem is studied and indicates where and how field data
arise. Based on the same figure, we shall set the main questions
that ecologists try to answer. xany of these questions are

Physical Biological
Processes Processes
("Continuous" ) ("Discrete")

\
Diesi pati ve
/ IExternal information 1---.. ._.
Structure

\/ DATA

I REAL OBJECT I Sampling _I SAMPLES 1-


Space-time Measure
distribution
of physical
REAL SAMPLING PROCEDURE
and chemical
properties
and organisms

Fig. 1. Scheme of the way in which data are obtained from the
environment.
498

concerned with the physical and biological processes and their


interactions. Partial answers to these questions help us to
understand mechanisms which, put together and aided by theory and
reasoning, help us to understand the nature of the ecological
dissipative structures. When the system under study is driven by
a powerful and well-known physical process, the ecosystem can be
fairly well understood as it usually becomes relatively simple in
its main trends. In general, this is not the case and one is
often forced to aim at a pure description of a given "real
object", usually arbitrarily limited in space and time. Sampling
can be directed to the evaluation of populations. The
understanding of spatial distributions or temporal evolution are
also common objectives of the sampling procedures, but these can

variableB

TEMPORAL
PROCESS

time

time
SEQUENTIAL

SAMPLING AND MEASUREMENTS


DIFFERENT SITES AT
DIFFERENT MOMENTS time
REAL OBJECT

~ .....,.. ....
t n• 1 = tn • 6 tn

time

BiteB

SPATIAL
STRUCTURE

variableB

Fig. 2. Sequential sampling produces an incomplete data matrix


(si tes x variables x time) with many empty cells (combinations
wi th no data).
499

seldom overcome the problem of long distances and time consuming


measurements. Simultaneous sampling at many places can rarely be
achieved (unless we think of pictures taken from aircraft or
satellites, which can resolve only surface features of the
aquatic system). Figure 2 illustrates these problems as the data
matrix (sites x variables x time) has many empty cells. However,
we often assume that our samples are simultaneous for the
purposes of statistical analysis, model application or model
construction. We also assume that our system is stationary, at
least wi thin the space and time scales considered. In this way,
the general matrix can be reduced to a time-fixed or space-fixed
bidimensional one by using an appropriate sampling strategy, well
tuned to space or time variability.
The simplest and more common sets of data are thus
represented by a two-way table whose rows represent samples and
the columns the measured variables. Often, the variables are
quanti ties (numbers, weights, concentrations, ... ) of taxonomic
units or chemical substances. They can also be physical
measurements like temperature or water density. Each row of data
(a vector whose components are the values for the different
variables) is a descriptor of an object, usually a sample (or
subsample) identified as a "paint" in a time-space frame.
Xost of the techniques or methods presented during the
present NATQ-ARW on Numerical Ecology are devoted to the analysis
of this kind of data matrix and we shall discuss their specific
applicabili ty in the case of pelagic ecosystem data. We shall
comment too, on some of the dangers and misuses in the
application of some mathematical analytical tools.

STATISTICAL AND NON·STATISTICAL INFERENCE

Historically, all that we know about pelagic communities


comes from the more or less sophisticated analysis of data
obtained from samples (field and laboratory experiments). In
fact, the analysis of field data gives information which is used
to build up a conceptual model of nature. The model is used
afterwards to design further sampling, and so on. The first days
500

of pelagic ecology have passed, and so we find ourselves planning


our sampling with some more or less well defined ecological model
in mind. Sometimes, our mental picture is by no means simple and
we ask our data to give answers to very difficult questions.
A pelagic community shows a great variability, both in time
and space. Water masses or water bodies are often responsible at
several scales for the distribution of the different communities
we can find at sea. When water circulation or water structure can
be tracked at a given space scale, then the communities can
usually be defined too at the same space scale. However, even the
structure of water is not always clear. Moreover, organisms
dwelling in pelagic waters modify themselves according to the
characteristics of the distribution imposed by the physical
medium. Random fluctuations can develop into different
composi tions of communi ties at several space and time scales,
thus complicating the final structure of the pelagic ecosystem.
In this context, sampling is unavoidable and essential, and
we can design many different sampling devices as well as formal
sampling strategies (e. g. at random, regular grid, Lagrangian,
continuous or high frequency recording on transects, ... ). In all
the cases, one of the basic problems is to assess what the
different measurements represent. Usually the scientist has in
mind the idea that his numbers estimate means, but he is by no
mean able to declare the time or space interval for which the
estimated means are valid. Moreover, the ecologist often wants to
obtain results revealing some general laws and not a mere
description of the existing samples. Inferences are thus almost
systematically attempted but they cannot necessarily be put in
the frame of probabilistic statistics. However, when possible, it
is more satisfactory to refer to probabilistic statistics, which
makes it possible to appreciate the reliability of the
extrapolations of the observed properties of the data fitted to a
general law.
Figure 3 is parallel to Figure 1. It shows how a formalism
should be worked in order to allow statistical inferences. A
causal or conceptual model can be the foundation of a physical
and lor biological model of the ecological structure. If a formal
distri bution in time and space is taken as an hypothesis and a
501

formal sampling procedure is chosen, the expected probabilistic


and statistical structure of data <measures on samples) can be

FORMAL SAMPLING - - -....._ Transformed model


by the sampling
FUNCTION
Physical plus procedure
Biological
structure
(statistical and
(Probabilistic
probabil istic
properties)
properties)
~~----~vr----J/
Causal Model Statistical analysis

Fig. 3. Formalization of the different steps in the process of


obtaining data allows statistical inference. Formalization can
mix deterministic and stochastic processes or functions.

stated. Then, if sampling is correctly performed in practice, it


should be possible to estimate the properties of the underlying
stochastic model, with confidence intervals, and so on. Because
statistical inferences are possible only if a well defined
probabilistic model, combining deterministic and random
components, has been defined, and because modeling and
formalization are not always possible, many of the <implicit or
explici t) inferences that ecologists make are not strictly
statistical (which does not mean that they are not wise, correct
or relevant). However, even if there is no formal model for the
ecosystem, nor a well formalized sampling procedure, the
ecologist gets a matrix of data, which is a most valuable and
unique piece of information on his ecosystem. The ecologist uses
these data to estimate biomasses, to define communities, to
discover and describe relationships among variables and to
analyse and draw geographical and temporal distributions about
them. Many statistical methods or models are used to extract the
information contained in the matrix of data. Some of the methods
assume specific probabilistic distributions for variables
<allowing valid statistical inferences).
In statistical inference, it is considered that a sample
comes from a population <of objects, defined by the way of
502

sampling, and of values, on which are performed the statistical


calculations). The population of possible values is known when
the population of objects is defined and the measured variables
have been chosen. In practice, although it is not necessary for a
population to have precise space-time limits, a sample is always
considered as representative of something. It is known that the
repetition of the measurement within a short time and space
interval shows a variability (nugget effect). Then, in a simple
way, the deduction of the properties attributed to a set bigger
than a sample from the characteristics found in the sample, can
be considered as inference, which is statistical only if it
follows the statistical rules stated by mathematical theory.
In general, it is not possible to assume that our data come
from an underlying continuous stochastic multivariate process
that has been sampled systematically. A common practice is to
associate each observed site or sample with a given block
(limited in space and time). Then, the observed values are
considered as estimations of unknown mean values (averages over
space and time within the block). In this way, the deterministic
component of a model is associated with the interblock variations
while random components of the stochastic model are related to
intrablock variability. In such a model, the distribution of the
observed values in the block should be known or estimated.
However, in pelagic studies (e. g. planktonic) the blocks have
little, if any, practical meaning. Too often, a single point is
available wi thin a block. When several samples are available
wi thin a block, they are seldom distributed in space and time
according to a classical and easy to treat sampling strategy
(they are often aggregated in time). So, they will generally
underestimate the "real" variances and covariances, unless the
block size is "small" enough and the so-called "nugget" effects
are predOminant. In practice, the ecological boundaries for
blocks are fuzzy and we suspect that the system is strongly
structured <whichever space-time scale is considered). Quite
often, the analysis of data is primarily aimed at finding those
"block boundaries".
In conclusion, we would make little progress in pelagic
ecology if we had to wait for those conditions allowing real
503

statistical inference (with all their associated mathematical


securities> to be fulfilled. Currently. many ecologists are
moving away from statistical inferential methods. They content
themselves with analysing signals that come from the sampling of
a process. Although the approximation is not statistical. it uses
statistical operators like means. variances. covariances and
frequency distributions. Similarly. although multivariate
analytical methods make use of statistical operators. they are
not necessarily inferential in the statistical sense but they are
used to describe the structures of the samples and of variables.
and to infer (epistemologically> the properties of the sampled
medium.

SCALING (ORDINATION) TECHNIQUES

Some scaling techniques have been widely used in the


analysis of ecological data (Principal Component Analysis: PCA.
Correspondence Analysis: CA. Principal Coordinates Analysis •... >.
Xost of these techniques. if not all. have been borrowed from
other fields of research. They have not been designed
specifically to extract information from ecological data sets.
Their application to ecology (at least in the stUdy of pelagic
communi ties> has been useful because they help to reduce the
dimensionali ty of the data as inertia methods concentrate the
total variation within the first few components (axes or
dimensions>. These methods offer the possibility of drawing
projections of the data into Euclidean planes. The distances
calculated on the data are well understood by the human brain
when they are visualized in an Euclidean space.
From another point of view. it is important to know what can
be asked of a given scaling technique. Often. ecologists ask
mathematicians to apply some of these techniques to a given set
of data not to get an answer but to get a question. Sometimes.
ecologists have tried to identify and to test a model in a single
step. while scaling techniques only use dimension reduction in
the hope of better understanding data. No causal relationships
can be extracted from these analyses. We could say that we can
504

infer or generate an hypothesis from what we learn, and then act


in consequence. As we have already seen, the use of the word
"inference" may be misleading. Dealing with pelagic ecosystems,
the scientist does not attempt to make statistical inferences
from multivariate analysis but is content to learn from
ordination something about variables or samples. Sometimes, the
resul t is an outline to be followed in order to "interrogate"
with ability the original data. Sometimes, learning means
replacing old ideas by new ones that seem more precise or more in
accordance with the results: the extracted information encourages
the ecologist to modify previous models. From this point a
recursive process is usually followed.
Among the scaling techniques so far developed, two seem
specially interesting for the analysis of pelagic communi ties.
Multiple correspondence analysis offers the possibility of
handling qualitative variables and quantitative variables with
non-monotonic behaviour. Discrete sampling of a very complicated
system (where non-linear processes take place) results in
relationships between pairs of variables that are not only non-
linear but may be even discontinuous. With multiple
correspondence analysis the ecologist can analyse these variables
by "expanding" them into several nominal classes which need not
necessarily be disjoint (overlapping sets are allowed). Changing
the coding of variables (the class limits) in successive analyses
might be fundamental in elucidating the critical levels or values
of a given variable. However, the method has also a disadvantatge
because we loose ordinal information when we code quanti tati ve
measures into several nominal classes.
The second technique which seems potentially useful is
Generalized Procustes Analysis (GPA) (Gower, this volume)
because it allows us to compare different metrics used in the
construction of the distance or similarity matrix. This technique
seems also well suited to analyse different sets of data
belonging to the same sites (biological data and oceanographic
data, for example) helping to detect overall relationships among
them. Also it could be used to discriminate between relevant and
irrelevant variables.
505

Another possibility might be to combine a Procustes analysis


wi th multiple correspondence analysis. We could compare in this
way "classical" analysis (PCA, CA, non-metric multidimensional
scaling, ... ) and a multiple correspondence analysis performed on
the same original set of data but in coded form. lJo matching
should mean that something "interesting" has happened during the
codification of the variables. In the interpretation it is
important to remember that in GPA, the matching beeing "tested"
is among distances or similarities, not among more basic data.
One approach to a form of GPA is to use IlJDSCAL (see
Carroll, this volume) with distances calculated from two or more
multivariate solutions (from CA, PCA, two-way MDS ... ) as a
General Procustes approach. This would result in a "group" space
containing the units of observation and a "source" space in which
the various multivariate data analyses would be represented as
pOints corresponding the the weights of the common dimensions or
factors in the group object (observational units) space for each
of the different analyses.
Xost scaling techniques take into account neither the
temporal nor the spatial order of samples (which are basic
features of pelagic communities). Spatial and temporal proximity
cause "autocorrelation" which is a recognized problem, which we
think we understand and are willing to analyse. Scaling
techniques can be applied separately to several sets of parallel
information <e.g. physical, biological and spatial) to relate the
results in an a posteriori analysis (e. g. Ibanez 1973, 1976).
Perhaps mul tiway tables of data could help us to treat this
problem, but other mathematical tools are ususally preferred to
deal with it, as time series analysis, multiple regression, and
so on.
Xultiway tables (Carroll, this volume) are difficult to fill
with observations in pelagiC ecology. Because of the unavoidable
sequential sampling, too many combinations of spatial and
temporal variates have to remain empty. However, it seems a wise
way of dealing with long-term sets of multivariate data embracing
very wide areas. In this sense, nested data (filtered for high
frequency fluctuations) could be organized into mul tiway tables
506

and explored for long-term trends or structures (to study


biogeography, evolution or historical oceanography).

FURTHER COMMENTS ON SCALING TECHNIQUES

The duality diagram (Escoufier, this volume) seems of most


interest to those statisticians whose aim is to unify theory to
aid the construction of general purpose software paCkages. On the
other hand, its understanding can be useful to ecologists as
users of these techniques because it gives a generalized or
unified exposition of methods and possi bili ties often presented
separately. An ecologist who uses scaling techniques, must be
aware that he or she always looses information at the very first
step of the analysis, when the data matrix is transformed into a
distance or similarity matrix. The choice of a specific metric is
not without a price.
In pelagic community analysis, it is uncommon to weight the
variables and indeed most programs give equal weight to all the
variables used. The ecologist dealing with pelagic communi ties
has usually no reliable information to guide in the possible
weighting of variables. In fact, he or she is often looking for
this kind of information when applying a scaling technique.
Non-linear relationships are to be expected among variables.
We have seen that multiple correspondence analysis could help
detect these situations. Non-linear multivariate analysis with
optimal scaling (de Leeuw, this volume) could be applied to
detect non-linear relationships. It probably could be used as an
-experimental treatment" of data for the limits of classes are to
be chosen as an experimental procedure. It seems to fit quite
well the needs of ecologists who know that many relationships
between variables are often not monotonic as well as non-linear.
In any case, a general recommendation to users of scaling
techniques is to make clear, when publishing a paper, what
specific model and scaling techniques they are USing. For
example, in Correspondence analYSiS, it is very important to
state clearly whether the barycentric principle is applicable to
variables and/or to samples <or to neither, which is an
507

al ternati ve in certain forms of Correspondence analysis). The


scales must be precisely stated. In general, if the mathematical
part of the study were well described, the discussions among
ecologists would less often result in misunderstandings and would
be much more fruitful.

B1plots and related problems

In ecology, at least in pelagic ecosystem research, we often


measure things that are ill-defined. We give them a name but we
must always clarify the method to make sure that people
understand what has been done (e.g. there are many methods that
"measure" primary production) . The measurement of several
variables at different times and/or sites is how we try to
understand their meaning or role in the ecosystem. We try to
describe the samples by the variables but we also try to
understand the variables through the values they take at
different sites and times. We are seldom in the position of
defining without doubt the sites by the variablesj more often we
see the whole as a real multiway entity. This is the reason why
techniques like Correspondence analysis (or reciprocal averaging>
have been so widely used. Unfolding techniques as those presented
by Heiser (this volume) or those discussed by Carroll (1972,
1980) might be used for the same purpose. All these techniques
offer the possi bili ty of putting together, in the same picture,
the samp.es and the variablesj in the case of unfolding, through
the definition of a distance between every sample-variable pair.
Ecologists should pay special attention to the correct
interpretation of biplot representations (variables and samples
on the same axes). A clear difference between biplot
representation and dual representation has to be made. A biplot
of sites and variables can be used for several purposes.
Proximities between sites illustrate the structure of the sets of
sites and proximities between variables illustrate the structure
of variables. When a simple biplot is used, proximi ties between
si tes and variables can suggest either that a species appears
only in a limited number of Sites, or that a site contains only a
508

lim! ted number of species (thus being characterized by their


presence>. The two possibilities are ecologically very different.
It has to be remembered that unfolding techniques can help
the interpretation and the representation altogether of sites and
variables, and that it is possible to do a CA in such a way that
all the distances among categories of the same variable and those
of two or more different variables are simultaneously meaningful
(Carroll et a1. in press>.

Horseshoes

Contigui ty of samples in space or time is detected in the


graphical representations by a distribution of samples (or sites)
that looks like a horseshoe. A similar pattern is seen in the
distri but ion of variables when the different measures are not
independent, as is the case when applying scaling techniques to
granulometric data or variables coming from the discretization of
continuous measurements (Flos 1978). In these cases one has to
understand that the axes are not independent (they are non-
linearly dependent, fig. 4). One must be specially careful of the
interpretation of the third aXis, although it is never
recommended to reify a dimension as this may oversimplify the
resul ts (e. g. eutrophication, light, time, ... ). The dimensions
are ususally the result of a combination of several factors. When
a snake-like distribution of pOints is found in the plane which
includes the third axis, it is almost certain that this is a
combination of the two previous ones. In other cases the strong
Guttman or horseshoe effect in the first plane makes it difficult
or impossible to understand the third dimension. Then the samples
at the two ends of a horseshoe might be removed and the analysis
repeated so as to allow a more thorough interpretation of data.
The high correlations or similarities found between sites or
samples, due to their proximity or overlaping, absorbs much of
their variability. It is information whose source is already well
understood and we often would like to eliminate it. This is not
always possible but it represents only a minor problem if we are
aware of it. One of the easiest ways of proceeding is to repeat
509

the analysis with subsets of samples. In Figure 5 we can see the


vertical depth distribution of scores for the two first principal
components of an analysis of oceanographic data (temperature,
salinity, oxygen and nutrients) obtained in the Mediterranean
during the stratified period. The first component reflects the
variability due to the distribution of warmer and less saline
water on top of a denser water. The second component reflects the
existence of subsurface relative oxygen and lor nitrite maxima.

PC~l __~____~____~~~
PC3
12

13

PC1~~~____ +-______~r-
PC4
11

PCl

Fig. 4. Distribution of 16 variables in the space of the four


first factors of PCA. Variables 1 to 14 (united by a line)
correspond to particle volumes in the Coulter Counter channels
(the mean volume of the characteristic particle in each channel
is half the volume of the characteristic particle in the
following, higher order, channel). Variables 15 and 16 are the
total number of particles and the chlorophyll, respectively. From
Flos (1976).

The two axes reflect almost banal information but the analysis
reduces dimensionality from 8 hydrographic variables to 2
components, which is useful for geographical description of the
hydrographic situation. To extract more information, three groups
510

of samples should be analysed separately (surface layer, samples


on the middle gradients and samples from the bottom layers).
Quite often, ecologists have available additional
information on the samples which does not enter the multivariate

E 12 o E 13 o

10 10

20 20
30 30
40 40
50 50

100 100
\

200
\
"""'" 200
\C 1 \C,
\
\
\
,, 300
I I 400
~ ...I ... 600
... 800
0 0 1000

Fig. 5. Distribution with depth of the first (C 1 ) and second (C z )


components resulting in a PCA of hydrographic data at two
stations (E 12 and E 13) in the Xedi terranean in October 1976
(Flos 1980a>. The depth scale is logarithmic.

analysis. This information (on geographical situation, time of


the year, depth or tide, and so on) is reflected in the analyses
which then corroborate something that may already be known.
However, after an initial general analysis, it may be possible to
reassemble samples into subsets so as to uncover the hidden
information. For example, in the analysis of zooplankton from an
estuary, a first analysis can reflect the seasonal variability. A
second one restricted to a subset of data obtained in July, can
reflect different geographical situations in the estuary as well
as depth variability. In this case, it is possible to understand
and describe the relationship between depth and geographical
situation which depends on tidal movements (Fig. 6>.
511

PC2

o metres
,- --~-
., ............
'...,"
/," ......... _-_ ... ,~\ PCl
8
....
------ -'
STATION 5
STATION 2
PC2

...1
/1
...1 3 metres
PC2 ... /
... ...
STATION f;1 "
I I
I I

"/8
?\\ .... I'" PCl

PCl

8 metres

Fig. 6. Distribution of samples in the first plane of a peA made


on zooplankton data from the estuary of the Nervi6n river
(Bilbao, Spain) in July. Only the samples for two stations (5 and
2) are represented. The lines linking the samples indicate the
temporal relationship in accordance with the tide (B is low
water). At the begining of the sampling (simultaneous in station
2 and 5) the tide is high, and low tide is aproximately in the
middle of the set of samples. From B the tide is rising. a)
Surface samples; b) 3 m depth samples; c) 8 m depth samples (Flos
(ed) 1985).

CLUSTERING TECHNIQUES

It can be seen in the literature that many clustering


techniques are used to analyse data in pelagic ecosystem studies.
Our aim here is to make some recommendations for the general
512

application of classical clustering algorithms and to comment on


the applicability and possibilities of those techniques presented
in this book.
Clustering methods are used to identify either species
associations or groups of similar samples. The first approach is
often used to describe the communities while the second is more
related to their mapping. In fact, there exists a definition of a
community formulated from a theoretical ecological point of view
(=mixed population of different species living in a continuous
space which is limited in a conventional way, Margalef 1974) but
in practice, the typification or classification of communi ties
are obtained with the application of scaling or clustering
techniques to census data.
Before using any clustering procedure it is important to
think of the shape of the clusters we expect to find, as the
different clustering algorithms, implicitly or explicitly favour
different cluster shapes in the multidimensional space. Single
linkage techniques build clusters that can be elongated. Complete
linkage algorithms on the contrary, search for spherical clusters
implying that within them, the dissimilarity between pOints
cannot exceed a given level. Ecologists should be aware of the
ecological meaning of the "implicit" shape of clusters. Complete
linkage clustering algorithms applied to sites will thus search
for the existence of homogeneous groups of sites. This can be
related to the well-known but controversial theory assuming that
most, if not all, species compositions, derive from a limited
number of "ecotypes" (located at the "centers" of the clusters>.
On the other hand, single linkage algorithms do not imply such a
strong hypothesis. Clusters may be heterogeneous since there may
be gradients linking extremes by intermediate pOints, to form a
single cluster. On the other hand, the clustering may show that
discontinuities separate the different classes.
Algorithms that can be considered intermediate between
complete and single linkage are also related to the shape of the
clusters being pursued. Among those techniques, k-linkage
algori thms deserve further consideration since they retain the
continui ty/discontinui ty point of view (that of single linkage)
but reduce chaining.
513

Anyway, clustering is often tried even when there is no


previous reason for suspecting distinct groups either of samples
or of variables. As with ordination, clustering techniques are
used routinely to explore data. Often the exploration helps the
design of further sampling strategies <e.g. stratified sampling).
In this case <as in others) it is important to remember that
there is generally no reason to expect a clear relationship
between groups of species and of sites or samples.
In pelagic ecosystems, data analysis is commonly used to
identify discontinuities rather than associations. If this is so,
the interpretation of the results <e.g. by dendrograms) differs
from other cases.
One of the problems with pelagic ecosystem data sets is that
often no well-defined clusters appear. This is usually the case
when there is no clear driving force governing the system. The
sampled gradients seldom result in clearcut boundaries <whether
spatial, temporal or abstract). Possible clusters are then ill-
defined or fuzzy. Uncertainty may come from inexact measures,
random occurrences or difficulties in defining precisely the
meaning of some aspect (Bezdek, this volume). In some cases a PCA
applied before the cluster analysis might help to get rid of
undesirable data noise (Flos 1976; Laurec 1979; Flos 1982).
However, there are basic reasons for expecting fuzzy sets of data
in oceanography. This is why techniques similar to those
presented by Bezdek (this book) might be appropriate for pelagic
community analysis. Another approach that could well be used in
ecology is the overlapping clustering methods (MAPCLUS, I.DCLUS)
of Shepard and Arabie (1979), Arabie and Carroll (1980) and
Carroll and Arabie (1983).

Clustering with CODStraints

Sequential sampling is qUite unavoidable in biological


oceanography. So, it is clear that most samples are not connected
in space or time, unless they are contiguous. Between each pair
of samples there are usually many other samples. Space and time
514

are two kinds of constraints that can easily be introduced into


clustering procedures <Legendre, this volume). Although in some
cases it may seem that these constraints are not adequate, there
is always the possibility of discussing the results after the
analysis. Time constraints seem specially appropriate when
different stages of succession of a community are suspected. In
this case, a typical clustering technique could give a unique
group with a meaningless centroid, while clustering in time or
space might separate several meaningful groups.
Constraints other than temporal or spatial have to be
considered. Two possible steps have been proposed. One is the
previous analysis of autocorrelation among variables which could
help to forewarn of a possible physical or biological constraint.
The other is an abstract spatial constraint performed on a
factorial plane obtained through some ordination technique. Both
possibilities should be investigated.
Some constraints can easily be introduced by means of a
"penal ty" matrix with weights <some being possibly zero) to be
associated with the matrix of similarities.

Clustering ~thout al.ast any constraint

The method presented by Lefkovitch <this volume) seems


specially well suited to look for communities or species
clusters. It is one method with few assumptions and little
manipulation of data. Groups found may not have the kame internal
variance, nor the same shape or density. The method preserves all
<and uses only) the information contained in the original data.
It is a system that is clearly aimed at simplifying the data
without destroying the information. Besides its utility in
defining natural communities, it can help to generat~ hypotheses.
There are examples of other useful clustering techniques
I
that operate directly on the original data, r&ther than on
:1
similarity or dissimilarity indices derived from tpat data <e.g.
Hartigan 1973).
515

FRACfALS

Ecologists have been trying to describe, to model, to


simulate and to classify ecosystems for a long time. All the
approaches seldom go beyond a half-way point and are not totally
satisfying, although many partial mechanisms are now quite well
understood (see for example Ulanowicz and Platt 1985).
Theoretical ecologists are qUite active and move very much in an
intuitive world, being open to all the new theories that appear
from time to time wi thin the scientific community. This open
attitude is justified, for the classical approaches to ecosystem
analysis have proved rather limited, even with the use of modern
computers. Among the theories that are presently inspiring
theoretical ecology we quote the thermodynamics of systems far
from equilibrium (liicolis and Prigogine 1977; Allen 1985), the
theory of adaptability (Conrad 1983, 1985), that of fractals
(Mandelbrot 1977, 1982) or that of catastrophes (Thom 1972;
Wilson 1981). All of them are developing theories that inspire,
and are inspired by, ecology. Other concepts like "attractors" or
"fuzzy" sets or relationships are also popular.
Fractal theory has brought a striking approach to the
geometry of nature. It suggests not only a fractional occupation
of the space-time, that models nature better by this new geometry
than by classical geometry, but also it suggests that the web
1nteract10ns also follow similar laws too. Ecologists had
suspected for a long time that interact10ns and functions were
int1mately related to space-t1me patterns and that emergent
propert1es of systems were much related to topological properties
(Margalef 1979, 1985).
In the present context, fractal theory seems more
interest1ng for the development of theoretical ecology than as a
primary tool to analyse data. However, it can be used to give a
formal shape to some ecological models (Frontier 1985), thus
helping to design optimal sampling strateg1es (Frontier 1982,
1986). SpeCific and partial application of fractals to ecology
has been done (see Frontier and Legendre 1986, Frontier this
volume and literature cited in both).
516

We think that the possibility of using fractal dimensions as


a holistic measure in the ecosystem, the problem of matching
different fractals, their interpretation, and the study of
interruptions of fractals (catastrophes at different scales)
should be investigated.

PATH ANALYSIS

Non-linear path analysis with optimal scaling (de Leeuw,


this volume) can be applied to pelagic ecosystem studies, but may
be of help only in the study of simple models (no loops for
example). Ecologists are often more interested in models with
sets of differential equations or in simulation. However, usually
our data do not allow precise modelling and then, the path
analysis method may help identify the principal features of an
ecological model. The approach could help to discipline oneself
(or to direct thinking> before any analysis.

SPATIAL ANALYSIS

Two different approaches were presented during the ARW


(Ripley and Sokal, this volume>. Common pelagic data do not match
point pattern analysis <Ripley, this volume>. However, we think
that in the study of pelagic fish or mammal distributions, the
method could be useful and we appreciate its general interest in
the description of non-random distributions.
Concerning the spatial analysis methods presented by Sokal
<this volume> we find a special interest in the computation of
the weight matrix for the Mantel correlogram, as it can derive
from ecological models as well as from statements about the
purpose of the analysis.
In either case, we think it worth commenting that testing
the null hypothesis is rarely worthwhile (though it is the only
hypothesis that can be tested>. It must be remembered that often
significant differences are not important, while non-significant
ones are.
517

CONCLUSIONS

Time is needed to accumulate many more practical examples of


applications of the techniques presented during the ARW on
Numerical Ecology. At first sight we recognize that some
approaches can be readily understood by ecologists and are
immediately applicable to pelagic ecology (e. g. clustering with
constraints or Procustes analysis and multiple correspondence
analysis). These are interesting techniques or methods, clearly
adapted to deal with some common problems in practical pelagic
ecology.
Other methods (like point pattern analysis) seem less
adapted to pelagic ecology (at least not with most existing
data). Finally, other techniques or topics like fractals or path
analysis, are seen as potentially interesting.
From the point of view of practical ecologists, it could be
thought that we are witnessing an inflation of techniques and
methods for data analysis. We feel that we have few and poor
data. Better data and an improved fitness of sampling patterns
wi th sUbsequent data treatments are needed rather than an
increase in the sophistication of statistical or mathematical
methods. New technological tools and increased funding may allow
new sampling designs, more adapted to present and future
mathematical and computational power. We should encourage the
investigation of new measures in the ecosystem related to general
theoretical ecology (holistic approaches), as well as to continue
wi th classical concepts or with older measurements that may be
difficult to get.
Finally, the best technique is often the one best known to
the user. Ecologists should not dissipate their time and effort
in mastering irrelevant techniques. The danger exists of
confounding mathematical or methodological problems with genuine
ecological problems. So, be careful.
518

REFERENCES

Allen,P.M. 1985. Ecology, thermodynamics, and self-organization:


Towards a new understanding of complexity, p. 3-26. In R.E.
Ulanowicz and T. Platt (eds) Ecosystem theory for biological
oceanography. Can. Bull. Fish. Aq. Sci. 213. Ottawa.
Arabie, P., and J.D. Carroll. 1980. KAPCLUS: A mathematical
programming approach to fitting the ADCLUS model.
Psychometrika 45: 211-235.
Carroll, J.D. 1972. Individual differences and Multidimensional
Scaling. p. 105-155. In R. M. Shepard, A. K. Romney and S.
Xerlove (eds) Multidimensional Scaling. Theory and
Application in Behavioral Sciences (Vol. I, Theory). New
York. Seminar Press.
Carroll, J.D. 1980. Models and methods for multidimentional
analysis of preferential choice (or other dominance) data.
p. 234-289. In E.D. Lontermann and H. Feger (eds) Similarity
and choice. Bern, Stuttgart, Vienna: Hans Huber Publishers.
Carroll, J. D., and P. Arabie. 1983. An individual differences
generalization of the ADCLUS model and the KAPCLUS
algorithm. Psychometrika 48: 157-169.
Carroll, J.D., P.E. Green, and C.M. Schaffer. 1986. Interpoint
distances interpretation in correspondence analysis. Journal
of Marketing Research (in press.)
Conrad, M. 1983. Adaptability: the significance of variability
from molecule to ecosystem. Plenum Press, New York.
Conrad, M. 1985. The statistical basis of ecological
potential i ty, p. 179-186. In R. E. Ulanowicz and T. Platt
(eds) Ecosystem theory for biological oceanography. Can.
Bull. Fish. Aq. Sci. 213 . Ottawa.
Dallot, S., M. Etienne, S. Frontier, Ph. Gallet, and F. Ibanez.
1986. Etude des series chronologiques dans le plancton. In
S. Frontier (ed) Evaluation et optimisation des plans
d'echantillonnage en ecologie littorale. P.I.R.E.N., ATP n2
9.82.65. Polycopie, Universite des Sciences et Techniques de
Lille.
Estrada, M., and F. Vallespin6s. 1976. Antllisis estadistico de
extractos de pigmentos de algas macr6fi tas. Inv. Pesq. 40:
551-559.
Flos, J. 1976. Seston superficial de la zona de afloramiento del
llW de Africa; Oecologia aquatica 2: 27-39.
Flos, J. 1978. El antllisis de las componentes pr1ncipales
aplicado a una serie de variables espectrales. Inv. Pesq.
42: 53-64.
Flos, J. 1980a. Material en suspensi6 oceanic en la Mediterrania
Occidental. Tesi de doctorat. Universitat de Barcelona.
Flos, J. 1980b. Ordination and cluster analysis applied to
oceanographical data. Est. Coast. Mar. Sci. 11: 393-406.
519

Flos, J. (ed). 1985. Estudio oceanogr~fico del Abra de Bilbao y su


entorno. Vol. 11. p. 98-146. Fundaci6n Euskoiker, Bilbao.
(Technical Report).
Frontier, S. (ed). 1982. Strategies d' echantillonnage en
ecologie. Masson, Paris.
Frontier, S. 1985. Diversity and structure in aquatic ecosystems.
Oceanogr. Mar. Biol. Ann. Rev. 23: 253-312.
Frontier, S. (ed). 1986. Evaluation et optimisation des plans
d'echantillonnage en ecologie littorale. P.I.R.E.N., ATP nQ
9.82.65. Polycopie, Universite des Sciences et Techniques de
Lille.
Frontier, S., and P. Legendre. 1986. Theorie des fractals en
ecologie, p. 293-324. In S. Frontier (ed) Evaluation et
optimisation des plans d'echantillonnage en ecologie
littorale. P.I.R.E.N., ATP n2 9.82.65. Polycopie, Universite
des Sciences et Techniques de Lille.
Hartigan, J.A. 1973. Direct clustering of a data matrix. J. Amer.
Stat. Assoc. 67: 123-129.
Haury, L. R., J. A. XcGowan, and P. H. Wiebe. 1979. Patterns and
processes in the time-space scales of plankton
distributions, p. 277-327. In J.H. Steele (ed) Spatial
pattern in plankton communities. Plenum Press, New York.
Herman, A.W., and T. Platt. 1980. Xeso-scale spatial distribution
of plankton: co-evolution of concepts and instrumentation,
p. 204-225. In M. Sears and D. Xerriam (eds) Oceanography:
the past. Springer-Verlag, New York.
Ibanez, F. 1973. Xethode d'analyse spatio-temporelle du processus
d' echantillonnage en planctologie, son influence dans
l'interpretation des donnees par l' analyse en composantes
principales. Ann. Inst. Oceanogr. 49: 83-111.
Ibanez, F. 1976. Contribution A l'analyse mathematique des
evenements en ecologie planctonique. BUll. Inst. Oceanogr.
Xonaco. nQ 72. 96pp.
Laurec, A. 1979. Analyse des donnees et modeles previsionnels en
ecologie marine. These d'Etat. Universite d'Aix-Marseille.
Laurec, A., P. Chardy, P. de la Salle, and X. Rickaert. 1979.
Use of dual structures in inertia analysis: ecological
implications, p. 127-174. In L. Or16ci, C. R. Rao and W. M.
Stiteler (eds) Multivariate methods in ecological work.
International Co-operative Publishing House, Fairland,
Xaryland.
Xandelbrot, B.B. 1977. Fractals. Form, chance and dimension.
Freeman & Co., San Francisco.
Xandelbrot, B.B. 1982. The fractal geometry of nature. Freeman &
Co., San Francisco.
Xargalef, R. 1978. Life-forms of phytoplankton as survival
al ternati ves in an unstable environment. Oceanol. Acta. 1:
493-510.
Xargalef, R. 1979. The organization of space. Oikos 33: 152-159.
520

Xargalef, R. 1985. From hydrodynamic processes to structure


(information) and from information to process, p. 200-220.
In R. E. Ulanowicz and T. Platt (eds) Ecosystem theory for
biological oceanography. Can. Bull. Fish. Aq. Sci. 213.
Ottawa.
Hicolis, G., and 1. Prigogine. 1977. Self-organization in non-
equilibrium systems. Wiley Interscience, Hew York.
Hihoul, J. (ed). 1985. Dynamic biological processes at marine
ergoclines. Proc. 17th Intern. Liege Colloq. on Ocean
Hydrodynamics (Liege 13-17 may 1985).
Shepard, R.N., and P. Arabie. 1979. Additive clustering:
representation of similarities as combinations of discrete
overlapping properties. Psychological Review 86: 87-123.
Thom, R. 1972. Stabilite structurelle et morphogenese. W.A.
Benjamin, Reading, Mass.
Ulanowicz, R.E., and T. Platt (eds). 1985. Ecosystem theory for
biological oceanography. Can. BUll. Fish. Aq. Sci. 213.
Ottawa.
Wilson, A. G. 1981. Catastrophe theory and bifurcation. Uni. of
California Press, Berkeley and Los Angeles.
NUMERlCALECOLOGY: DEVELOPMENIS
FOR BIOLOGICAL OCEANOGRAPHY ANDLJMNOLOGY

Louis Legendre* (Chairman), Carol D. Collins and Clarice M. Yentsch (Rapporteurs),


James C. Bezdek, Janet W. Campbell, Yves Escoufier, Marta Estrada, and Frederic Ibanez

*Departement de biologie, Universite Laval, Quebec, Quebec GIK 7P4, Canada

INTRODUCTION

Numerical techniques used in biological oceanography and limnology must take into
account ecological hypotheses to be tested and the specific nature of aquatic data. These
techniques can be used either for examining data sets (exploratory data analysis) or for estimating
population or subpopulation characteristics from samples (inferential statistics). These two
approaches now tend to be considered as complementary, and there will be continuing need for
both exploratory and critical/conftrmatory methods (Mallows and Tukey 1982). To some extent,
through appropriate sampling design and experimental planning, the nature of the data can be
controlled to accommodate constraints of the numerical methods. Nevertheless, it often occurs
that aquatic data do not meet the basic assumptions of the numerical techniques (see below).
Despite these problems numerical methods can provide new ecological insights, which is precisely
the aim of numerical ecology.

SCALING (ORDINATION) TECHNIQUES

In order to better understand the data and to facilitate communication, it is common


practice to summarize the data sets. Scaling techniques can be used in biological oceanography
and limnology to summarize the data in various ways. First, they can serve to reduce the number
of dimensions (variables) into which the data are represented (e.g., Pearson 1901). They can also
be used to combine the observed variables into summary variables, which are linear (e.g.,
Hotelling 1933) or nonlinear combinations of the original variables. Additionally, they can be
used to reduce noise, thus producing more meaningful representations (e.g., enhancing satellite
images: Lowitz 1978). Finally, they can be used to recognize subsets in the data.

Scaling techniques discussed in this volume, together with other feature extraction and
display methods (e.g., linear projection pursuit, Sammon mapping, and triangulation: Friedman

NATO ASI Series, Vol. 014


Developments in Numerical Ecology
Edited by P. and L. Legendre
© Springer-Verlag Berlin Heidelberg 1987
522

and Tukey 1974; Biswas et al. 1981; Huber 1985), are in an area of rapidly advancing computer
data analysis aimed at the visual perception of high dimensional data (> 3d). While such aims are
not new, the field has been reactivated by recent advances in computer technology that allow the
analyst a high level of rapid, dynamical interaction with the data (Becker and Chambers 1984).
For example 3-d data can be visualized as a rotating cloud of points, using computers capable of
rapidly displaying 2-d projections. A fourth dimension can be depicted with the use of colour
(Brieman and Friedman 1982; McDonald 1982). For a more extensive discussion of feature
analysis as used for graphical interpretation of multidimensional data, refer to section 2.B of
Bezdek (this volume).

Scaling techniques can be used with two types of data: measurements and assessments.
Principal component or correspondence analyses are limited to the first type of data, while
multidimensional scaling, and so on (Carroll, this volume) can be applied to both types of data.
On one hand, quantitative or qualitative variables observed for each object (site, sample, etc.) may
be used to compute similarity or distance matrices. On the other hand, similarities or distances
between objects can be defined by the observer as based on global appreciation. For example, the
relative strength of vertical mixing at various stations is easier to infer than to measure directly.
The same is true for ecological niches. However to extract useful information, data must be
organized a priori in an ecologically meaningful way. Analyses always give information, and
even the result that the data set cannot be approximated by a given analytical model is useful
information in itself.

One problem in common to biological oceanography and limnology is that aquatic data are
in general strongly autocorrelated in both space and time. This means that values observed at
given points in space and/or time are to some extent functions of values observed at other points.
This can obscure relationships among the observed variables, so that sampling constraints may
become as severe or more severe than numerical problems. In principle, standard scaling
techniques do not give optimal representations when applied to autocorrelated data. The taking
into account of autocorrelation in scaling techniques is however possible, but not routinely applied
in numerical ecology. One possibility could be to compute a correlation matrix on the basis of the
spatio-temporal coordinates of the objects, followed by the method of Aragon and Caussinus
(1980) for principal component analysis with correlated statistical units. Another possibility could
be to use Procustes (Gower, this volume) or PCARIY (principal component analysis with respect
to instrumental variables: Rao 1965; Bonifas et al. 1984), to compare the scaling of the objects
based on their spatio-temporal coordinates to that computed from measured variables. When there
is more that one subset of variables (e.g., comparing environmental to biological variables),
methods analogous to partial correlation but for Procrustes or PCARIV could be useful in
523

exploring the relationships between the scalings of these subsets while controlling for the
spatio-temporal scaling. Such methods remain to be developed, following for example the
generalized procedures of Carroll and Chang (1970), Gower (1975) or Escoufier (1980) (see also
the June 1985 issue of Statistique et analyse des donnees). One could also use spatio-temporal
coordinates as additional variables to put restrictions on the ordination when applying techniques
such as unfolding (Heiser, this volume).

In ecology, the precision of the data is often overemphasized. Semi-quantitative


(rank-ordered) and qualitative information can be legitimately treated numerically. First, several
types of information can at present only be collected in a semi-quantitative or qualitative manner.
Second, the effects of observational, experimental and analytical errors on the results of numerical
analyses could be reduced by transforming fully quantitative data into semi-quantitative or
qualitative forms. Gifi methods (de Leeuw, Heiser, this volume) have been developed to
simultaneously process quantitative, semi-quantitative and qualitative data. Multiple
correspondence analysis is also often used in this case, after transforming the data. There are
several studies to document that ecologically meaningful information is not reduced or lost from
such transformations (Ibanez 1971, 1973b; Frontier and Ibanez 1974; Devaux and Millerioux
1976; Ibanez 1983). In several instances, the ecologist uses external information in defining
categories and in so doing may even enrich the data set. One must determine whether different
choices of qualitative categories change the ecological interpretation of the analyses.

Some scaling techniques, such as unfolding and correspondence analysis, allow dual
projection of variables and objects in the ordination space. This may be useful in biological
oceanography and limnology, in order to relate objects and variables. One must realize, however,
that unfolding represents the proximities between objects and variables but not the proximities
among objects. This is contrasting to some other scaling techniques (e.g., principal component
analysis or multidimensional scaling), where proximities among objects and also among variables
can be represented but where proximities between objects and variables are not immediately
accessible.

CLUSTERING TECHNIQUES

In order to extract information, it is common practice in biological oceanography and


limnology to cluster the data. Clustering techniques are often used in conjunction with various
scaling techniques. In recent years, several new clustering algorithms have been developed,
including clustering under constraints (Legendre, this volume), fuzzy sets (Bezdek, this volume)
524

and conditional clustering (Lefkovitch, this volume). These are not mutually exclusive methods:
i.e., constraints such as those discussed by Legendre (this volume) can be applied to data that are
clustered using any method, including the fuzzy sets approach. In the aquatic context, where
samples are often spatio-temporally autocorrelated, it appears that clustering under constraints
leads to more realistic subsets. Constrained clustering has the additional advantage of reducing
the number of pairwise comparisons, thus facilitating rapid processing of large data sets such as
satellite images or flow cytometric records.

Fuzzy set algorithms give a relative measure of association of each object or variable to
each cluster, thus defining inliers and outliers. In the special case when there is insufficient
information to properly assign objects or variables to anyone cluster, these become outliers of
fuzzy sets. It is unrealistic in biological oceanography and limnology to assume that each object
or variable should be member of one and only one cluster. Conditional clustering offers the
possibility for any object or variable to become a member of two or several overlapping clusters.

It is unclear for the time being which of the above clustering approaches will lead to more
ecologically meaningful results, and different cases may well vary. In all instances it would be
advisable to cluster the data with algorithms from several different families, so as to judge the
robustness of the resulting clusterings. In the case of robust ecological structures, different
methods should lead to relatively similar results.

FRACfALS

It appears that, in lakes and oceans, several physical and biological structures have fractal
dimensions (e.g., ergocline structures such as fronts, pycnocline, ice-water and water-sediment
interfaces: Legendre et a/. 1986). These structures may be characterized by complex geometry,
changing species diversity, patchiness, high biological production, and so on. Considering the
complexity of these phenomena occurring at different scales, and the lack of models, it is
presently very difficult to appropriately sample such environments. Fractal theory (Frontier, this
volume) may provide the framework for modelling complex aquatic ecosystems, and designing
new sampling approaches and new techniques for numerical data analysis (e.g., Ibanez 1986;
Ibanez and Etienne, submitted).
525

CAUSALITY MODELS: PATH ANALYSIS AND BEYOND

One of the major objectives of biological oceanography and limnology is to evidence cause
and effect relationships in aquatic ecosystems. In all the available numerical techniques, there are
implicit and/or explicit hypotheses concerning the causal relationships among variables. Available
techniques allow different levels of input from the scientists, different levels of precision of
causality, and different levels of model rigidity. For example, multiple linear regression requires
limited input from the scientists, assumes very precise causal relationships, and is therefore a very
rigid model. As a consequence, multiple linear regression may be little adapted to model
ecological relationships, and may thus give the investigator a false sense of confidence concerning
his quantitative understanding.

Path analysis is a more sophisticated method, where the scientist defines the causal
relationships from a priori knowledge by specifying the paths among variables. These variables
may also include "latent variables", that are not observed by the investigator but whose paths can
be included in the model. In path analysis, only the direction of the causality is specified, which
makes it a less rigid model than for instance multiple linear regression. "Nonlinear path analysis"
(de Leeuw, this volume) can be applied to quantitative, semi-quantitative and qualitative ecological
data (e.g., PATHALS algorithm). In ecology, path analysis is used to explore the consequences
of hypotheses concerning causal relationships among variables, given the computed regression
and correlation coefficients.

A recent development in computational techniques for the analysis of relational data,


termed "approximate reasoning", allows an even higher level of input from the investigator. It
requires low precision causality, which allows an increase in model elasticity and thus might be
more adapted to ecological data. Approximate reasoning enables one to construct inferential
models which combine numerical (sensor) data with "higher level" facts and knowledge supplied
to the system by domain experts (i.e., biological oceanographers and limnologists). For example,
one can represent object-pair relationships with linguistic terms (e.g., phytoplankton
photosynthesis is highly dependent upon light; zooplankton are ~ encountered in surface
waters during daytime). The underlined terms become linguistic data, that may be represented in a
variety of ways. Manipulation of this kind of information is being studied by many authors
representing various schools (e.g., fuzzy logic, Dempster-Schafer, probabilistic, heuristic).
Numerical ecologists should be aware of the evolution of these techniques as a means for
capturing the effects of "imprecise causality". Readers interested in pursuing these ideas may
begin with the surveys of Dubois and Prade (1986) and Shafer (1986); computational
algorithms are discussed, among others, by Bonnisone and Decker (1985) and Bezdek (this
526

volume). Path analysis and approximate reasoning show promise for biological oceanography
and limnology.

SPATIO-TEMPORAL PATTERN ANALYSIS

In the field of spatial pattern analysis, one makes the distinction between two situations.
(1) Point patterns (Ripley, this volume), in which the data are the spatial coordinates of the
objects, e.g., the location of spawning sites or whale sightings. (2) Cases where variables
changing continuously over space and time are sampled at discrete points whose coordinates are
determined by the observer (surface patterns). In general, biological oceanographers and
limnologists are mainly concerned with the latter situation.

Methods of spatial analysis (Sokal and Thomson, this volume), in particular those that
treat anisotropic environments, can be readily applied in oceanography to satellite images.
Normally, field data result from the interaction of two spatio-temporal patterns: that of the
measured variables (natural pattern) and that of the sampling design (sampling pattern). This
means that, for the same natural pattern, different spatio-temporal sampling patterns may give
different results (Ibanez 1973a, 1976). This problem is magnified in aquatic environments, as
natural patterns change rapidly in both space and time. This is obviously a fundamental problem
in limnology and oceanography, which will require further advances in the methods of
spatio-temporal analysis. It appears that methods such as partial Mantel analysis (Dow and
Cheverud 1985; Hubert 1985; Smouse et al. 1986) may be a step towards controlling for the
spatial and temporal organization of the samples.

REFERENCES

Aragon, Y., and H. Caussinus. 1980. Une analyse en composantes principales pour des unites
statistiques correlees, p. 121-131. In E. Diday et al. [ed.] Data analysis and informatics.
North Holland Pub!. Co., New York.
Becker, R. A., and J. M. Chambers. 1984. S: an interactive environment for data analysis and
graphics. Wadsworth Adv. Book Program, Belmont, CA. 550 p.
Biswas, G., A. K. Jain, and R. C. Dukes. 1981. Evaluation of projection algorithms. IEEE
Trans. PAMI 3: 701-708.
Bonifas, L., Y. Escoufier, P. L. Gonzalez, and R. Sabatier. 1984. Choix de variables en analyse
en composantes principales. Revue de Statistique appliquee 32 (2): 5-15.
Bonnisone, P., and K. Decker. 1985. Selecting uncertainty calculi and granularity: an experiment
in trading-off precision and complexity. GE TR85.5C38, Schenectady, N.Y.
Brieman, L., and J. H. Friedman. 1982. Estimating optimal transformations for multiple
regression and correlation. Dept. Stat., Stanford Univ., Calif., 81 p.
527

Carroll, J. D., and J. J. Chang. 1970. Analysis of individual differences in multidimensional


scaling via an N -way generalization of "Eckart-Young" decomposition. Psychometrika 35:
283-319.
Devaux, J., and G. Millerioux. 1976. Possibilite de l'utilisation de la cotation d'abondance de
Frontier (1969) pour l'analyse multivariable des populations phytoplanctoniques. C. R.
hebd. Seances Acad. Sci., Ser. D Sci. nat. 283: 41-44.
Dow, M. M., and J. M. Cheverud. 1985. Comparison of distance matrices in studies of
population structure and genetic microdifferentiation: quadratic assignment. Am. J. Phys.
Anthropol. 68: 367-373.
Dubois, D., and H. Prade. 1986. Fuzzy numbers: an overview. In J. C. Bezdek [ed.] The
analysis of fuzzy information, Vol. 1, CRC Press, Boca Raton, Florida.
Escoufier, Y. 1980. Exploratory data analysis when data are matrices, p. 45-53. In K. Matusita
[ed.] Recent developments in statistical inference and data analysis. North Holland.
Friedman, J. H., and J. W. Tukey. 1974. A projection pursuit algorithm for exploratory data
analysis. IEEE Trans. Computers, C-23: 881-889.
Frontier, S., and F. Ibanez. 1974. Utilisation d'une cotation d'abondance fondee sur la
progression geometrique, pour l'analyse en composantes principales en ecologie
planctonique. J. Exp. Mar. Biol. Ecol. 14: 217-224.
Gower, 1. C. 1975. Generalized Procrustes analysis. Psychometrika 40: 33-41.
Hotelling, H. 1933. Analysis of a complex statistical variable into principal components. 1. Educ.
Psychol. 26: 417-441,498-520.
Huber, P. 1. 1985. Projection pursuit. Ann. Stat. 13: 435-525.
Hubert, L. J. 1985. Combinatorial data analysis: association and partial association.
Psychometrika 50: 449-467.
Ibanez, F. 1971. Effet des transformations des donnees dans l'analyse factorielle en ecologie
planctonique. Cah. Oceanogr. 23: 545-561.
Ibanez, F. 1973a. Methode d'analyse spatio-temporelle du processus d'echantillonnage en
planctologie, son influence sur l'interpretation des donnees par l'analyse en composantes
principales. Ann. Inst. Oceanogr. Paris 49: 83-111.
Ibanez, F. 1973b. Une cotation d'abondance reduite atrois classes: justification de son emploi en
analyse en composantes principales, mise en oeuvre et interet pratique en planctologie. Ann.
Inst. Oceanogr. Paris 50: 185-198.
Ibanez, F. 1976. Contribution al'analyse mathematique des evenements en ecologie planctonique.
Bull. Inst. Oceanogr. Monaco 72: 1-96.
Ibanez, F. 1983. Optimisations de la representation des series chronologiques planctoniques
multivariables. Rapp. Comm. Int. Mer Medit. 28: 113-115. .
Ibanez, F. 1986. Le determinisme du chaos. J. Rech. Oceanogr. 11: (in press).
Ibanez, F., and M. Etienne. The fractal dimension of a chlorophyll record. (Submitted).
Legendre, L., S. Demers, and D. Lefaivre. 1986. Biological production at marine ergoclines, p.
1-29. In J. C. J. Nihoul [ed.] Marine interfaces ecohydrodynamics. Elsevier, Amsterdam.
Lowitz, G. E. 1978. Stability and dimensionality of Karhunen-Loeve multispectral image
expansions. Pattern Recognition 10: 359-363.
Mallows, C. L., and T. W. Tukey. 1982. An overview of techniques of data analysis,
emplasizing exploratory aspects. In J. Tiago de Oliviera and B. Epstein [ed.] Some recent'
advances in statistics. Academic Press.
McDonald, J. A. 1982. Interactive graphics for data analysis. Ph. D. Dissertation, Stanford
Univ., Calif. 60 p.
Pearson, K. 1901. On lines and planes of closest fit to systems of points in space. Phil. Mag. 2:
559-572.
Rao, C. R. 1965. The use and interpretation of principal component analysis in applied research.
Sankhya, A 26: 329-358.
Shafer, G. 1986. Belief functions and possibility measures. In J. C. Bezdek [ed.] The analysis
of fuzzy information, Vol. 1. CRC Press, Boca Raton, Florida.
Smouse, P. E., J. C. Long, and R. R. Sokal. 1986. Multiple regression and correlation
extensions of the Mantel test of matrix correspondence. Syst. Z001. 35: 627-632.
NUMERICAL METHODS IN TERRESTRIAL PLANT ECOLOGyl

R. Gittins* (Chairman), S. Amir, J.-L. Dupouey, W. J. Heiser, M. Meyer,


R. R. Sokal, and M. J. A. Werger

*Biometry Unit, Faculty of Agriculture, University of Sydney,


Sydney, NSW, Australia 2006.

1. INTRODUCTION

Studies of terrestrial plant communities are generally directed towards describing the
vegetation of some suitably circumscribed area and then interpreting the features described in
terms of environmental, biological or historical processes or events. More specifically, the usual
objectives are (a) to study spatial and temporal patterns in the occurrence and representation of
terrestrial plants; (b) to elucidate the causes of such patterns in terms of environmental factors and
species interactions; and (c) to establish whether recognisable communities exist, and, if so, to
describe them and to account for the dynamics in their species composition. Numerical ecology
offers one avenue of approach towards the attainment of these goals. Quantitative studies of
vegetation begin with observations or measurements in the field and proceed by means of
algebraic manipulation of the data to graphical expression or display. Such displays, in
themselves, are succinct descriptions of the vegetation which, in addition, may provide insight
into the nature and relative importance of underlying ecological controls.
Terrestrial plant communities are composed of individuals that belong to numerous
co-existing species. It is the higher-plant species which are the focus of attention in most, though
not all, studies of terrestrial vegetation. These species constitute the variables of interest. As a
rough guide, the number of higher-plant species, p, likely to be encountered in most studies can
be taken to be bounded by 50 and 500, that is 50 ::;; p ::;; 500. Thus, a salient characteristic of
terrestrial plant ecology is that the domain with which it deals is multivariate. Put rather
differently, we may say that field observations in community ecology are vector-valued.
Terrestrial vegetation is just one component of a larger entity - the ecosystem, the remaining
components of which are the fauna, fungi, climate and soils of the area in question, together with
the interactions and reactions which bind these constituents together. Definitive studies of
terrestrial vegetation are therefore inseparable from the study of the ecosystem as a whole, or at
any rate of some significant part of it Accordingly, a second characteristic of terrestrial ecology is
that it is relations between variables of two or more distinct but associated kinds which ultimately

1 This report represents Dr. Gittins' summary of the debates of the Working Group. It was not submitted to the
other members' approval before publication, for lack of time. [Editor]

NATO ASI Series, Vol. G14


Developments in Numerical Ecology
Edited by P. and L. Legendre
© Springer-Verlag Berlin Heidelberg 1987
530

are the primary focus of interest in most if not all studies. A third characteristic of terrestrial
ecology is associated with the heterogeneity of terrestrial vegetation. The meaning of heterogeneity
here has been well-described by Webb (1954), who observed that variation in random samples of
vegetation hovers in a tantalising way between the continuous and the discontinuous. That is to
say, units comprising the sample are likely to have been drawn from some unknown mixture of
underlying plant communities or p-variate distributions and thus are unlikely to consist of
identically distributed representatives of a single homogeneous community or distribution.
Together, it is the high-dimensionality of the data, the complexity of the network of
interrelations among species and between them and environmental variables, and the heterogeneity
of the samples which combine to make the description and comprehension of vegetation the
challenge that it is. These features explain why narrative accounts of vegetation, in themselves,
should have proved insufficient in efforts to place terrestrial plant ecology on a firm scientific
footing. Vector-valued observations are simply not amenable to verbal description though they are
readily handled algebraically.
Ecology by definition deals with relations between variables of distinct but associated
kinds (e.g., Walter and Breckle 1985; Begon eta!' 1986). It follows as a consequence that the
data matrix with which we deal in terrestrial plant ecology is a partitioned matrix of the form A =
[AI I A21 ... lAm], where Al (N x p), A2 (N x q), ... , Am (N X z) are submatices whose
columns correspond to biological, physical, chemical, spatial or other variables. The case A = [AI
I A2] is usual in applications, although reports of investigations for which m = 3 or m = 4 are not
unknown. The partitioned nature of A is frequently overlooked. In the ecological literature, for
example, the matrix A = [AI I A2] is all too frequently misspecified as the non-partitioned matrix
A = [AI]' where the distinction between variables of different kinds is either entirely disregarded
or else one of the two sets is neglected, in the initial stages of analysis at least. Misspecification
results in the inapproriate analysis of A, for example by principal components or correspondence
analysis. Studies of vegetation in which interest is strictly confined to variables of just one kind,
namely the species comprising the vegetation, are encountered from time to time. Since relations
between variables of different kinds are not involved, such studies are by definition (above)
outside the domain of ecology. If desired, they might be brought within the confines of the subject
by treating them as a special case where the data matrix A = [AI] is genuinely non-partitioned and
so corresponds to a degenerate case. It would be a serious error, however, if a degenerate case of
this kind were allowed to obscure the partitioned nature of A in ecological studies generally.
In addition to the partitioned nature of the data matrix, the rows (sample-units) of A are
rarely identically distributed, in terrestrial plant ecology. Rather, the topology of the implied
structure in A corresponds to an unknown mixture of multivariate distributions which differ in
location and which mayor may not overlap, but whose characteristics are otherwise unspecified.
In other words, the sample is heterogeneous. An account of one such empirical ecological data
distribution has been described with admirable clarity by Noy-Meir (1971). The partitioned form
531

of A together with the distributional features noted serve to distinguish the data matrix usually
encountered in terrestrial ecology rather sharply from its counterpart in classical multivariate
analysis and in experimental psychology, fields from which the impetus for the development of
methodology appropriate for analyzing multiresponse data has largely been derived. Evidently,
what is required above all in terrestrial plant ecology are procedures that will render relations
between m ~ 2 vector-valued variables tractable even where the sample in question is
heterogeneous.
The principal aim of this report is to evaluate, as far as possible, the opportunities afforded
by new or recently developed numerical methods for attaining research goals in terrestrial plant
ecology. Little or nothing in the way of a coherent body of ecological theory exists which might be
used to structure or guide such an endeavour. Accordingly, having stated above what we consider
to be the principal methodological challenges posed by terrestrial vegetation data, we have opted to
proceed to evaluate methods in terms of their ability or promise to address one or more of the
challenges identified. The principal sections of the paper focus on problems associated with
high-dimensional data and heterogeneity. Sections 2 and 3 are devoted to different classes of
techniques for the reduction of dimensionality in high-dimensional data, and section 4 to
procedures for dealing with heterogeneity. Finally, in section 5, our assessment of the
opportunities for the productive use of numerical methods in terrestrial plant ecology is expressed.

2. REDUCTION OF DIMENSIONALITY

Field observations in terrestrial ecology consist of simultaneous observations or


measurements on several variables; they are vector-valued. Where the dimensionality exceeds
about four or five, vector-valued observations are difficult to comprehend. Scaling is a general
approach for simplifying high-dimensional data in such a way that their comprehension is eased.
More specifically, scaling is directed towards arriving at a simple geometric description of the
sample in two or three dimensions, which is revealing with respect to its internal structure.
Geometric representation of the sample, following algebraic manipulation of the initial field data,
enormously eases its comprehension (Gower, this volume).
A large number of scaling methods have been developed. From a practical viewpoint, one
important distinction among these is that between methods for linear (Gower, Escoufier, this
volume) and nonlinear (Carroll, de Leeuw, Heiser, this volume) reduction of dimensionality,
respectively.

2.1 Linear reduction of dimensionality

Classical multivariate analysis provides several methods that lead directly or indirectly to
532

linear reduction of dimensionality. By classical multivariate analysis we mean that part of the
subject in which the multivariate normal distribution plays a prominent role. We shall assume
initially that m = 1, but will remove this restriction later. For a single set of variables (m = 1), a
useful set of summary statistics where the data distribution is p-variate normal is based on the
means, variances and covariances of the variables in question. Suppose that there is interesting
structure in A. Then an important consequence of p-variate normality is that the configuration
generated by A in p-space will be a p-dimensional ellipsoid with linear axes. Further, the
covariance matrix, S = (N-l)-lAtA, where A is the data matrix in deviations form A = (A -lat)
and at is the row vector of column means, will capture this structure with remarkable efficiency.
The covariance matrix, or some related matrix of scalar products, is the starting point for most
classical multivariate analyses. The solutions such methods provide very often result in a reduction
of dimensionality, a reduction which is linear in the sense that the coordinates of sample-points in
the reduced space are linear functions of the original coordinates in p-space. We sketch three
methods of this kind together with a closely related method (principal coordinates analysis), all of
which have application in terrestrial plant ecology.
Principal component analysis. Principal component analysis is the classical method for the
linear reduction of dimensionality in an unstructured sample. Though the plane (or hyperplane) is
best fitting, in the sense that it minimizes the sum of squares of the residuals, this does not assure
that it will necessarily yield the most useful view of the sample for practical purposes. Thus, an
outlying sample-point might dominate one of the first two dimensions and result in a poor fit to
many other points. Or, where sample-points lie close to some nonlinear manifold inp-space rather
than a linear one, the principal components will at best provide only a poor approximation of the
sample.
Principal coordinates analysis. The starting point for principal coordinates analysis (Gower
1966, and this volume) is a symmetric matrix of distances or of distance-like quantities between
samples, rather than a covariance matrix between variables. The distances may be observed
directly or may be computed from the data matrix A. Principal coordinates analysis of a distance
matrix!!.. (N x N) provides a mapping of N sample-points in a reduced t-dimensional space such
that the distances between points in the plot approximate the observed or calculated distances in !!...
Principal coordinates analysis may be applied to matrices whose elements are computed from one
of the many coefficients that measure similarity or dissimilarity, Euclidean or otherwise. For a
review of such coefficients and their properties, the interested reader is referred to Gower and
Legendre (1986). The freedom of choice regarding the elements of!!.. represents an important
advantage over principal component analysis, as the distance used in components analysis
(Euclidean distance) is arbitrary and is rarely appropriate in practice. In this sense, principal
coordinates analysis represents a significant generalization and improvement over components
analysis. On the other hand, unlike principal component analysis, principal coordinates analysis
provides no information on the role of the variables in the analysis. In applications, such
533

considerations as these are pertinent to the matter of selecting a method which is appropriate to a
specified ecological goal.
We consider next two Euclidean mappings, both extensions of principal component
analysis, which may be useful where relations between species and samples are important to the
success of an investigation. These mappings are the biplot and correspondence analysis. Each
yields a graphical representation of the sample that discloses relations within the set of
sampling-units, within the set of species, and also relations between the sample-units and the
species.
The hiplot. Let A = (aji) be an N x p matrix consisting of N relations of a p-valued
variable expressed as deviations from the variate means. Starting from A, the biplot (Gabriel
1971, 1982) provides a display in which rows and columns are simultaneously represented in a
low-dimensional vector space with the aim of obtaining more insight into the data than could be
obtained from separate inspections of samples and variables. It is usual to represent samples
(rows) by points and variables (columns) by vectors emanating from the origin which is at the
centroid. Relations among samples are proportional to the distance between pairs of
sample-points. With respect to the variables, the length of a vector is proportional to the standard
deviation of the variable in question, while the cosine of the angle between any two vectors is the
correlation between the corresponding variables. Relations between samples and variables are
interpretable in terms of the angular separation of the row and column markers or symbols
involved. More specifically, the scalar product between the jth sample-point and the ith variable
vector approximates the jith element of A, aji. These between-set relations are especially valuable
for interpreting the configuration of sample points. In terrestrial ecology all too often one comes
across attempts to interpret the sample in terms of its principal components - in effect, that is,
attempts are made to interpret the components themselves in ecological terms. Though useful for
plotting, however, it is far from clear that the components are necessarily useful for interpretation.
Interpretation of the sample is more likely to be productive in terms of the original variables, hence
the appeal of the biplot. The biplot is a flexible tool, many variants and extentions of which have
been described. For an introduction to this work, see Bradu and Gabriel (1978), Gabriel (1981),
Cox and Gabriel (1982), Greenacre (1984, p. 341).
Correspondence analysis. Correspondence analysis (Gower, Escoufier, and de Leeuw,
this volume; Greenacre 1984) provides a simultaneous, two-dimensional display of the rows and
columns of a data matrix A which is useful where row and column entities are commensurate.
Rows and columns are each represented by points in the display. The matrix A is first
standardized to R-1I2 A C- 1I2 , where R (N x N) is the diagonal matrix of row totals and C (p xp)
the diagonal matrix of column totals. When A is a contingency table this standardization results in
displays with appealing metric properties. Ecological affinities between row-entities are inversely
proportional to the distance between row-points in the display; similarly, affinities between
column-entities are inversely proportional to the distance between column-points. In contrast, the
534

distance between a row-point and a column-point is not open to straightforward interpretation in


these tenns, although samples and species that occur in proximity on the display are likely to be
related at least to some extent in tenns of the original data. The nature of the interpretations to
which sample/species relations are open in correspondence analysis and the biplot is one important
way in which the methods differ. We have seen that correspondence analysis embodies a very
specific standardization. The effect of this standardization is to equalize the influence of samples
rich or poor in tenns of the number of species they contain, and the influence of abundant or rare
species in the analysis. This standardization has no counterpart in the biplot, and this constitutes a
second important difference between the two methods. There are two further points to bear in
mind, when using correspondence analysis, if sensible interpretation of results is to be achieved,
in addition to the meaning of row/column distances and the effects of standardization. First, the
homogeneity of the sample has a bearing on sensible interpretation (see Lebart et al. 1984, p.
162). Secondly, a weight is attached to each point in the display, given by the corresponding row
or column total, which must be taken into account in interpreting the display. We remark finally
that although correspondence analysis is primarily defmed for matrices of numerical frequencies,
data matrices of several kinds can be recoded in order to render them amenable to analysis.
Our discussion to this point has focused on applications involving a single set of variables
(m = 1). We turn now to consider the more general case for which m ~ 2. We have noted that
relations among variables of different kinds feature prominently in most if not all definitions of the
tenn ecology. Accordingly, questions concerning relationships comprise the essential research
questions of the discipline. For concreteness and simplicity, we shall confine our attention to
relations between two sets of variables (m = 2), noting that extensions to the more general case m
> 2 is straightforward. In terrestrial plant ecology, the variables of interest are generally drawn
from among the following domains: vegetation, fauna, fungi, soil, climate, fire, space, and time.
In exploratory studies, the question of clarifying relations among variables from any two or more
of these domains almost always arises. One method which is well-suited to this purpose is
canonical correlation analysis.
Canonical correlation analysis. Canonical analysis (Hotelling 1935, 1936) has the aim of
reducing the correlation structure between two sets of variables, x (p x 1) and y (q x 1) say, to its
simplest possible fonn. HoteHing's solution to this problem was to seek linear transfonnations
from the x's to new variables u I ' ~, ... , up (say), and from the y's to new variables VI' V2' ... ,
Vq (say), whose correlation matrix R(u,v) has a particularly simple and appealing from. More

specifically, transfonnations are sought that disentangle relations (correlations) within each set of
variables while simultaneously emphasizing relations (correlations) between sets. The correlation
matrix of the x's and y's, R(x,y), is in fact reduced to a fonn, R (u,v), that involves in most
cases only two or three nonzero quantities.
This remarkable result enonnously simplifies the task of comprehending relations among
the original x's and y's. What canonical analysis does is to identify variables (the u's and v's) that
535

preserve and clearly reveal the internal structure of R(x,y). The new variables are associated in
conjugate pairs (uk' vk)' for k = 1,2, ... , s where s = min(p, q), and are known as canonical
variates. Not all pairs (uk' vk) are equally useful, and in applications it is often found that all but
the two or three most highly correlated pairs can be discarded with little loss of important
information. In this way a reduction in dimensionality is achieved. Interpretation of results is
based on the point configuration that results on mapping sample-points into the low-dimensional
vector-space associated with the retained canonical variates. The resulting configuration is then
examined for its substantive implications, with the distance between points again being the
primary interpretive device. Thus, we have seen that canonical analysis is a particular form of
scaling which is very specifically oriented towards clarifying the correlation structure of
multiresponse data where the variables in question fall naturally into classes of different kinds.
The notion of arriving at a low-dimensional spatial representation of the sample, however, was not
explicit in Hotelling's original formulation.
The results sketched above for m = 2 have been generalized to the case m > 2. Where all m
sets of variables are on an equal footing, solutions have been proposed by Horst (1961),
Kettenring (1971), van de Geer (1984) and Verdegaal (1986). Generalizations of quite different
kinds have also been proposed. It is usual in canonical analysis to assume, for example, that the
relationship between x and y is symmetric. The theory has however been extended to situations in
which the relationship is asymmetric (van den Wollenberg 1977; Tyler 1982; Israels 1984). Other
extensions that allow the efficient analysis of nonmetric data have been proposed by van der Burg
and de Leeuw (1983), van der Burg (1985), and Verdegaal (1986). The practical significance of
these and similar developments for terrestrial plant ecology is that canonical analysis clearly
emerges as a method of remarkable flexibility and of correspondingly wide applicability.
Convincing ecological applications of canonical analysis are nevertheless few.
Concluding remarks. Applications of methods such as principal components analysis, the
biplot, correspondence analysis, and canonical analysis, which depend on the data only through
the covariance matrix or some similar matrix, will be most satisfactory when the sample is
homogeneous and the data distribution elliptically symmetric, not too long-tailed and free from
contamination by outlying or extraneous observations. Thus, before embarking on linear reduction
of dimensionality, one is well-advised to examine the data and to proceed only where these are
shown to conform to the specifications mentioned. Procedures for assessing mUltiresponse data
for this purpose have been described by Gnanadesikan (1977). In addititon, it is well to bear in
mind that all real problems are nonlinear. Linear methods are therefore at best only first-level
approximations to problems of greater complexity. Yet, where they are applicable, linear methods
have the merits of being sufficiently simple to be mathematically tractable and sufficiently realistic
to allow sensible interpretation of results. Algebraically, the methods lead to straightforward
matrix decompositions with closed-form solutions for which efficient, numerically stable
algorithms are widely available. For these reasons, methods of classical linear multivariate
536

analysis are likely to remain the methods of choice for aiding the comprehension of
high-dimensional data whenever the data meet the specifications set out above.
In selecting a method for linear reduction of dimensionality, several matters have to be
weighed. Where relations between two or more sets of variables are the focus of attention,
canonical analysis in one form or another is almost always likely to be the preferred method. In
studies involving variables of just one kind, principal component analysis offers an uncomplicated
approach and one which in addition provides insight as to the role of the species in the analysis.
Yet, the distance coefficient implicitly used is arbitrary and often difficult to justify in practice.
Principal coordinates analysis overcomes the latter difficulty, but at the expense of forfeiting
information on the species. The biplot and correspondence analysis are available where samples
and species jointly are the focus of attention. Of the two, the biplot has the greater flexibility and
yields a precise representation of species/sample relations. Correspondence analysis may be
preferred where the data are of a particular kind, namely commensurate, where the declared
ecological objective unequivocally calls for a quite specific standardization of the data, and where
an exact specification of species/samples relations is not of overriding importance.
Although the theory of classical multivariate analysis has existed for something like fifty
years, and despite its straightforward algebraic basis and computational requirements, the impact
of classical multivariate analysis on our understanding and appreciation of terrestrial vegetation has
scarcely been beneficial. Convincing ecological applications in which due care and attention has
been paid to the selection of a method, to the implementation of the chosen method, and to the
interpretation and reporting of results are few and far between. It is clear from the ecological
literature that the methods themselves are generally poorly understood by terrestrial plant
ecologists. Three requirements for incisive analysis are almost always overlooked: the need to
ensure that a chosen method is properly matched to the declared ecological goal, the need to
carefully consider the effects of the implicitly or explicitly chosen origin and unit of scale on the
outcome of the analysis, and the need for care and sound judgement in implementing a method if
analysis is to be productive. We shall defer consideration of the reasons for this state of affairs and
its implications until section 5.

2.2 Nonlinear reduct jon of dimensionality

Where the data distribution is other than elliptically symmetric, important structures may be
present that cannot be captured adequately by linear associations or correlations between variables
- in other words, by a covariance or correlation matrix. Such features include the tendency for
sample-points to be concentrated close to certain kinds of curved, t-dimensional surfaces or to be
aggregated into clusters, either discrete or overlapping. In contrast to linear effects, the variety of
shapes and other attributes of nonlinearity are many. Linear methods in these cases will be
deficient and there is a need for methods that are sensitive to effects of the kinds described, even
537

though it is impossible to specify all the many possibilities in advance. Before proceeding to
consider methods for nonlinear reduction of dimensionality, we make some general observations
regarding these methods, contrasting them in the process with procedures for linear reduction of
dimensionality .
We remark fIrst of all that the starting point for classical, linear multivariate analysis is a
model which is assumed to describe the distribution of the data to be analyzed. Distributional
specifications figure prominently, the model being fitted under the assumption that the
distributional specifications are in fact satisfied. Nonlinear scaling methods, in contrast, begin
with the data rather than a model, the analysis being directed towards finding a structure or model
that describes the data. The tightly-specified distributions of classical statistics are thus entirely
circumvented. In short, nonlinear reduction of dimensionality is data analytic rather than
confrrmatory in character, exploratory rather than inferential. Second, the search for coordinates in
a reduced dimensional space in nonlinear scaling is not restricted to coordinates which are linear
functions of the original coordinates of the data-points. This freedom imparts a degree of elasticity
to the shape of configurations implied by the data which can profitably be fitted by nonlinear
scaling which is not shared by methods for linear reduction of dimensionality.
A third point concerns the measurement level of the data for analysis. The methods of
classical, linear multivariate analysis were proposed with metric (interval or ratio scaled) data
primarily in mind. Nonmetric (nominal or ordinal) data are for the most part less amenable to
incisive analysis by classical methods. Nonlinear scaling in contrast is directed primarily towards
the analysis of nonmetric data. The preoccupation with nonmetric data is justified on several
grounds. For example, the apparent precision of metric or quantitative data in terrestrial plant
ecology is all too often spurious, the extent of measurement error being such that the data contain
little reliable information beyond their rank order. A second point is that nonmetric data are
comparatively resistant to the effects of outlying observations or other peculiarities in distribution
to which ecological data are all too prone. Further, nonmetric data are almost always speedier and
cheaper to acquire than metric data. Finally, we shall see that nonmetric data can be transformed in
such a way that nonlinear structure in the data can frequently be linearized and hence captured
parsimoneously if analyzed appropriately. We find these arguments in favour of nonmetric data
compelling and advocate their widespread adoption in ecological studies of terrestrial plant
communities. Indeed, the unreliability of some data collected, and assumed to be metric for the
purpose of data analysis, together with the availability of efficient means for analyzing nonmetric
data, suggest that it would often be advantageous to replace these data by their ranks and then to
analyze (or re-analyze) them accordingly. In much of what follows we shall be primarily, although
not exclusively, concerned with nonmetric data.
The principles used to guide and inform the analysis of nonmetric data have been
well-summarized by Takane (1985) and by Carroll (this volume). A key notion in this area is that
non metric data are nonlinear transforms of metric data. Thus, if appropriate transformations of
538

initial, nonmetric data can be found, the transformed data can be analyzed by some such
quantitative procedure as multidimensional scaling. Unlike other procedures that require data
transformations, the form of any particular transformation in nonlinear reduction of dimensionality
does not have to be specified in advance; optimization of a suitable index or loss function will
yield both the best transformations and the best parameter estimates for a given model within the
least squares framework.
An important special case of multidimensional scaling is represented by multidimensional
unfolding (Heiser, this volume). Both are scaling methods used to represent dissimilarity data, the
distinction being that whereas multidimensional scaling involves a single set of objects 0, in
multidimensional unfolding the objects are partitioned into two fmite subsets, 01 and 02' In many
applications 01' corresponds to the row-objects (e.g., samples) of a rectangular data matrix A (N
xp) and 02 to the colunm-objects (e.g., variables) of the same matrix. Multidimensional scaling
and unfolding are likely to be most useful for nonlinear reduction of dimensionality where the
structure in the data is largely continuous.
Multidimensional scaling (Carroll, this volume) consists of a family of methods for the
spatial description of relations among objects which are free from both the distributional
restrictions and the requirement for metric data associated with classical, variance-based
multivariate analysis. Multidimensional scaling therefore significantly widens the domain of
applicability of multivariate methods in areas such as terrestrial ecology. Nevertheless,
characteristics of many ecological data sets, such as the partitioning of variables into subsets and
the occurrence of heterogeneous data structures, have not featured prominently in the development
of multidimensional scaling methodology. As a result, a sizeable gap exists between the
methodological requirements of terrestrial plant ecology, on the one hand, and the scaling methods
available to meet these needs on the other. Nevertheless, the range of substantive goals that can be
addressed by multidimensional scaling and the very general conditions under which scaling
methods are applicable, will in themselves suffice to ensure that multidimensional scaling will
come to make a significant contribution to numerical ecology.
The basic methods of two-way metric and nonmetrlc scaling have been shown to have
much to contribute to large-scale investigations of vegetation (e.g., Kenkel 1986; Kenkel and
OrI6ci 1986). In studies where temporal as well as spatial variation is of interest, and in a variety
of other situations, three-way scaling provides a wealth of opportunities. Three-way scaling with
linear constraints, for example, extends the applicability of multidimensional scaling to factorially
designed analysis of variance experiments. Accordingly, three-way scaling enables time trends,
treatment effects, individual differences, and numerous comparable conditions to be explored
directly within the context of multidimensional scaling. Opportunities of a different kind are
afforded by constrained multidimensional scaling. Constraints on the coordinates of a
configuration matrix X, for sample, may be useful where relations between objects are generated
by seasonal or other cyclical processes, not necessarily time-related. Or, constraints may be used
539

to introduce information on external variables of various kinds in order to arrive at more


parsimoneous representations or richer interpretations. In studies of vegetation, such external
variables could be soil or climatic variables, the geographical coordinates of sites, or measures of
competition between species. In large-scale vegetation surveys, nonlinear structure of some
complexity can reasonably be anticipated in the data. Parametric mapping has proved useful in
such cases (Noy-Meir 1974a, 1974b). The question of the appropriateness of imposing a
Euclidean structure on the observed dissimilarties, which arises throughout multidimensional
scaling, is especially pertinent here. Further, parametric mapping is known to be sensitive to the
presence of error in the data, while the discontinuity index, k, is not a reliable guide to
goodness-of-fit (Kruskal and Carroll 1969). Consequently, it can be difficult to judge whether a
solution (configuration) merits serious consideration or whether it should be discarded. Where an
interest is declared in relations between vegetation samples and their component species, unfolding
offers promise. The spatial representations provided are succinct characterizations of vegetation
which simultaneously provide a wealth of insight into structure, composition and variation. Here
also, opportunities for the introduction of constraints on the configuration open the way for
incorporating information on environmental, experimental design, or other variables.
Altogether, it is plain that the scope of the opportunities for the productive use of
multidimensional scaling in terrestrial plant ecology is considerable. At the same time, it is well to
bear in mind that scaling methods are delicate tools which must be used with care and good
judgement if sensible results are to be attained. Even the initial selection of a method is unlikely to
be straightforward for the uninitiated. We tum now to consider some of the issues that arise in
using multidimensional scaling and in interpreting the results provided.
Metric scaling is computationally more straightforward than nonmetric scaling but involves
the assumption that the relation between observed dissimilarities and fitted distances is linear. In
nonmetric scaling, a model is fitted while attempting as far as possible to preserve a rank-order
relation between dissimilarities and distances. This feature imparts to nonmetric scaling a degree of
flexibility not possessed by metric scaling in fitting configurations that lie close to mildly curved
manifolds in p-space. On the other hand, nonmetric scaling is less resistant to degenerate solutions
and local optima than most varieties of metric scaling. Further, nonmetric scaling has no
closed-form solution; fitting is an iterative process that proceeds by successive refmement of a trial
configuration in a space of chosen dimensionality. Often, the process converges to an optimum
other than the global optimum, or it may not converge at all. Sound judgement based on
experience is required to recognise such occurrences. A useful check as to whether a loss function
has been successfully minimized is provided by a plot of dissimilarities or disparities (transformed
dissimilarity values) against fitted distances. Factors that have a strong bearing on the solution
obtained are the choices for the starting configuration and the dimensionality. Evidently, nonmetric
multidimensional scaling calls for the exercise of considerable skill and expertise - familiarity with
the model in question and its properties, together with the care and insight necessary for its proper
540

implementation.
The robustness of multidimensional scaling against disturbances of different kinds has
received attention. Classical scaling has been shown to be little affected by random error (Kruskal
1977; Sibson 1979; Sibson et al. 1981). Scaling is also robust under variation in the method used,
except perhaps in certain unusual circumstances. Indeed, robustness against variation in method is
such that for most interesting sets of data, metric and nonmetric methods tend to yield
configurations which are remarkably similar, despite the algebraic and computational differences
involved (Carroll and Kruskal 1978). The results of multidimensional scaling do, however,
depend on the domain from which the objects of interest are drawn, particularly on its composition
and extent. That the composition of the domain should be influential in this way is perhaps not
altogether surprising. Fortunately, graphical procedures for assessing the extent of
sample-dependence or sample-specificity are available (Heiser and Meulman 1983a, 1983b;
Weinberg et al. 1984; de Leeuw and Meulman 1986). The effect of extent or breadth is more
subtle. As breadth increases so also do opportunities for the data distribution to become
increasingly heterogeneous. With heterogeneity, the likelihood that important or interesting
structures will go undetected is greatly increased. Where the data structure is neither strictly
continuous nor strictly discontinuous, but is somewhere between the two, as is common in
terrestrial ecology, the result of scaling is open to domination by chance features of the data. What
is more, no warning is given that a configuration has been so determined. In other words, where a
data distribution is not well-behaved, the results of scaling, while having every appearance of
normality and of being acceptable at face value, are more than capable of leading one astray. Thus,
it seems to us that in the context of large-scale vegetation surveys, the lack of robustness of
multidimensional scaling as the breath of the domain in question increases represents perhaps its
most vulnerable characteristic. Yet it is very much part of the flavour of multidimensional scaling
to neglect the data distribution altogether.
Remedial measures are nevertheless possible. Thus, a simple scatterplot of dissimilarity
values against fitted distances may be used to diagnose heterogeneity following analysis. An even
stronger case exists for examining the form of the data distribution itself before embarking on
scaling. As the dissimilarity matrix A in terrestrial plant ecology is almost invariably derived from
an N x p matrix A of multiresponse observations, such a step would be perfectly feasible. A
quantile-quantile (Q-Q) probability plot constructed from the rows, ai, of A would shed light on
the shape of the data distribution, and in particular of its coherence (Gnanadesikan 1977; Campbell
1980). Where coherence is demonstrated, A can be calculated and scaling undertaken in the usual
way. Otherwise, one or more dissimilarity matrices, each corresponding to a substantially
coherent subset of the rows of A, might be calculated, and each then scaled in turn. The procedure
would unavoidably fragment the analysis, but would have the considerable merit of being less
likely to lead one astray than an analysis based on A as a whole. A complementary step would be
to employ a loss function other than a least-squares one, since the sensitivity of least-squares
541

functions to disturbances of the kind in question is well-known. Some opportunities in this


direction are described and illustrated by Greenacre and Underhill (1982, p. 239).
The preceding survey draws attention to certain topics that require vigilance in
implementing multidimensional scaling and in interpreting its results; scaling methods are delicate
tools whose responsible use calls for insight and for the exercise of appreciable levels of expertise
and care. On the other hand, it seems to us that while there is certainly scope for further
methodological development, multidimensional scaling offers a variety of opportunities for the
productive analysis of ecological data. Terrestrial ecologists are fortunate to have at their disposal
such a rich variety of methods which liberate them from the linearity constraint of classical
multivariate analysis. The substantive goals that can be addressed by multidimensional scaling and
the very general conditions under which its methods are applicable will suffice to ensure that
multidimensional scaling will come to make a significant contribution to numerical ecology. At the
time of writing, however, multidimensional scaling remains largely unknown among terrestrial
plant ecologists. Furthermore, there are two areas of potential concern. First, we have seen that
the available methodology is not rich in techniques for dealing with two essentially ecological
problems: clarifying relations between sets of associated variables of different kinds, and
manipUlating data sets whose distributions characteristically are complex mixtures of several
component multivariate distributions. Constrained multidimensional scaling goes some way
towards meeting the first of these problems and may be amenable to further development. With
respect to data distributions that are not well-behaved, distance-based models that are robust under
such conditions would be enormously beneficial. There is relevant work. Degerman (1970)
proposed a procedure for the spatial representation of data that simultaneously embody discrete
and continuous elements. While Degerman's results are restricted to a rather specific class of data
structures (those with non-overlapping discrete elements), his work could provide a valuable
starting point for further work. Related work within the classical multivariate analysis framework
has been described by Noy-Meir (1971, 1973) and, in the multidimensional scaling tradition, by
Carroll and Pruzansky (1984). Second, the wise selection of a multidimensional scaling model
presupposes familiarity with the existence and properties of a large and growing number of
techniques (see Carroll, this volume). Related challenges are those of the satisfactory
implementation of a selected model and the adequate interpretation and reporting of results, each of
which calls for a comparable level of skill to that of model selection.

3. NON-LINEAR MULTIV ARIATE ANALYSIS

We use the term nonlinear multivariate analysis to refer to a class of methods which are
invariant under nonlinear transformations of the variables. That is to say, in the sense of Gifi
(1981) and de Leeuw (1984, 1987a, 1987b). The ALSOS-system of Young et al. (1980) is a
542

related approach which overlaps with Gifi's system (de Leeuw 1987a). For present purposes,
distinctions between the two approaches (Meulman 1986) are not important and are neglected.
Fundamental to both conceptualizations of multivariate analysis are the notions that all data
irrespective of measurement level are qualitative, and that qualitative data are nonlinear
transformations of metric data.
The idea that all data are qualitative is justified by appeal to the finite precision of the
measurement process (Takane et al. 1977; Young 1981). In terrestrial plant ecology, the precision
of measurements is generally low, so that data collected and assumed for the purpose of analysis
to be metric, commonly do not meet this standard. For Gifi, the well-known principal
measurement levels of Stevens (1962), that is nominal, ordinal, interval and ratio, are regarded not
as a property of data but as a set of restrictions that mayor may not be imposed in any subsequent
manipulation of the data. The notion that qualitative data are nonlinear transformations of metric
data draws empirical support from the common observation that nonlinear configurations in
geometric representations of multiresponse data are more frequent with qualitative than with metric
data. It follows that if suitable transformations of nonmetric data can be found, the transformed
data will be metric and therefore amenable to analysis by any appropriate method of standard linear
multivariate analysis. In contrast to other situations that call for the use of re-expression, the form
of a particular transformation in nonlinear multivariate analysis does not have to be pre-specified;
optimization of a suitable loss function will yield the best transformations and the best parameter
estimates for the model in question, within the least squares framework.
The emphasis in nonlinear multivariate analysis is very much on the analysis of linear
relations among nonmetric data, and not at all on nonlinear approximation. Study of linear
relations between any functions, f(x) and g(y), of two variables x and y, is precisely the study of
nonlinear relations between x and y. Accordingly, the nonlinear multivariate analysis problem is to
find optimal transformations - transformations of qualitative variables for which the transformed
variables are as linearly related as possible. Thus the nonlinear part of nonlinear multivariate
analysis concerns the transformations of the variables. Equivalently, the problem can be stated as
that of finding transformations that lead to the optimization of some pertinent criterion or loss
function. The general objective of nonlinear multivariate analysis is very much the same as that of
multidimensional scaling, namely to arrive at a spatial representation of a data matrix that conveys
as many useful relations as possible. At the same time, linear reduction of dimensionality is an
essential ingredient of nonlinear multivariate analysis, and in this respect nonlinear multivariate
analysis resembles classical multivariate analysis. In the nonlinear case, however, reduction of
dimensionality follows nonlinear transformation of variables.
Nonlinear multivariate analysis provides a very general framework for the quantitative
analysis of qualitative data. In facilitating the application of classical linear multivariate methods to
nominal and ordinal data, the new methodology represents a significant development. We turn
now to consider several implications of nonlinear multivariate analysis for terrestrial plant ecology.
543

In the first place, the metric or quantitative data still widely used in terrestrial ecology can
be dispensed with entirely. Metric field data are decidedly time consuming and expensive to
acquire. Nonmetric data, in contrast, are far easier and cheaper to obtain, so enabling larger
samples to be obtained for a given expenditure of time and effort. We have seen that nonlinear
multivariate analysis opens the way for the efficient analysis of nonmetric data. There is, however,
a price to be paid for this facility. The price is that two sets of parameter estimates are required
(those of the optimal scaling and model parameters) as opposed to a single set of estimates (model)
for the usual linear analyses. The precision of parameter estimates is very much a function of
sample size. It so happens that the larger samples, which become feasible with the acquisition of
nonmetric data, very nicely offset the reduced precision of estimates which would otherwise result
in the nonlinear case because of the additional estimates required. Second, nonlinearities due to the
coding of nonmetric variables are circumvented by finding optimal transformations. The practical
significance of this feature is that an improvement in the fit of a given model could reasonably be
anticipated. In other words, more informative graphical summaries of the plant communities or
ecosystems of interest could be expected than would result from a linear analysis of the same data.
A third aspect of nonlinear multivariate analysis is that questions of data expression do not
have to be solved before a chosen method is applied. Data expression includes as special cases
centering, standardization and the choice of a similarity measure. In practice, a rationale for
choosing between the various options is sometimes lacking and arbitrary choices which lack
theoretical justification are often made. In nonlinear multivariate analysis, however, the expression
of a variable in a data matrix is regarded as essentially a convention, merely a coding. As a
consequence, the question of re-expression does not have to be solved before a technique is
applied; rather, it is an important part of the methodology of nonlinear multivariate analysis to find
appropriate re-expressions. In other words, optimal scaling removes the arbitrariness from
re-expression. Fourth, nonlinear multivariate analysis is very much oriented towards statistical
data analysis as distinct from statistical inference. In this respect, the new methodology accords
closely with the realities of ecological data, where departure from p-variate normality or even
elliptical symmetry is the rule rather than the exception. It is for precisely this reason that the
models of classical multivariate analysis all too often prove to be too tightly-specified to be used
responsibly in studies of terrestrial vegetation. Nonlinear multivariate analysis, in contrast, is
comparatively free from distributional constraints. More realistic analyses and sounder
conclusions are to be expected. Lastly, nonlinear multivariate analysis with optimal scaling can
under appropriate conditions be used to generate a surprisingly wide class of techniques, as de
Leeuw (1987a) has shown. We have already seen that where a least squares procedure for
analyzing metric data is known, then that procedure can also be used to analyze qualitative data
simply by alternating the procedure with optimal scaling. In fact, nonlinear multivariate analysis
subsumes and generalizes all linear multivariate methods to yield a unified framework for linear
and nonlinear multvariate analysis, and so it provides a common approach to a diversity of
544

ecological goals.
Having mentioned some of the ecological benefits to be expected from nonlinear
multivariate analysis, we draw attention to precautions which for best results need to be observed
in applying nonlinear methods. These are (a) the desirability of assessing the joint distribution of
the optimally scaled variables before fitting a linear model; (b) the need for care in model fitting;
and (c) the importance of assessing the stability of the results obtained.
Joint distribution of optimally scaled variables. Nonlinear multivariate analysis can be
viewed as a class of methods that have as their common starting point a correlation matrix (de
Leeuw 1987a, 1987b). Our remarks in this section are made with this observation very much in
mind.
It is very much part of the flavour of nonlinear multivariate analysis that statements about
the joint distribution of the data are almost entirely avoided. The resulting freedom from
distributional constraints will be appreciated in community ecology, where it is a matter of
common experience that the distributional requirements of linear multivariate analysis are
sometimes restrictive. Yet, in ecological applications of nonlinear multivariate analysis, it seems to
us that it would be unwise to disregard the data distribution entirely. In a previous section, the
view was expressed that the kind of data distributions with which we deal in terrestrial plant
ecology consist of some unknown mixture of multivariate distributions which differ in location
and which mayor may not overlap, but whose characteristics are otherwise unspecified. It is
pertinent at this point to enquire as to the effect of such a distribution on nonlinear multivariate
analysis. Just how far can a data distribution depart form a homogeneous, p-dimensional ellisoid
of the kind implied by multinormality and yet yield sensible results? Given the sensitivity of
second-order linear multivariate methods to disturbances of the kind described (Devlin et al.
1981), it seems to us that the consequences of heterogeneity, at least, on the structure of the
correlation matrix merit attention in nonlinear multivariate analysis.
The sensitivity of least-squares based methods to disturbances in the data distribution is
leading, in careful applications of linear multivariate methods, to the fitting of a model only where
the data distribution has first been examined and shown to be well-behaved. Thus, the first step is
to probe the data distribution, and to proceed to the next stage (fitting) only where evidence
justifying this step can be adduced. Good examples of this practice are provided by Campbell
(1980) and by Smith et al. (1983). Such measures do not seem to be part of the new
methodology. Instead, in nonlinear multivariate analysis, the impression is conveyed that the
optimal transformations will, in themselves, suffice to bring the joint distribution to an acceptable
form, irrespective of the presence of gross heterogeneity or other more subtle disturbances. There
is evidence in support of the robustness of nonlinear methods against such features. Nominal and
ordinal data are certainly less sensitive than metric data to any peculiarities which may be present
in the data. Further, the stability of the data transformations themselves has been clearly
demonstrated in certain applications (e.g., van der Burg and de Leeuw 1983). The use of loss
545

functions other than unweighted least squares functions might further strenghten robustness.
Nevertheless, it would seem to us unwise in ecological applications to disregard the data structure
altogether.
In the absence of some such evidence as a coherent and substantially linear Q-Q probability
plot of the data prior to fitting a model, the question of the impact of the data on the outcome of
analysis must remain equivocal. From the alternating least squares method of algorithm
construction, which at the time of writing is an integral part of nonlinear multivariate analysis, it is
plain that the data distribution cannot be examined in the usual way before fitting a model. What
can, however, be done is to first obtain the matrix of optimally transformed variables, using for
example the step-one HOMALS procedure of Gifi (1981, sect. 3.8.2). An even better alternative
might be to obtain the optimally transformed variables from a preliminary run of the nonlinear
analysis in question, performed solely for this purpose, as the scaling of optimally transformed
variables is dependent on the criterion (loss function) minimized. The joint distribution of the
transformed variables could then be examined, using a Q-Q plot for the purpose, and, where
found to be well-behaved, the analysis completed by the routine application of a standard classical
linear multivariate method. Where the joint distribution proves to be other than homogeneous and
elliptically symmetric, steps to bring the distribution to a more acceptable form might be feasible.
In place of the matrix of transformed variables, the induced correlation matrix of the optimally
tranformed variables (van der Burg 1985, p. 38; de Leeuw 1987a) might be used to shed light on
the data distribution.
The detection of disturbances in a data distribution by a rather different procedure from that
described above has been described by van der Burg (1985, p. 41). Van der Burg's procedure is
as follows. Where the presence of outlying observations is disclosed by a nonlinear analysis,
either the implicated variables are re-coded, or the samples in question are otherwise dealt with.
The entire analysis is then re-run. The procedure proposed above, however, is both more
systematic and more revealing as to the shape of the data distribution as a whole than in van der
Burg's method. A price would have to be paid for the added refinement. Thus, every nonlinear
analysis would involve at least three steps: a preliminary analysis to obtain the optimally
transformed variables, a probability plot to disclose the joint distribution of the transformed
variables, followed by a classical linear multivariate analysis to fit some suitable model to the
optimally scaled variables where the shape of the distribution of these is shown to be acceptable
for this purpose.
Modelfitting. As in multidimensional scaling, fitting a nonlinear multivariate model is an
iterative procedure, with all the attendant sensitivity on a variety of factors. Iteration normally
commences by taking as starting values the actual values of the data to be analyzed, having first
re-expressed quantitative data in discrete form where necessary. Other starting values, however,
are admissible. Fitting is effected by optimizing an appropriate criterion or loss function with
respect to a solution in some pre-specified dimensionality. Convergence may be to a local rather
546

than a global optimum, an outcome which is not revealed by inspection of the loss function itself.
Trial and error using different starting values and different choices of dimensionality may be
helpful in distinguishing local from global optima. Unlike the equivalent linear method, the
solution in nonlinear multivariate analysis is not nested. That is to say, the coordinates of a
solution in t-dimensions are not equal to the fIrst t-coordinates of the (t+l)-dimensional solution.
Stability. Results in nonlinear multivariate analysis are dependent on characteristics of the
particular sample analyzed, to an even greater extent than in classical linear multivariate analysis,
since results in nonlinear analysis tend to be sample-specific. As a consequence, while the results
and any conclusions drawn from them may well be valid for the sample actually analyzed, there is
an element of uncertainly or vagueness about any wider validity which the results and conclusions
may possess. This feature of nonlinear multivariate analysis is a direct consequence of the large
number of parameter estimates required. Invariably, there are two sets of these (estimates for
optimal scaling parameters and for model parameters), compared with just one set in linear
multivariate analysis (model paramaters). Sample-specificity is very much a function of the total
number of parameter estimates required relative to sample size. In view of the implications of
sample-specifIcity, it is always worthwhile to assess the extent of this condition in applications.
Stability refers to the extent to which the results of an analysis are resistant to small
perturbations in the data. Small perturbations might reasonably be expected in taking one or more
additional samples, with the same specifIcations and from the same domain as the original sample.
Or, comparable additional samples are to be obtained by resampling from the original. The
jackknife and the bootstrap are both computer-intensive re-sampling schemes which have been
used to examine the stability of results in nonlinear multivariate analysis (van der Burg 1985; van
Rijckevorsel et al. 1985). The use of some such procedure to assess the sample-specificity of
results is best regarded as a integral part of any nonlinear multivariate analysis.
Concluding remarks. The essential point about nonlinear multivariate analysis is that it
extends the applicability of all the methods of classical linear multivariate analysis to nonmetric
data. Better fitting models, yet more parsimonious than would otherwise be the case, result.
Nonlinear multivariate analysis also subsumes methods that are well-suited to the analysis of
relationships between sets of variables of different kinds, a problem that we have seen is pertinent
in a large class of ecological endeavours. There are other benefits. Nonlinear methods are
well-suited to dealing with large, relatively unstructured (i.e., homogeneous) data sets for which
there is little in the way of prior information about physical or causal mechanisms. A wealth of
new opportunities for productive analysis is therefore provided.
Despite the flexibility that nonlinear multivariate analysis brings to data analysis, the
approach has its limitations and it is only proper that these be examined. It is well to recognize, for
example, that nonlinear methods comprise a very restricted class of multivariate techniques; they
are methods that depend on the data only through second-order moments and product-moments.
More specifically, nonlinear multivariate analysis is confined strictly to methods that have a
547

correlation matrix as their starting point (de Leeuw 1987a, 1987b). This represents a very severe
cutting operation. Furthermore, the essential role of the correlation matrix immediately raises two
issues, namely a need to consider the consequences of (a) the implied centering and scaling of the
data; and (b) the likely impact of the data themselves on the outcome of the analysis. Now,
centering and standardization may each be called for where there are compelling ecological
grounds and where the sample is substantially homogeneous. If the sample is not homogeneous,
operations involving sample means and sample standard deviations are not well-founded. The
point to be emphasized here is that in nonlinear multivariate analysis, freedom to allow substantive
or other pertinent considerations to guide and inform the crucial issue of choice of scale unit and
origin is lost. Even where a sample is substantially homogeneous, it is necessary to bear in mind
that a correlation matrix has a breakdown point of zero percent. The sensitivity of the correlation
matrix to possible disturbances in the data distribution accordingly has to be taken very seriously.
Evidently, in ecological studies, it could be prudent if not mandatory to examine global and local
features of the data structure before embarking on nonlinear multivariate analysis. The Q-Q
probability plot would provide one convenient means of obtaining insight into both the coherence
of the sample as a whole, as well as of symmetry and other trait characteristics of the joint
distribution. Other procedures for the same purpose are also available (e.g., Friedman and Rafsky
1981). Where the data on examination prove to be free from irregularities, one might proceed to fit
some appropriate nonlinear model. Otherwise, steps to bring the data distribution into closer
conformity with the desirable norms for any standard linear analysis first deserve to be
contemplated.
We have also seen that nonlinear methods have a strong tendency to capitalise on chance
characteristics of the sample analyzed. Accordingly, it is good practice to assess the extent of
sample-dependence by analysis. Jackknife and bootstrap analyses are available for this purpose.
Further, as model fitting in nonlinear multivariate analysis is iterative, considerably more skill is
called for implementing a chosen method than is the case for a standard linear procedure. In short,
it is plain that the methods comprising nonlinear multivariate analysis are delicate tools, to be used
with sound judgement and care, if trustworthy results are to be obtained. It seems therefore that if
nonlinear methods are to be exploited to ecological ar, rantage, users will first have to acquire the
necessary insight and skill.

4. HETEROGENEITY

Consider a sample drawn from an unknown mixture of multivariate distributions whose


locations differ and which mayor may not overlap, but whose characteristics are otherwise
unspecified. Such a distribution is neither strictly continuous nor strictly discontinuous, but
somewhere between the two. We refer to data distributions of this kind as heterogeneous. The
548

data distributions encountered in terrestrial plant ecology are generally of this sort, being
especially characteristic of large-scale vegetation surveys. Heterogeneous data sets pose
difficulties for statistical data analysis. With most scaling methods, for example, there is an
implicit or explicit requirement that, for sensible interpretation of results, sample-units should not
deviate too far from being identically distributed, at least. Clustering procedures are free from any
such requirement but are all too likely to impose structure on the data, and thus to destroy features
that may have ecological significance. Scaling methods generally, as we shall see, while much
less drastic in their effects, are no less misleading when applied to large sets of heterogeneous
data. Evidently, there is a need for methods that are sensitive to data structures, which, in the
works of Webb (1954), hover in a tantalising way between the continuous and the discontinuous.
Most, if not all, scaling methods in common use are centered or, what amounts to the same
thing, are applied to centered data. The lack of robustness of centered scaling methods generally
(classical multivariate analysis and multidimensional scaling) to discontinuities in data arises as
follows. The first dimension of any centered scaling method applied to heterogeneous data is
likely to divide the sample into two (or perhaps more) subgroups, not necessarily of equal size or
coherence. The second dimension in such cases is open to being unduly influenced by chance
characteristics of one or another subgroup, such as a difference in size or scatter, or by some
uneasy compromise of properties of the two, rather than by characteristics of the sample as a
whole. In either case, the dimension extracted will poorly represent the total sample;
consequently, the second axis is all too likely to be uninformative if not actually misleading.
These remarks apply with even greater force to the third and higher dimensions, extraction of
which will only confound an already confused situation. No warning that an analysis may have
been affected in this way is signalled, the results having the appearance of being normal in every
respect. We stress that it is not the interpretation of dimensions in ecological terms that is at issue
here; we regard the axes simply as a convenient coordinate system in relation to which to study
the sample after projection. The point of general interest here is that scaling methods applied to
heterogeneous data after centering are likely to yield misleading or incorrect results and hence to
lead to confusion rather than to insight.
Observe that the notion of correcting for row or column effects (or both), that is of
centering, for other than a homogeneous sample is not well-founded. Evidently, a very strong
case exists in terrestrial ecology for probing the data structure prior to the application of any
scaling method, and for centering the data and proceeding only where the homogeneity of the
sample is beyond question. We note also that scaling methods that have proved useful with
complex data distributions of the kind in question do exist, an we tum now to briefly consider two
of these.
Non-centered principal component analysis. Effects of centering and non-centering on the
principal component analysis of heterogeneous ecological data have been studied by Noy-Meir
(1971, 1973). Noy-Meir was able to demonstrate that, by non-centering and varimax rotation of
549

extracted non-centered components, the productive analysis of heterogeneous data becomes


perfectly feasible. Non-centered principal components after rotation possess two desirable
properties. First, they provide an indication of the presence and sharpness of any disjunction that
may be present in the data. Second, they minimize interference between variation on different
sides of any such disjunction. An analysis of this type yields noncentered components of two
kinds: unipolar and bipolar. Unipolar components identify and characterize the salient discrete
elements (clusters) that may be present; bipolar components specify and account for continuous
variation within particular clusters. Clusters are allowed to overlap. The resulting gain in clarity
and interpretability, compared with the results of the usual centered analysis applied to the same
data, is immense. Depending on the complexity of the vegetation, as many as ten or even twenty
non-centered components may be required to adequately account for the data. Results cannot
therefore be summarized by a single two-dimensional or three-dimensional display. The
additional dimensions may nevertheless help to clarify the meaning of the structure as a whole, as
Wish and Carroll (1982, p. 322) have remarked. A second limitation is that there is some
arbitrariness as to the number of noncentered components to retain for rotation. These limitations,
it seems, are the price to be paid for an analysis of complex data. The robustness of non-centered
principal component analysis against heterogeneity suggests that the properties of noncentered
solutions in a variety of other contexts may repay attention. These include (a) noncentered
versions of other classical multivariate methods; (b) noncentered forms of multidimensional
scaling.
In applications, noncentered components analysis has been found to yield a weath of
insight into the composition, variation and structure of vegetation (Noy-Meir 1971, Gittins and
Ogden 1977). Much can be accomplished by means of two-dimensional or three-dimensional
scattergrams of the principal components in various combinations, despite the comparatively large
number of components necessary in connection with really large-scale surveys. Andrews (1972)
method for plotting high-dimensional data may prove to be a useful auxiliary device where large
numbers of components are involved.
. Canonical variate analysis. The classical technique of canonical variate analysis (Fisher
1936, Rao 1948) is a procedure for mapping ap-variable sample whose units fall naturally into g
~ 2 prespecified groups or subsamples into a low-dimensional vector-space in such a way that
relations among the groups are clearly revealed. For sensible results, it is desirable that the data
not deviate too far from certain norms. In particular, the point distribution of variables within
groups should be reasonably symmetric, not too long-tailed and uncontaminated by outliers, while
the covariance structure should be reasonably stable across groups. Procedures for examining the
data prior to analysis in order to establish whether these requirements are satisfied are available
(e.g., Campbell 1980, 1981). In the context of vegetation survey, the requirements mentioned
may be rather strong. Modifications of the usual analysis have, however, been developed
(Campbell 1982, 1984) which allow the specifications placed on the data to be relaxed; this
550

widens the applicability of the analysis. With these modifications, canonical variate analysis may
be useful in connection with vegetation surveys and investigations of vegetation succession where
the discrete element in the data is clearly dominant and where global rather than local variation -
variation between groups as distinct from within groups - is of overriding interest
Two further developments in the spirit of Campbell's suggestions for widening the
applicability of canonical variate analysis deserve mention. Digby and Gower (1981) have
described a robust form of canonical variate analysis, called canonical coordinate analysis (see
Gower, this volume), which is both distribution-free and for which the requirement for a relatively
stable covariance structure is entirely dispensed with. In this robust version, canonical variate
analysis is applicable under very general conditions. The second development is due to Hawkins
and his co-workers (Hawkins and Merriam 1974, Hawkins and Ten Krooden 1979) and is
directed towards extending the use of the method to situations where discrete communities or other
comparable sample-groups cannot be recognised at the outset. Their proposal uses constrained
cluster analysis to create discrete groups of neighbouring samples in abstract or geographical
space, to which canonical variate analysis or a robust variant thereof may be applied in the usual
way. The appeal of this development will be self-evident in the context of data distributions that
are neither striclty continuous nor striclty discontinuous but are somewhere between the two.
Heterogeneity may be dealt with in yet other ways. We mention finally a two-stage
procedure for its analysis in which clustering and scaling each have a role, and which in some
ways is reminiscent of an extreme form of Noy-Meir's procedure. The data are first clustered by
means of a suitable standard clustering procedure (e.g., Hartigan 1975). Some at least of the
resulting clusters may be expected to be substantially homogeneous, a point that is readily checked
by means of projection pursuit or of a Q-Q probability plot The internal structure of at least the
larger of the homogeneous clusters thus established may then be examined by means of any
scaling method appropriate to the problem in hand. This two-stage strategy, though more
cumbersome than either procedure described above, does provide a convenient and informative
means of dealing with large, heteogeneous data sets, which might otherwise prove difficult to deal
with.

s. CONCLUSIONS

Recent advances in scaling theory promise to sharpen the analysis of ecological data.
Conceptual advances in the comprehension of vegetation may follow. Salient characteristics of
ecological data are their high dimensionality, variables that fall naturally into two or more sets, and
distributions that are composites of some unknown mixture of component multivariate
distributions. Characteristically, the data matrix is a partitioned matrix A = [All A21 ... 1Am]
whose constituent observations (rows) are not identically distributed. Our attention has been
551

confmed to three related families of techniques for analyzing arrays of this kind: methods for linear
reduction of dimensionality, for nonlinear reduction of dimensionality, and for nonlinear
multivariate analysis. The approaches represented are united by a commom theme, namely they
are all concerned, directly or indirectly, with scaling high-dimensional data. Methodological
developments of several other kinds were presented at the workshop. These include conditional
and constrained clustering, fractal theory, spatial analysis, qualitative path analysis, and the duality
diagram, some or all of which seem likely to be useful for analyzing ecological data. Our
discussion focused principally on scaling methods because it is these which in our view seem
likely to have the greatest impact on terrestrial plant ecology in the forseeable future and because
the benefits and limitations of these methods are the most tangible at present. Unquestionably,
however, there is an important place for approaches other than scaling.
Of the three families of methods, classical multivariate analysis provides techniques for
linear reduction of dimensionality which are conceptually simple and computationally
straightforward. But with these methods are also associated restrictions on the data for analysis
which all too often prove stringent or even unrealistic in practice. Multidimensional scaling
provides varied opportunities under much more general conditions for nonlinear reduction of
dimensionality, for analyzing three-way arrays, and for revealing relations between samples,
species and external variables of several kinds. Nonlinear multivariate analysis opens the way for
the scaling of nonmetric data by means of classical linear and bilinear methods, so dispensing with
he need for quantitative data in terrestrial ecology. Classical and nonlinear multivariate analysis
both contain methods which specifically address the question of the relatedness of variables of
different kinds, and for this reason more closely match the prime substantive goal of a large and
important class of ecological endeavours than multidimensional scaling. Indeed, variables almost
always playa more prominent role in multivariate analysis than in multidimensional scaling, a
point of some importance in selecting a method for a given purpose. Nonlinear multivariate
analysis and multidimensional scaling are both appreciably more demanding computationally than
classical multivariate analysis. Moreover, none of the three families is conspicuously rich in
methods for dealing with data distributions that are complex mixtures of several underlying
multivariate distributions.
Much of the new methodology discussed rests heavily on notions or techniques of classical
multivariate analysis. Thus, the robust scaling procedures of Campbell (1982, 1984) and of
Digby and Gower (1981) can each be thought of as direct extensions of the classical method of
canonical variate analysis. The noncentered principal component analysis of Noy-Meir (1971,
1973) is another case of the same general kind. In the same spirit also is the nonlinear multivariate
analysis of Gifi (1981), which amounts to nothing less than a generalization of virtually the whole
of classical multivariate analysis to encompass data whose measurement-level characteristics are
very general indeed. There are also close affinities between multidimensional scaling and classical
multivariate analysis, though the derivation of a particular technique in each case is usually quite
552

different Meulman (1986) gives a clear exposition of the relationships from a distance-geometric
point of view. The impetus for the above and other similar developments has been a need for
methods that are free from the tightly-specified restrictions associated with standard methods,
which are often unrealistic or inconvenient in practice. In this way, the realities of ecological data
are respected and the applicability of classical methods widened. The simple device of
noncentering in the context of the hitherto somewhat intractable problem posed by heterogeneity,
illustrates just how much can be accomplished in this direction. A need for scaling methods
suitable for partitioned, heterogeneous data sets, remains.
The second-order linear methods of classical multivariate analysis constitute an extremely
restricted class of procedures; they are methods whose dependence on the data is channeled
entirely through the covariance matrix. Algebraically, they represent little more than different
aspects of a single matrix operation, namely singular value decomposition. Furthermore, they are
usually unable to handle outliers or other disturbances in the data distribution. It would be
unreasonable to expect such a narrowly-based class of methods to provide solutions to all or
indeed most problems posed by multiresponse data. Thus, it is likely to be advantageous to
develop a variety of other, perhaps quite different approaches. Several of the methodological
developments mentioned at the outset of the present section are of this kind. In contrast to
methods whose development can be traced to the 1930's as variations on a single theme, methods
which would be inconceivable without the aid of the high-speed computer would be especially
appealing. There is relevant work, some of which is directed towards the provision of
information-rich graphical displays similar in spirit to those of multivariate analysis and
multidimensional scaling (Friedman and Tukey 1974, Friedman and Rafsky 1983).
Reference to centering, above, serves as a reminder of a serious drawback of scaling
methods generally: they are scale-dependent. Where substantive or other considerations provide
clear guidelines, the dependence can be used to good advantage. In practice, however, arbitrary
choices for the unit and origin of the scale often have to be made and it is particularly important
then to bear the consequences of these choices in mind when interpreting results.
In turning to consider the likely impact of scaling methods generally on terrestrial plant
ecology, it will be instructive to reflect first on the success actually achieved by the use of
classical, linear multivariate analysis in this sphere. The underlying theory has been well
understood for something like fifty years, while the methods themselves have been
computationally feasible for 25 years or so. Moreover, linear methods have a fairly extensive
history of application in terrestrial plant ecology. Thus, there is an adequate foundation on which
assessment can be based. Regrettably, it is plain from the ecological literature that classical
multivariate analysis is poorly understood by terrestrial ecologists. Familiarity with the subject
rarely extends beyond a superficial acquaintance with one or two methods. As a result, all too
often a method, chosen in relation to some specified ecological purpose, fails to match the declared
purpose. Indeed, considerations such as the purpose of an investigation frequently seems to play
553

little or no part in the selection process. Even where a plausible method is used, deficiencies in its
implementation and in the interpretation and reporting of results are often apparent. These are very
disturbing matters. Further, they are often exacerbated in practice by uncritical reliance on widely
distributed computer programs or program packages. Convincing applications of classical
multivariate analysis in terrestrial plant ecology are accordingly very few.
Altogether, there is an alarming gap between theory and practice in this area, so much so
that the impact of multivariate analysis can scarcely be regarded as beneficial. Similar views have
been expressed by many workers (e.g., Jeffers 1972, Innis 1979, Levin 1980, Van Valen 1985,
Freeman 1987). Lindley (1984) has remarked that the success of multivariate analysis in
applications generally has been small in relation to the body of theory. Gnanadesikan and
Kettenring (1984) suggest reasons as to why this should be so. Such shortcomings in the use of
statistical methods are by no means confmed to multivariate analysis or to terrestrial ecology; they
are simply one aspect of a much more pervasive malaise (see Underwood 1981, Preece 1982,
1986, Gnanadesikan and Kettenring 1984, Hamill 1985). The spread of methodological
innovations from the research laboratory where they were developed to the applied scientist is
known to be slow (Jeffers 1971, Bentler 1986). Gani (1985) has estimated the time lag in
question to be of the order of 20 to 30 years, which in the light of ecological experience may be a
modest underestimate. If this argument is accepted, then it would be prudent to recognise that a
lenghty delay is in order before classical, linear methods come to be employed effectively in
community ecology in a general way. Yet it seems to us that if terrestrial plant ecology is to
become a science it must be more than a collection of anecdotes. This is precisely where
mathematics in our view has a contribution to make. Algebraic models provide the only way of
dealing adequately with the complexity of plant communities and ecosystems; they are able to
abstract the essential elements of a problem and to identify the minimum number of dimensions or
parameters necessary to describe such complex systems.
Before turning to consider the likely impact of recent developments in multidimensional
scaling and nonlinear multivariate analysis, it is worth recalling the elementary and unified
algebraic foundation of classical linear multivariate analysis (Krzanowski 1971) and its
comparatively simple computational requirements. At the same time it is well to be aware that
linear models are delicate tools to be applied with care and good sense if trustworthy results are to
be achieved. Multidimensional scaling and nonlinear multivariate analysis are both less
straightforward algebraically and more demanding computationally than classical multivariate
analysis. Further, unlike classical multivariate analysis, multidimensional scaling and nonlinear
multivariate analysis have been generally available for perhaps only ten years or less, and their
impact on terrestrial plant ecology to-date has been negligible. Using the record of classical
multivariate analysis in terrestrial plant ecology, we shall argue that the new methodology carries
with it stringent responsibilities if it is to be used productively. De Leeuw (1987a) has remarked
in connection with nonlinear multivariate analysis that nonlinear methods require even more care
554

and even more expert knowledge than standard linear methods. This observation is no less true of
multidimensional scaling (cf. Section 2.2 above). Given that classical multivariate analysis has yet
to be generally applied to useful advantage in terrestrial ecology, we have no assurance that the
still more demanding new methodology will be properly used. The existence of pertinent methods
is no guarantee of their sensible use, far from it. Success in endeavours of this kind is to be
expected only where methodological innovations are accompanied by a commensurate effort on
the part of practitioners to make sure that they understand the nature and properties of the methods
available and to acquire the skills necessary for their successful implementation. Unless the effort
is made, there is every danger that the opportunities afforded by the new methodology, far from
leading to new ecological insights, will paradoxically only further widen the already alarming gap
between theory and practice in vegetation ecology.
The often suggested course of seeking the advice of a professional statistician
(Gnanadesikan and Kettenring 1984) does not work well in practice; statisticians with the requisite
expertise, time and interest are simply too few and too far between. This certainly has been the
case with ecological applications of classical, linear methods, as the record clearly shows. There
are no grounds for supposing that matters will improve in the foreseeable future. In any case,
while a statistician may be asked for guidance, he cannot be expected to make good fundamental
deficiencies in the content and structure of ecological research programs. It is for ecologists to
work out how this is to be done. Steps which in our view would go some way towards rectifying
shortcomings in current applications of scaling methods in terrestrial ecology include the
following. First, to recognise that field observations in ecology are vector-value quantities
(strictly partitioned vectors). It follows immediately from the nature of field observations that the
description and analysis of vegetation are largely algebraic matters. Second, the need to check
data for irregular features before embarking on scaling. A variety of measures are available for
this purpose (Gnanadesikan 1977, Friedman and Stuetzle 1982). Third, the cardinal importance
of selecting a method of data analysis that is suitable for the purpose in hand. There are two
aspects to this question. Ensuring (a) that a chosen method is properly matched to the declared
substantive goal; and (b) that assumptions explicit and implicit in the method concerning estimates
of fitted quantities are satisfied by the data for analysis. Fourth, the need to carefully consider the
unit and origin of the scale of measurement and their effect on the outcome of the analysis.
Choices should be based on ecological considerations whenever possible. Noy-Meir (1973) and
Noy-Meir et al. (1975) provide useful guidelines. There are numerous other aspects of good
statistical data analysis: appreciation that realistic research goals can be set only by familiarity with
and reference to the range of methods available for their attainment, that much can be done in the
design stage to ensure that subsequent data analysis will be manageable and efficient, that large
samples are advantageous in increasing the precision of parameter estimates and that the
measurement level of the data to be collected is best decided with this point in mind. Data
re-expression before or during analysis and computer-intensive resampling schemes following
555

analysis can do much to sharpen both the results and the conclusions drawn from them. Until
principles of statistical data analysis such as these are widely used to guide and inform the use of
scaling methods in terrestrial ecology, misgivings about the worth of much numerical work in this
field are bound to persist.
In short, a large and growing body of scaling methods for aiding the comprehension of
terrestrial vegetation exists. There is nevertheless room for further methods whose development is
guided by greater attention to the salient characteristics of ecological data. On the other hand, the
availability of the new methodology has unfortunately not been matched by a commensurate
increase in our understanding of vegetation. Far from it. In our view, the impact of scaling
methods generally in community ecology has not been beneficial, principally because of the
existence of an alarming gap between theory and practice in this area. Progress in describing and
comprehending vegetation will result, in our view, only when plant ecologists are equipped to use
existing and emerging methods with insight and ingenuity.

REFERENCES

Andrews, D.F. 1972. Plots of high-dimensional data. Biometrics 28: 125-136.


Begon, M., J.L. Harper, and C.R Townsend. 1986. Ecology: individuals, populations and
communities. Blackwell, Oxford.
Bentler, P.M. 1986. Structural modeling and Psychometrika: an historical perspective on growth
and achievements. Psychometrika 51: 35-51.
Bradu, D., and K.R Gabriel. 1978. The biplot as a diagnostic tool for models of two-way tables.
Technometrics 20: 47-68.
Burg, E. van der. 1985. CANALS user's guide. Internal Report UG-85-05, Department of Data
Theory, University of Leiden.
Burg, E. van der, and 1. de Leeuw. 1983. Nonlinear canonical correlation. Brit. J. Mathem. Stat.
Psychol. 36: 54-80.
Campbell, N.A. 1980. Robust procedures in multivariate analysis. I. Robust covariance
estimation. Appl. Stat. 29: 231-237.
Campbell, N.A. 1981. Graphical comparison of covariance matrices. Aust. J. Stat. 23: 21-37.
Campbell, N.A. 1982. Robust procedures in multivariate analysis. II. Robust canonical variate
analysis. Appl. Stat. 31: 1-8.
Campbell, N.A. 1984. Canonical variate analysis with unequal covariance matrices:
generalizations of the usual solution. J. Int. Assoc. Math. Geology 16: 109-124.
Carroll, J.D., and J.B. Kruskal. 1978. Scaling, multidimensional, p. 892-907. In W.H. Kruskal
and J.M. Tanur [eds.] International encyclopedia of statistics, Vol. 2. Free Press, New
York.
Carroll, J.D., and S. Pruzansky. 1984. The CANDECOMP-CANDELINC family of models and
methods for multivariate data analysis, p. 372-402. In H.G. Law, W. Snyder, J. Hattie, and
RP. McDonald [eds.] Research methods for multimode data analysis. Praeger, New York.
Cox, C., and K.R Gabriel. 1982. Some comparisons of biplot display and pencil-and-paper
exploratory data analysis methods, p. 45-82. In RL. Launer and A.F. Siegel [eds.] Modern
data analysis. Academic Press, New York.
Degerman, R. 1970. Multidimensional analysis of complex structure: mixture of class and
quantitative variation. Psychometrika 35: 475-491.
de Leeuw, J. 1984. The Gifi system of nonlinear multivariate analysis, p. 415-424. In E. Diday et
al. [eds.] Data analysis and informatics, IV. North Holland, Amsterdam.
556

de Leeuw, J. 1987a. Nonlinear multivariate analysis with optimal scaling, p. 157-187. In this
volume.
de Leeuw, J. 1987b. Path analysis with optimal scaling, p. 381-404. In this volume.
de Leeuw, J., and J. Meulman. 1986. Principal component analysis and restricted
multidimensional scaling. In W. Gaul and M. Schader [eds.] Classification as a tool of
research. North Holland, Amsterdam.
Devlin, S.J., R. Gnanadesikan, and J.R. Kettenring. 1981. Robust estimation of dispersion
matrices and principal components. J. Amer. Stat. Ass. 76: 354-362.
Digby, P.G.N., and J.C. Gower. 1981. Ordination between- and within-groups applied to soil
classification, p. 63-75. In D.F. Merriam [ed.] Down to earth statistics: Solutions looking
for geological problems. Syracuse University Geological Contributions.
Fisher, R.A. 1936. The use of multiple measurements in taxonomic problems. Ann. Eugen.
Lond. 7: 179-188.
Freeman, G.H. 1987. Letter to the editor. Nature 325: 656.
Friedman, J.H., and L.C. Rafsky. 1981. Graphics for the multivariate two-sample problem. J.
Amer. Stat. Ass. 76: 275-295.
Friedman, J.H., and L.C. Rafsky. 1983. Graph-theoretic measures of multivariate association
and prediction. Ann. Stat. 11: 377-391.
Friedman, J.H., and W. Stuetzle. 1982. Projection pursuit methods for data analysis, p. 123-147.
In R.L. Launer and A.F. Siegel [eds.] Modem data analysis. Academic Press, New York.
Friedman, J.H., and J.W. Tukey. 1974. A projection pursuit algorithm for exploratory data
analysis. IEEE Transactions on Computers C-23: 881-890.
Gabriel, K.R. 1971. The biplot graphic display of matrices with application to principal
component analysis. Biometrika 58: 453-467.
Gabriel, K.R. 1981. Biplot display of multivariate matrices for inspection of data and diagnosis,
p. 147-173. In V. Barnett [ed.] Interpreting multivariate data. Wiley, Chichester.
Gabriel, K.R 1982. Biplot, p. 263-271. In S. Kotz and N.L. Johnson [eds.] Encyclopaedia of
the statistical sciences, Vol. 1. Wiley, New York.
Gani, J. 1985. In L. Rode and T. Speed [eds.] Teaching of statistics in the computer age.
Chartwell Bratt, Bromley. .
Geer, J.P. van der. 1984. Relations among k sets of variables. Psychometrika 49: 79-94.
Gifi, A. 1981. Nonlinear multivariate analysis. Department of Data Theory, University of Leiden,
Leiden.
Gittins, R., and J. Ogden. 1977. A reconnaissance survey of lowland tropical rain forest in
Guyana. [Unpublished manuscript]
Gnanadesikan, R 1977. Methods for statistical data analysis of multivariate observations. Wiley,
New York.
Gnanadesikan, R, and J.R. Kettenring. 1984. A pragmatic review of multivariate methods in
applications, p. 309-337. In H.A. David and H.T. David [eds.] Statistics: an appraisal. Iowa
State Univ. Press, Ames.
Gower, J.C. 1966. Some distance properties of latent root and vector methods used in
multivariate analysis. Biometrika 53: 325-338.
Gower, J.e., and P. Legendre. 1986. Metric and Euclidean properties of dissimilarity
coefficients. J. Class. 3: 5-48.
Greenacre, M.J. 1984. Theory and applications of correspondence analysis. Academic Press,
New York.
Greenacre, M.J., and L.G. Underhill. 1982. Scaling a data matrix in a low-dimensional Euclidean
space, p. 183-268. In D.M. Hawkins [ed.] Topics in applied multivariate analysis.
Cambridge Univ. Press, Cambridge.
Hamill, L. 1985. On the persistence of error in scholarly communication: the case of landscape
aesthetic. Can. Geographer 29: 270-273.
Hartigan, J.A. 1975. Clustering algorithms. Wiley, New York.
Hawkins, D.M., and D.F. Merriam. 1974. Zonation of multivariate sequences of digitized
geologic data. J. Int. Assoc. Math. Geology 6: 263-269.
Hawkins, D.M., and J.A. Ten Krooden. 1979. Zonation of sequences of heteroscedastic
multivariate data. Computers and Geosciences 5: 189-194.
557

Heiser, W.J., and J. Meulman. 1983a. Analyzing rectangular tables by joint and constrained
multidimensional scaling. Journal of Econometrics 22: 139-167.
Heiser, W.J., and J. Meulman. 1983b. Constrained multidimensional scaling, including
confIrmation. Applied Psychological Measurement 7: 381-404.
Horst, P. 1961. Relations among m sets of measures. Psychometrika 26: 129-150.
Hotelling, H. 1935. The most predictable criterion. J. Educ. Psychol. 26: 139-142.
Hotelling, H. 1936. Relations between two sets of variates. Biometrika 28: 321-377.
Innis, G.S. 1979. Letter to the editor. Science 204: 242.
Israels, A. 1984. Redundancy analysis for qualitative variables. Psychometrika 49: 331-346.
Jeffers, J.N.R. 1971. The challenge of modern mathematics. In J.N.R. Jeffers [ed.J Mathematical
models in ecology. Blackwell, Oxford.
Jeffers, J.N.R. 1972. The statisticians' role in the environmental sciences. The Statistician 21:
3-17.
Kenkel, N.C. 1986. Structure and dynamics of jack pine stands near Elk Lake, Ontario: a
multivariate approach. Can. J. Bot. 64: 486-497.
Kenkel, N.C., and L. Orl6ci. 1986. Applying metric and nonmetric multidimensional scaling to
ecological studies: some new results. Ecology 67: 919-928.
Kettenring, J.R. 1971. Canonical analysis of several sets of variables. Biometrika 58: 433-451.
Kruskal, J.B. 1977. The relationship between multidimensional scaling and clustering, p. 17-44.
In J. van Ryzin [ed.J Classification and clustering. Academic Press, New York.
Kruska1, J.B., and J.D. Carroll. 1969. Geometric models and badness-of-fit functions. In P.R.
Krishnaiah [ed.J Multivariate analysis, Vol. II. Academic Press, New York.
Krzanowski, W.J. 1971. The algebraic basis of classical multivariate methods. The Statistician
20: 51-61.
Lebart, L., A. Morineau, and K.M. Warwick. 1984. Multivariate descriptive analysis. Wiley,
New York.
Levin, S.A. 1980. Mathematics, ecology and ornithology. The Auk 97: 422-425.
Lindley, D.V. 1984. Prospects for the future: the next 50 years. J. Roy. Stat. Soc., Ser. A 147:
359-367.
Meu1man, J.J. 1986. A distance approach to nonlinear multivariate analysis. DSWO Press,
Leiden.
Noy-Meir,1. 1971. Multivariate analysis of the semi-arid vegetation in southeastern Australia:
nodal ordination by component analysis, p. 159-193. In N.A. Nix [ed.J Quantifying
ecology, Proc. Ecol. Soc. Aust. 6.
Noy-Meir, 1. 1973. Data transformations in ecological ordination. 1. Some advantages of
non-centering. J. Ecol. 61: 329-341.
Noy-Meir, I. 1974a. Multivariate analysis of the semiarid vegetation in southeastern Australia. II.
Vegetation catenae and environmental gradients. Aust. J. Bot 22: 115-140.
Noy-Meir,1. 1974b. Catenation: quantitative methods for the definition of coenoclines. Vegetatio
29: 89-99.
Noy-Meir, 1., D. Walker, and W.T. Williams. 1975. Data transformations in ecological
ordination. II. On the meaning of data standardization. J. Ecol. 63: 779-800.
Preece, D.A. 1982. The design and analysis of experiments: what has gone wrong? Utilitas
Mathematica 21A: 201-244.
Preece, D.A. 1986. illustrative examples: illustrative of what? The Statistician 35: 33-44.
Rao, C.R. 1948. The utilization of multiple measurements in problems of biological classifIcation.
J. Roy. Stat. Soc., Ser. B 10: 159-203.
Sibson, R. 1979. Studies in the robustness of multidimensional scaling: perturbational analysis of
classical scaling. J. Roy. Statist. Soc., Ser. B 41: 217-229.
Sibson, R., R. Bowyer, and C. Osmond. 1981. Studies in the robustness of multidimensional
scaling: Euclidean models and simulation studies. J. Stat. Compo Simul. 13: 273-296.
Smith, R.E., N.A. Campbell, and J.L. Perdrix. 1983. Identification of some Western Australian
Cu-Zn and Pb-Zn gossans by multi-element geochemistry, p. 109-126. In R.E. Smith [ed.J
Geochemical exploration in deeply weathered terrain. C.S.1.R.O. Institute of Energy and
Earth Sciences, Division of Mineralogy, Floreat Part.
Stevens, S.S. 1962. Mathematics, measurement and psychophysics. In S.S. Stevens [ed.J
Handbook of experimental psychology. Wiley, New York.
558

Takane, Y. 1985. The nonmetric data analysis, p. 314-318. In S. Kotz and N.L. Johnson [eds.]
Encyclopaedia of statistical sciences, Vol. 6. Wiley, New York.
Takane, Y., F.W. Young, and J. de Leeuw. 1977. Nonmetric individual differences
multidimensional scaling: an alternative least squares method with optimal scaling features.
Psychometrika 42: 7-67.
Tyler, D.E. 1982. On the optimality of the simultaneous redundancy transformations.
Psychometrika 47: 77-86.
Underwood, AJ. 1981. Techniques of analysis of variance in experimental marine biology and
ecology. Oceanogr. Mar. BioI. Ann. Rev. 19: 513.
van den Wollenberg, A.L. 1977. Redundancy analysis. An alternative for canonical correlation
analysis. Psychometrika 42: 207-219.
van Rijckevorsel, J., B. Bettonvil, and J. de Leeuw. 1985. Recovery and stability in nonlinear
principal component analysis. Internal Report RR-85-21, Department of Data Theory,
University of Leiden.
Van Valen, L.M. 1985. Letter to the editor. Nature 314: 230.
Verdegaal, R. 1986. OVERALS. Department of Data Theory, University of Leiden.
Walter, H., and S.-W. Breckle. 1985. Ecological systems of the geobiosphere. I. Ecological
principles in global perspective. Springer-Verlag, Berlin.
Webb, D.A. 1954. Is the classification of plant communities either possible or desirable? Botanisk
Tidsskrift 51: 362-370.
Weinberg, S.L., J.D. Carroll, and H.S. Cohen. 1984. Confidence regions for INDSCAL using
the jackknife and bootstrap techniques. Psychometrika 49: 475-491.
Wish, M., and J.D. Carroll. 1982. Multidimensional scaling and its applications. In P.R.
Krishnaiah and L.N. Kanal [eds.] Handbook of statistics, Vol. 2. North Holland,
Amsterdam.
Young, F.W. 1981. Quantitative analysis of qualitative data. Psychometrika 46: 347-388.
Young, F.W., J. de Leeuw, and Y .. Takane. 1980. Quantifying qualitative data. In E.D.
Lantermann and H. Feger [eds.] Similarity and choice. Hans Huber Verlag, Wien and Bern.
NOVEL STATISTICAL ANALYSES IN TERRESTRIAL ANIMAL ECOLOGY:
DIRTY DATA AND CLEAN QUESTIONS

D. Simberloff* (Chairman), P. Berthet, V. Boy, S. H. Cousins,


M.-J. Fortin, R. Goldburg, L. P. Lefkovitch, B. Ripley,
B. Scherrer, and D. Tonkyn

*Department of Biological Science, Florida State University,


Tallahassee, FL 32306-2043 USA

INTRODUCTION

The discipline of terrestrial animal ecology has developed


somewhat differently from other branches of ecology (~.~., plant
ecology, marine ecology) and concerns itself with somewhat dif-
ferent questions. At the outset, it is useful to consider these
differences briefly because they may color the ways in which we
view the promise of new numerical techniques.

First, there is a strong historical component. Plant and


animal ecology developed from rather different traditions during
the late 19th and first half of the 20th centuries (Simberloff
1980). Plant ecologists, soil ecologists, and limnologists were
almost wholly concerned with the community and ecosystem levels
of organization and perceived such a high degree of order and
pattern at these levels that they viewed the community as a
"superorganism" and argued that the study of individuals and
populations is not properly included in the science of ecology.
Thus, plant and aquatic ecologists tended to deal with very large
masses of static data, and it seemed natural to seek to reduce
the size and complexity of the data base. Ordination allows
exactly this simplification. In the ecology of birds, mammals,
and insects, on the other hand, there coexisted with community
ecology a strong tradition of studying single populations, pairs
of them (such as a predator and its prey), or other small subsets
of the community. Disappointment with the fruits of the I.B.P.,

NATO ASI Series, Vol. G14


Developments in Numerical Ecology
Edited by P. and L. Legendre
© Springer-Verlag Berlin Heidelberg 1987
560

with its programs explicitly aimed at the ecosystem, especially


with the concept of trophic level as an organizing principle,
solidified the tendency of terrestrial animal ecologists to focus
on small sets of species (Cousins 1985). Some would argue that
this focus is an unwarranted retreat and that we should be more
ambi tious, moving on to mUlti-species models (~.9:.., May 1979).
However, such pleas have not (yet, at least) caused much of a
shift in the objects that terrestrial animal ecologists study.
The modelling framework of this tradition was mathematical at the
outset, often based on the Lotka-Vol terra equations, and began
wi th deduction of the dynamic behavior of a system based on
demographic and behavioral traits of the species. This approach
is still dominant.

Another characteristic of terrestrial animal ecology that


seems to distinguish the discipline from other areas of ecology
is that, rightly or wrongly, terrestrial animal ecologists seem
usually able to generate hypotheses rather easily, without the
aid of preliminary data reduction or exploratory data analysis.
They are inclined to move rather directly towards mechanistic
explanations of phenomena and patterns, often by formal statisti-
cal means such as testing hypotheses. Terrestrial animal ecolo-
gists are certainly no brighter than ecologists of any other
stripe; rather it seems likely that the frequent focus on one or
a few species makes it easier to see patterns. Also, terrestrial
animals generally operate on a faster time scale than do plants;
they move and behave, whereas plants only grow or die and often
do both slowly. Animal movements and/or behaviors often suggest
hypotheses, frequently necessitating dynamic models. Finally,
terrestrial organisms are generally easier to sample and to
observe than are aquatic ones. Small wonder that patterns are
more easily perceived.

ORDINATION

As noted by Gower (this volume), ordination and the asso-


ciated scaling are typically used not to test hypotheses or to
561

estimate parameters, but rather as an exploratory tool, a con-


venient graphical way to depict relationships among entities.

Because terrestrial animal ecology has a wide repertoire of


hypotheses, it appeared to us that such techniques are not of the
highest priority to terrestrial animal ecologists and are
occasionally misguided. They may conveniently represent multidi-
mensional data for illustrative purposes but do not often seem to
generate new hypotheses or insights. This problem is manifested
in several ways. For example, in certain situations in which one
wishes to cluster sites by which species they contain (or by
abundances of species), logistic regression or contingency-table
methods such as log-linear models are appropriate ways to test a
pre-existing hypothesis--certain sites should form natural
clusters. As another example, principal components analysis has
been used to reduce the dimens ions of a space, after which the
relative positions of two sets of samples in the reduced space
are examined to see whether one variable (~ . .<l., pesticide or
pollutant) affects position. However, a multivariate regression
or analysis of variance can usually answer this question
directly.

Another problem that arises not infrequently in terrestrial


animal ecology is that an ordination is performed without even
general questions in mind and so fails to inspire hypotheses or
new, more specific questions. In other words, the wide availabi-
lity of computer programs to perform ordinations as well as the
ease of interpretation of some ordinations has occasionally led
to ordination for its own sake. This is not to say, of course,
that there are no uses for peA or other ordination techniques in
terrestrial animal ecology. For example, in a large-scale
sampling scheme in which many variables are measured at many
sites, it may be possible to reduce the expense of the research
program by using ordination to suggest which sites and/or
variables are redundant. However, this is a far cry from
generating hypotheses or suggesting new insights.
562

Nor did it appear to us that the particular advanced tech-


niques described by Gower, Carroll, Escoufier, de Leeuw, and
Heiser (this volume)--procrustes analysis, non-metric scaling,
multidimensional scaling with stress minimization, versatility in
the ability to weight particular individuals or to choose the
distance measure between individuals, unfolding, and the use of
multivariate observations--would lead to greater utility of ordi-
nation in terrestrial animal ecology. We were then naturally led
to ask why an approach that has a venerable tradition in other
areas of ecology (~.9.., plant community ecology) and in other
sciences (~.9.., psychology) seems not to have made much inroad
into the ecology of terrestrial animals. Several answers, not
necessarily mutually exclusive, suggest themselves.

One is the historical development of terrestrial animal eco-


logy noted above with its emphasis on deductive, dynamic models
of one or, at most, a few species. Another is the fact alluded
to in the introduction that terrestrial animal ecologists typi-
cally have specific hypotheses in mind when they collect and exa-
mine a data set.

A third reason may be that the nature of a "site" in the


species-by-site matrix (the usual starting point of ecological
ordination) is often problematic for terrestrial animal ecolo-
gists. First, sites often have no objective boundaries and loca-
tions. They may be quadrats, placed in random or stratif ied
fashion, that could as well have been of different sizes or loca-
tions. The effect of quadrat size is of particular concern, as a
number of ecological patterns can be erected, obliterated, or
changed simply by the use of different sized quadrats (Pielou
1977). Second, many terrestrial animal species are highly
mobile. For many taxa the species and abundances found at a site
during one census represent only a snapshot or sample from an
unknown distribution. Of course there are instances in which
sites are well-defined even for mobile animals--host plant indi-
viduals for phytophagous insects, tree holes for birds that nest
in them, hosts for parasites or inquilines, etc.--but these are
likely a minority. It is probably a reflection of the rather
563

uncertain status of "site" for terrestrial animal ecologists that


ordination and scaling have found their greatest application when
there is a set of naturally and objectively defined sites, often
large ones, such as islands (including habitat islands such as
mountaintops). Thus various ordination procedures have been used
in biogeographic analyses that seek to show relationships among
si tes in terms of their biotae or among taxa in terms of their
biogeographic distributions.

There also is a difference between psychometric uses of


scaling and ordination and potential uses in terrestrial animal
ecology: psychometricians for the most part design their
var iables, perhaps to descr ibe psychological concepts for which
there may be ~ priori intuition, whereas ecologists usually can
perceive a set of variables as already given to them. We observe
and measure physical factors like temperature and humidity as
well as biological factors such as the presence or absence of
other species. Even biomass is a physically based composite
var iable. Furthermore, we usually have preliminary hypotheses
about the underlying relationships of these variables. Also,
animal abundances may interact in ways characteristically dif-
ferent from human psychological traits. Competing species may
affect one another's abundances, as can a predator and prey or
parasite and host.

So far, our assessment sounds quite pessimistic, but we cer-


tainly do not rule out the possibility of exciting advances in
terrestrial animal ecology through advanced ordination methods.
It would be a rash individual in as young and controversial a
field as animal ecology who would predict with assurance that no
exciting advances will come from a particular direction.
Furthermore, no group as small as this can hope to have both
expertise in and insight into all areas of terrestrial animal
ecology: there may be obvious applications that are simply out-
side our ken. In particular, we have no ethologists among us,
and it is quite likely that effective uses of ordination tech-
niques would arise in the area of behavioral ecology. After all,
much of the primary literature on ordination is in psychology.
564

One such direction that might be fruitfully explored was


suggested by the INDSCAL analysis described by Carroll (this
volume) and arises in the intersection of ecology and ethology.
Closely related phytophagous insect species often have differing
relative preferences for host plant species, and there is every
reason to believe that different individuals would have different
relative preferences as well. It is also known that different
individual plants within a species are of different attrac-
tiveness to insects of one species, and it is likely that insect
individuals would express somewhat different preferences for
individual conspecific plants. We can easily imagine repre-
senting such preferences in a series of perceptual spaces.
Having represented these preferences, however, we would ask how
to proceed to estimate differences or to test hypotheses about
differences. There is optimism (Carroll, pers. comm.) that such
recent statistical innovations as the bootstrap will yield an
understanding of the statistical properties and reliability of
scaling methods. It is quite possible, however, that these same
statistical innovations plus other advances in traditional, for-
mal statistics will render ordination passe by facilitating
direct statistical inference from large masses of data. To some
extent this may already be happening.

One area of terrestrial animal ecology in which ordination


may be particularly useful is among soil animals. Here spatial
and temporal structure are less immediately apparent and there
are frequently large masses of data. These data often encompass
many species and present a bewildering welter to the human eye:
soil mites, earthworms, insects, etc. This seems to be precisely
the sort of system in which preliminary data analysis with
graphical output, such as ordination, would be useful in helping
one to establish working hypotheses. In addition, detritivore
assemblages, such as those in the soil, dung, and rotting car-
casses, may generally be more highly and intricately organized
into something approximating the interactive community of classi-
cal ecology than are other terrestrial animal groups, in which
such large-scale organization is often difficult to detect.
565

A particular phenomenon in terrestrial animal ecology


(indeed, in all ecology) that might bear a new look through
modern ordination procedures is succession. As part of the move-
ment away from perceiving assemblages of organisms as highly
ordered superorganismic communities, Drury and Nisbet (1973) and
others argued that the traditional view of succession--the whole-
sale extinction of one community by another--is misguided.
Instead, a careful examination of data from numerous vegetational
successions did not appear to
indicate anything more than the
temporal comings and goings of individual species. Succession
has thus come to be less studied than previously. Of course,
terrestrial animal ecologists as a group were never as heavily
committed to successional studies as were plant ecologists. The
basis for choosing the superorganismic view of succession or the
individualistic view was rarely based on a formal statistical
analysis or even on informal quantitative methods such as
ordination. Choices seem often to have been made rather
impressionistically. One can imagine a more thorough approach
using modern ordination, particularly in well-defined microcosmic
successions such as those involving decomposers (in dung, in
rotting wood, etc.) but also possibly in larger communities. The
underlying approach would be to gather data on a series of com-
munities at one site at several times. One might then perform an
ordination at each time and attempt to look at the differences:
Is there an operator that transforms each ordination into its
successor? One might use generalized procrustes techniques to
examine the differences between the ordinations in the sequence.

In applied ecology or ecological management there appear to


be potential uses of ordination to generate criteria for
decision-making. For example, in Canada a "potential key" func-
tion is used to determine which part of a site planned for deve-
lopment is the most suitable for wildlife and therefore the best
one to set aside for conservation purposes. In construction of
the function, a PCA is first performed on several physical
variables and species, then a discriminant analysis is employed,
and finally clustering. Such indicator functions have the advan-
tage of being concise, though there is always some concern that
566

the information lost in the distillation down to one function


might have been key to a fuller understanding.

We are aware of several terrestrial animal studies in which


ordination has proven useful in preliminary data analysis when
preliminary hypotheses were lacking. For example, correspondence
analysis has been used to indicate the habitat variables most
closely associated with high beaver density (Erome 1982) and to
suggest the feeding grounds of individual ducks by comparing the
distribution of seeds in their guts to the distribution at a
number of potential feeding sites (Pirot et al. 1984). However,
such examples are not common. Ordination seems to be a procedure
of last resort, fruitfully employed only if there are not obvious
hypotheses and formal statistical means to test them. For most
terrestrial animals, the latter situation seems to obtain.

CLUSTERING

We foresee more application of clustering than ordination in


terrestrial animal ecology, though clustering, like ordination,
is a tool to generate hypotheses for further testing rather than
to test hypotheses directly. Probably the chief reason why we
perceive clustering to be a more exciting tool than ordination is
that we already have sets of hypotheses about clusters, even
though we may not have delineated the clusters. Also, we already
have experience with examples in which cluster analysis yielded
hypotheses that would not have been obvious without the clus-
tering, even though it is far from clear that the best clustering
algor i thms were used in each instance. One interesting example
(with plants rather than animals) will suffice to show how
clustering has been helpful. In Belgium, a phytosociological
survey of many pastures was accompanied by chemical analyses of
the soil. The resulting matrices were subjected separately to
cluster analyses and generated virtually identical clusters. The
clusters were subsequently found to consist of sets of pastures
each belonging to the same farmer, and cultural methods used by
567

individual farmers turned out to be key to which sites had which


plants.

In spite of the demonstrated usefulness of cluster analysis,


there is a clear danger: typology. Once the clustering is
achieved, particularly if it is to be used in some management
procedure, it is quite possible that the degree of scatter within
a cluster and the validity of the entire clustering procedure
will no longer be examined and the clusters will automatically be
viewed as real entities, rather than as provisional groupings for
further consideration. Typology is, of course, a tendency of the
human mind, and there is small wonder that it can creep into a
procedure that so easily lends itself to suggesting that entities
belong to types. This is not to say that clustering algorithms,
including modern ones, should be avoided, only that one must
constantly bear in mind what the clusters are.

Whether fuzzy set clustering (Bezdek, this volume) will prove


useful for such delineation, we cannot tell until ecologists have
more experience with such algorithms as fuzzy c-means. We will
have to evaluate the principles on which these procedures are
based, determine whether the ecological data to which we would
apply the procedures do not violate assumptions, and examine the
results of several studies to insure that the objective of this
kind of clustering is appropriate for the sorts of questions that
ecologists ask.

On the other hand, we expect that the proposals by Legendre


(constrained clustering) and Lefkovitch (conditional clustering
without pairwise resemblances) will rather quickly be deployed by
terrestrial animal ecologists. In each instance there already
appear to be problems for which the algorithm is suited and the
interpretation seems straightforward. Constrained clustering
might be used to solve some longstanding biogeographic controver-
sies, such as the appropriate division of the earth or of par-
ticular continents into mammal provinces. Several methods have
been used to generate these provinces, from intuitive, qualita-
tive approaches to multidimensional scaling with stress
568

minimization. There is no real consensus, partly because none of


the clusterings have been very convincing. Constraints would be
appropriate, because certain regions must cluster together (if
they cluster at all) because of pre-existing information such as
taxonomic or genetic data. Another controversial problem is the
nature of succession, discussed above. One wishes to know the
extent to which the comings and goings of individual species are
co-ordinated, as envisioned in the superorganismic view, or truly
independent, as in the individualistic hypothesis. There exist
sets of data (on both plants and animals) taken at several times
during a successional sere. If the superorganismic view is
correct, these should form well-def ined clusters, each corre-
sponding to those samples that belonged to one particular com-
munity type. Clearly the clustering should be constrained,
because the only candidates for clustering are communi ties that
are temporally contiguous. In the marine and aquatic literature
there has already been an examination of succession by cluster
analysis (Legendre et al. 1985).

Condi tional clustering also is immediately appealing, rela-


tive to several older clustering algorithms in wide use, on
several counts. First, it moves directly from the data matrix to
the clusters, without forcing one to choose an arbitrary simi-
larity index and then to calculate pairwise similarities. Often
"similarity" seems poorly defined, and one would like to be able
to form clusters without recourse to similarity. Second, the
clusters that arise mayor may not be overlapping (not unlike
fuzzy sets), and often there seems no prior reason why clusters
should not be overlapping. Finally, a straightforward
information-theoretic interpretation of a particular clustering
results from use of conditional clustering.

FRACTALS

Fractals did not generate much enthusiasm in the terrestrial


ecology working group. Possibly our interest will be enhanced
569

when applications are presented in which new insights or questions


arise, but for now we have the following concerns:

Fractals are def ined rather loosely. A fractal is strictly


defined only through passing to an infinite or infinitesimal
limit that does not have physical reality. Of course the central
limit theorem and other widely used mathematical and statistical
results also rest on passage to the limit, but, in these cases,
the effect of the hypothetical construct has been thoroughly
studied. There seems as yet no exploration of whether applica-
tion of this mathematics to ecological phenomena in spite of this
unreali ty renders the results suspect. The informal def ini tion
of fractals--entities that are geometrically self-similar--is
often stated to apply to particular ecological situations
without a strict demonstration. For example, there seems some
resemblance of Koch coastlines to some real coastlines. Is this
resemblance sufficient to indicate that coasts are fractals?
Certainly no trees have an aboveground structure remotely like
that depicted in the caricature of the For~t Montmorency. Tree
branching is emphatically not self-similar. What exactly is
gained (or lost) by modelling trees or forests as if they were?

Beyond this issue--whether entities claimed to have fractal


geometry actually do have fractal geometry--there is a larger
issue. If we grant for the moment the fractal nature of the phe-
momena addressed by Frontier (this volume), we find that, at
least for terrestrial animals, all have already been treated by
other means, and we would ask what new insights fractals have
introduced. In a general way, it seems that any relationship
that is a line in a log-log plot can be viewed as a fractal, and
many have been thus construed. However, power laws have been
around for decades, and many already have mechanistic or causal
interpretations independently of fractals. Furthermore, there
appears to be a danger that expressing the fractal character of
an ecological entity by its fractal dimension d will subsume so
much information into one number that the result will be unin-
terpretable or uninteresting. Such condensation of distributions
into numbers seems to be irresistible to ecologists--diversity is
570

expressed as H', evenness as J, species-abundance distributions


as slopes or other fitted parameters of particular kinds of cur-
ves, spatial pattern as a single Clark-Evans statistic, etc. In
the end, we seem always to realize that the simplicity embodied
in such condensation is not worth the loss of ecological infor-
mation. Years of viewing diversity as adequately expressed by H'
have left a legacy--we cannot answer some ecological questions
because the original abundance data were not reported. H' is no
longer acceptable as a sufficient measure of diversity, and it
would be sad if the history of d mirrored that of H', J, or other
summary statistics.

PATH ANALYSIS

Path analysis has a venerable tradition in terrestrial animal


ecology, perhaps because most animal ecologists are familiar with
Wright's original elaboration of the technique. Furthermore,
because of the continuing interest in multiple regression of spe-
cies richness on various physical and biotic variables following
the pioneering paper by Hamilton et al. (1964), most applications
have been in the regression setting described by de Leeuw (this
volume). In addition to the uses outlined by de Leeuw,
terrestrial ecologists have employed path analysis to determine
signs of particular paths and to estimate approximate slopes of
unknown relationships from those known through independent evi-
dence. A problem occasionally arises, not because of a defi-
ciency in the technique but because of a misunderstanding of its
proper use: the path analysis is occasionally taken to validate,
in some sense, the path model that is set up on independent
grounds, as if the analysis were a proper test of the hypothesis
embodied in the path diagram.

Although path analysis seems quite appropriate for several


ecological problems, recent advances in multiway contingency
table analysis may render the latter technique an even better one
for most of the same problems. Both methods deal explicitly
with interactions of the same sorts of variables. Multiway
571

contingency table analysis seems somewhat less restrictive than


path analysis because the latter appears to imply causation, and
often the passage of time, through the very nature of the path
diagram. On the other hand, multi way contingency table analyses
generally require more data than path analyses. It would be very
interesting to attempt a comparison of analyses of the same data
sets using both methods.

SPATIAL ANALYSIS

There is a long tradition of spatial analysis in terrestrial


ecology, even more of plants than of animals because plants are
sessile. Furthermore, terrestrial ecologists seem to be well
aware of many of the recent advances such as those described by
Ripley and by Sokal and Thompson (this volume). This is not to
say that the recommendations have been uniformly adopted. For
example, Ripley (this volume) inveighs against using quadrats in
spatial analysis, and his arguments are well known. However,
quadrats are still used by many workers for this purpose. There
appear to be two main reasons. First, many long-term monitoring
programs rest on quadrats sampled repeatedly, and, because these
data exist, they are analyzed for all sorts of things, including
spatial pattern. Second, particularly for rare animals, it is
often easier to estimate the numbers in a quadrat than to esti-
mate the distance between individuals. The last problem will
always be with us, and the biology of particular species may
require that quadrat methods be used to assess their density and
aspects of their spatial dispersion, but we are confident that
there will be a growing use of plotless techniques such as those
outlined by Ripley (this volume).

Spatial correlation is also widely known among terrestrial


animal ecologists, largely because many of them are familiar with
the population genetics literature. There have already been
several applications in ecology, mostly of a descriptive nature,
and the new multiple and partial Mantel tests described by Sokal
and Thompson (this volume) address questions that terrestrial
572

ecologists already ask--we frequently have three or more spa-


tially distributed variables and desire information on their
interrelationships. Whether any of the three approaches outlined
by Sokal and Thompson (this volume) becomes the method of choice
remains to be seen. However, because we already have hypotheses
about relationships of this sort, whatever method turns out to
have the proper statistical properties will probably be used not
only as an exploratory technique but to test hypotheses.

REFERENCES

Cousins, S. H. 1985. Ecologists build pyramids again. New


Scientist 406:50-54.

Drury, W. H., and I. C. T. Nisbet. 1973. Succession. J. Arnold


Arboretum (Harvard University) 54:331-368.

Erome, G. 1982. Contribution a la connaissance eco-ethologique


du castor (Castor fiber) dans la vallee du RhOne. These de
Docteur de l'Universite, Universite Claude Bernard, Lyon.

Hamilton, T. H., R. H. Barth, Jr., and 1. Rubinoff. 1964. The


environmental control of insular variation in bird species
abundance. Proc. Nat. Acad. Sci. (U.S.A.) 52:132-140.

Legendre, P., S. Dallot, and L. Legendre. 1985. Succession of


species within a community: chronological clustering, with
applications to marine and freshwater zooplankton. Am. Nat.
125:257-288.
May, R. M. 1979. The structure and dynamics of ecological com-
munities. In R. M. Anderson, B. D. Turner, and L. R. Taylor
[ed.] • Population dynamics. Blackwell Scientific Press,
Oxford.
Pielou, E. C. 1977. Mathematical ecology. Wiley, New York.
pirot, J. Y., D. Chessel, and A. Tamisier. 1984. Exploitation
alimentaire des zones humides de Camargue par cinq especes
de canards de surface en hivernage et en transit:
Modelisation spatio-temporelle. Rev. Ecol. (Terre Vie)
39:167-192.

Simberloff, F. 1980. A succession of paradigms in ecology:


Essentialism to materialism and probabilism. Synthese
Q: 3-39.
List of participants

Michel AMANIEU, Vincent BOY,


Laboratoire d'Hydrobiologie marine, Fondation Sansouire,
Universite des Sciences et Station biologique de la
Techniques du Languedoc, Tour du Valat, Le Sambuc,
Place Eugene Bataillon, F-13200 ArIes,
F-34060 Montpellier Cedex, France.
France.
Shmuel AMIR, Janet W. CAMPBELL,
Department of Applied Physics Bigelow Laboratory for
and Mathematics, Ocean Sciences,
Soreq Nuclear Research Center, West Boothbay Harbor,
Yavne 70600, Maine 04575,
Israel. U.S.A.
Francisco A. de L. ANDRADE,
Laboratorio Maritimo da Guia da J. Douglas CARROLL,
Universidade de Lisboa, AT&T Bell Laboratories, Room 2C-553,
Forte de N. Sa da Guia, Murray-Hill,
Estrada do Guincho, New Jersey 07974,
2750 Cascais, U.S.A.
Portugal.
Fortunato A. ASCIOTI, Carol D. COLLINS,
Dipartamento Biologia Animale Biological Survey,
e Ecologie Marina, New York State Museum, Science Service,
Via dei Verdi, 75, The University of the State of New York,
Universita di Messina, Albany, New York 12230,
1-98100 Messina, U.S.A.
Italy.
Steve H. COUSINS,
Paul BERTIIET, Energy Research Group,
Laboratoire dEcologie theorique The Open University,
et de Biometrie, Walton Hall,
Universite de Louvain, Milton Keynes,
B-1348 Louvain-Ia-Neuve, England MK7 6AA,
Belgium. United Kingdom.
Earn: BOLA @ BUCLLNl1
Serge DALLOT,
James C. BEZDEK, Station Zoologique,
Department of Computer Science, Universite Pierre et Marie Curie,
University of South Carolina, F-06230 Villefranche-sur-Mer,
Columbia, France.
South Carolina 29208,
U.S.A. Jan DE LEEUW,
Department of Data Theory,
Manfred BOLTER, Faculty of Social Sciences,
Institut fiir Polarokologie, University of Leiden,
Universitat Kiel, Middelstegracht 4,
OlshausenstraBe 40/60, NL-2312 TW Leiden,
D-2300 Kie11, The Netherlands.
Federal Republic of Germany. Earn: DELEEUW @ HLERUL55
574

Jean-Luc DUPOUEY, Robert GITTINS,


Laboratoire de Phyto-ecologie forestiere, 30 Cowper Street, Unit 19,
Institut national de la Recherche Randwick,
agronomique, Sydney,
Champenoux, New South Wales,
F-54280 Seichamps, Australia 2031.
France.
Transpac: 178021240 DUPOUEY.RFPENOI
Rebecca GOLDBURG,
Yves ESCOUFIER, Department of Ecology,
Unite de Biometrie, University of Minnesota,
Institut national de la Recherche 109 Church St S.E.,
agronomique, Minneapolis,
9, Place Pierre Viala, Minnesota: 55455,
F-34060 Montpellier Cedex, U.S.A.
France.
Marta ESTRADA, John C. GOWER,
Instituto de Investigaciones Pesqueras, Statistics Department,
Paseo Nacional, sIn, Rothamsted Experimental Station,
E-08oo3 Barcelona, Harpenden,
Spain. Hertfordshire AL5 2JQ,
United Kingdom.
John G. FIELD,
Zoology Department,
University of Cape Town, Roger H. GREEN,
Rondebosch, Department of Zoology,
South Africa 7700. University of Western Ontario,
London,
JordiFLOS, Ontario N6A 5B7,
Departament d'Ecologia, Canada.
Universitat de Barcelona, Bitnet: A271 @UWOCCI
Avinguda Diagonal, 645,
E-08028 Barcelona,
Spain. Philippe GROS,
Institut fran~ais de Recherche pour
Marie-Josee FORTIN, l'Exploitation de la Mer (IFREMER),
D9:>artement de Sciences biologiques, B.P.337,
Umversite de Montreal, F-29273 Brest Cedex,
C.P. 6128, Succursale A, France.
Montreal,
Quebec H3C 3J7,
Canada Richard L. HAEDRICH,
Newfoundland Institute for Cold
Eugenio FRESI, Ocean Science,
Stazione Zoologica di Napoli, Memorial University,
Laboratorio di Ecologia St-John's,
del Benthos, Newfoundland AlB 3X7,
Punta S. Pietro, Canada.
1-80077 Ischia Porto (Napoli),
Italy. Willem J. HEISER,
Department of Data Theory,
Serge FRONTIER, Faculty of Social Sciences,
Laboratoire d'Ecologie numerique (SN3), University of Leiden,
Universit6 des Sciences et Techniques Middelstegracht 4,
de Lille Flandres Artois, NL-2312 TW Leiden,
F-59655 Villeneuve-d'Ascq Cedex, The Netherlands.
France. Earn: HEISER @ HLERUL55
575

Jean-Marie HUBAC, Michael MEYER,


Laboratoire de Systematique et Institut for PolarOkologie,
Ecologie vegetales, Universitiit Kiel,
BAtirnent 362, OlshausenstraBe 40/60,
Universite Paris-Sud, D-2300 Kiell,
F-91405 Orsay Cedex, Federal Republic of Germany.
France.
Frederic mANEZ, Richard A. PARK,
Station Zoologique, Holcomb Research Institute,
Universite Pierre et Marie Curie, Butler University,
F-06230 Villefranche-sur-Mer, 4600 Sunset Avenue,
France. Indianapolis,
Indiana 46208,
Pierre LASSERRE, U.S.A.
Station Marine de Roscoff,
Place Georges Teissier,
F-29211 Roscoff, Brian RIPLEY,
France. De~ent of Mathematics,
Umversity of Strathclyde,
Alain LAUREC, 26, Richmond Street,
Institut fran~ais de Recherche pour Glasgow,
l'Exploitation de la Mer, Scotland G 1 lXH,
Laboratoire d'Evaluation des United Kingdom.
Ressources halieutiques, Janet: B.D.RIPLEY @
B.P.I049, UK.AC.S1RATU.VAX
F-44037 Nantes Cedex,
France.
Michele SCARDI,
Leonard P. LEFKOVITCH, Stazione Zoologica di Napoli,
Statistical Research, ESRC, Laboratorio di Ecologia del Benthos,
Central Experimental Farm, Punta S. Pietro,
Ottawa, 1-80077 Ischia Porto (Napoli),
Ontario KIA OC5, Italy.
Canada.
Louis LEGENDRE, Bruno SCHERRER,
Departement de biologie, Departement des Sciences biologiques,
Universite Laval, Universite du Quebec aMontreal,
Ste-Foy, C.P. 8888, Succursale A,
Quebec G lK 7P4, Montreal,
Canada. Quebec H3C 3P8,
Canada.
Pierre LEGENDRE,
Departement de Sciences Peter SCHWINGHAMER,
biologiques, Marine Ecology Laboratory,
Universite de Montreal, Bedford Institute of Oceanography,
C.P. 61228, Succursale A, P.O. Box 1006,
Montreal, Quebec H3C 3J7, Dartmouth,
Canada. Nova Scotia B2Y 4A2,
Bitnet: FDI0 @ POLYTECl Canada.
Brian R McARDLE, Daniel SIMBERLOFF,
Department of Zoology, Department of Biological Science,
University of Auckland, Florida State University,
Private Bag, Tallahassee,
Auckland, Florida 32306-2043,
New Zealand. U.S.A.
576

Robert R. SOKAL, Daniel E. WARTENBERG,


Department of Ecology and Evolution, Department of Environmental and
State University of New York, Community Medicine,
Stony Brook, UMDNJ -- Robert Wood Johnson
New York 11794-5245, Medical School,
U.S.A. 675 Hoes Lane,
Bitnet: CHERYL @ SBBIOVM Piscataway,
New Jersey 08854-5635,
S. Edward STEVENS, Jr., U.S.A.
Department of Molecular and
Cellular Biology, Marinus 1. A. WERGER,
Pennsylvania State University, ~ent of Plant Ecology,
University Park, Umversity of Utrecht,
Pennsylvania 16802, Lange Nieuwstraat, 106,
U.S.A. NL-3512 PN Utrecht,
The Netherlands.
David W. TONKYN,
Department of Biological Sciences, Qarice M. YENTSCH,
Clemson University, Bigelow Laboratory for Ocean Sciences,
Clemson, West Boothbay Harbor,
South Carolina 29634, Maine 04575,
U.S.A. U.S.A.

Marc TROUSSEll..IER,
Laboratoire d'Hydrobiologie marine,
Universite des Sciences et
Techniques du Languedoc,
Place Eugene Bataillon,
F-34060 Montpellier Cedex,
France.
Earn: HAIR@FRMOP11
Subject Index

algorithm (see also analysis; computer pro- - Guttman's principal components of scale
grams and packages) a, 179
- annealing a., 318, 329 - homogeneity a., 53, 179,214,215
- clustering a., 225-287, 291, 294, 325 - individual differences scaling a. (INDS-
- constrained clustering a., 291, 294, 295 CAUL), 59, 80,489,493,505
- cutting plane a, 318 - individual differences in orientation scal-
- dynamic programming a., 291 ing (IDIOSCAUL), 91
- nonlinear path analysis a., 398 - item a., 326
- unfolding a., 201, 203 -linear projection pursuit, 521
alternating least squares procedure (ALS), -loglinear a., 159,166,386,492
86,172,213,398 - maximum likelihood nonmetric 2-way
analysis MDS,66
- ACE-method of nonlinear multivariate a., - metric scaling, 487, 493, 539
174 - monotonic analysis of variance, 66
- asymmetric matrix a., 488, 492, 493, 494 - multidimensional preferences scaling, 111
- autocorrelation a.: see autocorrelation - multidimensional scaling a.: see analysis
- canonical a., 172, 183 (nonmetric multidimensional scaling)
- canonical coordinate analysis, 42, 472 - multidimensional unfolding a.: see analy-
- canonical correlation a., 173, 183, 401, sis (unfolding a)
472,485,488,493,534,535,536 - multiple correlation a., 397
- canonical correspondence a., 214 - multiple correspondence a., 52, 176, 180,
- canonical decomposition of N-way tables, 183,472,489,493,504,523
81,85 - multiple regression a.: see regression
- canonical variates a., 41, 125, 154, 166, - multiple-set canonical a., 183
183,472,549,551,565 - multiplicative analysis of a two-way table,
- Chernofffaces, 231 39,43
- classical scaling: see analysis (principal - multivariate a., ix, 157, 158, 163, 401,
coordinates a.) 531,537
- cluster a.: see clustering - nonlinear iterative least squares, 86
- common factor a., 397 - nonlinear iterative partial least squares,
- confIrmatory data a., 183,537 86
-constrained scaling, 305, 489,493 - non-linear mapping, 32
- contingency table a., 488, 561, 570 - nonlinear multivariate a. with optimal
- correspondence a., 47, 56, 153, 161, scaling (see also under the specific entry),
179, 181, 183, 196, 206, 208, 209, 210, 157-187, 210, 214, 401, 474, 506, 537,
212, 213, 215, 216, 325, 487, 492, 493, 541,551,553
522,533,535,536,566 - nonlinear ordination with optimal scaling,
- detrended correspondence a., 39, 161, 183,487,493
214,487,493 - nonlinear path analysis (see also analysis,
- discriminant a.: see analysis (canonical path a. with optimal scaling), 210, 386,
variates a.) 398,479,525
- distance methods, 412-414 - nonlinear principal component a., 180,
- dual scaling: see correspondence a. 181, 183
- exploratory data a., 103, 183, 230, 521, - nonmetric multidimensional scaling
537,560,572 (MDS), 32, 43, 56, 65-138, 158, 183,
- factor a., 183,230 209, 216, 230, 471, 473, 487, 492, 493,
- feature a., 228, 230 522, 523, 538, 539, 541, 542, 551, 553,
- generalized canonical a., 172, 181 562,567
- generalized canonical correlation a., 66, - nonmetric unfolding a., 212
183 - of partial covariances, 150
- generalized Procrustes a., 57, 565 - of three-way data matrix, 59
578

analysis (continued) animal (see a/so benthos, birds, fishes,


- of variance (ANOVA), 166, 183, 443, humans, insects, mammals, spiders,
479,561 zooplankton) 337, 340, 353, 354, 534,
- one-dimensional MDS, 212 559-572
- ordination a: see ordination; scaling - sampling mobile populations, 414
- orthogonal Procrustes a. (see a/so analy- - trajectory, 351, 352
sis, Procrustes a.), 55, 129 approximate reasoning, 494, 525
- parametric mapping, 32, 217, 539 association: see species association
- parametric mapping of nonlinear data assumptions
structures, 66 - distributional a in PCA, 36
- path a., 183, 381, 401, 478,481,488, - orthogonality a. (weak, strong) in path
493,494,516,525,570 modelling, 387
- path a. with optimal scaling, 183,381- asymmetry (in resemblance matrix), 43, 71
404,488,516,525,551 attribute: see variable
- periodogram, 349 autocorrelation
- point pattern a., 407-429, 480, 491, 493, - defmition of spatial a., 432, 522
516,526,571 - in time series, 296, 475, 505, 522
- polynomial factor a., 66, 192 - nominal a. analysis, 436
- preference mapping of stimulus space, - spatial a., 296, 470, 471, 475, 476, 505,
93,113 522
- principal components a. (PCA), 8, 36, - spatial a. analysis, 296, 297, 299,
126, 139, 170, 173, 182, 183, 191,209, 431-466,479,491,493,514,571
230, 401, 472, 487, 493, 509, 522, 523, - univariate a analysis, 296
532,535,536,561,565
- principal components a. (non-centered),
548,551 bacteria (see a/so microbial ecology), 293,
- principal components a. with respect to 338,340
mstrumental variables, 522 barycenter,36,37,49
- principal coordinates a., 28, 56, 303, behaviour, 351, 353, 560, 563
487,493,532,536 Bell Laboratories, 65
- Procrustes a., 55, 57, 128, 129, 472, benthos,346,360,469,485-494
488,493,504,522,562 Bezdek, James c., 225
- property fitting, 66 biogeography,295,431,563,567
- Q, R a., 15, 165 biological oceanography and limnology, 469,
- reciprocal averaging, 179, 208, 325 521-527
- regression a.: see regression biplot, 13,20, 194,208,507,533,535,536
- Sammon mapping, 230,521 birds,44,163,415,417,559,562,566
- Sammon triangulation, 230,521 blood-group genotypes, 37, 437
- scaling a: see ordination; scaling Bolter, Manfred, 469
- scaling by majorizing a complicated boot-strapping, 4, 546, 547, 564
function, 198 botany: see vegetation
- scaling by maximizing a convex function, boundary layer, 478
198 Brownian motion, 351, 352, 374
- simultaneous linear equation scaling, 66
- spatial autocorrelation a: see autocorrela-
tion canonical loading, 40 1
- spectral a., 412, 481 Cantor dust, 371, 375, 376
- step-across method, 39 Carroll, J. Douglas, 65
- surface pattern a, 431, 526 causal
- three-way, three-mode a., 79, 92, 100, - analysis, 381
538 -chain,389
- time series a. (see a/so autocorrelation), causality: see model
183 - imprecise c., 525
- triangulation, 230 chance capitalization (in modelling), 183,
- two-way MDS, 68, 538 402
- unfolding a., 32, 39, 43, 93, 115, 120, chemiometric analysis, 252
189-221, 474, 477, 488, 492, 493, 507, chlorophyll,349,357,391
523,538,539,562 Church-Rosser reduction system, 318
angel,351 Clark-Evans test, 416
579

classification (see also clustering), 158, 161, - integer c., 167


228,376 - interactive c., 164
- distinction between clustering and classifi- coefficient, 22, 32, 68
cation, 232 - chi-square distance, 49, 209
classifier design, 232, 254 - correlation c. (see also matrix), 90
climatology, 255, 376, 534 - dissimilarity c., 26, 90, 313, 532
climax adaptation number, 204 - discontinuity index, 539
closure - distance between ordinations, 55, 155
-data, 36 - Euclidean distance, 9, 17, 23, 28, 72, 82,
- numerical transitive c., 274 90,92,103,116,197,253,532
cluster, 311 - for binary variables, 23
- analysis: see clustering - for combinations of variable types, 27
- statistical test for c., 292, 294 - for qualitative variables, 26
- validity, 231, 255 - for quantitative variables, 26
clustering (see also classification), 62, 189, - Geary's c c., 433
228,481,485,489,493,511,565,566 - Jaccard c., 295, 297, 299
-ADCLUS, 66 - metric c., 23
- algorithms, 225-287, 291, 294, 325 - Minkowski metric, 72
- assignment-prototype (AP) c., 245 - Moran's I c., 433
- average linkage c., 291 - of determination, 389, 397
- binary division c., 291, 294 - Pythagorean distance (see also coefficient,
- chronological c., 291, 481 Euclidean distance), 9,17,23,28
- c-means c.: see clustering (k-means c.) - Rajski's metric, 295
- complete linkage c., 301 - Rand index, 295, 297, 299
- conditional c., 309-327, 476, 489, 492, - similarity c., 23, 27, 90, 532, 543
493,514,524,551,567,568 - spatial autocorrelation c., 433
- constrained c. (see also algorithm), 289- community ecology, ix, 296,559
307, 476, 490, 492, 493, 513, 523, 524, competition, 290
550,551,567,568 complexity (concept oQ, 236, 237
- convex decomposition c., 269 computer programs and packages
- crisp c., 231 -ACE,183
- criterion, 309 - ALSCAL, 34, 59
- definition of, 231 -ALSOS, 183,541
- distinction between clustering and classifi- - AT & T Bell Labs Computer Information
cation,232 Library, 65
- fuzzy c., 231, 475, 490, 492, 493, 494, - BIOGEO, 295
513, 523, 524, 567 -CANCOR,66
- fuzzy c-varieties (FCY) c., 247 - CANDECOMP, 81, 85, 88, 101
- individual difference c., 66 - CANDELINC, 86
- ISODATA c., 253 - constrained clustering, 305
- k-means c., 253, 443 -GENSTAT,4
-linkage hierarchical c., 294 - GIFI system, 34, 183, 214, 474, 523,
- minimum-variance hierarchical c., 290, 542
294 -mCLUS,66
- nearest neighbor (generalized), 278 - IDIOSCAL, 91, 101
- non-hierarchical c., 481 - INDCLUS, 66, 513
- overlapping c., 66, 513 - INDSCAL, 59, 66, 80, 564
- proportional-link linkage c., 291, 293, - INDSCALS, 66, 88
295,297,303 - KYST, 34, 57, 68, 75, 107, 126
- relational c-means (RCM) c., 247 - MAPCLUS, 66, 513
- single linkage c., 275, 277, 291, 301 - MAXSCAL, 66
- time-constrained c., 291 -MDPREF,66, 107,111,121
- ultrametric tree hierarchical c., 66 - MDSCAL, 66, 68
- under a priori models, 223 - MINISSA, 34
- UPGMA c., 294,443 - MONANOYA, 66
coding (see also variable transformation), - MULTISCALE, 34, 211
160,163,215,543,545 -NILES, 86
- conjoint c., 215 - NINDSCAL, 66, 81
- convex c., 215 -NIPALS,86
580

computer programs and packages (continued) - relation c. f., 244


- nonlinear multivariate analysis programs cycle of matter, 336
with optimal scaling, 183
- optimal set covering, 329 data (see also variable), 228
-PARAFAC,92,100 - analysis: see analysis
- PARAMAP, 66 - assessment d., 522
-POLYFAC,66 - design d., 229
- PREFMAP, 65, 66, 93, 108, 113, 193 - frequency d., 313, 316
- PREFMAP models I, II, ill and N, 115 -labelled d., 267
-PROFIT,66 -large d. sets, 295, 524
- set covering probabilities, 328 - measurement d., 522
- set representation probabilities, 328 - missing d., 31, 71, 316, 318
- SIMULES, 66 - mixed-type d., 474
- SINDSCAL, 65, 88,127 -reduction, 181,560
- SMACOF, 60, 198, 199,200,207,209, - relational d., 229, 313
210,212,214 - test d., 229
-TORSCA,71 - transformation: see variable
confusion data, 6 Delaunay triangulation (see also Dirichlet tes-
connection (see also contiguous samples), sellation),294,295,303
434 de Leeuw, Jan, 157,381
consensus ordination, 58 Delphi method, 474
constrained clustering: see clustering dendrogram, 311
constrained ordination or scaling: see ordina- description (in data analysis), 383
tion; scaling descriptive efficiency of a model, 398
constraint descriptor: see variable
- centroid c. (in restricted unfolding), 213 Devil's comb, 347
- equality c. (in restricted unfolding), 213 diatoms, 475
- inequality c. (in restricted unfolding), 213 dimensionality of space (reduction of), 10,
- linear c. on parameters (in CANDEL- 521,531
INC),86 Dirichlet tessellation (see also Delaunay trian-
- other than space or time (in clustering), gulation),314
303 discriminant: see analysis
- space c. (in clustering), 290 dissimilarity: see coefficient
- time c. (in clustering), 290, 291 - approximation, 196
contiguous samples, defmition of (see also dissipative system, 337, 495
connection),293-294 distance: see coefficient
contingency table, 47, 492 - approximation, 198
- analysis, 488 - minimization, 196, 199,206,207
contour distribution
- analysis, 252 - multivariate normal d., 159
- mapping: see mapping - size frequency d., 336
convex - spatial d. (see also spatial), 293, 336
- combination, 251, 269 diversity of species, 336, 357, 358, 364,
- decomposition, 268, 272 524,569
-hull, 238 dominance data, dominance scale, 111
coral reef, 293, 343 duality,15
correlation (see also coefficient; matrix), 383 - diagram, 139-156,474,506,551
correlogram, 451,458,479
- definition of, 435 Eckart-Young decomposition, Eckart-Young
- multivariate Mantel c., 296, 479, 516 theorem, 14, 112, 143
- nominal data, 436, 450 ecological
- test of overall significance, 441 - gradient: see gradient
correspondence analysis: see analysis - hypothesis: see hypothesis
crisp - model: see model
- classifier, 232 - process: see process
-label matrix, 239 - theory, 290
- partition, 231, 238, 240 ecology (definition of), 530
criterion function (in clustering) edaphic condition, 303
- object c. f., 247 edge effect, 418
581

effect (direct, indirect) in path analysis, 393 - classifier, 232


element: see object -label matrix, 239
engineering, 254 -limit, 349
epidemiology, 480 -model,235
equivalence relation, 231 -partition,231,233,237,238,240
ergocline, 290,352,354,495,524 - relation, 233, 240
Escoufier, Yves, 139 - scatter matrix, 249
ethology, 564 - set, 225-287
Euclidean - similarity relation, 231,240,272
- distance: see coefficient - subset (definition of), 233
-model,81
- properties, 24 G-test,443
-space,23,24,28,368 galaxies,349,351,375
evenness,358,364 gauging, 494
experimental design, 539 genetic drift, 444
expert system, 277 geographic information, 289
explanation, 382 geology, 252, 254
external energy, 495 geomorphology, 303
external unfolding problem (see also analy- gill, 337, 345
sis), 198, 199,200,201,213 Gittins, Robert, 529
extreme value theory, 313 Gower, John C., 3
gradient, 193
F-test - analysis, 190, 193,474
- in IDIOSCAL, 99 - ecological g., 39, 48, 190, 191,473
-inPREFMAP,119 - method (in MDS), 69
factor graph,384
- analysis: see analysis - Gabriel g., 294, 314,434
- common f., 394, 396, 397 -random g., 314
farming, 566 - relative neighbourhood g., 313
feature - theory, 357
- analysis, 228, 230 Greig Smith's method, 411
- extraction, 230 Guttman's effect: see horseshoe
- nomination, 229
- selection, 230 health sciences, 296, 480
Field, John G., 485 Heiser, Willem J., 189
fishes,293,295,359,365 histological structure, 337
floristics: see vegetation horseshoe, 39, 161, 181, 195, 214, 216,
Flos, Jordi, 495 217,474,508
foraminifera, 475 humans (see also blood group genotypes),
fossil fishes, 293 37,445,480,563
four-way, four-mode analysis, 100 Hutchinson's fundamental niche, 157
fractal, 335-378, 477, 490, 493, 496, 515, hypothesis
524,551,568 - ecological h., ix, 560, 562
- elements off. theory, 367-377 - generation, 482, 566
- forms (in ecology), 337 - testing (see also test, and under the spe-
- in abstract representational space, 336, cific name), 319, 482, 566, 572
355,376
- in geometric space, 368 identifiability,391
- in physical space, 336, 355 image analysis, 254
- statistical f., 373 indifference principle, 318, 324
- tree, 370, 374 inference,499,521
fractal dimension information theory, 362
- computation for real objects, 375 initial configuration (in MDS), 70,74
- computation through self-similarity rule, insects, 359,438,446,455,559,562,564
369 intelligence, 91
front (hydrological), 354,478 - G factor (general intelligence), 92
Frontie~Serge,335 interface, 335, 354
fungi,534 internal unfolding problem (see also analy-
fuzzy (see also clustering, fuzzy c.) sis), 202,203,204,208,213
582

jack-knifing, 4, 100,546,547 maximum entropy principle, 310, 324


joint plot: see biplot maximum joint probability (principle of), 324
maximum likelihood, 210, 268, 324
Koch triadic curve, 369, 370, 372 measurement error, 396, 537
kriging, 479, 490, 493 medicine, 254
medieval cemeteries, 437
labelling meterology,349
- probabilistic, 268 metric: see coefficient
- relaxation, 268 metric scaling (see also under analysis, for
lake morpholo@', 340 specific forms of metric scaling), 28
language: see linguistics microbial ecology, 469-484
latent variables, in path analysis, 381, 394, minimum cross-entropy (principle of), 324
396,401,478,488 minimum spanning tree, 294, 314, 434
Lefkovitch, Leonard P., 309 misclassification (probability of), 233
Legendre, Louis, xi, 521 missing values: see data (missing d.)
Legendre,Piene,xi,289 mixture problem, 268
lexicographic tree, 362, 363, 364, 377 mobile population, 414
limnology, 469, 521-527, 559 mode (definition of), 80
linear programming, 318 model
linear variety, 248, 250 - causal m" 382, 385, 488, 501, 525
linguistics, 361, 362 - common factor m., 396
location problem, 198 - descriptive efficiency of am., 398
loss function, 196, 197,207,210,211, 214, - ecological m., ix, 478, 491, 524, 562
539-546 - for clustering, 223, 289
-normalization, 198,202 - fuzz)' m., 235
lung, 296,337,343,347 - just Identified m., 387
-linear structural m., 386
nuunnuUs,446, 559, 566, 567 -log-linear m., 561
management (ecological), 565, 567 - Mandelbrot m., 361, 365, 366
Mantel test, 292, 437,453,491,492,493 - mathematical m., statistical m., 4, 39,
- multiple M. t., partial M. t., 438-439, 160,537,544,545,553
454,526,571 - MIMIC m., 395
- restricted randomization in M. t., 439-441 - multilinear m., 86
- test of significance, 437-438 - multiple regression m., 388, 478
mapping - multi-species m., 560
- contour m., 305, 479, 490 - of processes, 355
- ecological communities, 512 - path m., 384, 478
matrix - predictive power of am., 397, 398
- asymmetric m., 43 - probabilistic m., 235
- Burt m., 53, 179 - saturated m., 387
- correlation m., 90,397,534,536,544, - transitive m. (block, simple), 389
547 - trilinear m., 86
- covariance m., 90, 532, 535, 536, 552 -Zipfm.,361
- dissimilarity m., distance m. (see also co- monotonicity, 173
efficient, dissimilarity), 102,532 Monte Carlo method, 112,297,303
- incidence m., 310 Morowitz' principle, 337
- indicator m., 52, 169 morphoedaphic index, 340
- maximum membership m., 272 morphology of living beings, 335, 337, 347,
- multiway m., 505 348
- normalized m., 10, 16 mosaic, 409
- orthogonal m., 15 moss, 30
- orthononnal m., 15 multidimensional scaling: see analysis, non-
- partitioned m., 530, 550 metric multidimensional scaling
-penalty m., 305,514 multiple correspondence analysis: see analy-
- resemblance m., 90, 289 sis
- scalar product m., 90, 532 multiple Mantel test: see Mantel test
- skew-symmetric m., 44 multivariable, 161
- three-way m., 59, 473 - defmition of, 163
- two-way m., 499 multivariate statistical analysis: see analysis
583

niche, 157, 190,476,522 pattern


noise (in data), 521 - analysis, 485
nonlinear data transformation, 192 - clustered p., 416
nonlinearity, 190 - random p., 416
nonmetric multidimensional scaling: see -regularp., 416
analysis pattern recognition, 226, 409
nonsymmetry: see matrix (asymmetric m.) - numerical p. r. system, 226
normalization, in unfolding analysis, 198, - syntactic p. r. system, 226
203,208 pelagic ecosystem, 469, 495-520
normalized matrix: see matrix - production, 338
nugget effect, 502 permutation
numerical ecology, ix, 469, 485, 517, 521, - matrix, 175
529,538,541 - restricted p. test, 439-441
numerical taxonomy, 255, 471, 473, 476, -test, 438
481 phytoplankton, 338, 348, 353, 359, 360,
nutrition, 254 479
plankton (see also phytoplankton, zooplank-
object ton), 348, 349, 351, 375
- defInition of an 0., 162 plants: see vegetation
- distinction between object and variable, point pattern (see also analysis, point pattern
16, 162 a.),408
-linear sequence of 0.,314 Poisson process, 410
objective function, 310, 317 pollen stratigraphy, 291, 314
oceanography, 469, 521-527 pollution, 485
optimal scaling: see scaling Polychaetes, 75, 76, 100, 120, 121, 131,
ordinal 132,255
- information, 25 population ecology, 559
- variable: see variable potential key, 565
ordination (see also analysis), 3-64, 158, power law, 486, 569
161, 189, 191, 230, 296, 485, 503, 521, predator-prey, 290, 351, 352
559,560 predictive power of a model, 397, 398
- comparison of ordinations, 54, 216 preference data, preference scale, 111
- consensus 0., 58 principal components analysis: see analysis
- constrained 0.,306,489,493,538,541 principal coordinates analysis: see analysis
- defInition of an 0., 10 probability
- distance between ordinations, 55, 155 - of membership (in clustering), 234, 235
- Gaussian 0., 209 - posterior p., 279
- joint o. of species and sites, 189,205, process (biological, ecological), x, 4, 442,
206 470,478,485,490
- three-way methods of 0., 59 production, productivity (for primary pro-
orthogonality assumptions, in path model- duction, see also phytoplankton), 338,
ling, 387 340,352,353,354,391,524
outlier, 532,537,552 program (computer): see computer programs
and packages
programming
package (computer): see computer programs - dynamic p., 291
and packages - integer p., 318, 325
paired-comparisons data, 6, 111, 112 -linear p., 318
parasites, 562 proximity data, 66, 68
partial Mantel test: see Mantel test (multiple pseudo-distance, 198,217
M. t.) pseudo-species, 215
participants in ARW, vii, 573 Pythagorean distance: see coefficient
particle size, 470 (Euclidean distance)
partition (see also classifIcation, clustering),
311,319
patchiness, 297, 299, 306, 348, 351, 366, Q-mode analysis: see analysis
445,446,471,490,524 quadrat, 409, 411,562,571
path analysis: see analysis quantifIcation: see variable
path coeffIcients (computation of), 393 quantile plot, 312
584

R-mode analysis: see analysis - optimal s. c., 317, 329


randor.nness (concept on, 236, 237 set representation problem, 316, 328
rank: see variable (semi-quantitative) sewage treatment, 293
rank-frequency diagram, 359, 364, 365, 366 shoreline, 340, 342, 373
Rayleigh flight, 351, 353 sigma-algebra, 316
reciprocal averaging (see also analysis, cor- Simberloff, Daniel, 559
respondence a.), 49 similarity
regional analysis, 289 - coefficient: see coefficient
regression -data, 6
-inMDS,73 - judgments, 81
- in SMACOF, 198,203 - matrix, 289
-linearr.,174 single-peaked response function, 190, 192,
-logistic r., 561 194,196,204,205,213,215,216
- monotone r., 69, 118, 173, 212, 213, singular value decomposition, 14, 55, 112,
217 552
- multiple linear r., 108, 117, 183, 193, socio-economics, 361,364
388,401,488,561 soil, 291, 303, 337, 340, 345, 346, 534,
- polynomial r., 174 559,564
- spline r., 174, 175 Sokal, Robert R., 431
relocation problem, 198, 200, 202 space, 289, 290, 534
remote sensing, 407 spatial
restricted randomization technique (for - analysis, 405, 490, 492, 493, 516, 526,
Mantel test), 439-441 551,571
Ripley, Brian D., 407 - autocorrelation: see autocorrelation
Roscoff, vii, x, xi, 344 - contiguity, 293, 320
rotation - correlogram: see correlogram
- orthogonal r., 95, 116,548 -heterogeneity, 417
- to optimal congruence, 130 - pattern: see pattern
- to simple structure, 131 - point pattern analysis: see analysis
- statiStlcs, 407
sampling, 16,290,365,486,501 - surface pattern analysis: see analysis
- constraint, 522 species
- design, 479, 486, 496, 500, 524, 526, - association, 309, 310, 311, 315, 424,
561 476,489,512,514
- Wid s., 294,410,500 - diversity: see diversity
- mtensity, 410 - indicator s., 320
- Lagrangian s., 500 - interaction between s., 419
- mobile populations, 414 -list of s., 309
- quadrat s., 409, 452 -segregation, 424
- random s., 410, 500, 562 species-environment relations, 290
- stratified s., 562 spiders, 398
- transect s., 410, 414, 500 spline, 175
scale (of activity, of observation), 335, 336, sstress, 31, 32, 33, 59, 210
340, 343, 347, 365, 374, 407, 470, 477, stability
478,479,486,500 - of generalized canonical correlations, 183
scaling (see also ordination, and the specific - in nonlinear multivariate analysis, 546
entry under analysis), 1, 166, 296, 485, starting configuration (in MDS), 70
493,503,521,531,564 statistical package: see computer programs
- constrained s., 306, 489, 493, 538, 541 and packages
- criterion s., 167 statistical test: see test; see also under the
- geographic s., 289 specific entry
- optimal s., 163, 167,170,381,395,397 statistical unit (see also object), 139
- three-mode s., 92 steepest descent method (in MDS), 69, 193,
seaworms: see Polychaetes 199
sediment, 290, 340, 475 step function, 313
self-similarity rule, 369, 373, 376 stimulus, 79, 81
seriation, 191 Stirling approximation, 320
set covering, 310,319, 328 strain, 32, 59, 82
- minimum s. c., 317 strange attractor, 336, 355, 356, 357, 376
585

stress, 31, 32, 33, 60, 68, 74, 77, 197, 210, - distinction between object and variable,
212,213,216 16,162
- diagram, 77, 475 - dummy v., 162, 169
stretching of coordinate system, 116 - endogenous v., 385, 397, 398
structure -exogenous v. 385
- data s., 5, 41 - indicator v., 396
- ecological s., 296 -latent v., 381, 394, 396, 401
subgraph (connected), 314 - metric v.: see variable (quantitative v.)
succession theory, 290, 291, 361, 473, 490, - mixed-type v., 474
565,568 - nominal v.: see variable (qualitative v.)
surface pattern: see analysis (surface pattern - numerical v.: see variable (quantitative
a.) v.)
- ordered v., ordinal v.: see variable (quan-
titative v., semi-quantitative v.)
T-square method, 413 - qualitative v., 6, 26, 47, 162, 174, 312,
table (data): see matrix 436, 446, 479, 522, 523, 525, 535, 537,
target: see variable 542-546
Taylor's power law, 486 - quantification of a v., 163, 166, 169,
terrestrial ecosystem, 469 381,397
test (statistical; for specific tests, see under - quantitative v., 5, 26, 162, 174, 312,
the specific entry) 433,522,523,525,537,542-544
- randomization t., 292, 295 - semi-quantitative (ordinal, rank-ordered)
Thomson, James D., 431 v., 6, 162, 174, 312, 479, 486, 523,
three-way, three-mode analysis: see analysis 525,535,537,542-546
ties (in MDS), 71 - standardization, 197
time (see also constraint), 289,534 - state attribute (see also variable, binary
-scale, 477 v.), 312
- series (see also autocorrelation), 289 - summary variable, 521
trajectories of organisms, 336 - target of a v., 162
transect, 290, 293, 314 - transformation of v., 7, 38, 102, 160,
transformation (of data, of variables): see 163, 166, 192, 381, 397, 401, 471, 486,
variable 537,542-544
transitivity, 241,242 - unordered s-state v.: see variable (qualita-
trees, 337, 343, 345,347,370,413 tive v.)
trend surface analysis, 479, 490 variance, analysis of (ANOVA): see analysis
triangle inequality, 83 variate: see variable ,
trophic level, 560 vegetation, 5, 18, 30, 34, 50, 164, 191, 204,
turbulence, 338, 345, 349, 351, 352, 354, 303, 322, 340, 343, 407, 409, 413, 424,
374 439, 446, 447, 455, 529-558, 559, 564-
two-way analysis: see analysis 566
typology, 567 - nitrogen treatments of grass, 174
viscosity, 338, 345,346
unimodality, 190, 191
units: see objects ways (defInition of number of w., in a model
or a method), 79
weather, 255
variable Weber problem (generalized), 199
- binary v., 6, 23,162,310,312 weighting, 75, 143, 148,474
- categorical v.: see variable (qualitative v.) working groups, x, 467-
- category quantification of a v., 166, 169 worms: see Polychaetes
- continuous v.: see variable (quantitative
v.) zooplankton, 165, 168, 171, 173, 179, 292,
- defInition of a v., 161, 162 359,360,479

Вам также может понравиться