Академический Документы
Профессиональный Документы
Культура Документы
Perceptual Organization
The Oxford Handbook
of Perceptual
Organization
Edited by
Johan Wagemans
1
1
Great Clarendon Street, Oxford, OX2 6DP,
United Kingdom
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
© Oxford University Press 2015
The moral rights of the authorhave been asserted
First Edition published in 2015
Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
Published in the United States of America by Oxford University Press
198 Madison Avenue, New York, NY 10016, United States of America
British Library Cataloguing in Publication Data
Data available
Library of Congress Control Number: 2014955474
ISBN 978–0–19–968685–8
Printed and bound by
CPI Group (UK) Ltd, Croydon, CR0 4YY
Oxford University Press makes no representation, express or implied, that the
drug dosages in this book are correct. Readers must therefore always check
the product information and clinical procedures with the most up-to-date
published product information and data sheets provided by the manufacturers
and the most recent codes of conduct and safety regulations. The authors and
the publishers do not accept responsibility or legal liability for any errors in the
text or for the misuse or misapplication of material in this work. Except where
otherwise stated, drug dosages and recommendations are for the non-pregnant
adult who is not breast-feeding
Links to third party websites are provided by Oxford in good faith and
for information only. Oxford disclaims any responsibility for the materials
contained in any third party website referenced in this work.
Foreword
Stephen E. Palmer
The topic of perceptual organization typically refers to the problems of how the visual informa-
tion is structured into qualitatively distinct elements over time and space during the process
of perceiving and how that structuring influences the visual properties observers experience.
Corresponding work on analogous topics in other sensory modalities is also an active area of
research (see Section 7), but the vast majority of the literature concerns perceptual organization
in vision (as reflected in the rest of the volume). If one grants that the smallest, lowest-level visual
elements are likely to be the outputs of retinal receptors and that the largest, highest level ele-
ments are the consciously experienced, meaningful environmental scenes and events that human
observers use to plan and execute behaviors in their physical and social environments, then the
fundamental question of perceptual organization is nothing less than this: how does the visual
system manage to get from locally meaningless receptor outputs to globally meaningful scenes
and events in the observer’s perceived environment? When stated in this way, the field of percep-
tual organization encompasses most of human perception, including the perception of groups,
patterns, and textures (Section 2), contours and shapes (Section 3), figures, grounds, and depth
(Section 4), surfaces and colors (Section 5), motion and events (Section 6), as well as analogous
issues in other sensory modalities (Section 7). (The present volume also includes two further
sections on topics that have evolved from the material covered in Sections 2-7, one on special-
ized topics (Section 8) and another on practical applications (Section 9).) Indeed, nearly the only
aspects of perception typically excluded from discussions of perceptual organization are very low-
level sensory processing (such as detecting lines and edges) and very high-level pattern recogni-
tion (such as recognizing objects and scenes). This division has led to a somewhat unfortunate
and uninformative classification of vision into low-level, mid-level, and high-level processing,
with perceptual organization being identified with mid-level processing: essentially, whatever is
left over between basic sensory processing and pattern recognition of known objects and scenes.
Even so, some topics are more closely associated with the field of perceptual organization than
others, and the ones represented in this volume constitute an excellent sample of those topics.
Perceptual organization not only spans a wide array of empirical phenomena in human vision,
but the approaches to understanding it encompass four distinct, but tightly interrelated domains:
phenomenology, physiology, ecology, and computation. Phenomenology concerns the conscious
appearance of the visible world, seeking to answer questions about the structural units of visual
experience (e.g., regions, surfaces, and volumetric objects) and the properties people experience
as defining them (e.g., their colors, shapes, sizes and positions). Physiology (i.e., neuroscience)
concerns how neural events in the brain produce these experiences of perceived elements and
properties, addressing the problem of how the brain achieves that organization of visual experi-
ences. Ecology concerns the relation between observers and their environments (including physi-
cal, social, and cultural aspects), attempting to determine why the world is experienced in terms
of these units rather than others and why the brain processes the corresponding sensory informa-
tion in the way it does. Computation concerns formal theories of how perceptual organization
vi Foreword
might be achieved by the processing of information at a more abstract level than that of physi-
ological mechanisms in the brain. Computation thus provides a theoretical interlingua in which
the other three domains can potentially be related to each other. All four domains are crucial in
understanding perceptual organization and are mentioned throughout this volume. They are also
addressed quite explicitly in the final, theoretical section (Section 10).
The topic of perceptual organization in vision has a fascinating, roller-coaster history that is
relevant to understanding the field. Until the late 19th and early 20th centuries, organizational
issues in vision, at least as they are currently considered, were virtually nonexistent. The reason
is that the dominant theoretical paradigm in18th century philosophy came from British empiri-
cists, such as Locke, Berkeley, and Hume, who proposed that high-level perceptions arose from
a mechanistic, associative process in which low-level sensory atoms — i.e., primitive, indivisible,
basic elements (akin to the outputs of retinal receptors) — evoked other sensory atoms that were
linked together in memory due to repeated prior joint occurrences. The result of these activated
associations, they believed, was the perception of meaningful objects and scenes. This atomistic,
associative view, which became known as “Structuralism” in the hands of 19th century psycholo-
gists, such as Wundt and Titchener, includes no interesting role for structure between low-level
sensory atoms and high-level perceptions, as if the latter arose from unstructured concatenations
(or “summative bundles”) of the appropriate sensory atoms.
The theoretical landscape became more interesting in the late 19th century with the develop-
ment of philosophical phenomenology (see Chapter 2), in which the structure of internal experi-
ences was ascribed a much more important role. Phenomenologists, such as Brentano, Husserl,
and Merleau-Ponty, analyzed the subjective organization and content of internal experiences (i.e.,
the appearance of perceptual objects) into a sophisticated taxonomy of parts and wholes. The
development of such ideas in the hands of philosophers and early psychologists eventually led
to the seminal singularity in the history of perceptual organization: the advent of the Gestalt
revolution in the early 20th century. “Gestalt” is a German word that can roughly be translated
as “whole-form” or “configuration,” but its meaning as the name for this school of psychology
goes considerably beyond such superficial renderings because of its deep theoretical implications.
Gestalt psychology was nothing less than a revolutionary movement that advocated the over-
throw of Structuralism’s theoretical framework, undermining the assumptions of both atomism
and associationism. Following important earlier work by von Ehrenfels on the emergent quali-
ties of melodies, Gestalt psychologists, most notably including Wertheimer, Köhler and Koffka,
argued forcefully against the Structuralist views of Wundt and his followers, replacing their claims
about atomism and associationism with the opposing view that high-level percepts have intrinsic
emergent structure in which wholes are primary and parts secondary, the latter being determined
by their relations to and within the whole. This viewpoint is often expressed through the well-
known Gestalt rallying cry that “the whole is different from the sum of its parts.” Indeed, it was
only when the Gestaltists focused attention on the nature and importance of part-whole organiza-
tion that it was recognized as a significant problem for the scientific understanding of vision. It is
now a central – though not yet well understood – topic, acknowledged by virtually all perceptual
scientists. The historical evolution of the Gestalt approach to perceptual organization is described
in scholarly detail in Chapter 1.
Gestalt psychologists succeeded in demolishing the atomistic, associative edifice of
Structuralism through a series of profound and elegant demonstrations of the importance of
organization in visual perception. Indeed, these demonstrations, which Koenderink (Chapter 3)
calls “compelling visual proofs,” were so clear and definitive that they required only a solid
consensus about the subjective experiences of perceivers when viewing the examples, usually
Foreword vii
without reporting quantitative measurements. Their success is evident in the fact that many
of these initial demonstrations of organizational phenomena have spawned entire fields of
subsequent research in which more sophisticated, objective, and quantitative research meth-
ods have been developed and employed (see Chapter 3). Indeed, the primary topic of this
handbook is the distillation of current, cutting-edge knowledge about the phenomenologi-
cal, physiological, ecological, and computational aspects of perceptual organization that have
been achieved using these modern methods.
Research on the initial organizational phenomena discovered by Gestalt psychologists, such as
grouping (Chapter 4), apparent motion (Chapter 23), and other forms of organization in motion
and depth (Chapter 25), got off to a quick start, impelled largely by their crucial role in undermin-
ing the Structuralist dogma that held sway during the early 20th century, especially in Europe. (The
Gestalt approach was not as successful in the US, largely because American psychology was mired
in theoretical and methodological Behaviorism.) Indeed, Gestalt theorists advanced some claims
about alternatives to Structuralism that were quite radical. Among them were Köhler’s claims
that the brain is a “physical Gestalt” and that it achieves perception through electrical brain fields
that interact dynamically to minimize physical energy. Gestalt theorizing encountered resistance
partly because it went against the accepted consensus that science makes progress by analyzing
complex entities into more elementary constituents and the interactions among them, a claim
explicitly rejected by Gestalt theorists. More importantly, however, acceptance of Gestalt theory
plummeted when Köhler’s electrical field hypothesis was tested physiologically and found to be
inconsistent with the results (see Chapter 1 for details).
The wholesale rejection of Gestalt ideas that followed was an unfortunate example of throwing
the baby out with the bathwater. The poorly understood problem is that Gestalt theory was (and
is) much more general and abstract than Köhler’s electrical field theory or indeed any other par-
ticular implementation of it (see Palmer, 2009, for further explanation). For example, one of the
most central tenets of Gestalt theory is the principle of Prägnanz (or simplicity), which claims that
the organization of the percept that is achieved will be the simplest one possible given the available
stimulation. That is, the visual system attempts both to maximize the “goodness-of-fit” between
the sensory data and the perceptual interpretation and to minimize the perceptual interpretation’s
complexity (see Chapters 50 and 51). Köhler identified complexity with the energy of the electri-
cal brain field, which tends naturally toward a minimum in dynamic interaction within a physical
Gestalt system, which he claimed the brain to be. It is tempting to suppose that if electrical field
theory is incorrect, as implied by the results of experiments, then Gestalt theory in general must
be incorrect. However, subsequent analyses have shown, for example, that certain classes of neural
networks with feedback loops exhibit behavior that is functionally isomorphic to that of energy
minimization in electrical fields. If perception is achieved by activity in such recurrent networks
of neurons, then Gestalt theory would be vindicated, even though Köhler’s electrical field conjec-
ture was incorrect.
An equally important factor in the stagnation of research on perceptual organization was the
advent of World War II, which turned attention and resources away from scientific enterprises
unrelated to the war effort and sent many prominent German Gestaltists into exile in the US. The
Gestalt movement retained a significant prominence in Italy, however, where psychologists such
as Musatti, Metelli, and Kanizsa kept the tradition alive and made significant discoveries concern-
ing the perception of transparency (Chapters 20 and 22) and contours (Chapters 10–12). Other
important findings about perceptual organization were made by Michotte (in Leuven, Belgium),
whose analysis of the perception of causality challenged the long-held philosophical belief that
causality was cognitively inferred rather than directly perceived. These and other contributions to
viii Foreword
the phenomena of perceptual organization kept the field alive, but the period from the 1940s to
the 1960s was a nadir for research in this field.
A variety of forces converged since the 1960s to revitalize interest in perceptual organization
and bring it into the mainstream of the emerging field of vision science. One was the use of mod-
ern, quantitative methods to understand and extend classic Gestalt phenomena. These include
both direct psychophysical measures of organization (e.g., verbal reports of grouping) and visual
features (e.g., surface lightness) and indirect measures of performance in objective tasks (e.g.,
reaction time measures of interference effects). Among the many important examples of such
research are Wallach’s and Gilchrist’s contributions to understanding lightness constancy, Rock’s
work on reference frames in shape perception, Palmer’s studies of new grouping principles and
measures, Kubovy’s quantitative laws for integrating multiple grouping principles, Peterson’s
exploration of the role of past experience in figure-ground organization, Navon’s work on global
precedence, and Pomerantz’s research into configural superiority effects. Such empirical findings
intrigued a new generation of vision scientists, who failed to find low-level sensory explanations
of them – hence the invention of the term “mid-level vision.” A second force was the healthy desire
to shore up the foundations of Gestalt theory by formalizing and quantifying the Gestalt principle
of Prägnanz. This enterprise was advanced considerably by seminal contributions from Attneave,
Hochberg, Garner, Leeuwenberg, van der Helm, and others who applied concepts from informa-
tion theory and complexity theory to phenomena of perceptual organization. A third force that
eventually began to have an effect was the study of the neural mechanisms of organization. Hubel
and Wiesel revolutionized sensory physiology by discovering that the receptive fields of neurons
in visual cortex corresponded to oriented line- and edge-based structures. Their results and the
explosion of physiological research that followed is not generally discussed as being part of the
field of perceptual organization – rather, it is considered “low-level vision” – but it surely can be
viewed that way, as it specifies an early level of structure between retinal receptor outputs and
high-level perceptual interpretations. Subsequent neuroscientific research and theory by pioneers
such as von der Heydt, Lamme, von der Marlsburg, and van Leeuwen addressed higher-level
structure involved in figure-ground organization, subjective (or illusory) contours, and grouping.
A fourth converging force was the idea that perception – indeed, all psychological processes –
could be modeled within an abstract computational framework. This hypothesis can ultimately be
traced back to Turing, but its application to issues of visual organization is perhaps most clearly
represented by Marr’s influential contributions, which attempted to bridge subjective phenom-
ena with ecological constraints and neural mechanisms through computational models. More
recently, Bayesian approaches to the problem of perceptual organization are having an increas-
ing impact on the field due in part to their generality and compatibility with hypotheses such as
Helmholtz’s likelihood principle and certain formulations of a simplicity principle. Many of the
theoretical discussion in this volume are couched in computational terms, and it seems almost
certain that computational theory will continue to loom large in future efforts to understand per-
ceptual organization.
The present volume brings together all of these diverse threads of empirical and theoretical
research on perceptual organization. It will rightly be considered a modern landmark in the com-
plex and rapidly evolving history of the field of perceptual organization. It follows and builds upon
two extensive scholarly review papers that were published exactly 100 years after Wertheimer’s
landmark 1912 article on the phi phenomenon that launched the Gestalt movement (see
Wagemans Elder, Kubovy, Palmer, Peterson, Singh, & von der Heydt, 2012; Wagemans, Feldman,
Gepshtein, Kimchi, Pomerantz, van der Helm, & van Leeuwen, 2012). The 51scholarly chapters it
contains are authored by world-renown researchers and present comprehensive, state-of-the-art
Foreword ix
reviews about how perceivers arrive at knowledge about meaningful external objects, scenes, and
events from the meaningless, ambiguous, piecemeal evidence registered by sensory receptors.
This perceptual feat is nothing short of a miracle, and although we do not yet understand how it
is accomplished, we know a great deal more than was known a century ago when the enterprise
began in earnest. This handbook is thus equally suitable for students who are just beginning to
explore the literature on perceptual organization and for experts who want definitive, up-to-date
treatments of topics with which they are already familiar. And it is, above all, a fitting tribute to the
founding of an important field of scientific knowledge that was born a century ago and the quite
remarkable progress scientists have made in understanding it during that time.
Stephen E. Palmer
Professor of the Graduate School
Psychology & Cognitive Science
University of California, Berkeley, CA
U.S.A.
References
Palmer, S. E. (2009). Gestalt theory. In Bayne, T., Cleeremans, A., & Wilken, P. (Eds.). (2009). The Oxford
Companion to Consciousness (pp. 327–330). Oxford, U.K.: Oxford University Press.
Wagemans, J., Elder, J. H., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M., & von der Heydt, R.
(2012). A century of Gestalt psychology in visual perception: I. Perceptual grouping and figure–ground
organization. Psychological Bulletin, 138(6), 1172–1217.
Wagemans, J., Feldman, J., Gepshtein, S., Kimchi, R., Pomerantz, J. R., van der Helm, P. A., & van Leeuwen,
C. (2012). A century of Gestalt psychology in visual perception: II. Conceptual and theoretical
foundations. Psychological Bulletin, 138(6), 1218–1252.
Preface
Editing a handbook such as this is a serious undertaking. It has been high on my list of priorities
for over 3 years, from the first draft of the proposal to the writing of this paragraph. I was aided
in my initial steps by the helpful suggestions of many colleagues, including those who accepted
invitations to become members of the Scientific Advisory Board: Marlene Behrmann, Patrick
Cavanagh, Walter Gerbino, Glyn Humphreys, Stephen E. Palmer, and Pieter Roelfsema. I was
struck by the great level of enthusiasm I received from those I approached to write specific chap-
ters. Almost all accepted right away, and those who did not, explained how much they regret-
ted being unable to contribute due to other commitments. I thank everyone for tolerating my
persistence during the more difficult aspects of the editorial process, such as the coordination
of submissions, reviews, revisions, author proofs, and copyright forms. I would especially like to
thank all of the authors for their excellent contributions, and all of the reviewers (many of them
authors themselves or current and former postdoctoral collaborators) for the useful feedback and
specific suggestions for further improvements. A word of gratitude is in order for Martin Baum
(Senior Commissioning Editor for Psychology and Neuroscience at Oxford University Press),
for his enthusiasm and support throughout the whole process, from the very beginning to the
very end. I would also like to thank Charlotte Green (Senior Assistant Commissioning Editor for
Psychology and Social Work at Oxford University Press) and all the staff at OUP (and their service
companies) for their professional assistance during all steps from manuscript to final production
in electronic and book form. You have all done a marvellous job, thanks a lot!
I would like to thank my university (KU Leuven) and faculty (Psychology and Educational
Sciences) for allowing me a sabbatical when I started to work on this handbook, and the Research
Foundation–Flanders (K8.009.12N) for funding it. In addition, I thank the “Institut d’études avan-
cées” (IEA), Paris for providing an excellent environment to work on a large and time-consuming
project such as this. Last but not least, I thank the Flemish Government for the long-term struc-
tural funding of my large-scale research program, aimed at reintegrating Gestalt psychology into
contemporary vision science and cognitive neuroscience (METH/08/02 and METH/14/02). With
this handbook I hope to significantly contribute to realizing this ambition.
Contents
Contributors xix
General background
Chapter 1
Introduction
In 2012, it was exactly 100 years ago since Wertheimer published his paper on phi-motion (1912)–
perception of pure motion, that is, without object motion – which many consider to be the start of
Gestalt psychology as an important school of thought. The present status of Gestalt psychology is
quite ambiguous. On the one hand, most psychologists believe that the Gestalt school has died with its
founding fathers in the 1940s, after some devastating empirical findings regarding electrical field the-
ory in the 1950s, or as a natural decline because of fundamental obstacles against further progress, and
stronger theoretical and experimental frameworks arising and gaining dominance, since the 1960s
and 1970s (e.g., cognitive science, neuroscience). On the other hand, almost all psychology textbooks
still contain a Gestalt-like chapter on perceptual organization (although often quite detached from the
other chapters), and new empirical papers on Gestalt phenomena are published on a regular basis.
I believe that Gestalt psychology is quite relevant to current psychology in several ways. Not
only has contemporary scientific research continued to address classic questions regarding the
emergence of structure in perceptual experience and the subjective nature of phenomenal aware-
ness (e.g., visual illusions, perceptual switching, context effects), using advanced methods and
tools that were not at the Gestaltists’ disposal. I also believe that the revolutionary ideas of the
Gestalt movement can still function as a dissonant element to question some of the fundamental
assumptions of mainstream vision science and cognitive neuroscience (e.g., elementary build-
ing blocks, channels, modules, information-processing stages). Indeed, much progress has been
made in the field of non-linear dynamical systems, theoretically and empirically (e.g., techniques
to measure and analyze cortical dynamics), which allows us to surpass some of the limitations in
old-school Gestalt psychology, as well as in mainstream vision research.
To be able to situate all the reviews of a century of theoretical and empirical work on perceptual
organization in this handbook against the background of this special position of Gestalt psychol-
ogy, I will first introduce the key findings and ideas in old-school Gestalt psychology, its historical
origin and development, rise and fall. I will sketch only the main lines of thought and major steps
in the history. For a more extensive treatment of the topic, I refer to Ash (1995).
The names in boldface are the historically most important Gestalt psychologists.
1
4 Wagemans
but a special case. It concerned perceived motion without seeing an object moving, so rather than
the standard case of seeing an object first at location a, and then, after an interval φ, at location b
(i.e., apparent motion from a to b), here it concerned pure φ, without a percept of a or b. The gen-
eral phenomenon of apparent motion had already been observed as early as 1850 by the Belgian
physicist Joseph Platteau, Sigmund Exner (one of Wertheimer’s teachers) had obtained it with two
electric sparks in 1875, and in 1895 the Lumière brothers had patented the ‘cinématographe’, an
invention based on the phenomenon. (For an excellent discussion of its historical importance, see
Sekuler, 1996; for a demonstration of the phenomenon and for a review of its misrepresentation
in later sources, see Steinman, Pizlo, & Pizlo, 2000; for a recent review of apparent motion, see
Herzog & Ogmen, this volume.)
According to a famous anecdote, Wertheimer came to the idea for this experiment when he saw
alternating lights on a railway signal, while on his way from Vienna to the Rhineland for vaca-
tion in the autumn of 1910. He got off the train in Frankfurt, bought a toy stroboscope and began
constructing figures to test the idea in his hotel room. He then called Wolfgang Köhler, who had
just begun to work as an assistant at the Psychological Institute there. Köhler provided him with
laboratory space and a tachistoscope with a rotating wheel, especially constructed by Schumann
(the Institute’s Director) to study successive expositions. According to the conventional view of
apparent motion perception, we see an object on several positions successively and something is
then added subjectively. If this were correct, then an object would have to be seen moving, and
at least two positions, the starting and end points, would be required to produce seen motion.
Neither of these conditions held in the case of phi motion. By systematically varying the form,
color, and intensity of the objects, as well as the exposure intervals and stimulus distances between
them, and by examining the role of attitude and attention, Wertheimer was able to refute all of the
current theories of motion perception.
In the standard experiment, a white strip was placed on a dark background in each slit, while
the rotation speed of the tachistoscope wheel was adjusted to vary the time required for the light
to pass from one slit to the next. Above a specific threshold value (~200 ms), observers saw the
two lines in succession. With much faster rotation (~30 ms), the two lines flashed simultane-
ously. At the so-called optimal stage (~60 ms), observers saw a definite motion that could not
be distinguished from real motion. When the time interval was decreased slightly below 60 ms,
after repeated exposures, observers saw motion without a moving object. Although he used only
three observers (Wolfgang Köhler, Kurt Koffka, and Koffka’s wife Mira), he was quite confident
in the validity of the results: the characteristic phenomena appeared in every case unequivocally,
spontaneously, and compellingly. After confirming Exner’s observation that apparent motion pro-
duces negative after-images in the same way as real motion, Wertheimer proposed a physiological
model based on some kind of physiological short circuit, and a flooding back of the current flow,
creating a unitary continuous whole-process. He then extended this to the psychology of pure
simultaneity (for the perception of form or shape) and of pure succession (for the perception of
rhythm or melody). This extension was the decisive step for the emergence of the Gestalt theory.
Implications: Gestalt theory
The phi phenomenon was simply a process, a transition (‘an across in itself ’) that cannot be com-
posed from the usual optical contents of single object percepts at two locations. In other words,
perceived motion was not just added subjectively after the sensory registration of two spatiotem-
poral events (or snapshots), but something special with its own phenomenological characteris-
tics and ontological status. Indeed, based on the phi phenomenon, Wertheimer argued that not
Historical and conceptual background 5
sensations, but structured wholes or Gestalten are the primary units of mental life. This was the
key idea of the new and revolutionary Gestalt theory.
The notion of ‘Gestalt’ was already introduced into psychology by Christian von Ehrenfels in
his essay ‘On Gestalt qualities’ (1890), one of the founding document of Gestalt theory. Because we
can recognize two melodies as identical, even when no two notes in them are the same, he argued
that these forms must be something more than the sum of the elements. They must have, what
he called ‘Gestalt quality,’ a characteristic, which is immediately given, along with the elementary
presentations that serve as its fundament, dependent upon the objects, but rising above them. In
his discussion of the epistemological implications of his discovery of phi motion, Wertheimer
went considerably beyond von Ehrenfels’s notion of one-sided dependence of Gestalt qualities
on sense data, which made wholes more than the sum of their parts, while maintaining the parts
as foundations (‘Grundlage’). He claimed instead that specifiable functional relations exist that
decide what will appear or function as a whole and as parts (i.e., two-sided dependency). Often
the whole is grasped even before the individual parts enter consciousness. The contents of our
awareness are mostly not summative, but constitute a particular characteristic ‘togetherness’, a
segregated structure, often comprehended from an inner centre, to which the other parts of the
structure are related in a hierarchical system. Such structures were called ‘Gestalten,’ which are
clearly different from the sum of the parts. They were assumed to arise on the basis of continuous
whole-processes in the brain, rather than associated combinations of elementary excitations.
With this significant step, Wertheimer separated himself from the Graz school of Gestalt psy-
chology, represented by Alexius Meinong, Christian von Ehrenfels, and Vittorio Benussi, who
maintained a distinction between sensation and perception, the latter produced on the basis of the
former (Boudewijnse, 1999; for further discussion, see Albertazzi, this volume). The Berlin school,
represented by Max Wertheimer, Kurt Koffka, and Wolfgang Köhler, went further and considered
a Gestalt as a whole in itself, not founded on any more elementary objects. Instead of perception
being produced from sensations, a percept organizes itself by mutual interactions, a percept arises
non-mechanically by an autonomous process in the brain. The Berlin school also did not accept a
stage theory of perception and, hence, distinguished itself from the Leipzig school, represented by
Felix Krüger, Friedrich Sander, and Erich Jaensch, in which the stepwise emergence of Gestalten
(‘Aktualgenese’ or ‘microgenesis’) played a central role (see va Leeuwen, this volume).
Although the Berlin theorists adhered to a non-mechanistic theory of causation and did not
want to analyze the processes into stages, they did believe that the critical functional relations in
the emergence of Gestalts could be specified by several so-called Gestalt laws of perceptual organ-
ization. They were inspired by Johann Wolfgang Goethe, who introduced the notion of ‘Gestalt’
to refer to the self-actualizing wholeness of organic forms. For Goethe, the functional role of an
organism’s parts is determined by a dynamic law inherent in the whole, filled with comings and
goings, but not mechanical operations. The ideal end results of these dynamic interactions are clas-
sically proportioned forms, signs of balance, lawfulness, and order realizing itself in nature, not
imposed upon it by an ordering mind. However, at the same time, the Berlin theorists wanted to
give this notion a naturalistic underpinning to avoid the anti-physicalist attitude of Felix Krüger’s
holistic psychology (‘Ganzheitspsychologie’), which was characteristic of the Leipzig school.
They were all trained in experimental psychology by Carl Stumpf in Berlin, who strongly
believed in the immediately given as the basis of all science (cf. Brentano) and in the lawfulness
of the given, which included not only simple sensations of color or tone, but also spatially and
temporally extended and distributed appearances, as well as relationships among appearances,
such as similarity, fusion, or gradation. The laws of these relationships are neither causal nor
functional, but immanent structural laws according to Stumpf. It is these structural laws that
6 Wagemans
the Berlin school was about to uncover. Already at a meeting of the Society for Experimental
Psychology in 1914, Wertheimer announced that he had discovered a general kind of Gestalt law,
a tendency towards simple formation (‘Gestaltung’), called the law of the Prägnanz of the Gestalt.
Unfortunately, the promised publication did not appear until 1923, although the experiments
were essentially from the years 1911–1914.
all the other parts of the system. Köhler then showed that stationary electric currents, heat cur-
rents, and all phenomena of flow are strong Gestalten in this sense. These he distinguished from
what he called ‘weak Gestalten,’ which are not immediately dependent on the system’s topography
(e.g., a group of isolated conductors connected by fine wires). Weak Gestalten are satisfactorily
treated with simultaneous linear algebraic functions, whereas strong Gestalten must be described
either with integrals or with series of partial differential equations.
In addition, Köhler tried to construct a specific testable theory of brain processes that could
account plausibly for perceived Gestalten in vision. In short, he presented visual Gestalten as the
result of an integrated Gestalt process in which the whole optic sector from the retina onward
is involved, including transverse functional connections among conducting nerve fibres. The
strongest argument for proposing that the brain acted as a whole system was the fact that Gestalts
were found at many different levels: seen movement, stationary Gestalten, the subjective geom-
etry of the visual field, motor patterns, and insightful problem solving in animals. This theory
had dramatic consequences. For Gestalt theory, the 3-D world that we see is not constructed by
cognitive processes on the basis of insufficient sensory information. Rather, the lines of flow are
free to follow different paths within the homogeneous conducting system, and the place where a
given line of flow will end in the central field is determined in every case by the conditions in the
system as a whole. In modern terms, Köhler has described the optic sector as a self-organizing
physical system.
Based on this general theory of physical Gestalten and this specific theory of the brain as a
self-organizing physical system within which experienced Gestalten emerge, Köhler then came
to the postulate of ‘psychophysical isomorphism’ between the psychological facts and the brain
events that underlie them. With this he meant, as Wertheimer before him, functional instead
of geometrical similarity, so it is not the case that brain processes must somehow look like per-
ceived objects. Köhler also insisted that such a view does not prescribe featureless continuity in
the cortex, but is perfectly compatible with rigorous articulation. He conceded that experiments
to establish the postulated connections between experienced and physical Gestalten in the brain
were nearly unthinkable at the time from a practical point of view, but that this should not detract
from its possibility in principle. In the meantime, Köhler tried to show that his postulate was
practical by applying it to the figure-ground phenomena first reported by Edgar Rubin in 1915.
Decades later, after Köhler emigrated to the USA, he attempted to carry out such experiments (see
Section “In the USA” below).
All of the examples Köhler had offered of physical Gestalten were equilibrium processes, such as
the equalization of osmotic pressures in two solutions by the migration of ions across the boundary
between them, or the spontaneous distribution of charged particles on conductors. As Maxwell’s
field diagrams showed, we could predict from a purely structural point of view the movements
of conductors and magnets, and the groupings of their corresponding fields, in the direction of
increased evenness of distribution, simplicity, and symmetry. This was a qualitative version of the
tendency (described by Planck) of all processes in physical systems left to themselves, to achieve
the maximum level of stability, which is synonymous with the minimum expenditure of energy,
allowed by the prevailing conditions. Köhler explained this tendency – based on the second law of
thermodynamics or the entropy principle – with an example from hydrostatics. When dipping wire
frames of different forms into a solution of water and soap, one can see that such physical sys-
tems tend toward end states characterized by the simplest and most regular form, a tendency that
Köhler called the tendency to the simplest shape or toward ‘the Prägnanz of the Gestalt,’ alluding
to the principle already enunciated but rather vaguely by Wertheimer at the meeting of the Society
for Experimental Psychology in 1914.
8 Wagemans
The German word ‘Prägnanz’ is derived from the verb ‘prägen,’ – to mint a coin. Hence, by describing the
2
principle of Prägnanz as the tendency towards the formation of Gestalten, which are as regular, simple, sym-
metric (‘ausgezeichnet’, according to Wertheimer’s term) as possible given the conditions, a connection is
made to the notion of ‘Gestalt’ as the characteristic shape of a person or object, or the likeness of a depiction to
the original (which was the colloquial German meaning before Goethe and von Ehrenfels assigned it its more
technical meaning as we know it today). For this reason, ‘Prägnanz’ has often been translated as ‘goodness.’
Historical and conceptual background 9
history, see Vezzani et al., 2012). Wertheimer, instead, maintained that they are determinative for
the perception of figures and for form perception in general. Wertheimer also recognized the pow-
erful effect of observers’ attitudes and mental set, but by this he understood primarily a tendency
to continue seeing the pattern initially seen, even under changed conditions. Nor did he deny the
influence of previous experience, such as habit or drill, but he insisted that these factors operate only
in interaction with the autonomous figurative forces at work in the immediate situation. Moreover,
Wertheimer did not exclude quantitative measurements from his program but he made it clear that
such measurements should be undertaken only in conjunction with detailed phenomenological
description to discover what ought to or meaningfully could be measured. In fact, Wertheimer had
not elaborated a finished theory, but had presented an open-ended research program. He converted
the culturally resonant term ‘Gestalt’ and the claim that the given is ‘gestaltet’ into a complex research
program to discover the principles of perceptual organization in both its static and dynamic aspects.
called a rotating light-shadow apparatus, yielding what is now known as the ‘kinetic depth effect’
(Wallach & O’Connell, 1953; see also Vezzani, Kramer, & Bressan, this volume). In-between
Ternus and Metzger, Karl Duncker (1929) altered both the research modus and the terms of
discourse about these issues in his research on what he called ‘induced motion.’ In this work, he
combined some remarks from Wertheimer’s 1912 paper about the role of the observer’s position
in motion terminology with terminology from relativity theory in physics (borrowing the term
‘egocentric frames of reference’ from Georg Elias Müller). More parametric follow-up studies
were carried out by Brown (1931a,b,c) and Hans Wallach (1935). For recent reviews of motion
perception in the Gestalt tradition, see Herzog & Öğmen, this volume; Bruno & Bertamini, this
volume).
In the meantime, Gestalt thinking also affected research on other sense modalities (e.g., bin-
aural hearing by Erich von Hornbostel), on learning and memory (e.g., Otto von Lauenstein
and Hedwig von Restorff, both working under Köhler in search for physiological trace fields),
and on thought (e.g., Karl Duncker’s work on stages in productive thinking, moving away from
Wertheimer’s work on re-centering and Köhler’s work on sudden insight). At first sight, Gestalt
theory seemed to develop, rather consistently, from studying the fundamental laws of psychol-
ogy first under the simplest conditions, in rather elementary problems of perception, and then
including more and more complex sets of conditions, turning to memory, thinking, and acting.
At the same time, however, the findings did not always fit the original theories, which consti-
tuted serious challenges to the Gestalt framework. This was even more true for applications of
Gestalt theory to action and emotion (by Kurt Lewin), to neuropathology and the organism
as a whole (by Adhemar Gelb and Kurt Goldstein), to film theory and aesthetics (by Rudolf
Arnheim).
In summary, the period from 1920 to 1933 marked the high point, but not the end of Gestalt
psychology’s theoretical development, its research productivity, and its impact on German science
and culture. At the same time, Gestalt theory had some impact on research in the USA, as well,
mainly owing to Kurt Koffka (e.g., the notion of vector field inspired some interesting empirical
work published in the American Journal of Psychology; see Brown & Voth, 1937; Orbison, 1939).
Reviews of Gestalt psychology appeared in Psychological Review on a regular basis (e.g., Helson,
1933; Hsiao, 1928), a comprehensive book on state-of-the-art Gestalt psychology was published
as early as 1935 (Hartmann, 1935), and three years later Ellis’s (1938) influential collection of
translated excerpts of core Gestalt readings made some of the original sources accessible to a
non-German-speaking audience. Already in 1922, at Robert Ogden’s invitation, he had published
a full account of the Gestalt view on perception in Psychological Bulletin. He emigrated to the USA
mainly for professional reasons, after accepting a job at Smith College in 1927, long before such a
step became politically necessary, as for many other Gestaltists.
between the Gestalt psychologists at German universities during this period, and the political
attitudes and acts of the Nazi regime (e.g., Mandler, 2002; Prinz, 1985; Wyatt & Teuber, 1944),
which clearly went beyond pragmatic survival behavior in some cases (e.g., Erich Jaensch’s empir-
ical anthropology). I will focus only on the scientific contributions and impact on Gestalt psy-
chology here. Compared with the flourishing previous period, the institutional conditions for
Gestalt-theoretic research in the Nazi period were considerably reduced, but it was possible to
continue at least some of the lines of work already begun.
After the appearance of a pioneering monograph, ‘Thing and Shadow,’ by Vienna psychologist
Ludwig Kardos in 1934, Gestalt researchers pursued the issue further, for instance, examining
spatial effects of brightness contrast or applying Duncker’s work on induced motion to bright-
ness perception. Perhaps the most interesting research in this period was Erich Goldmeier’s study
of judgment of similarity in perception, published in 1937. His starting point was the problem
originally raised by Harald Höffding and Ernst Mach in the 1890s. How do we know an object
or features is the same as one we have seen before; or, how do we recognise forms as the same
even when they are presented in different positions? In Goldmeier’s view, his results showed that
what is conserved in perceived similarity are the phenomenal function of the parts within the
perceived whole or the agreement of those qualities, which determine the phenomenal organiza-
tion of the field in question. He found that similarity of form properties was best preserved by
proportional enlargement, while it was best to keep their measure constant for the similarity of
material properties.
Around the same time, two major developments in Gestalt theory occurred that have generally
been ignored outside Germany. Edwin Rausch’s monograph on ‘summative’ and ‘nonsummative’
concepts (1937) and Wolfgang Metzger’s theoretical masterpiece, ‘Psychology.’
Edwin Rausch
Rausch’s aim was to develop a more systematic account of the concepts of part and whole, with
the aid of innovations in symbolic logic pioneered by Bertrand Russell, Rudolf Carnap, Giuseppe
Peano, and others. Despite some conceptual difficulties, Rausch’s work had an immediate impact
(although not outside Germany). In an analysis of the Gestalt concept published in 1938, the emi-
grated logical empiricist philosophers Kurt Grelling and Paul Oppenheim attempted, in explicit
agreement with Rausch, to clarify the notions of sum, aggregate, and complex, in a way that would
elucidate the actual content of von Ehrenfels’s and Köhler’s Gestalt concepts and differentiate
them from one another. Such analyses could have saved the Gestalt concept from the recurring
charge of vagueness, if they had not been ignored at the time. However, because they presupposed
an empiricist standpoint, Grelling and Oppenheim failed to engage the epistemological core of
Gestalt theory – Wertheimer’s claim that Gestalten are immanent in experience, not categories
imposed upon experience. For a thorough discussion, see Smith (1988).
Wolfgang Metzger
After Wertheimer’s dismissal, Wolfgang Metzger became de facto head of the Frankfurt Institute,
and he was able to maintain his major lines of research by taking a collaborative stance regarding
the Nazi regime. In 1936, Metzger published a synoptic account of research on the Gestalt theory
of perception entitled ‘Gesetze des Sehens’ (‘Laws of seeing’), since reissued and vastly expanded
three times, and translated in 2006.
Even more important from a theoretical perspective was Metzger’s (1941) book, ‘Psychology: The
development of its fundamental assumptions since the introduction of the experiment.’ The original
title was ‘Gestalt theory,’ but he changed it to make clear that his aim was to make Gestalt theory
12 Wagemans
the conceptual foundation of general psychology. To achieve this, he employed a strategy rather
different from that of Kurt Koffka’s major text of the same period, ‘Principles of Gestalt Psychology’
(1935), which he wrote in the USA. Koffka wrote mainly against positivism (materialism, vital-
ism, E. B. Titchener, and behaviorism), while Metzger wrote mainly against non-positivists who
opposed natural-scientific psychology, or those who criticized Gestalt theory for its alleged lack
of biological orientation. Koffka structured his textbook in a standard way, enunciating general
Gestalt principles and then applying them to standard topics, beginning with a detailed account
of visual perception, proceeding to a critical reworking of Lewin’s work on action and emotion,
incorporating research by Wertheimer, Duncker, and Köhler on thinking, learning, and memory,
and finally applying Gestalt principles to personality and society. Metzger, however, presented not
a conventional textbook, but an attempt to revise the theoretical presuppositions of modern psy-
chology. His hope was that this approach would put an end to the misunderstanding that Gestalt
theory was merely a psychophysical theory that seeks to explain the entire psychical realm at any
price by means of known physical laws. The assumption that he questioned was that real causes
of events must be sought only behind, not within phenomena. The strategy he employed was to
convert Gestalt principles into meta-theoretical concepts and depict them as names for intrinsic
natural orderings. His chapter headings were, therefore, not standard textbook topics, but rather
terms from Gestalt-type phenomenology of perception, such as qualities, contexts, relational sys-
tems, centering, order, and effects.
Of particular interest and originality was Metzger’s discussion of psychological frames of ref-
erence or relational systems. The presupposition under attack was that of psychological space
as a collection of empty, indifferent locations. Instead, he argued that all location in space and
time, as well as all phenomenal judgment, is based on relations in more extended psychological
regions. To explain why relatedness is ordinarily hidden from immediate experience and that
in ordinary life the absolute quality of things appears their most outstanding characteristic, he
recognized that Wertheimer’s application of the word Gestalt to both seen objects and the struc-
ture of the perceptual field as a whole required modification. Specifically, Metzger acknowledged
that the characteristic membership of regions in a relational system is correlative to but different
from the relation of parts to their whole. A true part is in a two-sided relation with its whole;
a part of a relational system is in a one-sided, open-ended relation with the system as a whole.
A thing in space, for example, leaves no gap on removal, but a piece of a puzzle does. With this
modification, Metzger could get a conceptual grip on the myriad tendencies he and his stu-
dents had to suppose to account for the results that could not be explained by simple analogies
to Wertheimer’s Gestalt laws. To cover these, he posited a principle of branched effects, which
stated that wherever the experienced field had more dimensions than the stimulus field, an infi-
nite variety of experiences can emerge from the same stimulus constellation, depending on the
structure of the environmental situation and the state of the perceiving organism. With this
principle, it became possible to portray processes considered psychological, such as attention
and attitudes, as relational systems, and thus bring them into purview of Gestalt theory. It also
implied the possibility of extending Gestalt theory from perception and cognition to personality
and the social realm.
Metzger’s book was an eloquent statement of Gestalt principles and their conceptual founda-
tions but it was problematic both as a summary of what Gestalt theory had achieved and as a
response to its critics. Unexperienced entities as Gestalt centres of gravity are not causes of what
we perceive, but part of a larger, self-organizing Gestalt context that included the given. In addi-
tion, the organism-environment nexus is a relational system, not a Gestalt. In this way, Metzger
had reached Gestalt theory’s conceptual limits for which he tried to compensate in part with
Historical and conceptual background 13
terminological concessions to Leipzig’s holistic psychology. Like that of Koffka from the same
period, Metzger’s book considerably expanded the conceptual range of Gestalt theory. Precisely
that elaboration gave Gestalt theory a new, more finished look – the look of a system – during the
1930s, which it had not had before. However, because it now lacked the necessary institutional
base in Germany (e.g., very few PhD students), the book did not have a major impact on the field
as a whole in this period. Hence, this was at the same time the culmination of Gestalt theory and
the start of its decline.
After World-War II
In the USA
After their emigration to the USA, the founding fathers of Gestalt psychology did not perform
much new experimental work. Instead, they mainly wrote books in which they outlined their
views (e.g., Koffka, 1935; Köhler, 1940; Wertheimer, 1945). The big exception was Köhler who had
taken up physiological psychology, using EEGs and other methods in an attempt to verify his iso-
morphism postulate directly. Initially, his results with Hans Wallach on so-called figural afteref-
fects appeared to support his interpretation in terms of satiation effects of direct cortical currents
(Köhler & Wallach, 1944). Afterwards, he was able to directly measure cortical currents – as EEG
responses picked up from electrodes at the scalp, which flow in directions corresponding to some
bright objects moving in the visual field (Köhler & Held, 1949).
However, soon after that breakthrough, Lashley and colleagues (Lashley et al., 1951) per-
formed a more critical test of Köhler’s electric field theory (and its underlying postulate of iso-
morphism). If the flows of current picked up from the scalp in Köhler and Held’s experiments
were supposed to reflect the organized pattern of perception and not merely the applied stimu-
lation, and if that pattern of perception would result from a global figure-field across the whole
cortex, a marked alteration of the currents should distort visual figures and make them unrec-
ognizable. By inserting metallic strips and metal pins in large regions of the visual cortex of rhe-
sus monkeys, Lashley et al. could short-circuit the cortical currents. Surprisingly, the monkeys
could still perform the learned shape discriminations, which demonstrated that global cortical
currents were not necessary for pattern perception. In subsequent experiments, Sperry and
colleagues (Sperry et al., 1955) performed extensive subpial slicing and dense impregnation
with metallic wires across the entire visual cortex of cats, and showed that these animals too
could still perform rather difficult shape discriminations (e.g., between a prototypical triangle
and several different ones with small distortions). Together, these two studies effectively ruled
out electrical field theory as an explanation of cortical integration and, therefore, removed the
empirical basis of isomorphism between cortical flows of current and organized patterns of
perception.
Of course, Köhler (1965) reacted to these experiments. Lashley’s experiments he rejected
because he thought that the inserted gold foils had probably depolarized at once, which would
have made them incapable of conducting, deflecting the cortical currents, and thus disturbing
pattern vision. Sperry’s results he found too good to be acceptable as reliable evidence. Based
on the many deep cuts in large parts of the visual cortex, the cats should have been partially
blind when they were tested, and yet they made very few mistakes on these difficult discrimi-
nation tasks. Because the learning was initially already so difficult (forcing reliance on local
details), the animals probably learned to react not only to visual cues associated with the pro-
totypical test figure (which was repeated over and over again), but to other, non-visual cues
(e.g., smell) as well. The necessary methodological precautions to rule out these alternative cues
14 Wagemans
(e.g., changing all objects from trial to trial) had not been taken. However, Köhler’s rather con-
vincing counter-arguments and suggestions for further experiments were largely ignored, and
for most scientists at the time (especially, for physiological psychologists), the matter was closed
and electrical field theory, which was one of the pillars of Gestalt psychology’s scientific basis,
was considered dead and buried.
In Germany
In Germany, Gestalt psychology did not make much progress anymore after World War II.
Under Metzger’s guidance, the Psychological Institute in Münster became the largest in Western
Germany in 1965. This had much to do with Metzger’s public defense of experimental psychology,
presenting Gestalt theory as a humanistic worldview, based on experimental science. Metzger also
worked steadily to develop links with American psychologists, but that involvement did not actu-
ally rehabilitate the Gestalt position because, in doing so, he conceded much to conventional views
of machine modelling as causal explanation. In contrast to Metzger’s broad range and willingness
to address non-academic audiences, Rausch devoted nearly all of his publications to extremely
exact phenomenological illumination and conceptual clarification of issues from Gestalt theory.
For instance, in a major essay on the problem of qualities or properties in perception (Rausch,
1966), he provided an exhaustive taxonomy of Gestalt qualities (in von Ehrenfels’s sense) and
whole qualities (in Wertheimer’s sense), and he argued that whether a given complex is a Gestalt
or not is not a yes-or-no decision, but a matter of gradations on a continuum. Gottschaldt focused
mainly on clinical psychology.
Elsewhere
While Gestalt psychology declined in the English-speaking world after World War II, Italy was
a stronghold of Gestalt psychology. For instance, Wolfgang Metzger, the most important and
orthodox Gestalt psychologist in Germany at the time, dedicated his ‘Gesetze des Sehens’ (3rd
edn, 1975) to the memory of his ‘Italian and Japanese friends.’ Among his friends were Musatti,
Metelli, and Kanizsa, three major figures in Italian psychology. In spite of being Benussi’s student
and successor (from the Graz school), Cesare Musatti was responsible for introducing the Berlin
school’s Gestalt theory in Italy and training important students in this tradition, most notably
Metelli and Kanizsa, whose contribution continues to be felt today (see Bertamini & Casati, this
volume; Vezzani, Kramer, & Bressan, this volume; Bruno & Bertamini, this volume; Gerbino,
this volume; Kogo & van Ee, this volume; van Lier & Gerbino, this volume). Fabio Metelli is best
known for his work on the perception of transparency (e.g., Metelli, 1974). Gaetano Kanizsa’s
most famous work was performed in the 1950s with papers on subjective contours, modes of
color appearance, and phenomenal transparency (Kanizsa, 1954, 1955a, b; all translated into
English in 1979).
In the edited volume, ‘Documents of Gestalt psychology’ (Henle, 1961), the most important col-
lection of Gestalt work from the 1940s and 1950s, no Italian work was included. Although it
was not recognized by the emigrated German psychologists in the USA, the work put forward
by the Italian Gestalt psychologists was in many respects very orthodox Gestalt psychology. For
instance, Kanizsa (1955b/1979) took the phenomenon of ‘subjective contours,’ already pointed
out by Friedrich Schumann (1900), and gave a Gestalt explanation of the effect in terms of the
tendency toward Prägnanz. He showed how the contour could affect the brightness of an area,
just as Berlin Gestaltists had shown that contour could affect the figural character of an area.
Kanizsa (1952) even published a polemic against stage theories of perception, in which he argued
that, since according to Gestalt principles perception was caused by simultaneous autonomous
Historical and conceptual background 15
processes, it was meaningless to hypothesize perceiving as a stage-like process. This work symbol-
ized his complete separation from Graz thinking. In fact, one could talk about this tradition as the
Padua–Trieste school of Gestalt psychology (see Verstegen, 2000).
Except for Italy, Gestalt psychology was also strong in Belgium and in Japan. Albert
Michotte became famous with his work on the perception of causality (1946/1963), in which
he could demonstrate that even a seemingly cognitive inference like causality could be linked
directly to specific higher-order attributes in the spatiotemporal events presented to observers.
This work was very much in the same spirit as work by Fritz Heider on perceived animacy
and attribution of intentions (Heider, 1944; Heider & Simmel, 1944), which was the empirical
basis for his later attribution theory (Heider, 1958). Together with his coworkers, Michotte
also introduced the notions of modal and amodal completion (Michotte et al., 1964), and
studied several configural influences on these processes (for a further discussion of Michotte’s
heritage, see Wagemans et al., 2006). Building on earlier collaborations of Japanese students
with major German Gestalt psychologists (e.g., Sakuma with Lewin, Morinaga with Metzger),
Gestalt psychology continued to develop further in Japan after World War II. For instance,
Tadasu Oyama did significant work on figural aftereffects (e.g., Sagara & Oyama, 1957) and
perceptual grouping (e.g., Oyama, 1961). The Gestalt tradition is still continued in Japanese
perceptual psychology today (e.g., Noguchi et al., 2008), especially in their work on visual illu-
sions (e.g., Akiyoshi Kitaoka).
meaning. For Gestalt theory, in contrast, language expresses meaning that is already there in
the appearance or in the world (e.g., Pinna, 2010). Orthodox Gestalt theorists also refrained
from applying Gestalt thinking to personality and social psychology, fearing a lack of rigor. The
preferred route to such extensions was analogy or metaphor, and the further the metaphors
were stretched, the harder it became to connect them with Köhler’s concept of brain action. As
the work of Rudolf Arnheim on expression and art, and of Kurt Lewin on action and emotion
showed, extensions of the Gestalt approach were possible so long as one separated them from
Köhler’s psychophysics. Further extensions in that direction were largely an American phenom-
enon (e.g., Solomon Asch).
Ultimately decisive in the further decline of Gestalt theory was a meta-theoretical impasse
between its theoretical and research styles and those of the rest of psychology. Gestalt theory was
and remains interesting because it was a revolt against mechanistic explanations in science, as well
as against the non-scientific flavor of holism. Especially after 1950, its critics increasingly insisted
on causal explanations, by which they meant positing cognitive operations in the mind or neural
mechanisms in the brain. As sophisticated as the Gestalt theorists were in their appreciation of
the way order emerges from the flow of experience, one must ask how such a process philosophy
can be reconciled with strict causal determination, as Köhler at least wished to do. Koffka tried to
accomplish this feat by insisting that the very principles of simplicity and order that the Gestalt
theorists claimed to find in experience should also be criteria for evaluating both descriptions
and explanations. For him, the best argument for isomorphism was his desire for one universe of
discourse. Koffka and his co-workers never succeeded in convincing their colleagues that it was
logically necessary or scientifically fruitful to think that the external world, it’s phenomenal coun-
terpart, and the brain events mediating interactions between them, all have the same structure or
function, according to the same dynamical principles.
James J. Gibson (1971) has written that the question Koffka asked in his ‘Principles of Gestalt
Psychology’ – ‘Why do things look as they do?’ – has fundamentally reshaped research on percep-
tion. In the last two decades, central issues of Berlin school research, such as perceptual grouping
and figure-ground organization, have returned to centre stage (e.g., Kimchi et al., 2003; see also
Wagemans et al., 2012a, for a recent review), although concepts of top-down processing offered
to deal with the question have at best a questionable relationship to Gestalt theory. The status of
Wertheimer’s Gestalt laws and particularly of the so-called minimum principle of Prägnanz he
enunciated remains contested, which is another way of saying that the issues involved are still
important (e.g., Hatfield & Epstein, 1985; see also Wagemans et al., 2012b; van der Helm, this vol-
ume). Although it may be true that the Gestalt theorists failed to develop a complete and accept-
able theory to account for the important phenomena they adduced, it is also true that no one else
has either. The challenges for contemporary vision scientists are still significant.
Acknowledgments
I am supported by long-term structural funding from the Flemish Government (METH/08/02).
References
Albertazzi, L. (2001). The legacy of the Graz psychologists. In The School of Alexius Meinong, edited by
L. Albertazzi, D. Jacquette, & R. Poli, pp. 321–345. Farnham: Ashgate Publishing Ltd.
Ash, M. G. (1995). Gestalt Psychology in German Culture, 1890–1967: Holism and the Quest for Objectivity.
Cambridge, MA: Cambridge University Press.
Historical and conceptual background 17
Michotte, A. (1963). The Perception of Causality, translated by T. R. Miles & E. Miles. New York: Basic
Books. (Original work published 1946.)
Michotte, A., Thinès, G., & Crabbé, G. (1964). Les compléments amodaux des structures perceptives [Amodal
Completion of Perceptual Structures]. Leuven: Publications Universitaires de Louvain.
Müller, G. E. (1904). Die Gesichtspunkte und die Tatsachen der psychophysischen Methodik [Viewpoints
and the facts of psychophysical methodology]. In Ergebnisse der Physiologie, Vol. II, Jahrgang, II,
Abteilung Biophysik und Psychophysik, edited by L. Asher & K. Spiro, pp. 267–516. Wiesbaden:
J. F. Bergmann.
Noguchi, K., Kitaoka, A., and Takashima, M. (2008) Gestalt-oriented perceptual research in Japan: past
and present. Gestalt Theory, 30, 11–28.
Orbison, W. D. (1939). Shape as a function of the vector-field. Am J Psychol 52, 31–45.
Oyama, T. (1961). Perceptual grouping as a function of proximity. Percept Motor Skills 13, 305–306.
Pinna, B. (2010). New Gestalt principles of perceptual organization: an extension from grouping to shape
and meaning. Gestalt Theory 32, 11–78.
Prinz, W. (1985). Ganzheits- und Gestaltpsychologie und Nationalsozialismus [Holistic and Gestalt
psychology and National Socialism]. In Wissenschaft im Dritten Reich [Science in the Third Reich],
edited by P. Lundgreen, pp. 55–81. Frankfurt: Suhrkamp.
Rausch, E. (1937). Über Summativität und Nichtsummativität [On summativity and nonsummativity].
Psychol Forsch 21, 209–289.
Rausch, E. (1966). Das Eigenschaftsproblem in der Gestalttheorie der Wahrnehmung. [The problem of
properties in the Gestalt theory of perception]. In Handbuch der Psychologie: Vol. 1: Wahrnehmung
und Bewusstsein [Handbook of psychology: Vol. 1 Perception and consciousness] edited by W. Metzger &
H. Erke, pp. 866–953. Göttingen, Germany: Hogrefe.
Rubin, E. (1915). Synsoplevede Figurer. Studier i psykologisk Analyse /Visuell wahrgenommene Figuren.
Studien in psychologischer Analyse [Visually perceived figures. Studies in psychological analysis].
Copenhagen, Denmark/Berlin, Germany: Gyldendalske Boghandel.
Sagara, M., & Oyama, T. (1957). Experimental studies on figural after-effects in Japan. Psychol Bull 54,
327–338.
Schumann, F. (1900). Beiträge zur Analyse der Gesichtswahrnehmungen. I. Einige Beobachtungen über
die Zusammenfassung von Gesichtseindrücken zu Einheiten [Contributions to the analysis of visual
perception. I. Some observations on the combination of visual impressions into units]. Zeitschr Psychol
Physiol Sinnesorgane 23, 1–32.
Sekuler, R. (1996). Motion perception: a modern view of Wertheimer’s 1912 monograph. Perception 25,
1243–1258.
Smith, B. (1988). Foundations of Gestalt Theory. Munich: Philosophia Verlag.
Sperry, R. W., Miner, N., & Myers, R. E. (1955). Visual pattern perception following subpial slicing and
tantalum wire implantations in the visual cortex. J Comp Physiol Psychol 48, 50–58.
Steinman, R. M., Pizlo, Z., & Pizlo, F. J. (2000). Phi is not beta, and why Wertheimer’s discovery launched
the Gestalt revolution. Vision Res 40, 2257–2264.
Ternus, J. (1926). Experimentelle Untersuchungen über phänomenale Identität. Psychol Forsch 7, 81–136.
[Translated extract reprinted as ‘The problem of phenomenal identity’. In A Source Book of Gestalt
Psychology, edited by W. D. Ellis (1938), pp. 149–160. London: Routledge & Kegan Paul Ltd.]
Verstegen, I. (2000). Gestalt psychology in Italy. J Hist Behav Sci 36, 31–42.
Vezzani, S., Marino, B. F. M., & Giora, E. (2012). An early history of the Gestalt factors of organization.
Perception 41, 148–167.
von Ehrenfels, C. (1890). Über ‘Gestaltqualitäten’. Vierteljahrsschr wissenschaftl Philosoph 14, 224–292.
[Translated as ‘On ‘Gestalt qualities’. In Foundations of Gestalt theory, edited and translated by B. Smith
(1988), pp. 82–117. Munich, Germany/Vienna, Austria: Philosophia Verlag.]
20 Wagemans
Wagemans, J., Elder, J. H., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M., & von der Heydt, R.
(2012a). A century of Gestalt psychology in visual perception: I. Perceptual grouping and figure-ground
organization. Psychol Bull 138(6), 1172–1217.
Wagemans, J., Feldman, J., Gepshtein, S., Kimchi, R., Pomerantz, J. R., van der Helm, P., & van
Leeuwen, C. (2012b). A century of Gestalt psychology in visual perception: II. Conceptual and
theoretical foundations. Psychol Bull 138(6), 1218–1252.
Wagemans, J., van Lier, R., & Scholl, B. J. (Eds.). (2006). Introduction to Michotte’s heritage in perception
and cognition research. Acta Psychol 123, 1–19.
Wallach, H. (1935). Über visuell wahrgenommene Bewegungsrichtung [On visually perceived direction of
motion]. Psychol Forsch 20(1), 325–380.
Wallach, H., & O’Connell, D. N. (1953). The kinetic depth effect. J Exp Psychol 45(4), 205–217.
Wertheimer, M. (1912). Experimentelle Studien über das Sehen von Bewegung. Zeitschr Psychol 61,
161–265. [Translated as ‘Experimental studies on seeing motion’. In On Motion and Figure-ground
Organization edited by L. Spillmann (2012), pp. 1–91. Cambridge, MA: M.I.T. Press.]
Wertheimer, M. (1922). Untersuchungen zur Lehre von der Gestalt, I: Prinzipielle Bemerkungen. Psychol
Forsch 1, 47–58. [Translated extract reprinted as ‘The general theoretical situation,’ in A Source Book of
Gestalt Psychology, edited by W. D. Ellis (1938), pp. 12–16. London: Routledge & Kegan Paul Ltd.]
Wertheimer, M. (1923). Untersuchungen zur Lehre von der Gestalt, II. Psychol Forsch 4, 301–350.
[Translated as ‘Investigations on Gestalt principles, II,’ in On Motion and Figure-ground Organization
edited by L. Spillmann (2012), pp. 127–182. Cambridge, MA: M.I.T. Press.]
Wertheimer, M. (1945). Productive Thinking. New York: Harper & Brothers Publishers.
Wulf, F. (1922). Beiträge zur Psychologie der Gestalt; VI Über die Veränderung von Vorstellungen
(Gedächtnis und Gestalt). Psychol Forsch 1, 333–373. [Translated extract reprinted as ‘Tendencies in
figural variation’. In A Source Book of Gestalt Psychology, edited by W. D. Ellis (1938), pp. 136–148.
London: Routledge & Kegan Paul Ltd.).
Wyatt, F., & Teuber, H. L. (1944). German psychology under the Nazi system: 1933–1940. Psychol Rev 51,
229–247.
Chapter 2
Philosophical background:
Phenomenology
Liliana Albertazzi
Presentations
In PES Brentano defines the nature of the psychic phenomena (Vorstellungen) as acts (i.e. pro-
cesses) of psychic energy (a sort of Jamesian flow of awareness hence James’s esteem for Brentano
as expressed in James, 1890/1950, I, p. 547). Presentations may originate either in perception (as
seeing, noticing, observing, etc.), or in the phantasy, generally understood in terms of the capacity
to present or to visualize (when thinking, remembering, imagining, etc.).
Presentations usually do not exist on their own but in the context of other intentional modali-
ties like judgements and phenomena of interest, founded on presentations themselves. Whatever
their occurrence, and however complex simultaneously occurring psychic phenomena may be,
conscious experience is always unitary, because the acts are unitarily directed to the same object
(say, a landscape) and because individually they are partial phenomena (non-detachable parts)
of a single whole, i.e. of actual presenting. Brentano’s theory, in fact, is not ‘a summative bundle’
(Hume, 1739/2007) where perceptions arise in parcelled pieces or sensations, to be later associated
with each other according to traces of earlier perceptions, memory, etc. (Wertheimer, 1925b/1938,
p.12). A bundle, as Brentano observes, ‘strictly speaking requires a rope or wire or something else
binding it together’; on the other hand consciousness consists of a multitude of internally related
parts (Brentano, 1995b, p. 13–14).
As to perceiving, in Brentanian terms it consists neither in the symbolic or probabilistic rep-
resentation of an objective external physical reality, as for example assumed by the inferential
approach (Marr, 1982; Rock, 1983), nor in a direct or indirect resonance of such a reality due to
action, as for example assumed in the Gibsonian (Gibson, 1979) and enactive approaches (Noë,
2004) to perception. The ecological approach to vision still plays an important role in current
studies of perception (Koenderink, 1990; Lappin et al., 2011; Mace, 1977; Todd, 2004; Warren,
2005, 2006), and it is certainly closer to a Brentanian viewpoint than inferentialism; however, in
the Brentanian stance, one perceives qualitative wholes, not physical entities or physical invari-
ants. As to inferentialism, in the Brentanian framework this plays a role only insofar as the nature
of the transcendent world is concerned: in fact, appearances, the sole objects of our experience,
have only an extrinsic relationship with entities and unknown processes (PES, p. 129). Contrary
to inferentialism, however, a descriptive approach does not need to verify/justify the veridicality
or illusoriness of appearances with respect to the stimuli, because appearances are experienced
as evidently given in actual perceiving: at issue is the coherence of the structure, not the so-called
veridicality of the objects (Brentano, 1874/1995a).
Brentano identifies the essential characteristic of intentional presentation in its being directed
towards an inner object of some kind. As he writes in a celebrated but dense passage:
Every psychic phenomenon is characterized by what the medieval scholastics termed the intentional
(i.e. mental) in/existence of an object and which I shall call, albeit using expressions not devoid of
ambiguity, reference to a content, directedness towards an object (Objectum) (which should not be
taken to be real), or immanent objectivity. Every psychic phenomenon contains something in itself as
an object (Gegenstand), although each of them does not do so in the same way. In presentation some-
thing is presented, in judgement something is accepted or rejected, in love something is loved, in hate
hated, in desire desired, etc.
(PES, p. 88).
Brentano was clearly aware from the outset of an intrinsic ambiguity in this formulation,
which was exacerbated by the medieval implications of the term intentional, whether or
not it implied an act of will related to a goal, i.e., an ‘intention’ as generally understood in
Philosophical background 23
Experimental phenomenology
In Brentano’s approach the world is built from within, but not in a neurophysiological sense.
Neurophysiological aspects are not relevant to this kind of inquiry, which concerns itself only with
the modes of appearance of perceptive objects (on the relation between phenomenology of appear-
ances and neuroscience see Spillmann and Ehrenstein, 2004; Spillmann, 2009). What Brentano
affirms is that the world of experience is reducible neither to external nor to internal physiological
psychophysics (Wackermann, 2010): it is a primary, conscious, evident, qualitative level made up
of perception of colours, shapes, landscapes, movements, cats, and so on. This also means that
information is qualitative, immediately given, and endowed with meaning, not a product of the
computational retrieval and elaboration of stimuli. These are also the main tenets of an experi-
mental phenomenology focused on qualitative perceiving and its laws.
As Kanizsa put it:
The goal pursued by experimental phenomenology does not differ from that of other sectors of psy-
chology: discovery and analysis of necessary functional connections among visual phenomena, identi-
fication of the conditions that help or hinder their appearance or the degree of their evidence, in other
words: determination of the laws which the phenomenological field obeys. And this without leaving the
phenomenal domain; without, that is, referring to the underlying neurophysical processes (to a large
extent unknown) or to the concomitant non-visual psychological activities (logical, mnestic, affective
activities which are just as enigmatic as vision itself). The influence of such processes and activities cer-
tainly cannot be denied, but they must not be identified with seeing . . . The experimental phenomenol-
ogy of vision is not concerned with the brain but with that result of the brain’s activity that is seeing.
This is not a second-best choice justified by the slowness of progress in neurophysiological research and
its uncertain prospects, it is a methodological option taken for specific epistemological reasons. And
mainly the conviction that the phenomenal reality cannot be addressed and even much less explained
with a neuro-reductive approach because it is a level of reality which has its own specificity, which
24 Albertazzi
requires and legitimates a type of analysis suited to its specificity. The knowledge obtained in this way
is to be considered just as scientific as the knowledge obtained in any other domain of reality with
methods commensurate to that domain.
(Kanizsa, 1991, pp. 43–44; emphasis added).
In other words, phenomenological description comes first and it is also able to explain the laws
of seeing as the conditions governing appearances in visual space. The point has also been stressed
by Metzger when describing the task and method of an experimental phenomenology:
. . . we have proceeded exclusively and without any glance into physics, chemistry, anatomy, and physi-
ology, from within, from the immediate percept, and without even thinking of rejecting any aspect of
our findings or even changing its place, just because it does not fit with our contemporary knowledge of
nature so far. With our perceptual theory we do not bow to physiology, but rather we present challenges
to it. Whether physiology will be able to address these challenges, whether on its course, by external
observation of the body and its organs, it will be able to penetrate into the laws of perception, is point-
less to argue about in advance.
(Metzger, 1936/2006, p. 197).
A phenomenological approach to perception obviously does not deny the existence of stimuli,
but it treats them as external triggers and considers them extraneous to the phenomenological
level of analysis. Nor does it deny the psychophysical correlation between the stimulus and the
behavioural response, nor its measurement. In short, it does not deny classical psychophysics but
distinguishes among what pertains to psychophysics, what pertains to brain analysis, and what
pertains to a qualitative analysis of phenomena.
The Gestaltists adopted several features of the phenomenological method outlined by Brentano,
such as the description of appearance of the phenomena (Koffka, 1935, Part III). Katz, for exam-
ple, in his eidetic (Gestalt) analysis of colour, furnished an exemplary description of what is a
phenomenological variation (Husserl, 1913/1989, section 137) by showing that a particular
appearance of red is nothing but an instance of a certain shade of red in general (as pure colour)
and that there is a phenomenal difference between surface colours and film or volumetric colours
(Katz, 1935, Part I). Hering provided a psychological grounding for this method of analysis in
the first two chapters of his Outlines of a Theory of the Light Sense (Hering, 1920/1964), which led
to recovery of the laws of opponence among the unique colours, which were subsequently con-
firmed at neurophysiological level (Hurvich and Jameson, 1955). Although further research has
cast doubt on some of the results obtained by neuroscientific investigation (Valberg, 1971, 2001),
it has not changed in the slightest the validity of Hering’s analysis at the phenomenological level,
nor of Brentano’s proposed methodology.
nor a physical kind but a whole made up of merely qualitative, internally-related appearances,
and what constitutes its phenomenal permanence in the flow of our awareness, are questions to
be explained. In fact, they were later addressed by, among others, Husserl (1966a/1991), Benussi
(1913), and Michotte (1950/1991).
It should also be noted that appearances in presentations may have stronger or weaker degrees
of intentional existence like that of a presented, remembered, or dreamed cat (Albertazzi, 2010).
For example, Metzger (1941/1963, Chapter 1) would later distinguish between an occurring event
(presented reality) and the same event represented (represented reality).
Consider a play, which takes place during a certain period of physical time, and is watched
‘live’ with a subjective experiencing that varies in relation to the spectator’s attention, interest,
and emotional involvement. Then consider the representation of the event in static photographic
images or as reported in a newspaper. Mainstream science represents events in a quantitatively
parametrized mode, but it involves structural changes in the lived experience.
A second difference within the level of phenomenal reality is given by the present reality in
its fullness, and by the reality that is equally given but present in the form of a lack, a void, or
an absence. Examples of this difference are almost structural at presentative level because of the
organization of appearances into figure/ground, so that in the visual field there is always a ‘double
presentation’ (Rubin, 1958). Other striking examples are provided by the phenomena of occlu-
sions, film colour, or the determinateness versus indeterminateness of colours, or the volume of a
half-full and half-empty glass.
A further difference within the phenomenal level of reality is that between forms of reality
that present themselves as phenomenally real and forms that present themselves as phenom-
enally apparent. In the latter case, they have a lower degree of phenomenal reality. Examples
are mirror images, after-images, and eidetic images, and hallucinations, delusions, illusions,
etc. A phenomenological conception is not a disjunctivist conception, as has sometimes been
argued (see for example Smith, 2008; for a review of the varieties of disjunctivism see: http://
plato.stanford.edu/entries/perception-disjunctive/). In fact, what is seen is only a difference in
the degree of reality among veridical, deceptive, and hallucinatory perceptions. This is because
the reality of an appearance is not classifiable in terms of its possible veridicality upon the
stimulus. As said, for Brentano a ‘physical phenomenon’ is the object of a presentation or an
appearance. A complex and paradigmatic example of this difference is provided by amodal
shadows, like those produced on the basis of anomalous contours in an unfolding stereokinetic
truncated cone (Albertazzi, 2004).
Perceptual appearances may also have different modalities of existence. One thinks of the amodal
triangle (Kanizsa), of the impossible triangle (Penrose), of the length of lines in the Müller-Lyer
illusion (1889), or of the size of the circles in the Ebbinghaus illusion (1902), or more simply of the
already mentioned diverse modes of appearance of colour (Katz, 1935), including their valence
characteristics in harmony, which is still a controversial topic (Allen and Guilford, 1936; Allen
and Guilford,1936; Da Pos, 1995; Geissler, 1917; Granger, 1955;; Guilford and Smith, 1959; Major,
1895; von Allesch, 1925a, b).
Distinguishing and classifying the multifarious variety of immanent object/s and content/s also
in regard to the different kinds of psychic processes (ranging among presentations, judgements,
emotional presentations, and assumptions) was the specific goal of both Twardowsky (1894/1977)
and Meinong (1910), while the subjective space-time nature and internal dependence of act,
object, and content were the specific concern of Husserl’s, Meinong’s, and Benussi’s research, as
well as the phenomenological-experimental approach to the study of consciousness.
26 Albertazzi
Brentano distinguished very clearly between psychic and physical phenomena. He wrote,
Examples of physical phenomena, on the other hand, are a colour, a figure, a landscape which I see, a
chord which I hear, warmth, cold, odour which I sense; as well as similar images which appear in the
imagination.
(Brentano, 1874/1995a, pp. 79–80).
Although his theory underwent subsequent developments, Brentano always maintained his
assumption that ‘psychic phenomena’ like a seeing, a feeling, a hearing, an imagining, and so on,
constitute what effectively exists in the strong sense (Brentano, 1982, p. 21). They are mental pro-
cesses, in fact, expressed in verbal form.
Psychic phenomena are essentially distinct from ‘physical phenomena’, which for Brentano
are immanent and intentional objects of the presentations themselves, i.e. appearances, and are
expressed in nominal form (Brentano, 1874/1995a, pp. 78–79). Essentially, physical phenom-
ena are composed of two non-detachable parts, i.e. phenomenal place and quality (Brentano,
1874/1995a, pp. 79–80; 1907/1979, p. 167; 1982, pp. 89, 159 ff.). For example, if two blue spots,
a grey spot, and a yellow one appear in the visual field, they differ as to colour and place; each
of the blue spots, in its turn, is different from the yellow and the grey one. But they are also dif-
ferent from each other because of a difference in place; colour and place, in fact, being two (dis-
tinctional) parts of the same visual phenomenon (Brentano, 1995b, p. 17 ff; Albertazzi, 2006a,
Chapter 4).
The point is important, because readers of whatever provenance easily misunderstand what
Brentano conceives to be physical phenomena, as distinguished from psychic phenomena, mostly
because of the equivocalness of the term ‘physical’. Given that the objects of a presentation are
wholly internal to the mental process, it is not surprising, in this framework, that a seen colour, a
heard sound, an imagined cat, a loved poem, etc. are conceived as the only ‘physical phenomena’
of our subjective experience. Brentano’s ‘sublunar Aristotelian physics’ is a physics of man, or an
observer-dependent physics (Koenderink, 2010). One might think that avoiding equivocalness
and, for example, speaking in terms of processes and appearances would be more fruitful for
understanding Brentano’s theory. However, one notes that a similar radical position was later
assumed by Hering when he addressed the nature of the visual world. In defining the nature of
objects in a visual presentation, Hering declares:
Philosophical background 27
Colors are the substance of the seen object. When we open our eyes in an illuminated room, we see
a manifold of spatially extended forms that are differentiated or separated from one another through
differences in their colors . . . Colors are what fill in the outlines of these forms, they are the stuff out of
which visual phenomena are built up; our visual world consists solely of different formed colors; and
objects, from the point of view of seeing them, that is, seen objects, are nothing other than colors of dif-
ferent kinds and forms.
(Hering, 1920/1964, Chapter 1, p. 1; emphasis added).
Nothing could be more Brentanian than Hering’s account of vision, both from a psychological
and an ontological viewpoint. Interlocked perceptual appearances like colour, shape, and space,
in the Brentanian/Heringian framework, are in fact the initial direct information presented to us
in awareness (Albertazzi et al., 2013). They are not the primary properties of what are commonly
understood as physical entities, even though they are correlated with stimuli defined on the basis
of physics. Appearances in visual awareness are not simply representations of ‘external’ stimuli;
rather, they are internal presentations of active perceptual constructs, co-dependent on, but quali-
tatively unattainable through, a mere transformation of stimuli (see Mausfeld, 2010). For example,
the intentional object ‘horse’ is not the ‘represented horse’, but the inner object of who has it in
mind (Brentano, 1966/1979, pp. 119–121). The references of the phenomenal domain are not
located in the transcendent world but are the subjective, qualitative appearances produced by the
process of perceiving. Consequently, phenomena of occlusion, transparency, so-called illusions,
trompe l’oeil, and so on, because they are almost independent from external stimuli, are entirely
ordinary perceptive phenomena; they are not odd, deceptive perceptions as has been maintained
(Gregory, 1986). In fact, appearances are prior from the point of view of experiences to any con-
struction of physical theories: consider, for example, a visual point in which one can distinguish
between a where (the place in the field where the point appears) and a what (its ‘pointness’), some-
thing very dissimilar from the abstraction of a Euclidean point. We perceive the world and we do
so with evidence (the Brentanian concept of internal perception, innere Wahrnehmung) before
making of it an object of successive observations and scientific abstractions.
words, phenomenology ‘is prior in the natural order’ (Brentano, 1995b, p. 8, p. 13), and provides
guidance for correlated neurophysiological and psychophysical researches, but it also explains the
nature of appearances themselves, i.e. the conditions of their appearing.
This is why a science of phenomena must be strictly and formally constructed on the basis
of subjective judgements in first person account. Experimental-phenomenological science must
then identify the specific units of representations and the specific metrics with which to measure
them and construct a generalized model of appearances (Kubovy and Wagemans, 1995). In his
criticism of Fechner (1860/1966), Brentano maintained that explanation is required not only of
the classical psychophysical just noticeable differences (jnd), but also of ‘just perceivable differ-
ences’ (jpd), i.e. magnitudes of a qualitative nature that constitute the perception of difference,
like the ‘pointness’, ‘squareness’, ‘acuteness’, or ‘remoteness’ of an appearance in presentation. Here
evaluation is made of the phenomenic magnitude of a subjective, anisotropic, non-Euclidean,
dynamic space (Koenderink et al., 2010; Albertazzi, 2012a). The nature of such units (for exam-
ple, temporal momentum), depending on the conditions and the context of their appearances,
requires a non-linear metrics for their measurement. Contemporary science has not yet devel-
oped a geometry of visual awareness in terms of seeing, although this is a necessary preliminary
step in order to be able to address the question in proper terms, but there are some proposals
more or less organized into theories (Koenderink, 2002, 2010, 2013; Koenderink and van Doorn,
2006). This radical standpoint obviously raises numerous issues as to the proper science of psy-
chology, its feasibility, its laws of explanations, its correlation with the sciences of psychophysics
and neurophysiology, its methods, and its measurement of psychic processes and their appear-
ances. Last but not least, how the construction and the final identity of the object of a presenta-
tion develops in the flow is something that cannot be explained until we have a general theory of
subjective time-space, and of the inner relations of dependence among the parts of the contents
of our awareness in their flowing.
One only need look at Brentano’s analysis of the intensity of colour perception, for example, to
understand how distant from classical psychophysics his approach is (On Individuation, Multiple
Quality and the Intensity of Sensible Appearances, Brentano, 1907/1979, Chapter 1, pp. 66–89);
or at what should be framed as a geometry of the subjective space-time continuum, presented in
the Lectures on Space, Time and the Continuum (see the contributions in Albertazzi, 2002a), to
be aware of what could be the foundations of a science of subjective experiencing or, strictly in
Brentano’s terms, a science of psychic phenomena. These pioneering studies are at the roots of a
theory of consciousness as a whole.
Perceptual Grouping
Wholes and parts
The theory of wholes and parts is a cornerstone of Gestalt psychology (Brentano, 1982). However,
closer inspection of the subject shows how complex the question may be, how many different
aspects of our awareness it may concern, and at the same time the still enormous potential that it
has for the study of perceptual organization and of awareness in current science. Gestalt mereol-
ogy, in fact, concerns different aspects of perceiving, and intrinsically correlated topics like the
continuity, variance, and isomorphism of the inner relations of the parts of a perceptual whole,
this being a process of a very brief duration.
Mostly unknown in psychological studies, however, is that it was Twardowsky’s book
(1894/1977) on the object (i.e. phenomenon or appearance) and content of a presentation, and
Philosophical background 29
his distinction between the different types of parts in a whole, which prompted several strik-
ing developments in mereology among the Brentanians. It was the starting point for Husserl’s
mereology (1900–01/1970, Third Logical Investigation), Stumpf ’s analyses of the process of
fusion (Verschmelzung) between the parts of an acoustic whole (Stumpf, 1883), and Meinong’s
works on relations (Meinong, 1877, 1882) and on higher order mental objects like Gestalt wholes
(Meinong, 1899). Fusion is today studied in light of the concept of ‘unitization’ (Goldstone, 1998;
Czerwinski et al., 1992; Welham and Wills, 2011) but is generally seen as the product of percep-
tual learning.
All the above-mentioned developments were painstaking analyses that distinguished the
many ways in which something is part of a whole, and how a whole is made up of parts,
as well as the hierarchy of acts, objects, and parts of contents in a presentation. Most nota-
bly, Stumpf ’s analysis of tonal fusion was based on similarity of sounds, in contrast with
Helmholtz’s neurophysiological explanation, which was framed within a quantitative summa-
tive theory (Zanarini, 2001). Wertheimer, Koffka, and Köhler, all Stumpf ’s pupils, inherited
also his concept of the colour of a musical interval and the Gestalt concept of vocality. The
concept of fusion was then taken up by Husserl (1891/2003, § 29) when he considered mental
aggregates and manifolds. Husserl’s Logical Investigations (Husserl, 1900–01/1970), in fact, are
dedicated to Carl Stumpf.
Over the years, the analyses concentrated mainly on the nature of the already-organized percept
and its laws of organization in the so-called Berlin style (Koffka, 1935; Metzger, 1934, 1936/2006,
1941/1963), giving rise to what today is generally conceived as the Gestalt approach to percep-
tion. Less developed was the analysis of the process itself, in the so-called ‘Graz style’, i.e. how the
percept unfolds from within, in presentation. Wertheimer himself, however, in clarifying the role
and the goal of Gestalt theory, wrote:
There are wholes, the behaviour of which is not determined by that of their individual elements, but
where the part-processes are themselves determined by the intrinsic nature of the whole. It is the hope
of Gestalt theory to determine the nature of such wholes.
(Wertheimer, 1925a/1938, p. 2).
Emphasizing that the concept of Gestalt had nothing to do with ‘sums of aggregated contents
erected subjectively upon primary given pieces’, or ‘qualities as piecemeal elements’, or ‘some-
thing formal added to already given material’, expressed by kindred concepts, Wertheimer defined
these types of wholes as ‘wholes and whole processes’ possessed of specific inner intrinsic laws
(Wertheimer, 1925a/1938, p. 14; Albertazzi, 2006b), whose ‘pieces’ almost always appear as
30 Albertazzi
non-detachable ‘parts’ in the whole process: that is, they are not detachable from them. Finally,
he stated:
The processes of whole-phenomena are not blind, arbitrary, and devoid of meaning . . . To comprehend
an inner coherence is meaningful; it is meaningful to sense an inner necessity.
(Wertheimer1925a/1938, p. 16).
In short, according to Wertheimer, Gestalt wholes are made up of non-independent parts; they
are presented as phenomenal appearances with different degrees of reality; and they are intrinsi-
cally meaningful, which signifies that they do not have to refer to transcendent entities for their
truth, validity, and consistency. From where do these statements derive? And, can we say that over
the years Wertheimer’s theory, with all its richness, has received adequate explanation?
One may distinguish between two main approaches in the analysis of whole and parts: a line
of inquiry that can be broadly ascribed to Stumpf, Husserl, Wertheimer, Koffka, and Köhler, and
a line of inquiry broadly ascribable to Ehrenfels, Meinong, and Benussi, although matters are not
so clear-cut. Kenkel (1913), Lindemann (1922), Hartmann (1932), and Kopferman (1930), for
example, worked on the dynamic aspects of the apprehension of Gestalten; while the positions
taken up by Meinong, Benussi, Höfler, Witasek (1899), and Ameseder (1904) exhibit features in
common with what was the main concern of the Leipzig school of Ganzheitspsychologie (Sander,
1930; Klages, 1933; Krueger, 1953; Wellek, 1954; Ehrenstein, 1965). In fact, there is a time of the
development of phenomena (what the Leipzigers called ‘actual genesis’) that inheres in the onset
of a form at a certain temporal point of consciousness. From this point of view, the individual
Gestalten are sub-wholes of a larger whole, that is, the entire content of consciousness (see also
Husserl’s theory of double intentionality in Husserl, 1966a/1991).
Briefly, the Berliners focused mainly on appearances and their laws of organization in percep-
tual fields and their physiological correlates, while the Grazers were mainly interested in the con-
struction and the deployment of appearances in the subjective duration. Both approaches were
essentially concerned with the question of relations of a specific kind: the figural qualities, and
how they appear in perceiving. The solutions, however, were different.
Gestalt qualities
The term ‘Gestalt qualities’ was initially proposed by von Ehrenfels (1890/1988), Meinong (1891),
Cornelius (1897), and Mach (1886). Specifically, Mach observed that we are able to have an imme-
diate sensation of spatial figures, and of tonal ones like melodies. As is well known, the same
melody can be played in F, G, and so forth, as long as all the relationships of tempo and the tonal
intervals among the notes are respected; even if we replace all of the melody’s sounds, the melody
is still recognizable as the same melody.
Ehrenfels (1890/1988) wrote:
By Gestalt quality we mean a positive content of presentation bound up in consciousness with the
presence of complexes of mutually separable (i.e. independently presentable) elements. That complex
of presentations which is necessary for the existence of a given Gestalt quality we call the foundation
[Grundlage] of that quality.
(Ehrenfels, 1890/1988, § 4).
The most interesting and generally unknown development of the Brentano mereological the-
ory, however, was due to Benussi (Benussi, 1904, 1909, 1922–23). What Benussi experimen-
tally discovered is that there are phases (prototypical durations) in a presentation that allow
Philosophical background 31
dislocations and qualitative reorganization of the stimuli. He identified very short durations
(from 90 to 250 msec ca); short durations (from 250 to 600 msec ca); indeterminate durations
(from 600 to 1100 msec ca); long durations (from 1100 to 2000 msec ca); and extremely long
durations (≥2000 msec).
These findings addressed the subjective temporal deployment of a presentation and how mean-
ing is perceptually construed in the duration. The stereokinetic phenomenon of the rotating
ellipse, later developed by Musatti, shows the presence of ‘proto-percepts’ that processually unfold
from the first configuration in movement until the final perceptual stable outcome (Musatti, 1924,
1955, pp. 21–22).
To be noted is that Kanizsa, who first declared his disagreement with the idea of phases in
perceiving (Kanizsa, 1952), later came to reconsider Benussi’s viewpoint (Vicario, 1994). While
Kanizsa distinguished between seeing and thinking, considering them two different processes, at
least heuristically, he never directly addressed the question as to whether there was continuity or
discontinuity between the two processes (Albertazzi, 2003). Benussi’s theory shows the temporal
transition from perceptive to mental presence (i.e. from seeing to thinking) in presentation as the
inner deployment of the part/whole structure of a presentation.
Benussi’s experiments showed that seeing has a temporal extensiveness comprising phases in which
an ordering between the parts occurs; that the parts in perceptive presence are ‘spatialized’ in a simul-
taneous whole given in mental presence; that processes and correlates develop together; and that the
duration has a progressive focus and fringes of anticipation and retention of the parts, as Husserl had
already discussed from a phenomenological viewpoint. Benussi also showed that the dependence
relation among parts is a past-present relation, not a before-after one, occurring in the simultaneity of
the time of presentness; that parts may be reorganized qualitatively (as in cases of temporal and visual
displacement); and that at the level of the microstructure of the act of presentation, the parts can give
rise to different outputs as second-order correlates (which explains the phenomena of plurivocity).
After the initial ‘critical phase’ of the presentation regarding the actual duration of a presentation, we
take note of the spatial arrangement, the symmetry, the distance of its content-elements, and take
up assertive attitudes or attitudes of persuasion, of fantasy, of fiction, etc. (again a Brentanian legacy,
Brentano PES II). These are all intellective states, concerning the types of the act.
This explanation was bitterly contested by the Berliners. In 1913 Koffka and Kenkel published a
joint article in which they conducted detailed analysis of the results from tachistoscopic presenta-
tions of the Müller-Lyer illusion, results that closely resembled Benussi’s. Kenkel found that with
stroboscopic exposure, objectively equal lines in these figures were seen to expand and contract
(α-movement) in exactly the same manner as two similarly exposed objectively unequal lines
(ß-movement). From Koffka and Kenkel’s point of view, the two moments were functionally and
descriptively the same. While acknowledging Benussi’s temporal priority on this type of experi-
ment, Koffka nevertheless criticized his explanation. Benussi maintained that the cause of appar-
ent movement was the diversity of position assumed by the figure in the individual distinct phases
of the process. Koffka instead believed that the vision of movement was a unitary phenomenon,
not an aggregate of parts. Hence, he maintained, even if the phases presented are physically dis-
tinct, they are seen as a unitary, clearly structured complex (Koffka and Kenkel, 1913, 445 ff).
From his viewpoint, it was not possible to derive wholes from their parts, which he evidently
considered to be sensory contents, i.e. individual pieces.
At bottom, therefore, this was a theoretical dispute concerning: (i) the existence or otherwise
of non-detachable components of the Gestalt appearance; (ii) their nature, i.e. whether they
were sensory contents; (iii) their relation with the stimuli; (iv) their mutual inner relations; and
(v) more generally whether or not it was possible to analyse the deployments of the contents in
the presentation.
While insisting that the presence of internal phases did not imply the separateness of the parts
of the phenomenon, Benussi (1914a) in his turn criticized the physiological conception at the
basis of the Berliners’ theory, in that it did not account for the eminently psychological structure
of the event. What the Berliners lacked was a thorough theory of presentation in which stimuli
play only the role of triggers, in the absence of any constancy principle: presentations are not psy-
chophysical structures representing stimuli, as Brentano maintained.
The controversy continued in Koffka (1915/1938), who used the dispute with Benussi as an
occasion to give systematic treatment to the Berlin school’s views on the foundations of the theory
of perception, which he set in sharp contrast to those of the Graz school. The value of the con-
troversy consists in its clear depiction of the different positions taken by the two Gestalt schools
(Albertazzi, 2001b, c). From our present point of view, the controversy was grounded in the ques-
tion as to whether it is possible to test, and consequently explain, the subjective deployment of a
phenomenon at the presentational level, without necessarily having to resort to psychophysical or
brain correlates for their explanation.
indubitably depend on the workings of the nervous system―these are in large part physiological
conditions, we see that in this case psychological research must combine with physiological research.
(Brentano, 1895, p. 35; emphasis added).
The ‘genetic’ approach to which Meinong refers means neither a reduction to physiology, nor
research conducted in terms of developmental psychology, to use modern terms. The genesis,
i.e. the study of the deployment of a presentation, pioneered by Benussi, to distinguish specific
prototypical micro-durations responsible for the final output, was conducted without resorting
to underlying neurophysiological processes, but merely by analysing the characteristic of the
subjective integrations occurring in the space-time of awareness. Benussi admitted, however,
that at his time the tools available were not such to enable him to slow down the process in
the proper way. Recent research on attention processes, by Rensink (2000, 2002) for example,
has confirmed almost all the five prototypical durations evidenced by Benussi in his experi-
ments (Benussi, 1907, 1913, 1914b; see also Katz, 1906; Calabresi, 1930; Albertazzi, 1999, 2011).
These durations constitute the present and its fringes, i.e. they are the basic components of
presentations.
The theory of production, instead, was understood by the Berliners in terms of a mosaic theory,
as a variation of elementism, grounded on the constancy hypothesis of what, in their view, still
appeared to be ‘sensations’ (Köhler, 1913; Koffka, 1915/1938), interpreting it in inferentialistic
terms. As Kanizsa points out, in fact, in the inferentialist viewpoint:
One postulates the existence of a first ‘lower-level’ psychic phase, that of the ‘elementary sensations’.
Acting upon this are then ‘higher-level’ psychic faculties or instances, namely the memory, the judge-
ment, and the reasoning, which, through largely unconscious inferences founded upon specific and
generic past experiences, associate or integrate the elementary sensations, thus generating those
broader perceptual units which are the objects of our experience, with their forms and their meanings.
(Kanizsa, 1980, p. 38).
However, there is almost nothing in the Graz theory that can be traced back to a theory of
atomic sense data, to a Wundtian apperception or to unconscious Helmholtian inferences: what
the Grazers called the ‘founding elements’ on which higher-order objects (Gestalten) are subjec-
tively grounded are non-detachable parts of the whole and do not depend on probabilistic infer-
ences from past experience. Being partial contents of presentations, they are already phenomenic
materials, i.e. part-processes on their own, influenced, modified, and reorganized in the Gestalt
whole deploying in the time of presentness: for example, they are presented as ‘being past’, which
is a qualitative determination. Moreover, although they are distinguishable parts, they are not
separable. Also set out within this framework are the classic Brentanian notions concerning tem-
poral perception (specifically the difference between perceived succession and the perception of
succession), and the location in subjective space, place, and time of appearances.
34 Albertazzi
References
Albertazzi, L. (1999). ‘The Time of Presentness. A Chapter in Positivistic and Descriptive Psychology.’
Axiomathes 10: 49–74.
Albertazzi, L. (2001a). ‘Back to the Origins.’ In The Dawn of Cognitive Science. Early European Contributors
1870–1930, edited by L. Albertazzi, pp. 1–27 (Dordrecht: Kluwer).
Albertazzi, L. (2001b). ‘Vittorio Benussi.’ In The School of Alexius Meinong, edited by L. Albertazzi,
D. Jacquette, and R. Poli, pp. 95–133 (Ashgate: Aldershot).
Albertazzi, L. (2001c). ‘The Legacy of the Graz Psychologists.’ In The School of Alexius Meinong, edited by
L. Albertazzi, D. Jacquette, and R. Poli, pp. 321–345 (Ashgate: Aldershot).
Philosophical background 35
O’Reagan, J., and Noë, A. (2001). ‘A Sensorymotor Account of Vision and Visual Consciousness.’
Behavioural and Brain Sciences 24(5): 939–1031.
Passmore, J. (1968). A Hundred Years of Philosophy 3rd ed. (London: Penguin Books).
Rensink, R. A. (2000). ‘Seeing, Sensing, Scrutinizing.’ Vision Research 40: 1469–87.
Rensink, R. A. (2002). ‘Change Detection’. Annual Review Psychology 53: 245–77.
Rock, I. (1983). The Logic of Perception (Cambridge, Mass.: MIT Press).
Rubin, E. (1958). ‘Figure and Ground.’ In Readings in Perception, edited by D. C. Beardsley and
M. Wertheimer (New York: Van Nostrand).
Sander, F. (1930). ‘Structures, Totality of Experience and Gestalt.’ In Psychologies of 1930, edited by C.
Murchison (Worcester, Mass.: Clark University Press).
Smith, A. D. (2008). ‘Husserl and Externalism.’ Synthese 160(3): 313–333.
Spiegelberg, H. (1982). The Phenomenological Movement, 2nd ed. The Hague: Nijhoff.
Spillmann, L. (2009) ‘Phenomenology and Neurophysiological Correlations: Two Approaches to Perception
Research.’ Vision Research 49(12): 1507–1521. http://dx.doi.org/10.1016/j.visres.2009.02.022.
Spillmann, L., and Ehrenstein, W. (2004). ‘Gestalt Factors in the Visual Neurosciences?.’ The Visual
Neurosciences 19: 428–434.
Stumpf, C. (1883). Tonpsychologie, 2 vols. (Leipzig: Hirzel).
Todd, J. T. (2004). ‘The Visual Perception of 3D Shape.’ TRENDS in Cognitive Sciences 8(3): 115–121.
doi:10.1016/j.tics.2004.01.006.
Twardowsky, K. (1894/1977). Zur Lehre vom Inhalt und Gegenstand der Vorstellungen. Wien: Hölder. En. tr.
(1977) by R. Grossman (The Hague: Nijhoff).
Tse, P. U. (1998). ‘Illusory Volumes from Conformation’. Perception 27(8): 977–992.
Valberg, A. (1971). ‘A Method for the Precise Determination of Achromatic Colours Including White’.
Vision Research 11: 157–160.
Valberg, A. (2001). ‘Unique Hues: An Old Problem for a New Generation.’ Vision Research 41: 1645–1657.
http://dx.doi.org/10.1016/S0042-6989(01)00041-4.
Vicario, G. B. (1994). ‘Gaetano Kanizsa: The Scientist and the Man’. Japanese Psychological Research
36: 126–137.
von Allesch, G. J. (1925a). ‘Die aesthetische Erscheinungsweise der Farben’ (Chapters 1–5). Psychologische
Forschung 6: 1–91.
von Allesch, G. J. (1925b). ‘Die aesthetische Erscheinungsweise der Farben’ (Chapters 6–12). Psychologische
Forschung 6: 215–281
von Ehrenfels, C. (1890/1988) ‘Über Gestaltqualitäten.’ Vierteljharschrift für wissenschaftliche
Philosophie 14: 242–292. En. tr. in B. Smith ed. (1988), Foundations of Gestalt Psychology, pp. 82–117
(München-Wien: Philosophia Verlag).
Wagemans, J., Elder, J. E., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M., and von der
Heydt, R. (2012). ‘A Century of Gestalt Psychology in Visual Perception. I. Perceptual Grouping and
Figure-Ground Organization.’ Psychological Bulletin. Doi: 10.1037/a0029333.
Wackermann, J. (2010). ‘Psychophysics as a Science of Primary Experience.’ Philosophical Psychology 23:
189–206.
Warren, W. H. (2005). ‘Direct Perception: The View from here.’ Philosophical Topics 33(1): 335–361.
Warren, W. H. (2006). ‘The Dynamics of Perception and Action.’ Psychological Review 113(2): 358–389.
DOI: 10.1037/0033-295X.113.2.358.
Welham A. K., and Wills, A. J. (2011). ‘Unitization, Similarity, and Overt Attention in Categorization and
Exposure.’ Memory and Cognition 39(8): 1518–1533.
Wellek, A. (1954). Die genetische Ganzheitspsychologie. (München: Beck).
40 Albertazzi
Wertheimer, M. (1912/2012). ‘Experimentelle Studien über das Sehen von Bewegung.’ Zeitschrif für
Psychologie 61: 161–265. En tr. by M. Wertheimer and K. W. Watkins, in Max Wertheimer, On Perceived
Motion and Figural Organization, edited by L. Spillmann, pp. 1–92 (Cambridge, Mass.: MIT Press).
Wertheimer, M. (1925a/1938). ‘Untersuchungen zur Lehre von der Gestalt. I.’ Psychologische Forschung 4:
47–58. En tr. (1938; repr. 1991) in A Source Book of Gestalt Psychology, edited by W. D. Ellis, pp. 12–16
(London: Kegan Paul).
Wertheimer, M. (1925b/1938). Über Gestalttheorie (Erlangen). En tr. (1938; repr. 1991) in A Source Book of
Gestalt Psychology, edited by W. D. Ellis, pp. 1–11 (London: Kegan Paul).
Witasek, S. (1899). Grundlinien der Psychologie (Leipzig: Dürr).
Zanarini, G. (2001). ‘Hermann von Helmholtz and Ernst Mach on Musical Consonance.’ In The Dawn
of Cognitive Science. Early European Contributors 1870–1930, edited by L. Albertazzi, pp. 135–150
(Dordrecht: Kluwer).
Chapter 3
Methodological background:
Experimental phenomenology
Jan J. Koenderink
becomes of marginal interest. Consider the case of weight again. A kilogram of feathers by defini-
tion weighs as much as a kilogram of lead, yet they are experienced as ‘somehow different’ by the
human observer (Charpentier 1891).
In 1846 Ernst Heinrich Weber published Tastsinn und Gemeingefüll (Weber 1905). One result
he had found was that the human observer, in comparing weights placed upon the two hands, can
just notice a 5 per cent difference in weight—that is 50 g on a kilogram, or 5 g on 100 g. This law
of proportionality is known as ‘Weber’s Law’ (name due to Fechner). Gustav Theodor Fechner
published Elemente der Psychophysik in 1860 (Fechner 1860). He analytically ‘integrated’ Weber’s
Law, and thus framed what is commonly known as the Weber–Fechner Law: the sensation (in
this case the quantity of the feeling of heaviness) is proportional to the logarithm of the physical
stimulus (in this case weight). Fechner referred to this as ‘The Psychophysical Law’. (In all fair-
ness to Fechner, his ‘Psychophysical Law’ properly applies to arbitrary, just noticeable differences,
Weber’s law being just a particular example.)
Notice that we deal with a number of ontologically very different entities here1. We have at least
to reckon with the magnitude of a physical parameter, the judgment of equality of an environ-
mental property, the notion of the just noticeable difference in some environmental parameter,
and the magnitude of a certain experience. The physical parameter is often assumed to be trivial,
because physics is supposed to be the most elementary of the sciences. Of course, this is not quite
true. For one thing, physics derives from human experience, rather than the other way around, a
fact that is often forgotten. For another thing, the nature of mass in physics is not really that well
understood (does it involve an understanding of the Higgs boson2, or does it involve a composite
nature of the electron3?). However, I’ll let that be, for the elementary notions of detectability and
discriminability are more interesting. If you perform the experiment ‘right’, these notions can be
made very ‘objective’. Objectivity implies ‘independent of any first-person account’. In the highest
regarded methods the person making the judgments is largely (or even fully) unaware of experi-
encing anything at all. I will refer to such cases as ‘dry physiology’. Most of classical psychophys-
ics falls in this general ballpark. With methods like EEG-recording the ideal is actually reached.
One may derive signals from the body in response to physical stimuli that the person never (or
only after some time interval) becomes consciously aware of. The ‘magnitude of an experience’ is
in a different ballpark altogether. It is literally like a pain in the ass, in that it involves conscious
personal awareness.
Something like a ‘magnitude of experience’ may be considered mysterious, and perhaps not to
be counted as a scientific fact. One popular account would denote it ‘epiphenomenal to certain
neural events’4,5. This is like saying that ‘pain is the firing of C-fibres’, indeed a popular notion
(Puccetti 1977). The optimistic feeling is that once science prevails people will stop referring to
pre-scientific notions like pain.
A ‘magnitude of experience’ is not even the most mysterious entity around. Many naive observ-
ers actually feel that they experience (are aware of) qualities and meanings—at least that is what
they report, whatever that may be construed to mean. For instance, some visual observers, when
confronted with pieces of colored paper, are perfectly happy to grade them as ‘red’, ‘blue’, ‘yellow’,
and so forth. Notice that such observers are grading visual experiences here, not physical objects.
It is easy enough to change the state of the environment (including the observer), such that the
qualities change, relative to the identity of the objects. One may consider numerous confusions
at this point. For instance, it is not uncommon to hear remarks like ‘the red paper looks blue to
the observer’. Of course, that is a confusion of ontological levels. A thing that looks blue is a blue
visual thing. The ‘red paper’ referred to is another thing—here ‘red’ refers apparently to a physical
property. We are discussing visual things here.
I will denote the study of first-person reports such as ‘I see a blue patch’ as a function of the struc-
ture of the physical environment ‘experimental phenomenology’ (Varela, Maturana, and Uribe
1974)6. It is different from ‘dry physiology’, which I will denote ‘psychophysics’. Psychophysics is
again different from ‘physics’, which I will treat as the level at which ‘the buck stops’ as inquiry
goes. This is in no way necessary; for instance, the physicist will certainly want to carry the inquiry
further indefinitely.
Measurement in Psychophysics
Since I defined psychophysics as ‘dry physiology’, it only makes sense that psychophysics often
makes use of physiological measurements. These are usually physical measurements of an electri-
cal, mechanical, or thermal nature. Historically, reaction times have been very important; later
EEG-recording became a common method; at this time in history various techniques of ‘brain
scanning’ are becoming increasingly popular. Such methods are not essentially different from the
methods of animal physiology. Here I will concentrate upon methods in which the observer has
an active role.
The role of the observer can be various. In the simplest cases the observer has to indicate equal-
ity or its absence in a pair of prepared physical environments. The observer is not required to
comment on the nature of the difference. In some cases the observer may have to judge the dif-
ference between something and nothing. The ‘something’ remains undefined. In many cases, the
observer will actually be unaware of the nature of it—that is to say, will be hard-put to describe its
qualities. In such cases the observer acts as a ‘null-detector’. It is much like the case of weighing
with scales in which the person notices equilibrium, but has no experience of the quality of ‘heavi-
ness’, such as happens with objects too heavy to lift.
These are the measurements of ‘absolute thresholds’ and of ‘discrimination thresholds’. One
often assumes that such thresholds in some way ‘exist’, even when not being measured. The
experiment simply tries to measure this pre-existing value as precisely as possible. A plethora
of methods have been developed for that. The reader is referred to the standard literature for
this (Luce 1959; Farell and Pelli 1999; Ehrenstein and Ehrenstein 1999; Treutwein 1995; Pelli and
Farell 1995). Decades of work have resulted in a wealth of basic knowledge in (especially) vision
and audition. The development of modern media like television and high-fidelity sound record-
ing would have been impossible without such data. Yet it is easily possible to question the basic
assumptions. The thresholds are evidently idiosyncratic, and depend upon the present physiologi-
cal state of the observer. It is probably more reasonable to understand thresholds as operationally
defined, than as pre-existing. Indeed, different operationalizations typically yield (at least slightly)
different values. To discuss the question ‘which value is right’ seems hardly worthwhile. In a few
cases the thresholds can be related to basic physical constraints. For instance, electromagnetic
energy comes as discrete photon events (Bouman 1952), setting physical limits to the thresholds,
and Brownian movement of air molecules causes ‘noise’ that limits the audibility of weak sounds
(Sivian and White 1933). Especially in such cases, the notion of ‘dry physiology’ (essentially a
subfield of physics) appears an apt term.
If you have ever been an observer in a classical threshold experiment yourself, you will under-
stand that I only indicated the top of the iceberg. In the best, most objective, methods, the experi-
menter and the observer are both unaware of what they are doing. Such experiments are called
‘double blind’; these are considered the only ones to be trusted unconditionally. If the method has
been optimized for time, the observer will have a fifty-fifty chance of ‘being right’ at each trial.
‘Being right’ is relative to the notion that there exists a threshold independent of the method of
finding it. This puts the observer in a very unfortunate spot, namely maximum uncertainty. This
is especially unpleasant if you don’t know what you are supposed to ‘detect’. The best experiments
are like Chinese torture. This frequently happens in adaptive multiple forced-choice procedures.
The observer often has no clue as to what she is supposed to notice. One trick of the observer is
to respond randomly, in an attempt to have the method raise the stimulus level, so as to be able to
guess at the task. This is an idea that might not occur to actually ‘naive’ observers, which is perhaps
one reason for their popularity. Then the observer tries to remember what the task was, while—at
least in the observer’s experience—nothing is perceived at all. Such methods depend blindly on a
number of shaky assumptions, and their claims to objectivity, precision, and efficiency are argu-
able. In my view it remains hard to beat Fechner’s simple ‘method of limits’, ‘method of constant
stimuli’, and ‘method of adjustment’ (Farell and Pelli 1999; Ehrenstein and Ehrenstein 1999; Pelli
and Farell 1995), both conceptually and pragmatically.
In my experience, many observers try to ‘cheat’ by aiming at a level somewhat above threshold.
This is often possible because the experimenter will never notice. I can say from (much) experience
as an observer that it feels way better, and from (much) experience as an experimenter that it yields
much better results. Of course, this is bad, for it defeats the purpose. As an observer you are able to
manipulate the threshold. In many cases it is possible to maintain a number of qualitatively different
thresholds. For instance, in the case of the contrast threshold for uniformly translating sine-wave
gratings (about three decades worth of literature!) an observer can easily maintain thresholds for:
• Seeing anything at all;
• Seeing movement, but not its direction;
• Seeing movement in a specific direction;-Seeing something spatially articulated moving;
• Seeing stripes, but being uncertain about their spacing or width;
• Seeing well-defined stripes moving;
• and so forth.
It will depend upon the physical parameters what one will be aware of. Such things have rarely
been recorded in the literature (Koenderink and van Doorn 1979). However, they must be obvious
to anyone who was ever an observer. They must have been obvious to experimenters who occa-
sionally acted as an observer themselves. However, some experimenters never act as an observer,
in fear of losing their status as an objective bystander. Many are reluctant to admit that they did.
The point I am making here is that one should perhaps take the literature with a little grain of salt.
It is hard, maybe impossible, to really understand an experiment you are reading about, unless
you were at least once an observer in it yourself. This perhaps detracts a bit from the apparently
Methodological background 45
tidy objectivity of such reports. For the hardcore brain scientist this does not pose a problem, for
on the ontological level of physiology the observer’s reports are mere subjective accounts, and do
not count as scientific data. Moreover, visual awareness is epiphenomenal with respect to the real
thing, which is electrochemical activity in the brain. Numerical threshold data are supposed to
carry their own meaning.
Perhaps more interesting cases involve supra-threshold phenomena. These are often more
important from an applications perspective. It also involves the observer’s perceptual awareness.
It does not necessarily involve the observer’s recognition or understanding (in reflective thought)
of the perception. The techniques almost all involve a comparison of two or more perceptual
entities. In case the comparison is between successive cases, memory will also be involved. The
comparison may involve mere identity, in which case we are back in the dry physiology situation,
but more commonly involves some partial aspect of the perceptual awareness. In that case one
draws on the observer’s ability to somehow parse awareness.
An extreme example is Stanley Smith Stevens’ (proud author of the ‘Handbook of Experimental
Psychology’ (1951), counting over 1400 pages (Stevens 1951)) method of intermodal comparison
(the famous paper ‘On the Psychophysical Law’, dating from 1957 (Stevens 1957)). Stevens had
people ‘equate’ anything with anything, like equating brightness of an illuminated patch with force
exerted in a handgrip (or anything you might imagine). What could this mean? Apparently people
are comparing ‘magnitudes of sensation’ in the Fechnerian sense. It is not easy to understand what
is really going on here. Such experiments are simple enough to program on a modern computer,
and it is worthwhile to gain the experience. For instance, you may try to equate brightness with
loudness. Stevens’ Law tells us that all magnitudes of sensation are related by power laws, the argu-
ment being that power laws form a group under concatenation. It is hard to assess how reasonable
this argument is. Perhaps remarkably, in practice it works amazingly well. Moreover, silly as the
task sounds, most observers have no problem with it. They simply do it.
A special case of Stevens’ method of comparison is to let the observer relate a magnitude of sensa-
tion to numbers. One starts with some instance and encourages the observer to call out a number
(any number). Then further instances are supposed to be related to this, the number scale being
considered a ratio scale. This is often called ‘direct magnitude estimation’ (Poulton 1968). It has
often been shown to lead to apparently coherent results. This might perhaps be interpreted as an
indication that the ‘magnitude of sensation’ is a kind of quality that is immediately available to the
observer.
An interesting approach is Thurstone’s method of comparison (Thurstone 1927, 1929). Given
three items, you are required to judge which item is the (relative) outlier. This is evidently a metric
method—at least it purports to be by construction. The observer is not required to know on what
basis the decision is to be made, rendering the method ‘objective’. However, different from the
pairwise comparison, the observer is forced to judge on the basis of some quality (or qualities),
forced by the very choice of stimuli. Moreover, the method yields a clear measure of consistency.
This is what I like best. If the task makes no sense to the observer, the results will be verifiably
inconsistent. If the data are consistent, one obtains a metric. Simple examples appear impressive
at first sight. For instance, using pieces of paper, one obtains a metric that appears to reflect the
structure of the color circle. Does this ‘objectify’ the color circle? Perhaps, but it does not do so in
an interesting way. The same structure can be obtained from judgments of pairwise equality. It has
nothing to do with the quality we know as ‘hue’.
In the final analysis, if you want to study ‘hue’ as a quality, all you can do is rely on first-person
accounts of ‘what it is like’ to experience hue (e.g. to ‘have red in mind’ or ‘experience redly’). That
means moving to experimental phenomenology.
46 Koenderink
Experimental Phenomenology
Consider the instance of hue naming. It is easy enough to check whether observers can perform
this task in a coherent manner. One simply asks for the hues of a large number of objects that dif-
fer only in a few spectral parameters (e.g., the RGB colors of a CRT tube), presenting each object
multiple times. One goes to some length to keep the physical environment stable. For instance,
one shows the objects in open shade at noon on a sunny day, or uses a CRT in a dark room. This
allows one to check reproducibility. One finds that observers do indeed yield coherent results,
inconsistencies being limited to objects that appear very similar. The fuzzy equivalence sets7
appear to be fixed for a given observer. Moreover, there are numerous observers that essentially
agree in their judgments, although occasional dissenters occur. This suggests that the hue names
are not totally idiosyncratic. One might say that there exists something of a ‘shared objectivity’
among a large group of observers (Berlin and Kay 1969).
Such a shared objectivity is by no means the same as the (true) objectivity that is the ideal of the
sciences. In physics the ‘facts’ are supposed to be totally independent of the mind of any individual
observer. On closer analysis the facts of physics are defined by community opinion, the community
being a group of people that recognize each other as professionals (a ‘peer group’). They agree on
the right way to do measurements, to analyze the results, and so forth. There is no doubt that this
has been shown to work remarkably well. However, it is certainly the case that some ‘facts’ are hotly
debated in the community (like tachyonic neutrinos (Reich 2011), or the recent Higgs boson). There
are also cases where the system did not work too well, like the (in)famous case of Schiaparelli’s
Martian canals8, which played an important role in planetary science for decades9, but are now
regarded as non-existent. Thus the ideal of ‘true objectivity’ is evidently a fiction, at best a virtual lim-
iting case. One should perhaps not to hastily dismiss shared objectivity as totally unscientific. That
so many people are ready to judge blood ‘red’ and grass ‘green’ is hardly entirely meaningless. Nor
is it explained away by the spectral locations of the hemoglobin and chlorophyll absorption bands.
Researchers in the Gestalt tradition10 frequently use the method of ‘compelling visual proof ’.
One prepares an optical scene, and collects the majority community opinion on the structure of
immediate visual awareness in the presence of the scene. In cases of striking majority consensus,
one speaks of an ‘effect’, reified through shared objectivity. An example is the figure–ground struc-
ture of visual awareness. Visual objects are seen against a ground, the contour belonging to the
object, the ground apparently extending behind the object. The phenomenon of figure–ground
reversal proves that this is a purely mental phenomenon, there being no physics of the matter.
Most researchers accept compelling visual proofs as sufficient evidence for the reality of an effect.
The striking visual proof implies shared objectivity over a large group of observers, which goes
some way towards the virtual limit of ‘true objectivity’. However, it is accepted that there might be
a minority group that ‘fails to get the effect’.
Visual proofs are not limited to the psychology of Gestalt. They are actually common in math-
ematics, especially geometry. For instance, several visual proofs of the Pythagorean theorem are
well known11. Many mathematicians consider proofs only useful when they are ‘intuitive’, by which
Le Mani su Marte: I diari di G.V. Schiaparelli. Observational diaries, manuscripts, and drawings (Historical
8
is meant that they can be broken up in smaller parts that are individually compelling. Such parts
are often visual proofs (Pólya 1957). Other mathematicians abhor visual proofs and only recognize
‘symbol pushing’. Ideally, that would lead to a mathematics that would be fully independent of the
human mind, and be simply the (uninterpreted!) output of a Turing machine. In physics, visual
proofs are also common enough. Famous is the ‘Clootcransbewijs’ of Simon Stevin (Stevin 1586),
which yields an immediate insight in the truth of the vector addition of forces. Again, some physi-
cists would prefer to limit physics to ‘symbol pushing’ and ‘pointer readings’, in the interest of true
objectivity. Such would be physics beyond ‘human understanding’ in the usual sense. It could be
the (uninterpreted!) signal transmitted by a NASA Mars explorer. Since ‘true objectivity’ in the sci-
ences would exclude human intuition or understanding, it seems hardly a goal to strive for. Who
might be interested? True objectivity implies zero understanding. Somehow, one has to find the
right balance.
In experimental phenomenology such ‘symbol pushing’ or ‘pointer readings’ are to no avail, as
there are no formal theories with quantitative predictive power, and pointer readings belong to
dry physiology. Perceptual proofs have to be the major tool.
I will draw some illustrative examples from our recent work, stressing the considerations lead-
ing up to the design of the method, and the types of result that were obtained.
order. One may also look into the picture and be aware of a pictorial space, filled with pictorial
objects. Pictorial objects are volumetric and bounded by surfaces, the pictorial reliefs. Different
from the picture surface, which is a physical object coexisting with the body of the observer in a
single space, the pictorial relief is a mental object without physical existence. It lives in immedi-
ate visual awareness. As such, it is a worthy object for study in experimental phenomenology
(Koenderink, van Doorn, and Wagemans 2011).
Pictorial reliefs are two-dimensional submanifolds of three-dimensional pictorial space. Pictorial
space is quite unlike Euclidean space (the space you move in) in that the depth dimension is not com-
mensurate with the visual field dimensions. Whereas the ontological status of the visual field dimen-
sions is in no way obvious, these dimensions do at least have analogues in the physical scene, namely
the dimensions that span the picture plane. Despite these fundamental differences, it is intuitively evi-
dent that an element (small patch) of pictorial relief can be parameterized by a spatial attitude (that is
to say, it could be seen frontally or obliquely), and by a shape. The attitude can be parameterized by two
angles, a slant (measure of obliqueness) and a tilt (the direction of slanting). Being a two-dimensional
patch, it is geometrically evident that the shape can be parameterized by two curvatures in mutually
orthogonal directions and an orientation. Thus one can parameterize a smallish patch of pictorial
relief by six parameters, its ‘depth’ (one parameter), its spatial attitude (two parameters), and its shape
(three parameters). One might consider it the task of experimental phenomenology to address these.
How to go about that (Koenderink, van Doorn, and Kappers 1992)?
Initially, it might seem easiest to go for the depth first, since it is a simple point property. In
the simplest implementation, one might ask an observer to do raw magnitude estimation. One
puts a mark (think of a red dot placed on a monochrome photograph) on the picture surface and
instructs the observer to call out the depth. One repeats this for many points, say in random order.
The result would be a ‘depth map’, evidently a desirable result of experimental phenomenology.
When you give this a try, you will find that it doesn’t work very well. The observer has no clue as
to absolute depth, only relative depths (depth differences between point pairs, say) appear to make
sense. Such point pair comparisons do indeed work to some extent, but—of course—they yield
depth only up to an arbitrary offset. Moreover, the spread in the result is rather high, and for some
point pairs the task is essentially an impossible one. This is an important insight: ‘depth at a point’
plays no role in visual awareness.
Spatial attitude is apparently a better target since observers can easily point out in which direc-
tion a surface element is slanted. How to measure attitude? The simplest method appears again
to be magnitude estimation. Put a mark on the picture surface, and have the observer call out the
slant and tilt angles in degrees. This experiment was actually performed by James Todd (Todd and
Reichel 1989), but unfortunately the results are not encouraging. Observers take a long time to
arrive at a conclusion, and results are very variable. Moreover, observers hate the task. It just fails
to feel ‘natural’. Are there methods to address spatial attitude that do feel natural?
One approach to the design of more natural methods relies on the method of coincidence. It is a
very general principle, also commonly used in the sciences. Consider how one measures length.
One designates a certain stick as the ‘unit of length’. One uses geometrical methods to produce
sticks of any length. For instance, cutting a unit stick into two equal pieces produces a stick of
one-half unit length. The judgment of equality does not require any length measurement itself,
thus does not introduce circularity. Likewise, putting two unit-length sticks in tandem produces
a stick of two unit lengths. And so forth. Measuring the length of an unknown stick involves
finding a stick of known length (they can be produced of any length) and judging equality. In
practice one produces a yardstick with marked subdivisions, puts the unknown stick next to it,
and notices coincidence of the endpoints of the stick with marks on the yardstick. This is the
50 Koenderink
gist of the method of coincidence13. The ancients refined it, and the same principle was applied
to weights. Later methods were found to extend the method to luminance, temperature, various
electrical variables, and so forth. Here I will mainly use the paradigm of the yardstick.
Notice what you need in order to apply this method of ‘length measurement’. First you need a
yardstick. Then you have to be able to put the yardstick next to the object to be measured. Finally
you need to be able to judge the coincidence of two fiducial points on your object with marks on
the yardstick. Each of these requirements might fail to be met. For instance, you have no yardstick
that would let you measure the distance to the moon. You are not able to apply the yardstick (use-
fully) to a coiled rope. And so forth. The method of length measurement implies that you succeed
in dealing with the various requirements.
In the case of pictorial surface attitude you have to design a ‘gauge figure’ (your analogue of the
‘yardstick’), you have to be able to place this object in pictorial space, on the pictorial surface, and
you have to be able to manipulate the gauge figure so as to bring about a ‘coincidence’. None of
these design objectives is trivial.
The gauge figure should be a pictorial object, since it should be inserted in pictorial space. This
means designing a picture of the gauge figure, in the expectation that it will produce a pictorial
object. The gauge figure should appear to have well-defined spatial attitude, for that is what we
would like to measure, and as few superfluous ‘frills’ as possible. Inspiration can be found in the art
of drawing. Artists often use ellipses to suggest spatial attitude, for instance in ‘bracelet shading’14,
spreading ripples on water, the shape of water lily leaves, the bottom hem of a dress, and so forth.
An oval makes a good gauge figure for attitude because it tends to look ‘like’ a slanted and tilted
circle.
How to place the gauge figure at the right location? Perhaps surprisingly, this turns out to be
easy. Almost anything you put on the picture surface will travel into depth till it meets a pictorial
surface on which it will stick. Mustaches and black teeth on posters of politicians are a case in
point. However, it is by no means a fail-safe method; some marks stubbornly look like flyspecks
on the pictorial surface. This is an important insight: in experimental phenomenology the aware-
ness of the experimenter is just as important as that of the observer! The ‘objectivity’ of experi-
mental phenomenology is shared subjectivity. Fortunately, the gauge figure tends to work well.
Simply superimposing an elliptical outline on the picture surface is enough to put the gauge on
the pictorial relief.
Finally, bringing about the coincidence is a simple matter. Most ellipses look like they are not
lying upon the surface, but at some angle to it. By changing the orientation and shape of the ellipse
you may bring about an awareness of the gauge figure as ‘a circle painted upon the surface’. This
is a striking visual fact; it looks very different from an ellipse that doesn’t fit. Of course, there is
little one can do in case the observer fails to agree. Such cases appear to be extremely rare though.
The only important design issue left is the interface. The observer somehow has to be able to
manipulate the ellipse. This is very important. If the interface is not ‘natural’ the method is not
going to work. You may gain an appreciation for this fact if you play with a simple kid’s game: writ-
ing your name with a device that uses two knobs controlling the Cartesian coordinates of the
‘Bracelet shading’ derives from the way a (circular) bracelet reveals the shape of a cross-section of an arm,
14
leg, or neck. The hatching used in bracelet shading follows the curves obtained by cutting the shape by
planar sections perpendicular to its overall medial axis. The hatching may follow material features, for
instance, folds in sleeves often lend themselves very naturally to this technique.
Methodological background 51
writing implement. The ‘Etch a Sketch’ toy, a devilish French invention, manufactured by the Ohio
Art Company, does exactly that15. Writing anything, for instance your own name, is nearly impos-
sible, which accounts for the popularity of the device. Using a proper interface, observers bring
about coincidence in a few seconds. Participants consider it easy and generally fun to do. You
easily do hundreds of coincidences in a session of half an hour. In contradistinction, interfaces of
the Etch a Sketch type are a strain on the observer. Moreover, they lead to badly reproduceable
results, and take twice or thrice the time. In practice the difference is crucial. Yet from a ‘formal,
conceptual’ perspective the interface should make no difference at all. That’s why this section is
entitled the ‘art’ of devising methods. It is desirable that eventually such ‘art’ should be replaced
with principled methods, of course.
Notice that a natural interface is also crucial because of time constraints. The structure of picto-
rial space is volatile and may change to a noticeable degree over the span of an hour. This limits
the number of surface attitude samples that can be taken to a few hundred, even with a convenient
interface.
Such experiments are usually done on a computer screen because that makes it easy to imple-
ment the interface. Perhaps unfortunately, it also makes it trivial to put as many gauge figures
on the screen as you wish. This has induced people to plaster the surface with gauge figures, and
have the observer control the structure of an extensive gauge figure field. This is generally a bad
idea. Why? The reason is that ellipses are powerful cues (think of bracelet shading and so forth).
Indeed, you may as well remove the picture, for you will still see the pictorial surface, due to the
gauge figures alone. With the picture present it is easily possible to influence the pictorial relief
by adjusting the gauge figure field. Thus, the measurement influences the result. To minimize this
undesirable effect, we never show more than one gauge figure at a time, and do so in random
spatial order. Of course, there are many more possible artifacts of this type. Size, color, line thick-
ness, and so forth of the gauge figure are an important and integral part of the design. Such factors
co-determine the result, and should be considered part of the measurement.
Given a field of local surface attitudes, one may find an integral surface that ‘explains’ them as
well as possible. Some variations of attitude will have to be ignored by such a method, because not
just any field of attitudes admits of an integral surface. Thus, you obtain a very useful measure of
coherency of the result. If the spread in repeated settings accounts for the incoherence, then one
might say that a ‘pictorial surface exists’. This existence proof is a major advantage of these meth-
ods. In case a coherent surface exists, one obtains a depth map modulo an arbitrary offset. This is
an important point of departure for various important lines of experimental phenomenological
research.
There are a number of very common misunderstandings that may need special mention. I men-
tion two of these that have a bearing on the ontological status of the measurements.
One widespread misunderstanding is due to an overly cognitive interpretation of these meth-
ods. As I have argued above, the final task of the observer is to judge a coincidence. The gauge
figure should appear as ‘a circle painted upon the surface’ in immediate visual awareness. This is a
primitive awareness; it does not involve any reasoning. At least, that should be the case, or else the
method cannot be considered to be a method of experimental phenomenology. Neither cognition
proper (noticing the coincidence in no way involves recognition of the pictorial object, and so
forth), nor (a fortiori) reflective thought, should be involved. Yet people frequently interpret the
method in the following way. The observer is supposed to:
Conclusion
Experimental psychology is a very broad discipline. It encompasses subfields like dry physiology
(or behaviorism), cognitive science, and experimental phenomenology, which operate on mutu-
ally distinct ontological levels. This is unusual among the sciences. It is not intrinsically problem-
atic, but it starts to generate countless problems when one tries to enforce the same requirements
on ‘objectivity’ throughout. This is simply not possible. Of course, it isn’t even possible in physics,
but few people are ready to acknowledge that. Here I pleaded for the notion of ‘shared subjectivity’
as a pragmatic alternative to the virtual notion of scientific ‘objectivity’. At least it admits of graded
degrees of objectivity, instead of a mere binary objective/subjective distinction.
Once one recognizes the various ontological levels for what they are, it is evident that these
various levels require distinct methods. Dry physiology is perhaps the easiest case, because its
methods are essentially those of physics. The problem here is not so much in the methodol-
ogy as in its conceptual approaches: the physiological data are often interpreted in terms of
mental entities (e.g. visual awareness), which amounts to an unfortunate confusion of levels.
The behaviorists were far more consequent in considering speech as amounting to the move-
ment of air molecules. Cognitive science approaches perception on the functional level, which
16 A ‘palm board’ is a planar surface on which one may rest one’s hand palm, and that may be rotated in any
desired spatial attitude. The angles parameterizing the attitude are read out, usually in some electronic way.
The palm board is useful as an interface device that may be used to indicate the perceived spatial attitude
of some object.
Methodological background 53
is fine; it has developed a large toolbox of very useful methods. The problems are again a
frequent confusion of levels, in this case in two directions. Functional entities are often inter-
preted in both neural and mental terms (qualities and meanings), frequently in ways that are
rather far-fetched. Finally, experimental phenomenology studies the structure (in terms of
qualities and meanings) of perceptual awareness. It has to use its own methodology, in terms
of first-person accounts, mainly based on immediate ‘perceptual proofs’. This, again, is fine as
it goes. Problems occur as the conceptual interpretation crosses ontological levels. A historic
failure of this kind was the interpretation of Gestalt properties in terms of isomorphic neural
activity.
Of course, there is no problem with any one person freely moving back and forth between
researches on distinct ontological levels. On the contrary, such frequent excursions are very much
to the benefit of experimental psychology! However, a serious attempt at the recognition of the
ontological chasms is essential. Overstepping the boundaries should require explicit mention of
the psychophysical ‘bridging hypotheses’. Unfortunately, and to its disadvantage, the scientific
community fails to enforce that.
References
Albertazzi, L. (forthcoming). ‘Philosophical Background: Phenomenology’. In The Oxford Handbook of
Perceptual Organization, ed. J. Wagemans (Oxford: Oxford University Press).
Aristotle (ca.350 BCE). De Anima. Available as download from the Internet Classics Archive, <http://classics.
mit.edu/Aristotle/soul.html>.
Baxandall, Michael (1995). Shadows and Enlightenment (London, New Haven: Yale University Press).
Berlin, B. and P. Kay (1969). Basic Color Terms: Their Universality and Evolution (Berkeley, CA: University
of California Press).
Bouman, M. A. (1952). ‘Mechanisms in Peripheral Dark Adaptation’. JOSA 42: 941–950.
Charpentier, A. (1891). ‘Analyse expérimentale: De quelques élements de la sensation de poids’
[Experimental study of some aspects of weight perception]. Arch Physiol Norm Pathol 3: 122–135.
Eddington, Arthur Stanley (1928). The Nature of the Physical World (New York: Macmillan).
Ehrenstein, W. H. and A. Ehrenstein (1999). ‘Psychophysical Methods.’ In Modern Techniques in
Neuroscience Research, ed. U. Windhorst and H. Johansson, ch. 43 (New York: Springer).
Farell, B. and D. G. Pelli (1999). Psychophysical Methods, or How to Measure a Threshold, and Why. In
Vision Research: A Practical Guide to Laboratory Methods, ed. R. H. S. Carpenter and J. G. Robson, pp.
129–36 (New York: Oxford University Press).
Fechner, Gustav Theodor (1860). Elemente der Psychophysik (Leipzig: Breitkopf and Härtel). Available for
download from <http://archive.org/stream/elementederpsych02fech#page/n5/mode/2up>.
Koenderink, J. J. and A. J. van Doorn (1979). ‘Spatiotemporal Contrast Detection Threshold Surface is
Bimodal.’ Optics Letters 4: 32–34.
Koenderink, J. J., A. J. van Doorn, and A. L. M. Kappers (1992). ‘Surface Perception in Pictures.’
Perception & Psychophysics 52: 487–496.
Koenderink, J. J., A. J. van Doorn, and J. Wagemans (2011). ‘Depth.’ i-Perception (special issue on Art &
Perception) 2: 541–564.
Lowell, Percival (1911). Mars and its Canals (New York, London: Macmillan). Available for download on
<http://archive.org/details/marsanditscanals033323mbp>. Last accessed. Sept 25 2013
Luce, R. D. (1959). ‘On the Possible Psychophysical Laws.’ Psychological Review 66(2): 81–95.
Pelli, D. G. and B. Farell (1995). ‘Psychophysical Methods.’ In Handbook of Optics, vol. I, 2nd edn, ed.
M. Bass, E. Wvan Stryland, D. R. Williams, and W. L. Wolfe, pp. 29.1–29.13 (New York: McGraw-Hill).
Pólya, George (1957). How to Solve It (Garden City, NY: Doubleday).
54 Koenderink
Poulton, E. C. (1968). ‘The New Psychophysics: Six Models for Magnitude Estimation.’ Psychological
Bulletin 69: 1–19.
Puccetti, Roland (1977). ‘The Great C-Fiber Myth: A Critical Note.’ Philosophy of Science 44(2): 303–305.
Reich, E. S. (2011). ‘Speedy Neutrinos Challenge Physicists.’ Nature News 477 (27 September): 520.
Silberstein, Michael and John McGeever (1999). ‘The Search for Ontological Emergence.’ The Philosophical
Quarterly 49(195): 201–214.
Sivian, L. J. and S. D. White (1933). ‘On minimal audible sound fields’. J Acoust Soc 4: 288.
Stevens, S. S. (1951). Handbook of Experimental Psychology (New York: Wiley).
Stevens, S. S. (1957). ‘On the Psychophysical Law.’ Psychological Review 64(3): 153–181.
Stevin, Simon (1586). De Beghinselen der Weeghconst. Published in one volume with De Weeghdaet, De
Beghinselen des Waterwichts and an Anhang (appendix) (Leiden: Plantijn).
Thurstone, L. L. (1927). ‘A Law of Comparative Judgment.’ Psychological Review 34: 273–286.
Thurstone, L. L. (1929). ‘The Measurement of Psychological Value.’ In Essays in Philosophy by Seventeen
Doctors of Philosophy of the University of Chicago, ed. T. V. Smith and W. K. Wright, pp. 157–174
(Chicago: Open Court).
Todd, J. T. and F. D. Reichel (1989). ‘Ordinal Structure in the Visual Perception and Cognition of Smooth
Surfaces.’ Psychological Review 96: 643–657.
Treutwein, B. (1995). ‘Adaptive Psychophysical Procedures.’ Vision Research 35(17): 2503–2522.
van Doorn, A. J., J. J. Koenderink, and J. Wagemans (2011). ‘Light Fields and Shape from Shading’. Journal
of Vision 11: 1–21.
van Doorn, A. J., J. J. Koenderink, J. T. Todd, and J. Wagemans (2012). ‘Awareness of the Light Field: The
Case of Deformation. i-Perception 3(7): 467–480.
Varela, F., H. Maturana, and R. Uribe (1974). ‘Autopoiesis: The Organization of Living Systems, its
Characterization and a Model.’ Biosystems 5: 187–196.
Wagemans, J., A. J. van Doorn, and J. J. Koenderink (2011). ‘The Shading Cue in Context.’ i-Perception 1:
159–178.
Wagemans, J. (forthcoming) ‘Historical and Conceptual Background: Gestalt Theory.’ In The Oxford
Handbook of Perceptual Organization, ed. J. Wagemans (Oxford: Oxford University Press).
Weber, Ernst Heinrich (1905). Tastsinn und Gemeingefühl, ed. Ewald Hering (orig. 1846), Ostwald’s
Klassiker No. 149 (Leipzig: W. Engelmann). Available for download from <http://archive.org/details/
tastsinnundgeme00unkngoog>.
Zadeh, L. A. (1965). ‘Fuzzy Sets.’ Information and Control 8(3): 338–353.
Section 2
Within the wider study of perceptual organization, research on perceptual grouping examines how
our visual system determines what regions of an image belong together as objects (or other useful
perceptual units). This is necessary because many objects in real world scenes do not project to
a continuous region of uniform color, texture, and lightness on the retina. Instead, due to occlu-
sion, variations in lighting conditions and surface features, and other factors, different parts of
a single object often result in a mosaic of non-contiguous regions with varying characteristics
and intervening regions associated with other, overlapping objects. These diverse and disparate
image regions must be united (and segregated from those arising from other objects and surfaces)
to form meaningful objects, which one can recognize and direct actions toward. Also, meaning
may appear not only in the shape of individual objects, but in the spatial and temporal relation-
ships between them. For instance, the arrangement of individual objects may form a higher-order
structure, which carries an important meaning, such as pebbles on a beach arranged to form a
word. Perceptual grouping is one process by which disparate parts of an image can be brought
together into higher-order structures and objects.
Although grouping is often described as the unification of independent perceptual elements, it is also possible
1
to see this as the segmentation of a larger perceptual unit (the linear group of eight dots) into four smaller
groups. Regardless of whether it is segmentation or unification, the end result is the same.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h) (i)
a d
(j) (k)
b c
Fig. 4.1 Examples of some classic Gestalt image-based grouping principles between elements.
(a) Horizontal array of circular elements with no grouping principles forms a simple line. (b) When
the spatial positions of elements are changed, the elements separate into groups on the basis of
proximity. Elements can also be grouped by their similarity in various dimensions such as (c) color,
(d) shape, (e) size, and (f) orientation. (g) Similarity in the direction of motion (as indicated by
the arrow above or below each element) of elements is referred to as common fate and causes
elements with common motion direction to group together. (h) Curvilinear elements can be grouped
by symmetry or (i) parallelism. (j) Good continuation also plays a role in determining what parts
of a curve go together to form the larger shape. In this case, the edges grouping based on their
continuous link from upper left to lower right and lower left to upper right. (k) However, closure can
reverse the organization that is suggested by good continuation and cause perception of a bow-tie
shape.
Adapted from Palmer, Stephen E., Vision Science: Photons to Phenomenology, figures 6.1.2, © 1999
Massachusetts Institute of Technology, by permission of The MIT Press.
Traditional and New Principles of Perceptual Grouping 59
Proximity: quantitative accounts
Although Wertheimer convincingly demonstrated a role for proximity in grouping, he did not
provide a quantitative account of its influence. Early work on this issue by Oyama (1961) used
simple, rectangular 4 × 4 dot lattices in which the distance along one dimension was constant but
varied (across trials) along the other dimension (Figure 4.2A,B). During a 120-second observa-
tion period, participants continuously reported (by holding down one of two buttons) whether
they saw the lattice as rows or columns at any given time. The results clearly demonstrated that
as the distance in one dimension changed (e.g. horizontal dimension in Figure 4.2A,B) relative to
the other dimension, proximity grouping quickly favored the shortest dimension according to a
power function, a relationship found elsewhere in psychophysics (Luce, 2002; Stevens, 1957) and
other natural laws. Essentially, when inter-dot distances along one dimension are similar to one
another, a small change in inter-dot distance along one dimension can strongly shift perceived
grouping. However, the effect of that same change in inter-dot distance falls off as the initial dif-
ference in inter-dot distance along the two dimensions grows larger.
The above relationship, however, only captures the relative contributions of two (vectors a
and b, Figure 4.2C) of the many possible organizations (e.g., vectors a–d, Figure 4.2C) within the
(a) (b)
b b
a a
(c) (d)
b β
a αγ
c
Fig. 4.2 Dot lattices have been used extensively to study the quantitative laws governing grouping
by proximity. (a) When distances between dots along vectors a and b are the same, participants are
equally likely to see columns and rows. (b) As one distance, b, changes relative to the other, a, the
strength of grouping along the shorter distance is predicted by a negative power function. (c) Dot
lattices have many potential vectors, a–d, along which grouping could be perceived even in a simple
square lattice. (d) Dot lattices can also fall into other classes defined by the relative length of their
two shortest inter-dot distances and the angle between these vectors, γ. In all of these lattices, the
pure distance law determines the strength of grouping.
60 Brooks
lattice. Furthermore, the square and rectangular lattices in Figures 4.2A–D are only a subset of the
space of all possible 2D lattices and the power law relationship may not generalize beyond these
cases. In a set of elegant studies, Kubovy and Wagemans (1995), and Kubovy et al. (1998) first
generated a set of stimuli that spanned a large space of dot lattices by varying two basic features:
(1) The lengths of their shortest inter-dot distances (vectors a and b, Figure 4.2C,D).
(2) The angle between these vectors, γ.
They then briefly presented these stimuli to participants and asked them to choose which of four
orientations matched that of the lattice. They found that, across the entire range of lattices in all
orientations, grouping depended only on the relative distance between dots in the various pos-
sible orientations, a relationship that they called the pure distance law. Although the space of all
lattices could be categorized into six different classes depending on their symmetry properties,
this global configuration aspect did not affect the grouping in these lattices, leaving distance as
the only factor that affects proximity grouping. More recently though, it has been found that other
factors, such as curvilinear structure, can also play a role in grouping by proximity (Strother and
Kubovy, 2006).
Common fate
Wertheimer appreciated the influence of dynamic properties on grouping when he proposed
the well-known principle of common fate (Figure 4.1G). The common fate principle (which
Wertheimer also called ‘uniform destiny’) is the tendency of items that move together to be
grouped. Common fate is usually described with grouped elements having exactly parallel motion
vectors of equal magnitude as in Figure 4.1G. However, other correlated patterns of motion,
such as dots converging on a common point and co-circular motion can also cause grouping
(Ahlström, 1995; Börjesson and Ahlström, 1993). Some of these alternative versions of common
motion are seen as rigid transformations in three-dimensional (3D) space. Although common
fate grouping is often considered to be very strong, to my knowledge, there are no quantitative
comparisons of its strength with other grouping principles. Recently, it has been proposed that
common fate grouping may be explained mechanistically as attentional selection of a direction of
motion (Levinthal and Franconeri, 2011).
Similarity grouping
When two elements in the visual field share common properties, there is a chance that these two
elements are parts of the same object or otherwise belong together. This notion forms the basis
for the Gestalt grouping principle of similarity. One version of similarity grouping, and the one
that Wertheimer originally described, involves varying the colors of the elements (Figure 4.1C).
Items that have similar colors appear to group together. However, other features can also be varied
such as the shape (Figure 4.1D), size (Figure 4.1E), or orientation (Figure 4.1F) of the elements.
Although these variations on the principle of similarity are sometimes demonstrated separately
from one another (e.g., Palmer, 1999), Wertheimer appeared to favor the notion of a general prin-
ciple of similarity when he described it as ‘the tendency of like parts to band together.’ Thus, the
list of features given above is not meant to be an exhaustive set of features on which similarity
grouping can occur. Instead, there may be as many variations of the similarity principle as there
are features to be varied (e.g., texture, specularity, blur). However, many of these variations of
similarity grouping have not been studied systematically, if at all. Furthermore, the generality of
the similarity principle may also encompass other known principles as variations of similarity. For
Traditional and New Principles of Perceptual Grouping 61
instance, the principle of proximity may be thought of as similarity of position and classic com-
mon fate as similarity of the direction of movement. However, despite the ability to unify these
principles logically, the extent to which they share underlying mechanisms is unclear.
Symmetry
The world does not solely comprise dots aligned in rows or columns. Instead, elements take many
forms and can be arranged in patterns with varying forms of regularity. Mirror symmetry is a par-
ticular type of regularity that is present in a pattern when half of the pattern is the mirror image of
the other half. Such symmetrical patterns have been found to be particularly visually salient. For
instance, symmetry has clear effects on detection of patterns in random dot fields, contours, and
other stimuli (e.g., Machilsen et al., 2009; Norcia et al., 2002; Wagemans, 1995). However, when a
symmetrical pattern is tilted relative to the frontal plane, its features in the image projected to the
retinae are no longer symmetrical. Nonetheless, the detection advantage seems to be robust even
in these cases of skewed symmetry although it is clearest if symmetry is present in several axes
(e.g., Wagemans, 1993; Wagemans et al., 1991). However, not all symmetries are equal. A substan-
tial number of studies have found that symmetry along a vertical axis is more advantageous than
symmetry along other axes (e.g., Kahn and Foster, 1986; Palmer and Hemenway, 1978; Royer,
1981). However, symmetry along the horizontal axis has also been found to be stronger than sym-
metry along oblique angles (e.g., Fisher and Bornstein, 1982). Symmetry detection is also robust
to small deviations in the corresponding positions of elements in the two halves of the symmetric
pattern (Barlow and Reeves, 1979). The study of symmetry, its effects on detection and factors that
modulate it has been extensive and this is discussed in more detail elsewhere in this volume (van
der Helm, ‘Symmetry Perception’ chapter, this volume). It is important to point out that many
studies of symmetry (including those mentioned above) do not measure perceived grouping
directly, as was often the case for many of the other principles described above. Symmetry group-
ing has tended to be measured by its effect on pattern detection or ability to find a pattern in noise.
The extent to which performance in these tasks reflects perceived grouping, per se, rather than
other task-related changes due to symmetry is unclear. Nonetheless, phenomenological demon-
strations of symmetry grouping are often presented as evidence of the effect (e.g., Figure 4.1H).
One rationale for a symmetry grouping and detection mechanisms is that they are designed to
highlight non-accidental properties that are unlikely to have been caused by chance alignment of
independent elements. Alternatively, symmetry may allow particularly efficient mental or neural
representations of patterns (van der Helm, ‘Simplicity in Perceptual Organization’ chapter, this
volume). Symmetry also appears to be a common feature of the visual environment. Artefacts of
many organisms are often symmetrical (Shubnikov and Koptsik, 1974; Weyl, 1952). However, it is
not clear whether this is a cause of visual sensitivity to symmetry, an effect of it, or whether both
of these are caused by some other adaptive benefit of symmetry.
logically demanded” from the original element, i.e. a ‘factor of direction,’2, as he actually called it.
In Figure 4.1J this seems to correspond roughly to collinearity, or minimal change in direction,
because at their junction ac and bd are more collinear than the alternative arrangements. However,
other examples that he used (Figure 4.3B) suggest that this may not be exactly what he meant.
Wertheimer’s definition was not specific, and largely based on intuition and a few demonstrations.
In modern work, good continuation has been largely linked with work on contour integration
and visual interpolation. Contour integration studies largely examine what factors promote group-
ing of separate (not connected) oriented elements (Figure 4.3C) into contours, which are detectable
in a field of otherwise randomly orientated elements. Collinearity, co-circularity, smoothness, and
a few other features play prominent roles in models of good continuation effects on contour inte-
gration (e.g., Fantoni &and Gerbino, 2003; Field et al., 1993; Geisler, Perry, Super, & Gallogly et al.,
2001; Hess, May, & Dumoulin, this volume; Pizlo, Salach-Golyska, & Rosenfeld et al., 1997; Yen
& Finkel, 1998). Although these definitions of good continuation are clearly specified, the stimuli
and tasks used are very different from those of Wertheimer and may have different mechanisms.
Good continuation is also often invoked in models of interpolation that determine the likelihood
of filling in a contour between two segments on either side of an occluder (e.g., Wouterlood and
Boselie, 1992). One criterion for interpolation is whether two contours are relatable (Kellman and
Shipley, 1991), i.e. whether a smooth monotonic curve could connect them (roughly speaking).
Relatability is another possible formal definition of good continuation, although they may be related,
but distinct concepts (Kellman et al., 2010). This is an issue that needs further study. Completion
and its mechanisms are discussed at length elsewhere in this volume (Singh; van Lier & Gerbino).
Wertheimer also recognized the role for closure in grouping of contours. This is demonstrated
in the bow-tie shape in Figure 4.1K, which overcomes the grouping by good continuation that was
stronger in Figure 4.1J. Several contour integration studies have also examined the role of closure
in perceptual grouping of contour elements. Many find effects of closure on grouping and contour
detection (e.g., Mathes and Fahle, 2007), although these may be explainable by other mechanisms
(Tversky et al., 2004). Contours can also be grouped by parallelism (Figure 4.1I). However, this
effect does not appear to be particularly strong and contour symmetry seems to be better detected
(e.g., Baylis and Driver, 1994; Corballis and Roldan, 1974).
2 Wertheimer also used the term ‘factor of good curve’ in this section of his manuscript to describe an effect
that seems to be similar to his use of ‘factor of direction’ and the modern use of good continuation. However,
Wertheimer did not explicitly describe any differences between the nature of these two factors.
(a)
b
a
(b)
a c
(c)
Fig. 4.3 (a) Good continuation favors a grouping of ac with b as an appendage. This may be due
to segment c being collinear or continuing the same direction as a. (b) Good continuation may
not always favor the smallest change in direction. Segment c seems to be a better completion of a
than b despite b being tangent to the curve (and thus having minimum difference in direction) at
their point of intersection. (c) A stimulus commonly used in contour integration experiments with a
circular target contour created by good continuation and closure in the alignment of the elements.
64 Brooks
Fig. 4.4 When multiple grouping principles are present in the same display, they may reinforce one
another or compete against one another. (a) When both proximity and color similarity (indicated by
filled versus unfilled dots here) favor organization into rows, they reinforce each other and result in a
clear perception of rows. (b) When proximity grouping favors a rows organization and color similarity
favors columns, the factors compete against one another and this can result in perceptual ambiguity.
(c) With near maximal proximity of elements favoring rows, this factor can overcome the competition
with color similarity and result in a perception of rows.
proximity may work in opposition of one another. In this case, the grouping becomes somewhat
ambiguous. Ultimately, the resulting organization depends on the relative strengths of the two
grouping factors. With proximity at nearly maximum, it gains the upper hand and can overcome
the competing influence of color similarity (Figure 4.4C). Pitting grouping principles against
one another has served as one way to measure the relative strength of grouping principles (e.g.,
Hochberg and Silverstein, 1956; Oyama et al., 1999; Quinlan and Wilton, 1998). However, some
grouping principles may operate faster than others and this may affect their relative effectiveness
against one another in addition to the relative degree to which each principle is present in the
display (Ben-Av and Sagi, 1995).
Common region
The principle of common region (Figure 4.5B) recognizes the tendency for elements that lie within
the same bounded region to be grouped together (Palmer, 1992). Elements grouped by common
region lie within a single, continuous, and homogenously colored or textured region of space or
within the confines of a bounding contour. The ecological rationale for this grouping principle
Traditional and New Principles of Perceptual Grouping 65
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 4.5 Grouping by common region. (a) A set of ungrouped dots. (b) Dots grouped by common
region as indicated by an outline contour. Common region can also be indicated by regions of
common color, texture or other properties. (c) Common region can compete effectively against
grouping by color similarity, as well as against (d) grouping by proximity. (e) In the repetition
discrimination task, the repetition of two shapes in the element array—two circles here—can occur
within the same object or (f) between two different objects (repeated squares in this case).
is clear. If two elements, eyes for instance, are contained within an image region, of a head, then
they are likely to belong together as part of that object, rather than accidentally appearing together
within the same region of space. The effects of common region can compete effectively against
other grouping principles such as color similarity (Figure 4.5C) and proximity (Figure 4.5D).
Palmer (1992) also found evidence that the common region principle operates on a 3D represen-
tation of the world. When he placed elements within overlapping regions, there was no basis for
grouping to go one way or the other. However, if the dot elements were placed in the same depth
plane as some of the oval regions (using stereoscopic displays), then the dots tended to be grouped
according to the regions within their same depth plane. These results suggest that grouping by
common region can operate on information that results from computations of depth in images
and thus may not be simply an early, low-level visual process. It is also worth noting that unlike
all of the classic Gestalt principles that are defined around the relative properties of the elements
themselves, grouping by common region depends on a feature of another element (i.e. the bound-
ing edge or enclosing region) separate from the grouped elements themselves. Although common
region can be appreciated through demonstrations like those in Figure 4.5, indirect methods have
provided corroborative evidence for this grouping factor and others. For instance, in the Repetition
66 Brooks
(a) (b)
Fig. 4.6 Generalized Common Fate was demonstrated using displays comprising (a) square elements
and each element was initially assigned a random luminance and this oscillated over time. (b) For
a subset of these elements, the target (outlined in black here), their luminances oscillated out of
phase with the rest of the elements. This means that, although the elements within the target had
varying luminances (and similar to non-target luminances) they were distinguished by their common
direction of change.
Discrimination Task, abbreviated RDT (Palmer and Beck, 2007) participants see a row of elements
that alternates between circles and squares. One of the elements, either the circle or the square
repeats at one point, and the participant’s task is to report which shape it is. Participants are faster
at this when the repeat occurs within the same group (Figure 4.5E) than when it appears between
two different groups (Figure 4.5F). Because performance on this task is modulated by grouping,
it can be used to quantify grouping effects indirectly and corroborate findings in direct subjective
report tasks. Although such indirect measures may be less susceptible to demand characteristics,
it is important to point out that there is no guarantee that they reflect purely what people actually
see. Indirect measures may also reflect a history of the processing through which a stimulus has
gone even if that history is not reflected in the final percept. Such effects have been demonstrated in
experiments on figure-ground organization in which two cues are competing against one another
to determine which side of an edge is figural. Even though one particular cue always wins the
competition and causes figure to be assigned to its side, the presence of a competing cue suggest-
ing figural assignment to the other side affects response time in both direct report and other task
such as same-difference matching (e.g., Brooks and Palmer, 2010; Peterson and Enns, 2005). Even
clearer cases of the dissociation between implicit measures and conscious perception have been
seen in neurological patients. For instance, patients with blindsight can act toward an object even
though they cannot consciously see it (e.g., Goodale et al., 1991).
Figure 4.6B) was designated as the target and modulated out of phase with the rest of the elements.
Participants had to determine the orientation (horizontal or vertical) of this target. To the extent
that elements within the target group together (and segment from the other elements) based on
their common luminance changes, discrimination of the target orientation should be easier. The
results demonstrated a strong effect of generalized common fate by common luminance changes.
Importantly, the authors made significant efforts to control for the effects of static luminance
cue differences between the target and non-target areas of the image to ensure that this is a truly
dynamic cue to grouping. Although this grouping cue has been linked with classic common fate
by name, it is not clear whether it is mediated by related mechanisms.
Synchrony
The common fate principles discussed above capture how commonalities in the direction of
motion or luminance can cause grouping. However, elements which have unrelated directions
of change can group on the basis of their temporal simultaneity alone (Alais et al., 1998; Lee and
Blake, 1999). For instance, consider a matrix of small dots that change color stochastically over
time. If a subset of the elements change in synchrony with one another, regardless of their different
changes of direction, these elements group together to form a detectable shape within the matrix.
Lee and Blake (1999) claimed that in their displays, synchrony grouping cannot be computed
on the basis of static information in each frame of the dynamic sequence. This is because, for
instance, in the color change example describe above, the element colors in each frame are identi-
cally, and randomly distributed within both the grouped region and the background. It is only the
temporal synchrony of the changes that distinguishes the grouped elements from the background.
This is in contrast to previous evidence of synchrony grouping which could be computed on the
basis of static image differences at any single moment in time (e.g., Leonards et al., 1996; Usher
and Donnelly, 1998). Lee and Blake argued that purely temporal synchrony requires computing
high order statistics of images across time and is a new form of grouping that cannot be explained
by known visual mechanisms. However, this claim has proved controversial (Farid, 2002; Farid
and Adelson, 2001) and some have argued that temporal structure plays a more important role
than temporal synchrony (Guttman et al., 2007). The rationale for the existence of grouping by
pure synchrony is also controversial. Although it seems reasonable that synchronous changes in
elements of the same object are common in the visual world, it seems unlikely that these are com-
pletely uncorrelated with other aspects of the change (as is required for pure synchrony grouping),
although this appears not to have been formally tested.
Element connectedness
Distinct elements that are connected by a third element (Figure 4.7B) tend to be seen as part of
the same group (Palmer and Rock, 1994). This effect can compete effectively against some of the
classic grouping principles of proximity and similarity (Figure 4.7C,D) and it does not depend on
the connecting element to have the same properties as the elements themselves or to form a con-
tinuous unbroken region of homogeneous color or texture (Figure 4.7E). The ecological rationale
for element connectedness is simple. Many real-world objects comprise several parts that have
their own color, texture, and other properties. Nonetheless, the elements of these objects are often
directly connected to one another. The phenomenological demonstration of grouping by element
connectedness has also been corroborated by evidence from the RDT (Palmer and Beck, 2007)
that was used to provide indirect evidence for the common region principle. The powerful effects
of this grouping principle are also evident by how it affects perception of objects by neurological
68 Brooks
(a)
(b)
(c)
(d)
(e)
patients. Patients with Balint’s syndrome suffer from the symptom of simultanagnosia, i.e. they are
unable to perceive more than one object at a time (see Gillebert & Humphreys, this volume). For
instance, when presented with two circles on a computer screen, they are likely to report seeing
only one circle. However, when these two circles are connected by another element to form a bar-
bell shape, the patient can suddenly perceive both of the objects (Humphreys and Riddoch, 1993).
Similar effects of element connectedness have been shown to modulate hemi-spatial neglect
(Tipper and Behrmann, 1996).
Fig. 4.8 (a) A dot-sampled structured grid with two competing patterns of curvilinear structure.
(b) Curvilinear structure along this dimension in panel A has less curvature and is, therefore, less
likely to be perceived in comparison to structure along the direction showed in (c), which has a
stronger curve and is most likely to be perceived as the direction of curvilinear grouping.
curve was more likely to be perceived than the less curved competitor. For instance, the dot stimu-
lus in Figure 4.8A could be organized along the more shallow curve represented by Figure 4.8B
or along the stronger curve represented by Figure 4.8C. Greater curvature caused grouping even
if the distances between dots along the two curves were equal, ruling out an explanation in terms
of proximity. Parallel curvature is one example of non-accidentalness that could be quantified
and then systematically varied on the basis of previous work (Feldman, 2001). Other types of
feature arrangements can also have this property, but a challenge is to quantify and systematically
vary non-accidentalness more generally. One possible example of this principle is the tendency
to perceive grouping along regular variations in lightness (van den Berg et al., 2011). However, it
remains unclear whether these two aspects of grouping are mediated by similar mechanisms or
fundamentally different ones.
Edge-region grouping
Grouping has traditionally involved elements such as dots or lines grouping with other elements
of same kind. However, Palmer and Brooks (2008) have proposed that regions of space and their
edges can serve as substrates for grouping processes as well, and that this can be a powerful deter-
minant of figure-ground organization. For example, common fate edge-region grouping can be
demonstrated in a simple bipartite figure (Figure 4.9A). This stimulus has two sparsely textured
(i.e. the dots) regions of different colors and share the contrast boundary between them. If, for
instance, the edge moves in one direction in common fate with the texture of one of the regions
but not in common with the other region (Figure 4.9B; animation in Supplemental Figure 4.S1),
then participants will tend to see the region that is in common fate with the edge as figural. It is
not necessary for the edge and grouped region to be moving. In fact, if one of the textured regions
is moving, whereas the edge and the second region are both static, the edge will group with the
static region and become figural (Figure 4.9C; Figure 4.S2). Palmer and Brooks demonstrated that
proximity, orientation similarity, blur similarity (Figure 4.9D,E), synchrony, and color similarity
can all give rise to edge-region grouping, albeit with a range of strengths. Importantly, they also
showed that the strength of the induced figure-ground effect correlated strongly with the strength
of grouping (between the edge and the region) reported by the participants in a separate group-
ing task. This suggests a tight coupling between grouping processes and figure-ground processes.
However, it is not clear that the grouping mechanisms that mediate edge-region grouping are the
same as those that mediate other types of grouping. Nonetheless, edge-region grouping challenges
the claim that grouping can only occur after figure-ground organization (Palmer and Rock, 1994).
(a)
(b) (c)
F F
(d)
(e)
Fig. 4.9 Edge-region grouping occurs between edges and regions. (a) A bipartite display commonly
used in figure-ground paradigms contains two adjacent regions of different color (black and white
here) with a contrast edge between them. The regions here are textured with sparse dots. This can
be seen as either a black object with an edge of sharp spikes in front of a white object or as a white
object with soft, rounded bumps in front of a black object. (b) If the texture dots within one region
(right region here) move in common fate with the edge (edge motion indicated by arrow below
the central vertical edge) then that region will tend to group with the edge and be seen as figural.
The non-grouped region (left here) will be seen as background. (c) A region does not need to be
moving in order to be grouped. It (right region here; lack of movement indicated by ‘X’) can be in
static common fate with an edge if its texture and the edge are both static while the other region
(left region here) is in motion. The region which shares its motion properties with the edge (right
here) becomes figural. (d) Edge-region grouping based on blur similarity between the blurry edge
and a blurry textured region can cause figural assignment to the left in this case. (e) When the blur
of the edge is reduced to match the blur level of the texture elements in the right region then the
edge-region grouping causes assignment to the right.
Traditional and New Principles of Perceptual Grouping 71
Induced grouping
The elements in Figure 4.10A have no basis for grouping amongst themselves. However, when these
elements are placed near to other elements which have their own grouping relationships by prox-
imity (Figure 4.10B), color similarity (Figure 4.10C), or element connectedness (Figure 4.10D),
these other groups can cause induced grouping in the otherwise ungrouped elements (Vickery,
2008). For instance, element connectedness in the lower row of Figure 4.10D seems to group
the elements of the upper row into pairs. This impression can be seen phenomenologically, but
it is difficult to determine whether it occurs automatically or because the observer is intention-
ally looking for it (and thus induced by attention). To solve this problem, Vickery (2008) used
the RDT (see Common Region section above) to indirectly measure the effects of grouping and
avoid demand characteristics. The results demonstrated clearly that grouping can be induced by
similarity, proximity, and common fate. Based on demonstrations, other grouping principles also
seem to effectively induce grouping in surrounding elements as well. Induced grouping depends
critically on the relationship between the inducing elements (lower rows in Figures 4.10B–D) and
the elements in which grouping is being induced (top rows in Figures 4.10B–D). For instance, it
can be disrupted by using common region to put the inducing set into a separate region of space
(Figure 4.10E).
(a)
(b)
(c)
(d)
(e)
Fig. 4.10 Examples of induced grouping. (a) A set of elements with no adjacent elements to induce
grouping. (b) Placing elements grouped by proximity below ungrouped elements can induced
grouping within the otherwise ungrouped upper row. (c) Induced grouping by color similarity.
(d) Induced grouping by element connectedness. (e) Induced grouping can be disrupted by
segmenting the inducers into a separate group as done here by common region grouping.
72 Brooks
Uniform connectedness
Grouping principles operate on elements such as lines, dots, regions, and edges. How do these
elements come about in the first place? One hypothesis has been that these elements are gener-
ated by another, early grouping process, which partitions an image to form the substrates for
the further grouping processes that have been described above (Koffka, 1935; Palmer and Rock,
1994). The principle of uniform connectedness (UC) has been proposed to fulfill this role. UC
decomposes an image into continuous regions of uniform image properties, e.g., texture, color,
motion, and depth (e.g., Figure 4.11A–F). This process is very similar to some computer vision
algorithms that have been developed to segment images based on uniform regions of texture
and other properties (e.g., Malik and Perona, 1990; Shi and Malik, 2000). The elements created
by uniform connectedness were proposed to be entry-level units because they were thought of as
the starting point for all subsequent grouping and parsing processes. However, this proposal has
been controversial. Peterson (1994) has argued that the serial ordering of perceptual organiza-
tion suggested by uniform connectedness is not consistent with modern evidence for how these
processes operate. Others have found evidence that other principles such as collinearity and
closure are as important as uniform connectedness for the initial stages of perceptual organiza-
tion (Kimchi, 2000) and that, under some conditions, proximity may operate faster than uniform
connectedness (Han et al., 1999; Han and Humphreys, 2003). Although its place in the hierarchy
of grouping principles is debated, the basic effect of uniform connectedness as a grouping prin-
ciple seems to be clear.
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 4.11 Examples of uniform connectedness. (a) Each black circle defines its own unique uniformly
connected (UC) region and the grey background forms another UC region based on color.
(b) Regions of uniform texture also form UC regions. (c) When two circles are joined by a bar of the
same color or (d) texture, then those two dots join together with the connecting bar to form a single
UC region. (e) A bar of different color or (f) texture from the circles leads to the circles remaining
separate UC regions and the bar yet another UC region.
Adapted from Palmer, Stephen E., Vision Science: Photons to Phenomenology, figures 6.2.1, © 1999
Massachusetts Institute of Technology, by permission of The MIT Press.
because it takes into consideration the relative distance between elements within each single
frame. If the distance b is large (relative to the motion grouping directions) then spatial grouping
by proximity (along the dashed line in Figure 4.12E) is weak and motion grouping can dominate
and cause motion along either direction m1 or m2. However, when b is relatively small, then spatial
grouping by proximity is strong in each frame and it can affect perception of motion. Specifically,
it can cause motion along a direction orthogonal to the grouped line of dots (i.e. orthogonal to
the dashed line, Figure 4.12E), a totally different direction than either m1 or m2. By manipulating
both spatial and motion/temporal grouping parametrically within these displays, Gepshtein and
Kubovy (2000) found clear evidence that these two factors interact rather than operating sepa-
rately and in sequence as had been previously suggested.
The nature of the interaction between spatial and temporal factors in apparent motion, has
been controversial with some results supporting the notion of space-time coupling, whereas others
support space-time trade-off. Coupling is present if, in order to maintain the same perception of
apparent motion (i.e. perceptual equilibrium), increases in the time difference between two ele-
ments must be accompanied by a corresponding increase in the distance between them. In con-
trast, space-time trade-off occurs when increases in distance between elements (from one frame to
the next) must be countered with a decrease in the time between frames in order to maintain the
same perception of apparent motion. Although these two types of behavior seem incompatible,
74 Brooks
b
m1
m2
(c) (d)
Latticet=1 Latticet=2
Fig. 4.12 Apparent motion can occur when elements change position from one point in time (a) to
the next (b). If more than one element is present this can lead to ambiguous motion direction. For
instance, the change from pattern (a) to pattern (b) can occur either because of (c) horizontal motion
of the elements or because of (d) vertical motion of the elements. (e) Two frames of a motion lattice
are shown. Latticet=1 is shown in black and Latticet=2 is shown in gray. Spatial grouping along the
dashed line (not present in displays) is modulated by the distance b. Temporal grouping is modulated
by the ratio of distances m1 and m2 from an element in Latticet=1 to its nearest neighbors in Latticet=2.
they have recently been unified with a single function to explain them. Coupling occurs at slow
motion speeds and trade-off occurs at fast motion speeds (Gepshtein and Kubovy, 2007). This
unification provides a coherent account of the spatiotemporal factors that affect grouping (and
apparent motion) in discrete dynamic patterns.
Top-down/non-image factors
Probability
In the RDT paradigm, participants are faster at detecting two repeated-color (or another
repeated property) targets within an alternating-color array when the targets appear within the
same group than when they appear between two groups as indicated by a grouping principle
such as common region (Palmer and Beck, 2007). In the typical version of this task, targets are
equally likely to appear within groups and between groups across all of the trials of the experi-
ment. In this case, using grouping by proximity, common region, or another factor is equally
likely to help or hinder finding the target. However, in a situation in which targets are between
groups on 75% of trials, the perceptual organization provided by grouping would actively hin-
der performance in the task. In an experiment that varied the probability of the target appearing
within the same group (25%, 50%, or 75%), participants were sensitive to this manipulation and
could even completely eliminate the disadvantage of between-group targets with the knowledge
of what type of target was more likely (Beck and Palmer, 2002). A key question about this effect
is what mechanism mediates it. One interpretation is that the participants can use probabil-
ity as a grouping principle and this can itself compete against other grouping principles and
results in a different perceived grouping in the display. Alternatively, it could be that partici-
pants intentionally change their response strategy or allocate attention differently according to
the probability knowledge. In this case, there may be no actual change in perceived grouping,
but the effects of perceived grouping may be overcome by a compensating strategy. This is a
Traditional and New Principles of Perceptual Grouping 75
difficult question that is not easy to answer. However, it is clear that, at the very least, probability
manipulations can at least overcome and affect the results of grouping on performance. It is
also unclear the extent to which participants need to be aware of the probability manipulation
in order for it to be effective.
(a)
(b)
(c)
(d)
Fig. 4.13 Example stimuli from Vickery and Jiang (2009). Participants saw shapes of alternating
colors in a row and had to determine the color of a target pair which was a pair of adjacent shapes
with the same color, i.e. RDT paradigm. Black is the target color in this example. (a) During the
training phase participants saw the shapes grouped into pairs by common region using outline
contours. In some cases the target appeared within the common region group. (b) In other cases,
the target appeared between two common region groups. (c) After training participants saw the
same stimuli paired as they were during training but without the region outlines. The target could
appear within the previously-learned group or (d) between learned groupings.
Reproduced from Attention, Perception, & Psychophysics, 71 (4), pp. 896–909, Associative grouping: Perceptual
grouping of shapes by association, Timothy J. Vickery and Yuhong V. Jiang , DOI: 10.3758/APP.71.4.896 (c) 2009,
Springer-Verlag. With kind permission from Springer Science and Business Media.
76 Brooks
pair when it appeared within one of the previously seen groups (Figure 4.13C) than when the
pair was between two previously learned groups (Figure 4.13D). This suggests that association
between shapes based on their previously observed likelihood to appear together, can cause
grouping of those shapes in later encounters. Importantly, the task at hand was not dependent
on the shapes and only required participants to attend to the colors of the shapes. The authors
termed this effect associative grouping. In another study, they found that associative grouping
also caused shapes to appear closer together than shapes that had no association history, an effect
that mimics previously-observed spatial distortions induced by grouping (Coren and Girgus,
1980). Other results have also suggested that previous experience, both short-term and lifelong,
can have effects on the outcome of perceptual grouping processes (Kimchi and Hadad, 2002;
Zemel et al., 2002).
Some effects of previous experience on grouping are much more short-lived and may derive
from the immediately preceding stimuli. Hysteresis and adaptation are well-known carryover
effects on visual perception. Hysteresis is the tendency for a given percept to persist even in con-
tradiction to sensory evidence moving in the opposite direction, i.e., it maintains the status quo.
Adaptation, on the other hand, reduces sensitivity to the stimulus features at hand and thus reduces
their influence on subsequent perceptual decisions. Gepshtein and Kubovy (2005) demonstrated
that both of these processes have effects on perceptual grouping and, moreover, the two influ-
ences operate independently of one another. They showed participants dot lattices (Kubovy and
Wagemans, 1995) with two competing organizations, e.g., along directions a or b (Figure 4.2C).
As with previous work, they varied the proximity along these two dimensions and found the
expected effects of proximity on grouping. In a further analysis, they then split the data into trials
on which the participant perceived grouping along a, for instance, and determined the likelihood
that the participant would group along a in the next stimulus. Participants were significantly more
likely than chance to group along the same direction as the preceding stimulus. This demonstrates
an effect of hysteresis on perceptual grouping. They also found that the probability of perceiving
grouping along one dimension, say a, in a stimulus decreased with stronger perceptual evidence
for it in the preceding stimulus (i.e. greater proximity along a in the previous stimulus). This was
true regardless of whether you saw grouping along a or b in the preceding stimulus. The authors
interpreted this as evidence for adaptation. Essentially, when an observer sees strong evidence for
grouping along one dimension in a stimulus, the visual system adapts to this evidence, making
the system less sensitive to that same evidence for grouping when it appears in the next stimulus.
Although the recent data described above has clarified the nature of these carryover effects, hys-
teresis, for instance, was not unknown to Wertheimer and he described it as the factor of objective
set (1923).
theoretical issues that place grouping in context and try to reveal the mechanisms that generate
their phenomenal consequences and effects on task performance. Below are three examples of
these theoretical issues.
(a)
(b)
whether grouping occurred before or after a particular reference point in visual processing, i.e.
the construction of 3D scene representation. To do this, they constructed a 2D array of lumi-
nous beads (Figure 4.14A). In one condition, they presented this array to participants in a dark
room perpendicular to the line of sight (Figure 4.14B). Based on proximity, this array tends to
be perceived as columns. However, in another condition, the array of beads was tilted in depth
(Figure 4.14C). The tilt caused a foreshortening and thus in 2D image coordinates the elements
became closer together in the horizontal dimension which should make grouping by proxim-
ity more ambiguous. Of course, in 3D image coordinates, the beads remained closer together
vertically. If grouping is based on a 3D representation, then the participants should see columns
based on the shorter 3D vertical distances between elements. Alternatively, if grouping is based
on the 2D representation, then they may be more likely to see rows. When viewing the arrays
with both eyes opened (and thus full 3D vision), participants grouped according to the 3D
structure of the displays. However, when participants closed one eye and saw only the 2D image
information, they were more likely to group the display into rows based on the 2D proximity
of elements caused by foreshortening. Similar effects have been shown for similarity grouping,
suggesting that grouping by lightness (Rock et al., 1992) occurs on a post-constancy repre-
sentation of visual information. Other work has shown that grouping can also be affected by
the outcome of interpolation processes, such as modal (Palmer and Nelson, 2000) and amodal
completion (Palmer, Neff, and Beck, 1996). All of these results suggest that grouping occurs on
a representation beyond simple image features. Furthermore, grouping also seems to be able
to affect the results of figure-ground processing (Brooks and Driver, 2010; Palmer and Brooks,
2008), contradicting previous proposals that grouping can only occur after figure-ground
organization (Palmer and Rock, 1994). Although much of the evidence above suggests that
grouping occurs later in visual processing than previously thought, it does not always do so.
Grouping by color similarity is based on a post-constancy representation with long duration
displays, but when presented for very brief periods these displays are grouped by pre-constancy
features (Schulz and Sanocki, 2003).
Another approach to this question has been to assess whether perceptual grouping occurs
pre-attentively or only within the spotlight of attention? An early study on this issue used an
inattention paradigm (Mack et al., 1992). As with many other studies of grouping, arrays of
shapes that could be seen as arranged either in rows or columns (e.g., see Figure 4.4) were
presented to participants. However, in this case, a large cross was overlaid between the cen-
tral rows and columns, and participants were instructed to focus their attention on it and
judge whether the horizontal or the vertical part of the cross was longer. Despite the array
of elements being in the center of the participants’ visual field during this task, they were
unable to report whether the array was grouped into rows or columns. Presumably, this is
because they were not attending to the grouping array, while their attention was focused on
the task-relevant cross. This was taken as evidence that even if a pattern is at the center of
vision, grouping processes may not operate unless attention is specifically allocated to the
pattern (also see Ben-Av, Sagi, and Braun, 1992). However, since then, others, using different
paradigms, have uncovered evidence, often indirect, that at least some perceptual grouping
may be operating pre-attentively (Kimchi, 2009; Lamy et al., 2006; Moore and Egeth, 1997;
Russell and Driver, 2005), although this is not the case for all types of grouping (Kimchi and
Razpurker-Apfeld, 2004).
All of these results together have been taken to suggest that grouping may occur at many differ-
ent levels of processing, rather than being a single step that occurs at one point in time (Palmer,
Traditional and New Principles of Perceptual Grouping 79
Brooks, and Nelson, 2003). Furthermore, different types of grouping may occur at different levels.
It is also possible that at least some grouping is dependent on recurrent processing between dif-
ferent levels, or brain areas, rather than representing single sequential steps (e.g., Lamme and
Roelfsema, 2000; Roelfsema, 2006). This is an issue that is just starting to be addressed systemati-
cally and may most directly be approached by studying how perceptual grouping is implemented
in neural circuits.
Mechanisms of grouping
One well-known mechanism that may underlie perceptual grouping is suggested by the tem-
poral correlation hypothesis (Singer and Gray, 1995; von der Malsburg, 1981), which holds that
synchrony in neural populations serves as a binding code for information in different parts of
cortex. Grouping may be mediated by synchronization of activity between neurons represent-
ing different elements of a group. Although some neurophysiological recordings in animals
(e.g., Castelo-Branco et al., 2000; Singer and Gray, 1995) and EEG recordings in humans (e.g.,
Tallon-Baudry and Bertrand, 1999; Vidal, Chaumon, O’Regan, and Tallon-Baudry, 2006) have
supported this idea, it remains a controversial hypothesis (e.g., Lamme and Spekreijse, 1998;
Roelfsema et al., 2004). Much of that evidence applies to limited types of grouping such as
collinearity/continuity (e.g., Singer and Gray, 1995) or formation of illusory contours based
on these features (e.g., Tallon-Baudry and Bertrand, 1999). It is not clear whether synchrony
can serve as a general mechanism to explain a wider array of grouping phenomena, especially
those not based on image features. For more discussion of the role of oscillatory activity in
perceptual organization see Van Leeuwen’s Cortical Dynamics chapter (this volume). Van der
Helm’s Simplicity chapter (this volume) discusses a link between synchrony and perceptual
simplicity.
Even if multiple cues use synchrony as a coding mechanism, it may be that different cues use
different parts of visual cortex or recruit additional mechanisms. However, some fMRI evidence
suggests that proximity and similarity grouping cues, for instance, share a common network
including temporal, parietal, and prefrontal cortices (Seymour et al., 2008). In contrast, some ERP
evidence has shown differences in the time-course of processing of these two grouping cues (e.g.,
Han et al., 2002; Han et al., 2001) and other cues (e.g., Casco et al., 2009). Other work has focused
specifically on interactions between different visual areas with the role of feedback from higher
order areas a critical issue (Murray et al., 2004). A significant amount of computational work has
also generated specific models of perceptual grouping mechanisms. For instance, some of this
work has aimed to explain how grouping effects may emerge from the structure of the laminar
circuits of visual cortex (e.g., Grossberg et al., 1997; Ross et al., 2000). A full review of findings on
neural and computational mechanisms of grouping is beyond the scope of this chapter but it is
clear that even with the simplest Gestalt cues there is evidence of divergence in mechanisms and
many competing proposals.
(a)
1 2
3 4
(b)
Figure 4.15A could be perceived as edges 1 and 2 forming one object and lines 3 and 4 forming
another object (as shown in Figure 4.15B). However, most people do not see this organization.
Instead, they perceive two symmetrical objects that are overlapping (shown non-overlapping in
Figure 4.15C). Wertheimer claimed that the organization in Figure 4.15B produces ‘senseless’
shapes which are not very good Gestalts or whole forms. Those produced by the organization
represented in Figure 4.15C form better wholes. Notice that in this case, this means that we follow
what seems to be a factor of good continuation in grouping the edge segments together rather
than closure which may have favored the other organization. Wertheimer seemed to suggest that
ultimately all of the factors that he proposed are aimed at determining the best Gestalt possible
given the stimulus available. Furthermore, competitions amongst them may be resolved by deter-
mining which of them produces the best Gestalt.
Although the idea of Prägnanz was relatively easy to demonstrate, a clear, formal definition was
not provided by the Gestaltists. To fill this gap, modern vision scientists have often framed the
problem in terms of information theory. In this framework, organizations of the stimulus that
Traditional and New Principles of Perceptual Grouping 81
require less information to encode them are better than those which require more information
(Hochberg and McAlister, 1953). For instance, symmetrical figures (Figure 4.15C) may require
less information to encode than similar non-symmetrical figures (Figure 4.15B) because one half
of each figure is a simple transformation of the other. This could reduce the information needed to
encode them by nearly one half if you encode it as two identical halves plus one transformation.
There are multiple versions of how stimuli can be encoded, their information measured, and sim-
plicity compared (e.g., Collard and Buffart, 1983; Garner, 1970, 1974; Leeuwenberg, 1969, 1971).
Regardless of how it is computed, if the visual system uses simplicity as a criterion for determin-
ing perceptual structure, it is presumably useful in terms of constructing an evolutionarily useful
representation of the physical world. However, there is no guarantee that simple representations
are actually veridical. For a more detailed discussion of these important issues see van der Helm’s
chapter on Simplicity in this volume.
Summary
The Gestalt psychologists discovered and popularized an enduring set of grouping principles.
Their methods were largely based on demonstrations. To some, this has been seen as a point
of weakness. However, the ability to see clear effects through demonstration alone actually
shows the strength of the effects that they found, especially in comparison to some modern
indirect methods, which only show effects, for instance, on the order of tens of milliseconds.
Modern vision scientists have elaborated some of these principles by studying them quantita-
tively and clarifying the conditions under which they operate. However, some of the original
principles still are without clear formal definitions (e.g., good continuation) and work needs
to be done on this. There has also been significant work on how different principles combine
(Claessens and Wagemans, 2008; Elder and Goldberg, 2002), an important issue given that
natural images often seem to contain many cues simultaneously. A robust set of new principles
have also been articulated. Many of these involve dynamic scene features and others highlight
the influence of context, learning, and other aspects of cognition. Although all of these prin-
ciples can be termed as grouping based on their phenomenological effects, such a diverse set
of image-based and non-image factors are likely to involve a wide range of different neural
mechanisms. Identifying the mechanistic overlap between different principles is an issue, that
when addressed, will shed greater light on how we might further categorize them. It is also
unlikely that the principles described above form an exhaustive list. The brain likely picks up
on many sources of information in visual scenes to drive perceptual grouping and we have
likely only scratched the surface.
References
Ahlström, U. (1995). Perceptual unit formation in simple motion patterns. Scand J Psychol 36(4): 343–354.
Alais, D., Blake, R., and Lee, S. H. (1998). Visual features that vary together over time group together over
space. Nature Neurosci 1(2): 160–164.
Barlow, H. B., and Reeves, B. C. (1979). The versatility and absolute efficiency of detecting mirror
symmetry in random dot displays. Vision Res 19(7): 783–793. Available at: M http://www.ncbi.nlm.nih.
gov/pubmed/483597
Baylis, G. C., and Driver, J. (1994). Parallel computation of symmetry but not repetition within single
visual shapes. Visual Cognit 1(4): 377–400.
82 Brooks
Beck, D. M., and Palmer, S. E. (2002). Top-down influences on perceptual grouping. J Exp Psychol Hum
Percept Perform 28(5): 1071–1084.
Ben-Av, M. B., and Sagi, D. (1995). Perceptual grouping by similarity and proximity: experimental results
can be predicted by intensity autocorrelations. Vision Res 35(6): 853–866.
Ben-Av, M. B., Sagi, D., and Braun, J. (1992). Visual attention and perceptual grouping. Percept Psychophys
52(3): 277–294.
Börjesson, E., and Ahlström, U. (1993). Motion structure in five-dot patterns as a determinant of
perceptual grouping. Percept Psychophys 53(1): 2–12.
Brooks, J. L., and Driver, J. (2010). Grouping puts figure-ground assignment in context by constraining
propagation of edge assignment. Attention, Percept Psychophys 72(4): 1053–1069.
Brooks, J. L., and Palmer, S. E. (2010). Cue competition affects temporal dynamics of edge-assignment in
human visual cortex. J Cogn Neurosci 23(3): 631–44.
Bruno, N., and Bertamini, M. (2014). Perceptual organization and the aperture problem. In Oxford
Handbook of Perceptual Organization, edited by J. Wagemans. Oxford: Oxford University Press.
Burt, P., and Sperling, G. (1981). Time, distance, and feature trade-offs in visual apparent motion. Psychol
Rev 88(2); 171–195.
Casco, C., Campana, G., Han, S., and Guzzon, D. (2009). Psychophysical and electrophysiological evidence
of independent facilitation by collinearity and similarity in texture grouping and segmentation. Vision
Res 49(6): 583–593.
Castelo-Branco, M., Goebel, R., Neuenschwander, S., and Singer, W. (2000). Neural synchrony correlates
with surface segregation rules. Nature 405(6787): 685–689.
Claessens, P. M. E., and Wagemans, J. (2008). A Bayesian framework for cue integration in multistable
grouping: proximity, collinearity, and orientation priors in zigzag lattices. J Vision 8(7): 33.1–23.
Collard, R. F. A., and Buffart, H. F. J. M. (1983). Minimization of structural information: a set-theoretical
approach. Pattern Recogn 16(2): 231–242.
Corballis, M. C., and Roldan, C. E. (1974). On the perception of symmetrical and repeated patterns.
Percept Psychophys 16(1): 136–142.
Coren, S., and Girgus, J. S. (1980). Principles of perceptual organization and spatial distortion: the gestalt
illusions. J Exp Psychol Hum Percept Perform 6(3): 404–412.
Elder, J. H., and Goldberg, R. M. (2002). Ecological statistics of Gestalt laws for the perceptual organization
of contours. J Vision 2(4): 324–353.
Fantoni, C., and Gerbino, W. (2003). Contour interpolation by vector-field combination. J Vision, 3(4): 281–303.
Farid, H. (2002). Temporal synchrony in perceptual grouping: a critique. Trends Cogn Sci 6(7): 284–288.
Farid, H., and Adelson, E. H. (2001). Synchrony does not promote grouping in temporally structured
displays. Nature Neurosci 4(9): 875–876.
Feldman, J. (2001). Bayesian contour integration. Percept Psychophys 63(7): 1171–1182.
Felleman, D. J., and Van Essen, D. C. (1991). Distributed hierarchical processing in the primate cerebral
cortex. Cereb Cortex 1(1): 1–47.
Field, D. J., Hayes, A., and Hess, R. F. (1993). Contour integration by the human visual system: evidence for
a local ‘association field.’ Vision Res 33(2): 173–193.
Fisher, C. B., and Bornstein, M. H. (1982). Identification of symmetry: effects of stimulus orientation and
head position. Percept Psychophys 32(5): 443–448.
Garner, W. R. (1970). Good patterns have few alternatives. Am Scient 58(1): 34–42.
Garner, W. R. (1974). The Processing of Information and Structure. New York: L. Erlbaum Associates.
Geisler, W. S., Perry, J. S., Super, B. J., and Gallogly, D. P. (2001). Edge co-occurrence in natural images
predicts contour grouping performance. Vision Res 41(6): 711–724.
Traditional and New Principles of Perceptual Grouping 83
Gepshtein, S., and Kubovy, M. (2000). The emergence of visual objects in space-time. Proc Nat Acad Sci
USA 97(14): 8186–8191.
Gepshtein, S., and Kubovy, M. (2005). Stability and change in perception: spatial organization in temporal
context. Exp Brain Res 160(4): 487–495.
Gepshtein, S., and Kubovy, M. (2007). The lawful perception of apparent motion. J Vision, 7(8): 9.
Gillebert, C. R., and Humphreys, G. W. (2014). Mutual interplay between perceptual organization and
attention: a neuropsychological perspective. In Oxford Handbook of Perceptual Organization, edited by
J. Wagemans. Oxford: Oxford University Press.
Goodale, M. A., Milner, A. D., Jakobson, L. S., and Carey, D. P. (1991). A neurological dissociation
between perceiving objects and grasping them. Nature 349(6305): 154–156.
Grossberg, S., Mingolla, E., and Ross, W. D. (1997). Visual brain and visual perception: how does the
cortex do perceptual grouping? Trends Neurosci 20(3): 106–111.
Guttman, S. E., Gilroy, L. A., and Blake, R. (2007). Spatial grouping in human vision: temporal structure
trumps temporal synchrony. Vision Res 47(2): 219–230.
Han, S., Ding, Y., and Song, Y. (2002). Neural mechanisms of perceptual grouping in humans as revealed
by high density event related potentials. Neurosci Lett 319(1): 29–32.
Han, S., and Humphreys, G. W. (2003). Relationship between uniform connectedness and proximity in
perceptual grouping. Sci China. Ser C, Life Sci 46(2): 113–126.
Han, S., Humphreys, G. W., and Chen, L. (1999). Uniform connectedness and classical Gestalt principles of
perceptual grouping. Percept Psychophys 61(4): 661–674.
Han, S., Song, Y., Ding, Y., Yund, E. W., and Woods, D. L. (2001). Neural substrates for visual perceptual
grouping in humans. Psychophysiology 38(6): 926–935.
Herzog, M. H., and Öğmen, H. (2014). Apparent motion and reference frames. In Oxford Handbook of
Perceptual Organization, edited by J. Wagemans. Oxford: Oxford University Press.
Hess, R. F., May, K. A., and Dumoulin, S. O. (2014). Contour integration: psychophysical,
neurophysiological and computational perspectives. In Oxford Handbook of Perceptual Organization,
edited by J. Wagemans. Oxford: Oxford University Press.
Hochberg, J., and McAlister, E. (1953). A quantitative approach to figural ‘goodness.’ J Exp Psychol
46(5): 361.
Hochberg, J., and Silverstein, A. (1956). A quantitative index of stimulus-similarity proximity vs.
differences in brightness. Am J Psychol 69(3): 456–458.
Hock, H. S. (2014). Dynamic grouping motion: a method for determining perceptual organization for
objects with connected surfaces. In Oxford Handbook of Perceptual Organization, edited by J. Wagemans.
Oxford: Oxford University Press.
Humphreys, G. W., and Riddoch, M. J. (1993). Interactions between object and space systems revealed
through neuropsychology. In Attention and Performance, Volume 24, edited by D. E. Meyer and
S. Kornblum, pp. 183–218. Cambridge, MA: MIT Press.
Kahn, J. I., and Foster, D. H. (1986). Horizontal-vertical structure in the visual comparison of rigidly
transformed patterns. J Exp Psychol Hum Percept Perform 12(4): 422–433.
Kellman, P. J., Garrigan, P. B., Kalar, D., and Shipley, T. F. (2010). Good continuation and
relatability: related but distinct principles. J Vision 3(9): 120.
Kellman, P. J., and Shipley, T. F. (1991). A theory of visual interpolation in object perception. Cogn Psychol
23(2): 141–221.
Kimchi, R. (2000). The perceptual organization of visual objects: a microgenetic analysis. Vision Res
40(10–12): 1333–1347.
Kimchi, R. (2009). Perceptual organization and visual attention. Progr Brain Res 176: 15–33.
84 Brooks
Kimchi, R., and Hadad, B-S. (2002). Influence of past experience on perceptual grouping. Psychol Sci
13(1): 41–47.
Kimchi, R., and Razpurker-Apfeld, I. (2004). Perceptual grouping and attention: not all groupings are
equal. Psychonom Bull Rev 11(4): 687–696.
Koffka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt, Brace.
Köhler, W. (1920). Die physischen Gestalten in Ruhe und im stationären Zustand [Static and Stationary
Physical Shapes]. Braunschweig, Germany: Vieweg.
Korte, A. (1915). Kinematoskopische Untersuchungen [Kinematoscopic investigations]. Zeitschr Psychol
72: 194–296.
Kubovy, M., Holcombe, A. O., and Wagemans, J. (1998). On the lawfulness of grouping by proximity. Cogn
Psychol 35(1): 71–98.
Kubovy, M., and Wagemans, J. (1995). Grouping by proximity and multistability in dot lattices: a
quantitative Gestalt theory. Psychol Sci 6: 225–234.
Lamme, V. A. F., and Roelfsema, P. R. (2000). The distinct modes of vision offered by feedforward and
recurrent processing. Trends Neurosci 23(11): 571–579.
Lamme, V. A. F., and Spekreijse, H. (1998). Neuronal synchrony does not represent texture segregation.
Nature 396(6709): 362–366.
Lamy, D., Segal, H., and Ruderman, L. (2006). Grouping does not require attention. Percept Psychophys
68(1): 17–31.
Lee, S. H., and Blake, R. (1999). Visual form created solely from temporal structure. Science
284(5417): 1165–1168.
Leeuwenberg, E. L. (1969). Quantitative specification of information in sequential patterns. Psychol Rev
76(2): 216–220.
Leeuwenberg, E. L. (1971). A perceptual coding language for visual and auditory patterns. Am J Psychol
84(3): 307–349.
Leonards, U., Singer, W., and Fahle, M. (1996). The influence of temporal phase differences on texture
segmentation. Vision Res 36(17): 2689–2697.
Levinthal, B. R., and Franconeri, S. L. (2011). Common-fate grouping as feature selection. Psychol Sci
22(9): 1132–1137.
Luce, R. D. (2002). A psychophysical theory of intensity proportions, joint presentations, and matches.
Psychol Rev 109(3): 520–532.
Machilsen, B., Pauwels, M., and Wagemans, J. (2009). The role of vertical mirror symmetry in visual shape
detection. J Vision 9(12): 11.1–11.11.
Mack, A., Tang, B., Tuma, R., Kahn, S., and Rock, I. (1992). Perceptual organization and attention. Cogn
Psychol 24(4): 475–501.
Malik, J., and Perona, P. (1990). Preattentive texture discrimination with early vision mechanisms. J Opt Soc
Am A, Optics Image Sci 7(5): 923–932.
Mathes, B., and Fahle, M. (2007). Closure facilitates contour integration. Vision Res 47(6): 818–827.
Moore, C. M., and Egeth, H. (1997). Perception without attention: evidence of grouping under conditions
of inattention. J Exp Psychol Hum Percept Perform 23(2): 339–352.
Murray, S. O., Schrater, P., and Kersten, D. (2004). Perceptual grouping and the interactions between visual
cortical areas. Neural Networks 17(5–6): 695–705.
Norcia, A. M., Candy, T. R., Pettet, M. W., Vildavski, V. Y., and Tyler, C. W. (2002). Temporal dynamics of
the human response to symmetry. J Vision 2(2): 132–139.
Oyama, T. (1961). Perceptual grouping as a function of proximity. Percept Motor Skills 13: 305–306.
Oyama, T., Simizu, M., and Tozawa, J. (1999). Effects of similarity on apparent motion and perceptual
grouping. Perception 28(6): 739–748.
Traditional and New Principles of Perceptual Grouping 85
Palmer, S. E. (1992). Common region: a new principle of perceptual grouping. Cogn Psychol 24(3): 436–447.
Palmer, S. E. (1999). Vision Science: Photons to Phenomenology. Cambridge, MA: MIT Press.
Palmer, S. E., and Beck, D. M. (2007). The repetition discrimination task: an objective method for studying
perceptual grouping. Percept Psychophys 69(1): 68–78.
Palmer, S. E., and Brooks, J. L. (2008). Edge-region grouping in figure-ground organization and depth
perception. J Exp Psychol Hum Percept Perform 34(6): 1353–1371.
Palmer, S. E., Brooks, J. L., and Nelson, R. (2003). When does grouping happen? Acta Psychol
114(3): 311–330.
Palmer, S. E., and Hemenway, K. (1978). Orientation and symmetry: effects of multiple, rotational, and
near symmetries. J Exp Psychol Hum Percept Perform 4(4): 691–702.
Palmer, S. E., Neff, J., and Beck, D. (1996). Late influences on perceptual grouping: amodal completion.
Psychonom Bull Rev 3: 75–80.
Palmer, S. E., and Nelson, R. (2000). Late influences on perceptual grouping: illusory figures. Percept
Psychophys 62(7): 1321–1331.
Palmer, S. E., and Rock, I. (1994). Rethinking perceptual organization: the role of uniform connectedness.
Psychonom Bull Rev 1: 29–55.
Peterson, M. A. (1994). The proper placement of uniform connectedness. Psychonom Bull Rev
1(4): 509–514.
Peterson, M. A., and Enns, J. T. (2005). The edge complex: implicit memory for figure assignment in shape
perception. Percept Psychophys 67(4): 727–740.
Pizlo, Z., Salach-Golyska, M., and Rosenfeld, A. (1997). Curve detection in a noisy image. Vision Res
37(9): 1217–1241.
Quinlan, P. T., and Wilton, R. N. (1998). Grouping by proximity or similarity? Competition between the
Gestalt principles in vision. Perception 27(4): 417–430.
Rock, I., and Brosgole, L. (1964). Grouping based on phenomenal proximity. J Exp Psychol 67: 531–538.
Rock, I., Nijhawan, R., Palmer, S. E., and Tudor, L. (1992). Grouping based on phenomenal similarity of
achromatic color. Perception 21(6): 779–789.
Roelfsema, P. R. (2006). Cortical algorithms for perceptual grouping. Ann Rev Neurosci 29: 203–227.
Roelfsema, P. R., Lamme, V. A. F., and Spekreijse, H. (2004). Synchrony and covariation of firing rates in
the primary visual cortex during contour grouping. Nature Neurosci 7(9): 982–991.
Ross, W. D., Grossberg, S., and Mingolla, E. (2000). Visual cortical mechanisms of perceptual
grouping: interacting layers, networks, columns, and maps. Neural Networks 13(6): 571–588.
Royer, F. L. (1981). Detection of symmetry. J Exp Psychol Hum Percept Perform 7(6): 1186–1210.
Russell, C., and Driver, J. (2005). New indirect measures of ‘inattentive’ visual grouping in a
change-detection task. Percept Psychophys 67(4): 606–623.
Schulz, M. F., and Sanocki, T. (2003). Time course of perceptual grouping by color. Psychol Sci
14(1): 26–30.
Sekuler, A. B., and Bennett, P. J. (2001). Generalized common fate: grouping by common luminance
changes. Psychol Sci 12(6): 437–444.
Seymour, K., Karnath, H-O., and Himmelbach, M. (2008). Perceptual grouping in the human
brain: common processing of different cues. NeuroReport 19(18): 1769–1772.
Shi, J., and Malik, M. (2000). Normalized cuts and image segmentation. IEEE Trans Pattern Anal Machine
Intell 22(8): 888–905.
Shubnikov, A. V., and Koptsik, V. A. (1974). Symmetry in Science and Art. New York: Plenum.
Singer, W., and Gray, C. M. (1995). Visual feature integration and the temporal correlation hypothesis. Ann
Rev Neurosci 18: 555–586.
86 Brooks
Emergence in perception
The Gestalt psychologists’ key claim was that a whole is perceived as something other than the
sum of its parts, a claim still often misquoted as ‘more than the sum of its parts.’ Indeed, the Gestalt
psychologists argued such summing was meaningless (Pomerantz and Kubovy 1986; Wagemans
Emergent features and feature combination 89
et al. 2012b). That elusive ‘something other’ they struggled to define can be regarded as emer-
gence: those properties that appear, or sometimes disappear, when stimulus elements are per-
ceived as a unitary configuration. To take the example of apparent motion with which Wertheimer
(1912) launched the Gestalt school (Wagemans et al. 2012a, b): if one observes a blinking light
that is then joined by a second blinking light, depending on their timing, one may then see not
two blinking lights but a single light in apparent (beta) motion, or even just pure (phi) motion
itself. What is novel, surprising and super-additive with the arrival of the second light is motion.
What disappears with emergence is one or both of the lights, because when beta motion is seen
we perceive only one light, not two, and with phi we may see only pure, disembodied motion; in
this respect the whole is less than the sum of its parts.
from a field of arrows in Panel b (as fast as telling black from white) than at finding the nega-
tive diagonal in Panel a, even though the Ls add no discriminative information, rather only
homogeneous ‘noise’ with potential for impairing perception through masking and crowding.
Panels d and e show a similar configural superiority effect involving line curvature rather than
orientation. This configural superiority effect shows better processing of wholes—Gestalts—than
of their parts, and we show below how it may arise from the EFs of closure, terminator count,
and intersection type.
EFs and configural superiority pose challenges for the standard two-stage model of perception.
If the integration of basic features is slow and requires attention, why are Gestalts so salient and so
quickly perceived if they too require feature integration? How can EFs be more basic than the more
elementary features from which they arise? First we review the evidence that Gestalts are in fact
highly salient, and then we consider how their existence can be reconciled with perceptual theory.
Proximity
If the field of vision contains just a point or dot, as in Panel a’s Base displays, that dot’s only
functional feature is its location (x, y coordinates in the plane). If a second dot is added from the
Context displays to create the Composite display, we have its position too, but new to emerge is
the distance or proximity between the two. (This is separate from Gestalt grouping by proximity,
which we address below.) Note that proximity is affected by viewpoint and thus is a metric rather
than a non-accidental property.
Orientation
In this two-dot stimulus, a second candidate EF is the angle or orientation between the two dots.
Orientation too is an accidental property in that the angle between two locations changes with
perspective and with head tilt.
= + or +
Linearity
Stepping up to 3-dot configurations, all three dots may all fall on a straight line, or they may form
a triangle (by contrast, two dots always fall on a straight line). Linearity, as with all the potential
EFs listed below, is a non-accidental property in that if three points fall on a straight line in the
distal stimulus, they will remain linear from any viewpoint.
Symmetry (axial)
Three dots may be arranged symmetrically or asymmetrically about an axis (by contrast, two dots
are necessarily symmetric). More will be said about other forms of symmetry in a subsequent
section.
Surroundedness
With four-dot configurations, one of the dots may fall inside the convex hull (shell) defined by
the other three, or it may fall outside (consider snapping a rubber band around the four dots and
seeing whether any dot falls within the band’s boundary).
We now consider the EFs in Panel b, which require parts that are more complex than dots to
emerge. Here we use line segments as primitive parts.
(a)
Base Context Composite
Proximity
Orientation
Linearity
Symmetry
Surroundedness
Fig. 5.3 Potential basic EFs in human vision created from simple configurations of dots (Panel a)
or line segments (b) or more complex parts forming composites resembling 3D objects, faces,
or motion (c). The pair of figures on the left of each row shows a base discrimination with dots
or lines differing in location and/or orientation. The middle pair shows two identical context
elements, one of which is added to each base to form the composite pairs on the right that
contain potential EFs. In actual experiments, these stimulus pairs were placed into odd-quadrant
displays with one copy of one of the two base stimuli and three copies of the other. Note that
many of the rows contain additional EFs besides the primary one labeled at the far right.
(b)
Base Context Composite
Parallelism
Collinearity
Connectivity
Intersection
Lateral endpoint
offset
Terminator count
Pixel count
(c)
Base Context Composite
Topology
Depth
Motion/
flicker
Faces
Kanizsa
Parallelism
Two line segments may be parallel or not, but a minimum of two segments is required for paral-
lelism to appear.
Collinearity
Again, two line segments are the minimal requirements. Items that are not fully collinear may be
relatable (Kellman & Shipley, 1991), or at least show good continuation, which are weaker ver-
sions of the same EF.
Connectivity
Two line segments either do or do not touch.
Intersection
Two line segments either intersect or do not. Two lines can touch without intersecting if they are
collinear and so form a single, longer line segment.
Terminator count
This is not an emergent feature in the same sense as the others, but when two line segments con-
figure, their total terminator count is not necessarily four; if the two lines form a T, it drops to
three. This would illustrate an eliminative feature (Kubovy and Van Valkenburg 2002), where the
whole is less than the sum of its parts in some way.
Pixel count
This too is not a standard EF candidate, but the total pixel count (or luminous flux or surface
area) for a configuration of two lines is sometimes less than the sum of all the component lines’
pixel counts; if the lines intersect or if they superimpose on each other, the pixel count will fall,
sometimes sharply.
Finally, Figure 5.3 Panel (c) depicts five other EFs arising from elements more complex than dots
or lines. These EFs can be compelling phenomenally even though their key physical properties
and how they might be detected are less well understood:
Topological properties
When parts are placed in close proximity, novel topological properties may emerge, and these are
often salient to humans and other organisms. Three line segments can be arranged into a triangle,
adding the new property of a hole, a fundamental topological property (Chen 2005) that remains
invariant over so-called rubber sheet transformations. If a dot is added to this triangle, it will fall
either inside or outside that triangle; this inside-outside relationship is another topological property.
Depth
Depth differences often appear as EFs from combinations of elements that are themselves seen as
flat. Enns (1990) demonstrated that a flat Y shape inscribed inside a flat hexagon yields the per-
ception of a cube. Binocular disparity, as with random dot stereograms, is another classic example
96 Pomerantz and Cragin
Faces
A skilled artist can draw just a few lines that viewers will group into a face. We see the same, less
gracefully, in emoticons and smiley faces: ☺. Does ‘faceness’ constitute its own EF, or is it bet-
ter regarded as only a concatenation of simpler, lower-level grouping factors at work, including
closure, symmetry, proximity, etc.? This question encounters methodological challenges that will
be considered below.
single object despite the zero distance separating them. Unrelated objects piled together may form
a heap, but they usually will create no emergence or Gestalt.
A note on symmetry
Symmetry has been a pervasive property underlying Gestalt thinking from its inception (van der
Helm in press A, this volume). From its links with Prägnanz and the minimum principle (van
der Helm in press B, this volume) to its deep involvement with aesthetics, symmetry appears to be
more than just another potential EF in human perception. And well it might be, given the broad
meaning of symmetry in its formal sense in the physical and mathematical sciences. In the present
chapter, we focus on axial (mirror image) symmetry, but rotational and translational symmetry
may be considered along with translational symmetry. Formally, symmetry refers to properties
that remain invariant under transformation, and so its preeminence in Gestalt theory may come
as no surprise. We could expand our list of potential EFs to include the same versus different
distinction as a form of translational symmetry. We have only begun to explore the full status of
symmetry, so defined, using the approaches described here.
Although we typically use four-quadrant stimuli for convenience, there is nothing special about having four stimuli
1
or about arranging them into a square. In some experiments we use three in a straight line or eight in a circle.
98 Pomerantz and Cragin
stimuli into one quadrant and the other into the remaining three quadrants. We then create the
Composite display by superimposing an identical context element in each of the four quadrants
of the Base. Any context can be tested. In the absence of EFs, the context should act as noise and
make performance worse in the composite. The logic behind this superposition method fol-
lows from the eponymous superposition principle common to physics, engineering, and systems
theory.
Again, the composite is far superior to the base with the arrow and triangle displays in Figure
5.1, indicating a configural superiority effect (CSE). But it remains unclear which EF is responsible
for this CSE—it could involve any combination of closure, terminator count, or intersection type
because arrows differ from triangles in all three whereas positive diagonals differ from negatives
on none of them. As Panel c shows, shifting the position of the superimposed Ls eliminates all
three potential EFs and eliminates the CSE as well. Panels d and e show another CSE using base
stimuli varying in direction of curvature rather than in orientation. Here again, discriminating
pairs of curves such as (( and () is easier than discriminating single curves, a result that could be
due to any combination of parallelism, symmetry, or implied closure, all of which emerge in the
composite panel. Panel f shows that rotating the context curve eliminates both the EF differences
and the CSE, indicating that it is not just any inter-curve relationship from which a CSE arises but
rather only special ones giving rise to EFs.
Although these results confirm EFs arising with two-line stimuli, they do not provide inde-
pendent confirmation for each individual EF because EFs often co-occur, making it hard to isolate
and test them individually. Just as the arrow-triangle (three-line) example showed a confounded
co-occurrence of closure, terminator count, and intersection type, it can be challenging to sepa-
rate individual EFs even with two-line stimuli. For example it is difficult to isolate the feature of
intersection without engaging the feature of connectivity, because lines must be connected to
intersect (albeit not vice versa). Stupina ([Cragin] 2010) has shown that our ability to discriminate
two-line configurations in the odd quadrant task can be predicted well from their aggregate EF
differences. As noted below, however, further work is needed to find independent confirmation
of some of these EF candidates. For now, it is clear there are multiple, potent EFs lurking within
these stimuli.
Panel c of Figure 5.3 shows additional EFs involving a number of topological features (which
often yield very large CSEs), depth cues (Enns 1990), Kanizsa figures, and faces. Yet more can-
not be displayed readily in print because they involve stereoscopic depth, motion, or flicker. To
date, no experiments using the measurements described above have found clear EFs appearing in
cartoon faces or in words, but future work may change that with such stimuli that seem to have
Gestalt properties.
processed. In general, however, little or no SI arises with these stimuli or with most other stimuli
that are known to yield GI (see Pomerantz et al. 1994 for dozens of examples).2
Why might this contradiction exist between GI and SI, two standard methods for assessing
selective attention? In brief, GI occurs for the reason given above: the two elements group, and
Ss attend to the EFs arising between the elements, EFs that necessarily span the irrelevant parts.
However with SI, the same grouping of the elements precludes interference: for any two elements
to conflict or be congruent, there must of course be two elements. If the two elements group into
one unit, there are no longer two elements and thus no longer an opportunity for the two to be
congruent or incongruent. Perceivers are looking at EFs, not elements.
There is an alternative explanation for the lack of SI when parts group. The two elements in the
stimulus ((may seem congruent in that they both curve to the left; but when considered as a whole,
the left element is convex and the right is concave. Thus the two agree in direction of curvature but
disagree in convexity. The conclusion: when Gestalts form, the nature of the coding may change
radically, and a measure like SI that presumes separate coding of elements is no longer appropriate.
In sum, GI provides a strong converging operation for confirming EFs, but SI does not.
Exceptions to this generalization may occur when EFs happen to be correlated with congruent vs. incongru-
2
ent pairs, e.g. with the four-stimulus set ‘((, (),) (,))’ congruent stimuli such as ((contain the EF of parallelism
but lack symmetry about the vertical axis whereas incongruous stimuli like () contain symmetry but lack
parallelism. This set yields Garner but no Stroop. With the stimulus set ‘| |, | |, | |, | |’ however, congruent
stimuli such as | | contain symmetry and parallelism whereas incongruous stimuli such as | | lack either. This
set yields both Garner and Stroop. The key factor determining whether Stroop arises is the mapping of sali-
ent EFs onto responses; configurations by themselves yield no Stroop.
Emergent features and feature combination 101
(c)
(a) (b)
Fig. 5.4 Two progressions in which an original form A is modified in one way to create a different
form B, but a second modification results in a form C that is more similar to the original than is B.
Proximity Linearity
Position orientation symmetry Surroundedness
EFs. The Theory of Basic Gestalts (Pomerantz and Portillo 2011) addresses this challenge by com-
bining the Ground-Up Method for constructing configurations from the simplest possible ele-
ments in Figure 5.5 with a Constant Signal Method that minimizes these confounds by adding
context elements incrementally to a fixed base discrimination. This allows EFs to reveal their
presence through new CSEs in the composites.
Figure 5.6 Panel a shows a baseline odd quadrant display containing one dot per quadrant, with
one quadrant’s dot placed differently than in the other three quadrants. In Panel b, a single, identi-
cally located dot is added to each quadrant, which nonetheless makes locating the odd quadrant
much faster. This is a CSE demonstrating the EF of proximity (Pomerantz and Portillo 2011). In
Panel c, another identically located dot is added again to make a total of three per quadrant, and
again we see a CSE in yet faster performance in Panel c than in the baseline Panel a. This second
102 Pomerantz and Cragin
Fig. 5.6 Building EFs with the Ground-Up Constant Signal method. Panel (a) shows the base signal,
with the upper left quadrant having its dot at the lower left, versus the lower right in the other three
quadrants. Panel (b) adds a first, identical context dot to each quadrant in the upper right, yielding
a composite containing an EF of the orientation between the two dots now in each quadrant, a
diagonal versus vertical angle. Panel (c) adds an identical, third context dot to each quadrant, near
to the center, yielding a composite containing an EF of linearity versus nonlinearity/triangularity.
Speed and accuracy of detecting the odd quadrant improves significantly from Panel (a) to (b) to (c),
although the signal being discriminated remains the same.
CSE could be taken as confirmation of the EF of linearity, in that it is so easy to find the linear
triplet of dots in a field of nonlinear (triangular) configurations. But first we must rule out that
the CSE in Panel c relative to Panel a is not merely the result of the already-demonstrated EF of
proximity in Panel b. Dot triplets do indeed contain the potential EF of linearity vs. triangularity
but they also contain EFs of proximity and/or orientation arising from their component dot pairs,
so the task is to tease these apart.
The first key to dissociating these two is that the identical stimulus difference between the odd
and the remaining three quadrants exists in Panel c as exists in Panels b and A of Figure 5.6. This
is the unique contribution of the Ground-Up Constant Signal Method: the signal that Ss must
detect remains the same as new context elements are added. The second key is that Panel c shows
a CSE not only with respect to Panel a but also with respect to Panel b. This indicates that the third
dot does indeed create a new EF over and above the EF that already had emerged in Panel b. That
in turn supports linearity’s being an EF in its own right, over and above proximity. It shows how
EFs may exist in a hierarchy, with higher-order EFs like linearity arising in stimuli that contain
more elements.
Pomerantz and Portillo (2011) used this Ground-Up Constant Signal method to demonstrate
that linearity is its own EF with dot triplets whether the underlying signal contained a proximity
or orientation difference with dot pairs. They also showed that the EF of proximity is essentially
identical in salience to the EF of orientation in that the two show comparably sized CSEs com-
pared with the same base stimulus with just one dot per quadrant. Over the past 100 years, it
has been difficult to compare the strengths of different Gestalt principles of grouping because of
‘apples vs. oranges’ comparisons, but because the Ground-Up Constant Signal Method measures
the two on a common scale, their magnitudes may be compared directly and fairly.
To date this method has confirmed that the three most basic or elemental EFs in human vision
are proximity, orientation, and linearity. They are most basic in the sense that they emerge from
the simplest possible stimuli and that their EFs do not appear to be reducible to anything more
elemental (i.e., the CSE for linearity occurs over and above the CSEs for the proximity or orienta-
tion EFs it necessarily contains). Axial symmetry has yielded mixed results; further tests will be
Emergent features and feature combination 103
needed to determine whether it is or is not a confirmed EF. The results for surroundedness have
been somewhat less ambiguous: it does not appear to be an EF, although the evidence is not totally
conclusive (Portillo 2009).
Work is ongoing to test additional potential EFs using the same Ground-Up, Constant Signal
Method to ensure fair comparisons and to isolate the unique contribution made by each EF indi-
vidually, given that they often co-occur. As a lead up to that, Stupina ([Cragin] 2010) has explored
several regions of two-line stimulus space using this method, and she has found up to 8 EFs there.
Color as a gestalt
Color is usually treated as a property of the stimulus and in fact makes the list of ‘basic features’
underlying human vision (Wolfe and Horowitz 2004). However, color is not a physical feature but
rather a psychological one; wavelength is the corresponding physical feature, and color originates
‘in the head’, from interactions of units that are sensitive to wavelength. Color certainly meets the
criterion of a non-linear, surprising property emerging when wavelengths are mixed: combining
wavelengths seen as red and green on a computer monitor to yield yellow is surely an unexpected
outcome (Pomerantz 2006)! What is more, even color fails to qualify as a basic feature in human
104 Pomerantz and Cragin
vision, because it is color contrast to which we are most sensitive; colors in a Ganzfeld fade alto-
gether. Moving (non-stabilized) edges providing contrast are required for us to see color.
Hyper-emergent features?
If novel features can emerge from combinations of more elementary, ‘basic’ features, then can novel
features arise from combinations of EFs too, creating something we may call hyper-emergent fea-
tures? Given that our ultimate goal is to understand how we perceive complex objects and scenes,
these may play an essential role there.
Conclusions
This chapter aims to define EFs, explaining how they are identified and quantified, and enumerat-
ing those that have been confirmed to date. The Gestalt psychologists struggled to define grouping,
likening it variously to a belongingness or to a glue binding parts together, and advancing ambigu-
ous claims such as, ‘A strong form coheres and resists disintegration by analysis into parts of by
fusion with another form’ (Boring 1942). Working from the Theory of Basic Gestalts (Pomerantz
and Portillo 2011), we view grouping neither as a coherence, as a glue or a belongingness, nor as
a loss of independence when two items form a single perceptual unit. Instead we see grouping as
the creation of novel and salient features—EFs—to which perceivers can and do preferentially
attend. When we view an isolated stimulus such as a dot, we can roughly determine its x and y
coordinates in space, but we are much better determining the distances and angle between two
dots than we are at determining the position of either dot. This superiority of configurations, even
simple ones, is the defining feature of EFs, and we have uncovered over one dozen that meet this
criterion. The goal of future work is to explore additional EFs meeting this criterion and to ensure
Emergent features and feature combination 105
that these new EF are detectable through other, converging operations such as those derived from
selective attention tasks.
References
Biederman, I. (1987). ‘Recognition-by-components: A theory of human image understanding’. Psychological
Review 94, 2: 115–47.
Boring, E. G. (1942). Sensation and Perception in the History of Experimental Psychology.
(New York: Appleton-Century-Crofts).
Chen, L. (2005). ‘The topological approach to perceptual organization’. Visual Cognition 12: 553–637.
Cragin, A.I., Hahn, A.C., and Pomerantz, J.R. (2012) Emergent Features Predict Grouping in Search and
Classification Tasks. Talk presented at the 2012 Annual meeting of the Vision Sciences Society, Naples,
FL, USA. In: Journal of Vision 12(9): article 431. doi:10.1167/12.9.431.
Duncker, K. (1929). Über induzierte Bewegung. Ein Beitrag zur Theorie optisch wahrgenommener
Bewegung. [On induced motion. A contribution to the theory of visually perceived motion].
Psychologische Forschung 12: 180–259.
Enns, J. T. (1990). ‘Three dimensional features that pop out in visual search’. In Visual Search, edited by
D. Brogan, pp. 37–45 (London: Taylor and Francis).
Feldman, J. (in press). Bayesian models of perceptual organization. In J. Wagemans (Ed.), Oxford Handbook
of Perceptual Organization (in press). Oxford, U.K.: Oxford University Press.
Garner, W. R. (1974). The Processing of Information and Structure. (Potomac, MD: Erlbaum).
Garner, W. R., Hake, H. W., and Eriksen, C. W. (1956). ‘Operationism and the concept of perception’.
Psychological Review 63, 3: 149–56.
Hubel. D. H. and Wiesel, T.N. (1962). ‘Receptive fields, binocular interaction and functional architecture in
the cat’s visual cortex’. Journal of Physiology 160: 106–54.
Julesz, B. (1971). Foundations of Cyclopean Perception. Chicago: The University of Chicago Press.
Julesz, B. (1981). ‘Textons, the elements of texture perception, and their interaction’. Nature 290 (March 12,
1981): 91–7.
Kanizsa G. (1979). Organization in Vision: Essays on Gestalt Perception. (New York: Praeger Publishers).
106 Pomerantz and Cragin
Kellman, P. J. and Shipley, T. F. (1991). ‘A theory of visual interpolation in object perception’. Cognitive
Psychology, 23: 141–221.
Kogo, N. and van Ee, R. (in press). ‘Neural mechanisms of figure-ground organization: Border-ownership,
competition and perceptual switching’. In Oxford Handbook of Perceptual Organization, edited by J.
Wagemans. (Oxford: Oxford University Press).
Kubilius, J., Wagemans, J., and Op de Beeck, H. P. (2011). ‘Emergence of perceptual Gestalts in the human
visual cortex: The case of the configural superiority effect’. Psychological Science 22: 1296–303.
Kubovy, M. and Van Valkenburg, D. (2002). ‘Auditory and visual objects’. In Objects and Attention,
Scholl, B. J., pp. 97–126 (Cambridge, MA: MIT Press).
Levi, D. M. (2008). ‘Crowding—an essential bottleneck for object recognition: a mini-review’. Vision
Research 48 (5): 635–54.
Neisser, U. (1967). Cognitive Psychology. (New York: Appleton, Century, Crofts).
Overvliet, K. E., Krampe, R.T., and Wagemans, J. (2012). ‘Perceptual Grouping in Haptic Search: The
Influence of Proximity, Similarity, and Good Continuation’. Journal of Experimental Psychology: Human
Perception and Performance 38(4): 817–21.
Pomerantz, J. R. (2006). ‘Color as a Gestalt: Pop out with basic features and with conjunctions’. Visual
Cognition 14: 619–28.
Pomerantz, J. R. and Kubovy, M. (1986). ‘Theoretical approaches to perceptual organization’. In
Handbook of Perception and Human Performance, K. R. Boff, L. Kaufman, and J. Thomas, pp. 36–46.
(New York: John Wiley & Sons).
Pomerantz, J. R. and Portillo, M. C. (2011). ‘Grouping and emergent features in vision: Toward a theory of
basic Gestalts’. Journal of Experimental Psychology: Human Perception and Performance 37: 1331–49.
Pomerantz, J. R. and Portillo, M.C. (2012). ‘Emergent Features, Gestalts, and Feature Integration Theory’.
In Perception to Consciousness: Searching with Anne Treisman, edited by J. Wolfe and L. Robertson, pp.
187–92. (New York: Oxford University Press).
Pomerantz, J. R., Sager, L. C., and Stoever, R. J. (1977). ‘Perception of wholes and their component
parts: Some configural superiority effects’. Journal of Experimental Psychology: Human Perception and
Performance 3: 422–35.
Pomerantz, J. R., Carson, C. E., and Feldman, E. M. (1994). ‘Interference effects in perceptual organization’.
In Cognitive Approaches to Human Perception, edited by S. Ballesteros, pp. 123–52. (Hillsdale,
NJ: Lawrence Erlbaum Associates).
Portillo, M. C. (2009). Grouping and Search Efficiency in Emergent Features and Topological Properties in
Human Vision. Unpublished doctoral dissertation, Rice University, Houston, Texas, USA.
Ramachandran, V. S. (1988). ‘Perception of shape from shading’. Nature 331, 14: 163–66.
Rock, I. (1983). The Logic of Perception. (Cambridge, MA: MIT Press).
Stephan, A. (2003). ‘Emergence’. Encyclopedia of Cognitive Science. (London: Nature Publishing Group/
Macmillan Publishers).
Stupina, A.I. [now Cragin, A.I] (2010). Perceptual Organization in Vision: Emergent Features in Two-Line
Space. Unpublished master’s thesis, Rice University, Houston, Texas, USA.
Townsend, J. T. (1971) ‘A note on the identifiability of parallel and serial processes’. Perception and
Psychophysics 10: 161–3.
Treisman, A. and Gelade, G. (1980). ‘A feature integration theory of attention’. Cognitive Psychology
12: 97–136.
Treisman, A. and Gormican, S. (1988). ‘Feature analysis in early vision: evidence from search asymmetries’.
Psychological Review 95: 15–48.
Treisman, A. and Souther, J. (1985). ‘Search asymmetry: a diagnostic for preattentive processing of
separable features’. Journal of Experimental Psychology: General 114: 285–310.
Emergent features and feature combination 107
Symmetry perception
Peter A. van der Helm
Introduction
Mirror symmetry (henceforth, symmetry) is a visual regularity that can be defined by configura-
tions in which one half is the mirror image of the other (see Figure 6.1a)—these halves then are
said to be separated by a symmetry axis.1 Albeit with fluctuating degrees of asymmetry, it is abun-
dantly present in the world. For instance, the genetic blueprint of nearly every organism implies
a symmetrical body—if the mirror plane is vertical, this conveniently yields gravitational stabil-
ity. Furthermore, many organisms tend to organize things in their environment such that they are
symmetrical—think of bird nests and human art and design (Hargittai 1986; Shubnikov and Koptsik
1974; Washburn and Crowe 1988; Weyl 1952; Wynn 2002; van Tonder and Vishwanath, this volume;
Koenderink, this volume). Presumably, for organisms with symmetrical bodies, symmetrical things
are practical to make and to work with (Allen 1879). Think also of the preference which many
organisms have for more symmetrical shapes over less symmetrical ones in mate selection and, by
pollinators, in flower selection (Møller 1992, 1995; Johnstone 1994; Swaddle and Cuthill 1993). This
preference presumably favors mates and flowers with high genetic quality (Møller 1990). Currently
relevant is that it also requires a considerable perceptual sensitivity to symmetry—which many spe-
cies of mammals, birds, fish, and insects indeed are known to have (Barlow and Reeves 1979; Beck
et al. 2005; Giurfa et al. 1996; Horridge 1996; see also Osorio and Cuthill, this volume).
In human perception research, detection of symmetry is in fact assumed to be an integral part of
the perceptual organization process that is applied to every incoming visual stimulus (Tyler 1996;
van der Helm and Leeuwenberg 1996; Wagemans 1997). This assumption has been related to the
idea that extraction of regularities like symmetry can be used to model the outcome of the percep-
tual organization process, because it would allow for efficient mental representations of patterns
(for more details about this idea and its potentially underlying neuro-cognitive mechanisms, see
van der Helm, this volume). It has also been related to the idea that the high perceptual sensitivity
to symmetry arose because the evolution of visual systems selected individual regularities on the
basis of their relevance in the world (Tyler 1996). It may, however, also have arisen because the evo-
lution selected a general regularity-detection mechanism with sufficient survival value (cf. Enquist
and Arak 1994). The latter option suggests a package deal: to survive, a visual system’s detection
mechanism may pick up irrelevant regularities as long as it also picks up relevant regularities.
The foregoing indicates that perceptual organization and evolutionary relevance provide an
appropriate context for an appreciation of symmetry perception. It also indicates that, to this end,
1 This definition reflects the common usage of the word symmetry. In mathematics, the word symmetry is
also used to refer to any configuration that remains invariant under certain transformations; this definition
is suited to classify visual regularities, but another definition is needed to model their perception (see Section
“The scope of formal models of symmetry detection”).
Symmetry perception 109
(a) (b)
(c) (d)
Fig. 6.1 Visual regularity. (a) A symmetry—left and right hand halves are mirror images of each
other. (b) A Glass pattern with coherently-oriented dot dipoles at random positions. (c) A repetition
with four identical subpatterns (the repeats). (d) Multiple symmetries with two and three global
symmetry axes, respectively.
symmetry detection. Indeed, various neuro-scientific studies used symmetry patterns as stimuli,
but thus far, the data are too divergent to draw firm conclusions about locus and timing of sym-
metry detection in the brain. One thing that seems clear, however, is that the lateral occipital com-
plex (LOC) is prominently involved (Beh and Latimer 1997; Sasaki et al. 2005; Tyler and Baseler
1998; Tyler et al. 2005; van der Zwan et al. 1998). The LOC in fact seems a hub where different
perceptual-grouping tendencies interact, which agrees with ideas that it is a shape-selective area
associated with perceptual organization in general (Grill-Spector 2003; Malach et al. 1995; Treder
and van der Helm 2007). Hence, the neuro-scientific evidence may still be scanty, but all in all, it
adds to the above-mentioned idea that symmetry is relevant in perceptual organization.
In cognitive science, behavioral research into this idea yielded evidence that symmetry plays a
role in issues such as object recognition (Pashler 1990; Vetter and Poggio 1994), figure–ground
segregation (Driver et al. 1992; Leeuwenberg and Buffart 1984; Machilsen et al. 2009), and amodal
completion (Kanizsa 1985; van Lier et al. 1995). It further finds elaboration in structural descrip-
tion approaches, that is, formal models which—using some criterion—predict preferred stimu-
lus interpretations on the basis of view-independent specifications of the internal structure of
objects. Some of these approaches work with a-priori fixed perceptual primitives like the volu-
metric building blocks called geons (e.g., Biederman 1987; Binford 1981), which is convenient for
object recognition. Other approaches (e.g., Leeuwenberg 1968, 1969, 1971; Leeuwenberg and van
der Helm 2013) allow primitives to be assessed flexibly, that is, in line with the Gestaltist idea that
the whole determines what the perceived parts are. The latter is more plausible regarding object
perception (Kurbat 1994; Leeuwenberg et al. 1994; Palmer and Rock 1994), but in both cases,
symmetry is taken to be a crucial component of how perception imposes structure on stimuli. In
Leeuwenberg’s approach, for instance, symmetry is one of the regularities exploited to arrive at
simplest stimulus organizations in terms of objects arranged in space (van der Helm, this volume).
Furthermore, in Biederman’s approach, symmetry is taken to define geons because it is a so-called
nonaccidental property: if present in the proximal stimulus, it is also likely to be present in the
distal stimulus (see also Feldman, this volume).
However, the proximal features of symmetry vary with viewpoint, and this drives a wedge
between the perception of symmetry as such and its role in object perception (Schmidt and
Schmidt 2013; Wagemans 1993). That is, symmetry is effective as nonaccidental property only
when viewed orthofrontally—then, as discussed later on, it indeed has many extraordinary detect-
ability properties. Yet, in structural description approaches, it is taken to be effective as group-
ing factor also when viewed non-orthofrontally. This touches upon the more general problem of
viewpoint generalization: how does the visual system arrive at a view-independent representation
of a three-dimensional (3D) scene, starting from a two-dimensional (2D) view of this scene?
Viewpoint generalization has been proposed to involve normalization, that is, a mental rotation
yielding a canonical 2D view of a scene (e.g., Szlyk et al. 1995). This presupposes the generation of can-
didate 3D organizations which, subsequently, are normalized. However, Sawada et al. (2011) not only
showed that any pair of 2D curves is consistent with a 3D symmetry interpretation, but also argued that
it is implausible that every such pair is perceived as being symmetrical. View-dependent coincidences,
for instance, have a strong effect on how a scene is perceptually organized, and may prevent interpreta-
tions involving symmetry (van der Helm, this volume). Likewise, detection of symmetry viewed in
perspective or skewed (i.e., sheared plus rotated, yielding something close to perspective) seems to rely
on proximal features rather than on hypothesized distal features. That is, it deteriorates as its proximal
features are more perturbed (van der Vloed et al. 2005; Wagemans et al. 1991).
Also when viewed orthofrontally, the grouping strength of symmetry is elusive. Symmetry is
often thought to be a cue for the presence of a single object—as opposed to repetition which the
Gestaltists had identified as a grouping factor too (under the umbrella of similarity), but which
Symmetry perception 111
rather is a cue for the presence of multiple objects. However, it seems safer to say that symme-
try is better detectable when it forms one object than when the symmetry halves form separate
objects, and that repetition is less detectable when it forms one object than when the repeats
form separate objects. At least, this is what Corballis and Roldan (1974) found for dot patterns
in which grouping by proximity was responsible for the perceived objects. To tap more directly
into the grouping process, Treder and van der Helm (2007) used stereopsis to assign symmetry
halves and repeats to different perceived depth planes. The process of depth segregation is known
to take a few hundreds of milliseconds, and they found that it interacts hardly with repetition
detection but strongly with symmetry detection. This suggests that the segregation into separate
objects (i.e., the depth planes) agrees with the perceptual structure of repetition but not with
that of symmetry. In a similar vein, Morales and Pashler (2002) found that grouping by color
interferes with symmetry detection, in a way that suggests that individual colors are attended
one at a time.
The foregoing perhaps questions the grouping capability of symmetry, but above all, it shows
the relevance of interactions between different grouping factors. In any case, further investigation
is required to see if firmer conclusions can be drawn regarding the specific role of symmetry in the
build-up of perceptual organizations. Furthermore, notice that the foregoing hardly affects con-
siderations about the functionality of symmetry in the world—after all, this functionality takes
effect once symmetry has been established. It also stands apart from the extraordinary detectabil-
ity properties that are discussed next.
Absolute orientation
The absolute orientation of symmetry axes is known to be relevant (for effects of the relative
orientation of symmetry axes, see Section “Representation models of symmetry detection”). The
effect usually found is that vertical symmetry (i.e., with a vertical axis) is more salient than hori-
zontal symmetry which, in turn, is more salient than oblique symmetry (see, e.g., Barlow and
Reeve 1979; Baylis and Driver 1994; Kahn and Foster 1986; Palmer and Hemenway 1978; Rock
and Leaman 1963). This usually found vertical-symmetry advantage has been attributed to the
neural architecture of the brain (Julesz 1971), but the evidence for that is not conclusive (Corballis
et al. 1971; Herbert and Humphrey 1996; Jenkins 1983). Furthermore, other studies did not find
this usual effect or found even an opposite effect (see, e.g., Corballis and Roldan 1975; Fisher and
Bornstein 1982; Jenkins 1983, 1985; Locher and Smets 1992; Pashler 1990; Wagemans et al. 1992).
In any case, notice that horizontal symmetry and vertical symmetry are not different regularities
but are the same regularities in different absolute orientations. Hence, it might well be that effects
of absolute orientation result from visuo-cognitive interactions (e.g., with the vestibular system)
rather than from purely visual processes (cf. Latimer et al. 1994; Wenderoth 1994).
112 van der Helm
Eccentricity
Detection of symmetry deteriorates as it is presented more eccentrically (Saarinen 1988), but if
scaled-up properly, it can maintain the same level of detectability (Tyler 1999). This scaling-up
compensates for the fact that eccentric receptive fields are sensitive to relatively large-scale infor-
mation, as opposed to foveal receptive fields which are sensitive to relatively small-scale informa-
tion. Hence, this is a general property of the visual system and not specific to symmetry which,
apparently, remains equally detectable across the visual field if this factor is taken into account
(see also Sally and Gurnsey 2001).
Jitter
Jitter refers to relatively small, dynamic, displacements of stimulus elements. Then, but also in
case of small static displacements, regularity detection depends on the visual system’s toler-
ance in matching potentially corresponding elements in symmetry halves or repeats. This toler-
ance too is a general property of the visual system and not specific to regularity detection. In
any case, Barlow and Reeves (1979) found that symmetry detection is quite resistant to jitter.
Furthermore, Dry (2008) proposed Voronoi tesselation as a scale-independent mechanism yield-
ing stimulus-dependent tolerance areas. Such a mechanism can, in any model, be adopted to
account for the visual system’s tolerance in matching elements.
Proximity
Proximity effects refer to the fact that stimulus elements that are closer to each other can be
matched more easily (this is not to be confused with the Gestalt law of proximity, which is not
about matching but about grouping). For instance, whereas detection of n-fold repetition (i.e., n
juxtaposed repeats) can only start to be successful by matching elements that are one repeat apart,
symmetry detection can already start to be successful by matching elements near the axis of sym-
metry. Jenkins (1982) in fact proposed that symmetry detection integrates information from only
a limited region about the axis of symmetry: his data suggested that this integration region (IR)
is a strip approximately 1 degree wide, irrespective of the size of the texture at the retina. Dakin
and Herbert (1998) specified this further: their data suggested that the IR has an aspect ratio of
about 2:1, and that its size scales with the spatial frequency content of the pattern. Thus, for homo-
geneous blob patterns for instance, the IR scales with blob size, so that it steadily covers a more or
less constant number of features.
Noticing this scale invariance, however, Rainville and Kingdom (2002) proposed that the size of
the IR is not determined by spatial frequency but by the spatial density of what they called ‘micro-
elements’: their data suggested that the IR covers about 18 such informational units regardless of
their spatial separation. This agrees with studies reporting that the detectability of symmetry does
not vary with the number of elements (i.e., no number effect) for symmetries with more than about
20 elements (e.g., Baylis and Driver 1994; Dakin and Watt 1994; Olivers et al. 2004; Tapiovaara
1990; Wenderoth 1996a). For symmetries with less than about 20 elements, however, these studies
reported opposite effects, and this hints at an explanation that takes into account that symmetry
detection is an integral part of perceptual organization, as follows (see also van der Helm, 2014).
For any stimulus—including symmetry stimuli—a symmetry percept is basically just one of the
possible outcomes of the perceptual organization process; it results only if it is stronger than other
percepts. It is true that a symmetry percept is bound to result for a really otherwise-random sym-
metry stimulus, but such stimuli are rare if not impossible. A symmetry structure with many sym-
metry pairs is usually strong enough to overcome spurious structures, but the smaller the number
Symmetry perception 113
of symmetry pairs is, the harder it is to construct a symmetry stimulus without spurious struc-
tures. This also implies that, in dense stimuli, such spurious structures are more prone to arise in
the area near the axis. In case of small numbers of symmetry pairs, such spurious structures may
have various effects on detection (see below), and in general, they may give the impression that
only the area near the axis is decisive.
In sum, it is true that proximity plays a role in symmetry perception, and the area near the sym-
metry axis is indeed relatively important. Notice, however, that Barlow and Reeves (1979) already
found that also symmetry information in the outer regions of stimuli is picked up quite effec-
tively (see also Tyler et al. 2005; van der Helm and Treder 2009; Wenderoth 1995). Furthermore,
even if symmetry processing would be restricted to a limited stimulus area, then this would not
yet specify which stimulus information in this area is processed, and how. The latter reflects the
fundamental question that formal models of symmetry detection focus on. That is, the factors
discussed here can of course be taken into account in model applications, but are usually not at
the heart of formal models. This is already an indication of their scope, which is discussed next.
relies on the same formalization as used in the classification of crystals and regular wall patterns
(Shubnikov and Koptsik 1974; Weyl 1952). It holds that symmetry and repetition are visual regu-
larities because they remain invariant under a 180° 3D rotation about the symmetry axis and a 2D
translation the size of one or more repeats, respectively. Because these transformations identify
entire symmetry halves or entire repeats with each other, they can be said to assign a block struc-
ture to both regularities (see Figure 6.2a).
However, its applicability is unclear for Glass patterns (which are as detectable as symmetry; see
below). Originally, Glass (1969) constructed the patterns named after him by superimposing two
copies of a random dot pattern—one slightly translated or rotated with respect to the other, for
instance. With the transformational approach in mind, this construction method suggests that the
resulting percept too is that of a whole consisting of two overlapping identical substructures (i.e.,
those two copies). This also seems to comply with a grouping over multiple views as needed in
case of binocular disparity and optic flow (Wagemans et al. 1993). However, the actually resulting
percept rather seems to require a framing in terms of relationships between randomly positioned
but coherently oriented dot dipoles (see Section “Representation models of symmetry detection”).
Furthermore, in original rotational Glass patterns, the dipole length increases with the distance
from the center of the pattern, but later, others consistently constructed rotational Glass patterns
by placing identical dot dipoles in coherent orientations at random positions (as in Figure 6.1b).
The two types of Glass patterns do not seem to differ in salience but, by the transformational
(a)
Block structures
(b)
construction above, the latter type would be a perturbed regularity. Because transformational
invariance requires perfect regularity, however, the transformational approach has a problem with
perturbed regularity. A formal solution might be to cross-correlate corresponding parts, but in
symmetry for instance, a simple cross-correlation of the two symmetry halves does not seem to
agree with human performance (Barlow and Reeves 1979; Tapiovaara 1990).
This unclarity regarding Glass patterns adds to the fact the transformational approach does not
account for the key phenomenon—discussed later on in more detail—that symmetries and Glass
patterns are about equally detectable but generally better detectable than 2-fold repetitions (notice
that they all consist transformationally of the same number of corresponding parts; cf. Bruce and
Morgan 1975). Hence, the transformational approach may account for how visual regularities can
be classified, but not for how they are perceived preceding classification.
This drawback does not hold for the holographic approach (van der Helm and Leeuwenberg
1996, 1999, 2004). This approach is also based on a rigorous mathematical formalization of regu-
larity in general (van der Helm and Leeuwenberg 1991), but the difference is that it relies on invar-
iance under growth (which agrees with how mental representations can be built up). To give a
gist, according to this approach, symmetries, repetitions, and Glass patterns are visual regularities
because, preserving the regularity in them, they can be expanded stepwise by adding symmetry
pairs, repeats, and dot dipoles, respectively. This implies that these regularities can be said to be
assigned a point structure, a block structure, and a dipole structure, respectively (see Figure 6.2b).
Thereby, this mathematical formalization supports a structural differentiation that, as discussed
next, seems to underlie detectability differences between visual regularities (see also Attneave
1954; Bruce and Morgan, 1975).
Perfect symmetry
In the holographic model, the support for the presence of a regularity is quantified by the number
of nonredundant relationships (E) between stimulus parts that, according to this model, constitute
116 van der Helm
a regularity. Thus, for symmetry E equals the number of symmetry pairs; for repetition E equals
the number of repeats minus one; and for Glass patterns E equals the number of dot dipoles minus
one. Furthermore, the total amount of information in a stimulus is given by the total number
of elements in the stimulus (n), so that the holographic weight-of-evidence metric (W) for the
detectability of a regularity is: W = E/n.
A perfect symmetry on n elements is constituted by E=n/2 symmetry pairs, so that it gets W=0.5
no matter the total number of elements—hence, symmetry is predicted to show no number effect,
which agrees with empirical reports (e.g., Baylis and Driver 1994; Dakin and Watt 1994; Olivers et al.
2004; Tapiovaara 1990; Wenderoth 1996a; see also Section “Modulating factors in symmetry detec-
tion”). Furthermore, E=n/2–1 for a Glass pattern on n elements, so that, for large n, it is predicted
to show more or less the same detectability as symmetry—empirical support for this is discussed in
the next subsection. For an m-fold repetition on n elements, however, E=m-1, so that its detectabil-
ity is predicted to depend strongly on the number of elements per repeat—hence, a number effect,
which found empirical support (Csathó et al. 2003). In particular, 2-fold repetition is predicted to
be generally less detectable than symmetry—which also found empirical support (Baylis and Driver
1994, 1995; Bruce and Morgan 1975; Csathó et al. 2003; Corballis and Roldan 1974; Zimmer 1984).
Hence, the foregoing shows that holographic weight of evidence accounts for the key phenom-
enon that symmetry and Glass patterns are about equally detectable but generally better detect-
able than repetition. This differentiation holds not only for perfect regularities, but as discussed
next, also for perturbed regularities.
Perturbed symmetry
A perfect regularity can be perturbed in many ways, and there are of course limits to the detect-
ability of the remaining regularity. Relevant in this respect is that the percept of an imperfect
regularity results from the perceptual organization process applied to the stimulus. This means
that the percept generally cannot be said to be some original perfect regularity plus some per-
turbation. For instance, if a perfect repetition is perturbed by randomly added noise elements
(which is the form of perturbation considered here), then there may be some remaining repeti-
tiveness depending on the location of the noise. In general, however, repetition seems quite easily
destroyed perceptually—some evidence for this can be found in Rappaport (1957) and in van der
Helm and Leeuwenberg (2004).
Symmetry and Glass patterns, however, are quite resistant to noise, and this is fairly independ-
ent of the location of the noise (e.g., Barlow and Reeves 1979; Maloney et al. 1987; Masame 1986,
1987; Nucci and Wagemans 2007; Olivers and van der Helm 1998; Troscianko 1987; Wenderoth
1995). In fact, both symmetry and Glass patterns exhibit graceful degradation, that is, their detect-
ability decreases gradually with increasing noise proportion (i.e., the proportion of noise elements
relative to the total number of stimulus elements). Their behavior is explicated next in more detail.
By fitting empirical data, Maloney et al. (1987) found that the detectability (d’) of Glass patterns
in the presence of noise follows the psychophysical law
d ’ = g / (2 + N / R)
with R the number of dot dipoles that constitute the regularity; N the number of added noise
elements; and g an empirically determined proportionality constant that depends on stimulus
type and that enables more detailed data fits than rank orders. Maloney et al. (1987) arrived at
this on the basis of considerations from signal detection theory, and the holographic model pre-
dicts the same law on the basis of structural considerations. In the holographic model, W=E/n is
Symmetry perception 117
proposed to be proportional to the detectability of regularity, and for Glass patterns in the pres-
ence of noise, it implies n=2R+N and E=R-1 or, for large R, approximately E=R. Substitution in
W=E/n then yields the psychophysical law above.
The holographic model also predicts this psychophysical law for symmetry (with R equal to the
number of symmetry pairs), and it indeed yields a near perfect fit on Barlow and Reeves’ (1979)
symmetry data (van der Helm 2010). In the middle range of noise proportions, this fit is as good
as that for the Weber-Fechner law (Fechner 1860; Weber 1834) if, in the latter, the regularity-to-
noise ratio R/N is taken as signal (cf. Zanker 1995). In both outer ranges, it is even better because,
unlike the Weber-Fechner law, it accounts for floor and ceiling effects. This means that, in both
outer ranges of noise proportions, the sensitivity to variations in R/N is disproportionally lower
than in the middle range, so that disproportionally larger changes in R/N are needed to achieve
the same change in the strength of the percept (which is also supported by Tjan and Liu (2005),
who used morphing to perturb symmetries).
Interestingly, this account of perturbed symmetry also predicts both symmetry and asymme-
try effects, that is, apparent overestimations and underestimations of the symmetry in a stim-
ulus when compared triadically to slightly more and slightly less symmetrical stimuli (Freyd
and Tversky 1984). These effects are context dependent, and the psychophysical law above sug-
gests that they are due not to incorrect estimations of symmetry but to correct estimations of
symmetry-to-noise ratios. For more details on this, see Csathó et al. (2004), but notice that these
effects are evolutionary relevant for both prey and predators. As discussed in van der Helm and
Leeuwenberg (1996), overestimation by oneself may occur in the case of partly occluded oppo-
nents, for instance, and is helpful to detect them. Furthermore, underestimation by opponents
may occur if oneself is camouflaged, for instance, and is helpful to avoid being detected. The
occurrence of such opposite effects is consistent with the earlier-mentioned idea of a package deal
in the evolutionary selection of a general regularity-detection mechanism. This idea is supported
further by the above-established fact that symmetry and Glass patterns exhibit the same detect-
ability properties, even though symmetry clearly has more evolutionary relevance. A further hint
at such a package deal is discussed at the end of the next subsection.
Multiple symmetry
Regularities can also occur in nested combinations, and in general, additional local regularities in
a global regularity enhance the detectability of this global regularity (e.g., Nucci and Wagemans
2007). To account for this, the holographic model invokes Leeuwenberg’s (1968) structural
description approach, which specifies constraints for hierarchical combinations of global and local
regularities in descriptive codes (which are much like computer programs that produce things by
specifying the internal structure of those things). As a rule, this implies that a compatible local
regularity is one that occurs within a symmetry half of a global symmetry or within a repeat of a
global repetition. The general idea then is that the just-mentioned enhancement occurs only in
case of such combinations. More specifically, however, it implies that local regularity in symmetry
halves adds only once to the detectability of the symmetry, and that local regularity in the repeats
of an m-fold repetition adds m times to the detectability of the repetition (van der Helm and
Leeuwenberg 1996). In other words, repetition is predicted to benefit more from compatible local
regularities than symmetry does—as supported by Corballis and Roldan (1974).
A special case of nested regularities is given by multiple symmetry (see Figure 6.1d). According
to the transformational approach, the detectability of multiple symmetry is predicted to increase
monotonically as a function of the number of symmetry axes—which seems to agree with empirical
118 van der Helm
data (e.g., Palmer and Hemenway 1978; Wagemans et al. 1991). Notice, however, that these studies
considered 1-fold, 2-fold, and 4-fold symmetries, but not 3-fold symmetries which seem to be odd
ones out: they tend to be less detectable than 2-fold symmetries (Wenderoth and Welsh 1998).
According to the holographic approach, hierarchical-compatibility constraints indeed imply
that 3-fold symmetries—and, likewise, 5-fold symmetries—are not as detectable as might be
expected on the basis of the number of symmetry axes alone. For instance, in a 2-fold symme-
try, each global symmetry half is itself a 1-fold symmetry which, in a descriptive code, can be
described as being nested in that global symmetry half. In 3-fold symmetry, however, each global
symmetry half exhibits two overlapping 1-fold symmetries, and because they overlap, only one
of them can be described as being nested in that global symmetry half. In other words, those
hierarchical-compatibility constraints imply that all symmetry can be captured in 2-fold symme-
tries but not in 3-fold symmetries—and, likewise, in 4-fold symmetries but not in 5-fold symmetries.
This suggest not only that 3-fold and 5-fold symmetries can be said to contain perceptually hidden
regularity—which may increase their aesthetic appeal (cf. Boselie and Leeuwenberg 1985)—but
also that they are less detectable than 2-fold and 4-fold symmetries, respectively.
A study by Treder et al. (2011) into imperfect 2-fold symmetries composed of two superim-
posed perfect 1-fold symmetries (which allows for variation in their relative orientation) showed
that the relative orientation of symmetry axes can indeed have this effect. That is, though equal
in all other respects and controlling for absolute orientation, orthogonal symmetries (as in 2-fold
symmetry) were found to be better detectable than non-orthogonal ones (as in 3-fold symmetry).
This suggests that the constituent single symmetries in a multiple symmetry first are detected
separately and then engage in an orientation-dependent interaction. Notice that this would be a
fine example of the Gestalt motto that the whole is something else than the sum of its parts.
Evolutionary interesting, 3-fold and 5-fold symmetries are overrepresented in flowers (Heywood
1993). Furthermore, in human designs, they are virtually absent in decorative motifs (Hardonk
1999) but not in mystical motifs (think of triquetas and pentagrams; Forstner 1961; Labat 1988).
This might well be due to a subconsciously attributed special status to them—caused by their
special perceptual status. In flowers, this may have given them a procreation advantage (Giurfa
et al. 1999). In this respect, notice that insect vision evolved 200–275 million years earlier than
flowering plants (Sun et al. 2011), so that such an perceptual effect may have influenced the distri-
bution of flowers from the start. Furthermore, throughout human history, the special perceptual
status of 3-fold and 5-fold symmetries may have made humans feel that they are more appropriate
for mystical motifs than for decorative motifs (van der Helm 2011). Such considerations are of
course more speculative than those based on psychophysical data, but they do suggest a plausible
two-way interaction between vision and the world: the world determines if a visual system as a
whole has sufficient evolutionary survival value, but subsequently, visual systems also influence
how the world is shaped (see also van der Helm, this volume).
procedure is applied to measure how well the centroids of the blobs align along a putative symme-
try axis. In the brain, something like spatial filtering occurs in the lateral geniculate nucleus, that
is, before symmetry perception takes place. It is more than just a modulating factor, however. In
Dakin and Watt’s (1994) model, for instance, the chosen spatial filtering scale in fact determines
the elements that are correlated to establish symmetry in a stimulus.
The latter can be exemplified further by considering anti-symmetry, that is, symmetry in which
otherwise perfectly corresponding elements have opposite properties in some dimension. For
instance, in stimuli consisting of monochromatic surfaces, angles may be convex in one contour
but concave in the corresponding contour (this can also be used to define anti-repetition in such
stimuli; Csathó et al. 2003). Such corresponding contours have opposite contrast signs, and detec-
tion seems possible only post-perceptually (van der Helm and Treder 2009). This also holds, in
otherwise symmetrical checkerboard stimuli, for corresponding squares with opposite contrasts
(Mancini et al. 2005). In both cases, contrast interacts with other grouping factors (grouping by
color in particular). It can, however, also be considered in isolation, namely, in dot patterns in
which symmetrically positioned dots can have opposite contrast polarities with respect to the
background (this can also be used to define anti-repetition and anti-Glass patterns in such stim-
uli). This does not seem to have much effect on symmetry detection (Saarinen and Levi 2000;
Tyler and Hardage 1996; Wenderoth 1996b; Zhang and Gerbino 1992). Representational models
cannot account for that, because they rely on precise correspondences. In contrast, there are spa-
tial filters (and maybe neural analogs) that filter out positional information only, thereby cance-
ling the difference between symmetry and antisymmetry in such stimuli (Mancini et al. 2005).
In Glass patterns, spatial filtering may also be responsible for identifying the constituent dot
dipoles which, after all, may blur into coherently-oriented blobs at courser scales. A potential prob-
lem here, however, is that this might not work for Glass patterns in the presence of noise given by
randomly added single dots. For instance, in Maloney et al.’s (1987) experiment, each dipole dot had
6–10 noise dots closer by than its mate. Further research is needed to assess how spatial filtering
might agree with the psychophysical law discussed in Section “Representation models of symmetry
detection”, which is based on precise correspondences and holds for Glass patterns and symmetry.
The foregoing indicates a tension between process models that rely on fairly crude spatial filter-
ing and representation models that rely on fairly precise correlations between stimulus elements.
Neither type of model alone seems able to account for all aspects of symmetry detection. Yet, uni-
fication might be possible starting from Dakin and Watt’s (1994) conclusion that their human data
match the performance of a fairly fine-scale filter. This empirical finding suggests that symmetry
does not benefit from the presence of relatively large blobs. As elaborated in the remainder if this
section, such an effect is in fact predicted by a process model that allows for effects of spatial filtering
even though it relies on fairly precise structural relationships between elements (van der Helm and
Leeuwenberg 1999). This model fits in the holographic approach discussed above, but it also builds
on processing ideas by Jenkins (1983, 1985) and Wagemans et al. (1993). In this respect, it is a nice
example of a stepwise development of ideas—each previous step as important as the next one.
Bootstrapping
Jenkins (1983, 1985) subjected symmetry and repetition to various experimental manipulations
(e.g., jitter), to investigate what the properties are that characterize these regularities perceptually.
He concluded that symmetry and repetition are characterized by properties of what he called vir-
tual lines between corresponding elements. That is, for orthofrontally viewed perfect regularities,
symmetry is characterized by parallel orientation and midpoint colinearity of virtual lines between
120 van der Helm
Fig. 6.3 (a) Symmetry is characterized by parallel orientation and midpoint colinearity of virtual lines
(indicated in bold in top panel) between corresponding elements in symmetry halves; two such
virtual lines can be combined to form a virtual trapezoid (middle panel), from which detection can
propagate in an exponential fashion (bottom panel). (b) In the original bootstrap model, the same
applies to repetition, which is characterized by parallel orientation and constant length of virtual
lines between corresponding elements in repeats. (c) In the holographic bootstrap model, repetition
involves an intermediate stepwise grouping of elements into blocks, which implies that detection
propagates in a linear fashion.
Symmetry perception 121
quadrangles are indeed the detection anchors for both regularities. The detection process can then
be modeled as exploiting these anchors in a bootstrap procedure which starts from correlation
quadrangles to search for additional correlation quadrangles in order to build a representation of
a complete regularity (Wagemans et al. 1993; see Figure 6.3ab, middle and bottom panels).
This bootstrap idea is indeed plausible, but it still seems to be missing something else. That is,
just as Jenkins’ idea, it is not sustained by a mathematical formalism (cf. Bruce and Morgan 1975),
and just as the transformational approach, both ideas do not yet explain detectability differences
between symmetry and repetition. To the latter end, one might resort to modulating factors—in
particular, to proximity. As discussed in Section “Modulating factors in symmetry detection”, such
factors do play a role, but as discussed next, those detectability differences can also be explained
without resorting to such factors.
Holographic bootstrapping
In a reaction to Wagemans (1999) and consistent with the holographic approach, van der Helm
and Leeuwenberg (1999) proposed that symmetry is indeed detected as proposed by Wagemans
et al. (1993) but that repetition detection involves an additional step. That is, according to the
holographic approach, symmetry pairs are indeed the constituents of symmetry, but repeats—
rather than single element pairs—are the constituents of repetition. This suggests that repetition
detection involves an intermediate step, namely, the grouping of elements into blocks that, even-
tually, correspond to complete repeats (see Figure 6.3c).
This holographic procedure implies that symmetry detection propagates exponentially, but that
repetition detection propagates linearly. For Glass patterns, in which it takes the dot dipoles as
constituents, it also implies that detection propagates exponentially. Thus, it again accounts for
the key phenomenon that symmetry and Glass patterns are about equally detectable but better
detectable than repetition. In addition, it predicts the following.
Suppose that, for some odd reason, a restricted part of a stimulus is processed before the rest
of the stimulus is processed. Then, exponentially propagating symmetry detection is hampered,
whereas linearly propagating repetition detection is not or hardly hampered (see Figure 6.4). By
way of analogy, one may think of a slow car for which it matters hardly whether or not there is
much traffic on the road, versus a fast car for which it matters a lot. Such a split-stimulus situation
seems to occur if the restricted part contains relative large and therefore salient blobs. Such blobs
can plausibly be assumed to be processed first, namely, due to the spatial filtering difference, in the
lateral geniculate nucleus, between the magnocellular pathway (which mediates relatively course
structures relatively fast) and the parvocellular pathway (which mediates relatively fine structures
relatively slow). Hence, then, the holographic bootstrap model predicts that symmetry detec-
tion is hampered by such blobs. Furthermore, due to the number effect in repetition (see Section
“Representation models of symmetry detection”), repetition detection is actually predicted to
benefit from such blobs. Both predictions were confirmed empirically by Csathó et al. (2003).
They are also relevant to the evolutionary biology discussion on whether symmetry or size—of
sexual ornaments and other morphological traits—is the more relevant factor in mate selection
(e.g., Breuker and Brakefield 2002; Goddard and Lawes 2000; Morris 1998). That is, a global sym-
metry may be salient as such but its salience is reduced by salient local traits.
Conclusion
Visual symmetry will probably remain an inexhaustible topic in many research domains. It is
instrumental in ordering processes that counter natural tendencies towards chaos. Thereby, it is
122 van der Helm
Fig. 6.4 Holographic bootstrapping in case of split stimuli, for symmetry (top) and repetition
(bottom). Going from left to right, suppose that, at a first stage, only the grey areas in the stimuli are
available to the regularity detection process. Then, at first, the propagation proceeds as usual (the
structure detected so far is indicated by black dots). The restriction to the grey areas, however, stops
the exponentially spreading propagation in symmetry sooner than the linearly spreading propagation
in repetition—hence symmetry is hindered more by the split situation than repetition is. When, later,
the rest of the stimulus becomes available, the propagation again proceeds as usual and symmetry
restores its advantage over repetition.
probably also the most important regularity in the interaction between vision and the world. In
vision, there is still unclarity about its exact role in perceptual organization (which depends on
interactions between various grouping factors), but its detectability is extraordinary. The percep-
tual sensitivity to symmetry seems part of an evolutionary package deal, that is, evolution seems
to have yielded a detection mechanism that includes a lower sensitivity to repetition (which is
also less relevant evolutionary) but an equally high sensitivity to Glass patterns (even though
these are even less relevant evolutionary). Therefore, rather than focusing on the relevance of
individual regularities in the external world, it seems expedient to focus on internal percep-
tual mechanisms to explain these sensitivities in a unified fashion. As discussed on the basis
of empirical evidence, these mechanisms seem to rely not only on fairly precise correlations
between stimulus elements, but also on spatial filtering to establish what the to-be-correlated
elements might be.
Acknowledgment
Preparation of this chapter was supported by Methusalem grant METH/08/02 awarded to Johan
Wagemans (www.gestaltrevision.be).
References
Allen, G. (1879). ‘The origin of the sense of symmetry’. Mind 4: 301–316.
Attneave, F. (1954). ‘Some informational aspects of visual perception’. Psychological Review 61: 183–193.
Bahnsen, P. (1928). ‘Eine untersuchung über symmetrie und asymmetrie bei visuellen wahrnehmungen’.
Zeitschrift für Psychologie 108: 355–361.
Barlow, H. B., and B. C. Reeves (1979). ‘The versatility and absolute efficiency of detecting mirror
symmetry in random dot displays’. Vision Research 19: 783–793.
Baylis, G. C., and J. Driver (1994). ‘Parallel computation of symmetry but not repetition within single
visual shapes’. Visual Cognition 1: 377–400.
Baylis, G. C., and J. Driver (1995). ‘Obligatory edge assignment in vision: The role of figure and part
segmentation in symmetry detection’. Journal of Experimental Psychology: Human Perception and
Performance 21: 1323–1342.
Symmetry perception 123
Beck, D. M., M. A. Pinsk, and S. Kastner (2005). ‘Symmetry perception in humans and macaques’. Trends
in Cognitive Sciences 9: 405–406.
Beh, H. C., and C. R. Latimer (1997). ‘Symmetry detection and orientation perception: Electrocortical
responses to stimuli with real and implicit axes of orientation’. Australian Journal of Psychology
49: 128–133.
Biederman, I. (1987). ‘Recognition-by-components: A theory of human image understanding’. Psychological
Review 94: 115–147.
Binford, T. (1981). ‘Inferring surfaces from images’. Artificial Intelligence 17: 205–244.
Boselie, F., and E. L. J. Leeuwenberg (1985). ‘Birkhoff revisited: Beauty as a function of effect and means’.
American Journal of Psychology 98: 1–39.
Breuker, C. J., and P. M. Brakefield (2002). ‘Female choice depends on size but not symmetry of dorsal
eyespots in the butterfly Bicyclus anynana’. Proceedings of the Royal Society of London B 269: 1233–1239.
Bruce, V. G., and M. J. Morgan (1975). ‘Violations of symmetry and repetition in visual patterns’. Perception
4: 239–249.
Chipman, S. F. (1977). ‘Complexity and structure in visual patterns’. Journal of Experimental
Psychology: General 106: 269–301.
Corballis, M. C., and C. E. Roldan (1974). On the perception of symmetrical and repeated patterns’.
Perception and Psychophysics 16: 136–142.
Corballis, M. C., and C. E. Roldan (1975). ‘Detection of symmetry as a function of angular orientation’.
Journal of Experimental Psychology: Human Perception and Performance 1: 221–230.
Corballis, M. C., G. A. Miller, and M. J. Morgan (1971). ‘The role of left-right orientation in
interhemispheric matching of visual information’. Perception and Psychophysics 10: 385–388.
Csathó, Á., G. van der Vloed, and P. A. van der Helm (2003). ‘Blobs strengthen repetition but weaken
symmetry’. Vision Research 43: 993–1007.
Csathó, Á., G. van der Vloed, and P. A. van der Helm (2004). ‘The force of symmetry
revisited: Symmetry-to-noise ratios regulate (a)symmetry effects’. Acta Psychologica 117, 233–250.
Dakin, S. C., and A. M. Herbert (1998). ‘The spatial region of integration for visual symmetry detection’.
Proceedings of the Royal Society London B 265: 659–664.
Dakin, S. C., and R. F. Hess (1997). ‘The spatial mechanisms mediating symmetry perception’. Vision
Research 37: 2915–2930.
Dakin, S. C., and R. J. Watt (1994). ‘Detection of bilateral symmetry using spatial filters’. Spatial Vision 8:
393–413.
Driver, J., G. C. Baylis, and R. D. Rafal (1992). ‘Preserved figure-ground segregation and symmetry
perception in visual neglect’. Nature 360: 73–75.
Dry, M. (2008). ‘Using relational structure to detect symmetry: A Voronoi tessellation based model of
symmetry perception’. Acta Psychologica 128: 75–90.
Enquist, M., and A. Arak (1994). ‘Symmetry, beauty and evolution’. Nature 372: 169–172.
Fechner, G. T. (1860). Elemente der Psychophysik. (Leipzig: Breitkopf und Härtel).
Feldman, J. (this volume). Probabilistic models of perceptual features. In Oxford Handbook of Perceptual
Organization, edited by J. Wagemans. (Oxford: Oxford University Press).
Fisher, C. B., and M. H. Bornstein (1982). ‘Identification of symmetry: Effects of stimulus orientation and
head position’. Perception and Psychophysics 32: 443–448.
Forstner, D. (1961). Die Welt der Symbole [The world of symbols]. (Innsbruck: Tyriola Verlag).
Freyd, J., and B. Tversky (1984). ‘Force of symmetry in form perception’. American Journal of Psychology
97: 109–126.
Giurfa, M., B. Eichmann, and R. Menzel (1996). ‘Symmetry perception in an insect’. Nature, 382: 458–461.
Giurfa, M., A. Dafni, and P. R. Neal (1999). ‘Floral symmetry and its role in plant-pollinator systems’.
International Journal of Plant Sciences 160: S41–S50.
124 van der Helm
Leeuwenberg, E. L. J., and H. F. J. M. Buffart (1984). ‘The perception of foreground and background as
derived from structural information theory’. Acta Psychologica 55: 249–272.
Leeuwenberg, E. L. J., and P. A. van der Helm (2013). Structural information theory: The simplicity of visual
form. (Cambridge, UK: Cambridge University Press).
Leeuwenberg, E. L. J., P. A. van der Helm, and R. J. van Lier (1994). ‘From geons to structure: A note on
object classification’. Perception 23: 505–515.
Locher, P., and G. Smets (1992). ‘The influence of stimulus dimensionality and viewing orientation on
detection of symmetry in dot patterns’. Bulletin of the Psychonomic Society 30: 43–46.
Mach, E. (1886). Beiträge zur Analyse der Empfindungen [Contributions to the analysis of sensations]. (Jena,
Germany: Gustav Fisher).
Machilsen, B., M. Pauwels, and J. Wagemans (2009). ‘The role of vertical mirror symmetry in visual shape
detection’. Journal of Vision 9: 1–11.
MacKay, D. (1969). Information, mechanism and meaning. (Boston: MIT Press).
Malach, R., J. B. Reppas, R. R. Benson, K. K. Kwong, H. Jiang, W. A. Kennedy, P. J. Ledden, T. J. Brady,
B. R. Rosen, and R. B. H. Tootell (1995). ‘Object-related activity revealed by functional magnetic
resonance imaging in human occipital cortex’. Proceedings of the National Academy of Sciences USA
92: 8135–8139.
Maloney, R. K., G. J. Mitchison, and H. B. Barlow (1987). ‘Limit to the detection of Glass patterns in the
presence of noise’. Journal of the Optical Society of America A 4: 2336–2341.
Mancini, S., S. L. Sally, and R. Gurnsey (2005). ‘Detection of symmetry and anti-symmetry’. Vision
Research 45: 2145–2160.
Masame, K. (1986). ‘Rating of symmetry as continuum’. Tohoku Psychologica Folia 45: 17–27.
Masame, K. (1987). ‘Judgment of degree of symmetry in block patterns’. Tohoku Psychologica Folia
46: 43–50.
Møller, A. P. (1990). ‘Fluctuating asymmetry in male sexual ornaments may reliably reveal male quality’.
Animal Behaviour 40: 1185–1187.
Møller, A. P. (1992). ‘Female swallow preference for symmetrical male sexual ornaments’. Nature
357: 238–240.
Møller, A. P. (1995). ‘Bumblebee preference for symmetrical flowers’. Proceedings of the National Academy of
Science USA 92: 2288–2292.
Morales, D., and H. Pashler (1999). ‘No role for colour in symmetry perception’. Nature 399: 115–116.
Morris, M. R. (1998). ‘Female preference for trait symmetry in addition to trait size in swordtail fish’.
Proceedings of the Royal Society of London B 265: 907–911.
Nucci, M., and J. Wagemans (2007). ‘Goodness of regularity in dot patterns: global symmetry, local
symmetry, and their interactions’. Perception 36: 1305–1319.
Olivers, C. N. L., and P. A. van der Helm (1998). ‘Symmetry and selective attention: A dissociation between
effortless perception and serial search’. Perception and Psychophysics 60: 1101–1116.
Olivers, C. N. L., N. Chater, and D. G. Watson (2004). ‘Holography does not account for
goodness: A critique of van der Helm and Leeuwenberg (1996)’. Psychological Review 111: 261–273.
Osorio, D. (1996). ‘Symmetry detection by categorization of spatial phase, a model’. Proceedings of the Royal
Society of London B 263: 105–110.
Osorio, D., and I. C. Cuthill (this volume). Camouflage and perceptual organization in the animal
kingdom. In Oxford Handbook of Perceptual Organization, edited by J. Wagemans. (Oxford: Oxford
University Press).
Palmer, S. E. (1983). ‘The psychology of perceptual organization: A transformational approach’. In Human and
machine vision, edited by J. Beck, B. Hope, and A. Rosenfeld, pp. 269–339. New York: Academic Press.
Palmer, S. E., J. L. Brooks, and R. Nelson (2003). ‘When does grouping happen?’ Acta Psychologica
114: 311–330.
126 van der Helm
Palmer, S. E., and K. Hemenway (1978). ‘Orientation and symmetry: Effects of multiple, rotational, and
near symmetries’. Journal of Experimental Psychology: Human Perception and Performance 4: 691–702.
Palmer, S. E., and I. Rock (1994). ‘Rethinking perceptual organization: The role of uniform connectedness’.
Psychonomic Bulletin and Review 1: 29–55.
Pashler, H. (1990). ‘Coordinate frame for symmetry detection and object recognition’. Journal of
Experimental Psychology: Human Perception and Performance 16: 150–163.
Poirier, F. J. A. M. and H. R. Wilson (2010). ‘A biologically plausible model of human shape symmetry
perception’. Journal of Vision 10: 1–16.
Rainville, S. J. M., and F. A. A. Kingdom (2000). ‘The functional role of oriented spatial filters in the
perception of mirror symmetry-psychophysics and modeling’. Vision Research 40: 2621–2644.
Rainville, S. J. M., and F. A. A. Kingdom (2002). ‘Scale invariance is driven by stimulus density’. Vision
Research 42: 351–367.
Rappaport, M. (1957). ‘The role of redundancy in the discrimination of visual forms’. Journal of
Experimental Psychology 53: 3–10.
Rock, I., and R. Leaman (1963). ‘An experimental analysis of visual symmetry’. Acta Psychologica
21: 171–183.
Roddy, G., and R. Gurnsey (2011). ‘Mirror symmetry is subject to crowding’. Symmetry 3: 457–471.
Saarinen, J. (1988). ‘Detection of mirror symmetry in random dot patterns at different eccentricities’. Vision
Research 28: 755–759.
Saarinen, J., and D. M. Levi (2000). ‘Perception or mirror symmetry reveals long-range interactions
between orientation-selective cortical filters’. Neuroreport 11: 2133–2138.
Sally, S., and R. Gurnsey (2001). ‘Symmetry detection across the visual field’. Spatial Vision 14: 217–234.
Sasaki, Y., W. Vanduffel, T. Knutsen, C. Tyler, and R. B. H. Tootell (2005). ‘Symmetry activates extrastriate
visual cortex in human and nonhuman primates’. Proceedings of the National Academy of Sciences USA
102: 3159–3163.
Sawada, T., Y. Li, and Z. Pizlo (2011). ‘Any pair of 2D curves is consistent with a 3D symmetric
interpretation’. Symmetry 3: 365–388.
Schmidt, F., and T. Schmidt (2014). ‘Rapid processing of closure and viewpoint-invariant symmetry:
behavioral criteria for feedforward processing’. Psychological Research 78: 37–54.
Scognamillo, R., G. Rhodes, C. Morrone, and D. Burr (2003). ‘A feature-based model of symmetry
detection’. Proceedings of the Royal Society B: Biological Sciences 270: 1727–1733.
Shubnikov, A. V., and V. A. Koptsik (1974). Symmetry in science and art. (New York: Plenum).
Sun, G., D. L. Dilcher, H. Wang, and Z. Chen (2011). ‘A eudicot from the Early Cretaceous of China’.
Nature 471: 625–628.
Swaddle, J., and I. C. Cuthill (1993). ‘Preference for symmetric males by female zebra finches’. Nature
367: 165–166.
Szlyk, J. P., I. Rock, and C. B. Fisher (1995). ‘Level of processing in the perception of symmetrical forms
viewed from different angles’. Spatial Vision 9: 139–150.
Tapiovaara, M. (1990). ‘Ideal observer and absolute efficiency of detecting mirror symmetry in random
images’. Journal of the Optical Society of America A 7: 2245–2253.
Tjan, B. S., and Z. Liu (2005). ‘Symmetry impedes symmetry discrimination’. Journal of Vision 5: 888–900.
Treder, M. S. (2010). ‘Behind the looking-glass: a review on human symmetry perception’. Symmetry 2:
1510–1543.
Treder, M. S., and P. A. van der Helm (2007). ‘Symmetry versus repetition in cyclopean
vision: A microgenetic analysis’. Vision Research 47: 2956–2967.
Treder, M. S., G. van der Vloed, and P. A. van der Helm (2011). ‘Interactions between constituent single
symmetries in multiple symmetry’. Attention, Perception and Psychophysics 73: 1487–1502.
Symmetry perception 127
Troscianko, T. (1987). ‘Perception of random-dot symmetry and apparent movement at and near
isoluminance’. Vision Research 27: 547–554.
Tyler, C. W. (1996). ‘Human symmetry perception’. In Human symmetry perception and its computational
analysis, edited by C. W. Tyler, pp. 3–22. (Zeist, The Netherlands: VSP).
Tyler, C. W. (1999). ‘Human symmetry detection exhibits reverse eccentricity scaling’. Visual Neuroscience
16: 919–922.
Tyler, C. W., and L. Hardage (1996). ‘Mirror symmetry detection: Predominance of second-order pattern
processing throughout the visual field’. In Human symmetry perception and its computational analysis,
edited by C. W. Tyler, pp. 157–172. (Zeist, The Netherlands: VSP).
Tyler, C. W., and H. A. Baseler (1998). ‘fMRI signals from a cortical region specific for multiple pattern
symmetries’. Investigative Ophthalmology and Visual Science 39 (Suppl.): 169.
Tyler, C. W., H. A. Baseler, L. L. Kontsevich, L. T. Likova, A. R. Wade, and B. A. Wandell (2005).
‘Predominantly extra-retinotopic cortical response to pattern symmetry’. NeuroImage 24: 306–314.
van der Helm, P. A. (2010). ‘Weber-Fechner behaviour in symmetry perception? ’ Attention, Perception and
Psychophysics 72: 1854–1864.
van der Helm, P. A. (2011). ‘The influence of perception on the distribution of multiple symmetries in
nature and art’. Symmetry 3: 54–71.
van der Helm, P. A. (2014). Simplicity in vision: A multidisciplinary account of perceptual organization.
(Cambridge, UK: Cambridge University Press).
van der Helm. P. A. (this volume). Simplicity in perceptual organization. In Oxford Handbook of Perceptual
Organization, edited by J. Wagemans. (Oxford: Oxford University Press).
van der Helm, P. A., and E. L. J. Leeuwenberg (1991). ‘Accessibility, a criterion for regularity and hierarchy
in visual pattern codes’. Journal of Mathematical Psychology 35: 151–213.
van der Helm, P. A., and E. L. J. Leeuwenberg (1996). ‘Goodness of visual regularities: A nontrans
formational approach’. Psychological Review 103: 429–456.
van der Helm, P. A., and E. L. J. Leeuwenberg (1999). ‘A better approach to goodness: Reply to Wagemans
(1999)’. Psychological Review 106: 622–630.
van der Helm, P. A., and E. L. J. Leeuwenberg (2004). ‘Holographic goodness is not that bad: Reply to
Olivers, Chater, and Watson (2004)’. Psychological Review 111: 261–273.
van der Helm, P. A., and M. S. Treder (2009). ‘Detection of (anti)symmetry and (anti)repetition: Perceptual
mechanisms versus cognitive strategies’. Vision Research 49: 2754–2763.
van der Vloed, G., Á. Csathó, and P. A. van der Helm (2005). ‘Symmetry and repetition in perspective’.
Acta Psychologica 120: 74–92.
van der Zwan, R., E. Leo, W. Joung, C. R. Latimer, and P. Wenderoth (1998). ‘Evidence that both area V1
and extrastriate visual cortex contribute to symmetry perception’. Current Biology 8: 889–892.
van Lier, R. J., P. A. van der Helm, and E. L. J. Leeuwenberg (1995). ‘Competing global and local
completions in visual occlusion’. Journal of Experimental Psychology: Human Perception and Performance
21: 571–583.
van Tonder, G. J., and D. Vishwanath (this volume). Design insights: Gestalt, Bauhaus and Japanese
gardens. In Oxford Handbook of Perceptual Organization, edited by J. Wagemans. (Oxford: Oxford
University Press).
Vetter, T., and T. Poggio (1994). ‘Symmetric 3D objects are an easy case for 2D object recognition’. Spatial
Vision 8: 443–453.
Wagemans, J. (1993). ‘Skewed symmetry: A nonaccidental property used to perceive visual forms’. Journal
of Experimental Psychology: Human Perception and Performance 19: 364–380.
Wagemans, J. (1997). ‘Characteristics and models of human symmetry detection’. Trends in Cognitive
Science 1: 346–352.
128 van der Helm
Wagemans, J. (1999). ‘Toward a better approach to goodness: Comments on van der Helm and
Leeuwenberg (1996)’. Psychological Review 106: 610–621.
Wagemans, J., L. van Gool, and G. d’Ydewalle (1991). ‘Detection of symmetry in tachistoscopically
presented dot patterns: Effects of multiple axes and skewing’. Perception and Psychophysics 50: 413–427.
Wagemans, J., L. van Gool, and G. d’Ydewalle (1992). ‘Orientational effects and component processes in
symmetry detection’. The Quarterly Journal of Experimental Psychology 44A: 475–508.
Wagemans, J., L. van Gool, V. Swinnen, and J. van Horebeek (1993). ‘Higher-order structure in regularity
detection’. Vision Research 33: 1067–1088.
Washburn, D. K., and D. W. Crowe (1988). Symmetries of culture: Theory and practice of plane pattern
analysis. (Washington, D. C., University of Washington Press).
Weber, E. H. (1834). De tactu [Concerning touch]. (New York: Academic Press).
Wenderoth, P. (1994). ‘The salience of vertical symmetry’. Perception 23: 221–236.
Wenderoth, P. (1995). ‘The role of pattern outline in bilateral symmetry detection with briefly flashed dot
patterns’. Spatial Vision 9: 57–77.
Wenderoth, P. (1996a). ‘The effects of dot pattern parameters and constraints on the relative salience of
vertical bilateral symmetry’. Vision Research 36: 2311–2320.
Wenderoth, P. (1996b). ‘The effects of the contrast polarity of dot-pair partners on the detection of bilateral
symmetry’. Perception 25: 757–771.
Wenderoth, P., and S. Welsh (1998). ‘Effects of pattern orientation and number of symmetry axes on the
detection of mirror symmetry in dot and solid patterns’. Perception 27: 965–976.
Wertheimer, M. (1912). ‘Experimentelle Studien über das Sehen von Bewegung’ [Experimental study on
the perception of movement]. Zeitschrift für Psychologie 12: 161–265.
Wertheimer, M. (1923). ‘Untersuchungen zur Lehre von der Gestalt [On Gestalt theory]’. Psychologische
Forschung 4: 301–350.
Weyl, H. (1952). Symmetry. (Princeton, NJ: Princeton University Press).
Wynn, T. (2002). ‘Archaeology and cognitive evolution’. Behavioral and Brain Sciences 25: 389–402,
432–438.
Yodogawa, E. (1982). ‘Symmetropy, an entropy-like measure of visual symmetry’. Perception and
Psychophysics 32: 230–240.
Zanker, J. M. (1995). ‘Does motion perception follow Weber’s law?’ Perception 24: 363–372.
Zhang, L., and W. Gerbino (1992). ‘Symmetry in opposite-contrast dot patterns’. Perception 21
(Supp. 2): 95a.
Zimmer, A. C. (1984). ‘Foundations for the measurement of phenomenal symmetry’. Gestalt Theory
6: 118–157.
Chapter 7
Introduction
Visual objects are viewed as a prime example of hierarchical structure; they can be defined as “multi-
level hierarchical structure of parts and wholes” (Palmer 1977). For instance, a human body is com-
posed of parts—head, legs, arms, etc., which in turn are composed of parts—eyes, nose, and so forth.
The perceptual relations between wholes and their component parts have been a controver-
sial issue for psychologists and philosophers before them. In psychology it can be traced back to
the controversy between Structuralism and Gestalt. The Structuralists, rooted firmly in British
Empiricism, claimed that perceptions are constructed from atoms of elementary, unrelated local
sensations that are unified by associations due to spatial and temporal contiguity. The Gestalt
theorists rejected both atomism and associationism. According to the doctrine of holism in tra-
ditional Gestalt psychology, a specific sensory whole is qualitatively different from the complex
that one might predict by considering only its individual parts, and the quality of a part depends
upon the whole in which this part is embedded (Köhler 1930/1971; Wertheimer 1923/1938; see
also Wagemans, this volume).
This chapter focuses on some modern attempts to grapple with the issue of part-whole relation-
ships: global precedence and the primacy of holistic properties. I begin with the presentation of
the global precedence hypothesis and the global-local paradigm, followed by a brief review of the
empirical findings concerning the boundary conditions of the global advantage effect, its source
and its brain localization. The following sections focus on the microgenesis and the ontogenesis
of the perception of hierarchical structure. I then discuss some issues concerning the interpreta-
tion of the global advantage effect, present a refinement of terminology between global proper-
ties and holistic/configural properties, and review empirical evidence for this distinction and for
the primacy of holistic properties. I close by briefly considering the implications of the empiri-
cal evidence for the understanding of the perception of hierarchical structure and part-whole
relationship.
Global precedence
The global precedence hypothesis, proposed by Navon (1977), states that perceptual processing
proceeds from the global structure towards analysis of more local details. Viewing a visual object
as represented by a hierarchical network with nested relationships (e.g., Palmer 1977), the glo-
bality of a visual property corresponds to the place it occupies in the hierarchy: Properties at
the top of the hierarchy are more global than those at the bottom, which in turn are more local.
Consider, for example, a human face: The spatial relationship between the facial components (e.g.,
eyes, nose, mouth) is more global than the specific shapes of the components, and in turn, the
relationship between the subparts of a component is more global than the specific properties of
the subparts. The global precedence hypothesis claims that the processing of an object is global to
130 Kimchi
local; namely, more global properties of a visual object are processed first, followed by analysis of
more local properties.
The global precedence hypothesis has been tested by studying the perception of hierarchical
patterns in which larger figures are constructed by suitable arrangement of smaller figures (first
introduced by Asch 1962, and later by Kinchla 1974, 1977). An example is a set of large letters
constructed from the same set of smaller letters having either the same identity as the larger letter
or a different identity (see Figure 7.1). These hierarchical patterns satisfy two conditions, which
were considered by Navon (1977, 1981, 2003) to be critical for testing the hypothesis: first, the
global and local structures can be equated in familiarity, complexity, codability, and identifiability,
so they differ only in level of globality, and second, the two structures can be independent so that
one structure cannot be predicted from the other.
In one experimental paradigm, which has become very popular, observers are presented with
such stimuli and are required to identify the larger (global) or the smaller (local) letter in separate
blocks of trials. Findings of global advantage—namely, faster identification of the global letter
than the local letter and disruptive influence from irrelevant global conflicting information on
local identification (global-to-local interference)—are taken as support for the global precedence
hypothesis (e.g., Navon 1977, experiment 3).
Much of the research following Navon’s (1977) seminal work has been concentrating on delin-
eating boundary conditions of the global advantage effect, examining its locus (perceptual or
post-perceptual), and its localization in the brain (see Kimchi 1992, and Navon 2003, for reviews).
Global advantage: boundary conditions. Several studies have pointed out certain variables that
can moderate or even reverse the effect. Global advantage is not likely to occur when the overall
visual angle of the hierarchical stimulus exceeds 7º—10º (Kinchla and Wolfe 1979; Lamb and
Robertson 1990), but the effect is just modulated when eccentricity of both levels is equated (e.g.,
Amirkhiabani and Lovegrove 1999; Navon and Norman 1983). Global advantage is also less likely
to occur with spatial certainty than spatial uncertainty (e.g., Lamb and Robertson 1988), with
S SS SS HHH HHH
SS
SS H
S SS S HH H H H
S H
S H
SS S S S S S HH H HH H
Consistent Conflicting
S S H H
S S H H
S S H H
SSSSSS HHHHH H
S S H H
S S H H
S S H H
Conflicting Consistent
Fig. 7.1 An example of Navon’s hierarchical letters: large H’s and S’s are composed of small H’s and S’s.
Reprinted from Cognitive Psychology, 9(3), David Navon, Forest before trees: The precedence of global features in
visual perception, pp. 353–83, Copyright (1977), with permission from Elsevier.
The Perception of Hierarchical Structure 131
central than peripheral presentation (e.g., Grice et al. 1983; Pomerantz 1983; but see, e.g., Luna
et al. 1990; Navon and Norman 1983), with sparse than dense elements (e.g., Martin 1979), with
few relatively large elements than many relatively small elements (Kimchi 1988; Kimchi and
Palmer 1982, 1985; Yovel et al. 2001), with long than short exposure duration (e.g., Luna 1993;
Paquet and Merikle 1984), and when the goodness of the local forms or their meaningfulness are
superior to that of the global form (e.g., LaGasse 1994; Poirel et al. 2006; Sebrechts and Fragala
1985). The global advantage effect can be also modulated by direct and indirect attentional manip-
ulations (e.g., Han and Humphreys 2002; Kinchla et al. 1983; Lamb et al. 2000; Robertson 1996;
Ward 1982). For example, Han and Humphreys (2002, experiment 1) showed that when attention
was divided between the local and global levels, the presence of a salient local element, which pre-
sumably captured attention, speeded responses to local targets while slowing responses to global
targets.
The source of global advantage. The source (or the locus) of the global advantage effect is still
disputed. Several investigators concluded that the source of global advantage is perceptual (e.g.,
Andres and Fernandes 2006; Broadbent 1977; Han et al. 1997; Han and Humphreys 1999; Koivisto
and Revonsuo 2004; Miller and Navon 2002; Navon 1977, 1991; Paquet 1999; Paquet and Merikle
1988), possibly as a result of early perceptual-organizational processes (Han and Humphreys 2002;
Kimchi 1998, 2000, 2003b). The involvement of organizational processes in global advantage is
discussed in detail later in the chapter. It has been also suggested that global advantage arises from
a sensory mechanism—faster processing of low spatial frequencies than high spatial frequencies
(e.g., Badcock et al. 1990; Han et al. 2002; Hughes et al. 1990; Shulman et al. 1986; Shulman and
Wilson 1987). Although the differential processing rate of low and high spatial frequencies may
play a role in global and local perception, it cannot account for several findings (e.g., Behrmann
and Kimchi 2003; Kimchi 2000; Navon 2003). For example, it cannot handle the effects of mean-
ingfulness and goodness of form on global/local advantage (e.g., Poirel et al. 2006; Sebrechts and
Fragala 1985). Also, Behrmann and Kimchi (2003) reported that two individuals with acquired
integrative visual object agnosia exhibited normal spatial frequency thresholds in both the high-
and low-frequency range, yet both were impaired, and differentially so, at deriving the global
shape of multi-element hierarchical stimuli. Other investigators suggested that global advantage
arises in some post-perceptual process (e.g., Boer and Keuss 1982; Miller 1981a, 1981b; Ward
1982). This view is supported by the findings demonstrating that attention typically modulates
the global advantage effect (e.g., Kinchla et al. 1983; Lamb et al. 2000; Robertson 1996), but, as
noted by Navon (2003), attention can magnify biases that originate prior to the focusing of atten-
tion. Similarly, an effect that arises at the perceptual level can be magnified by post-perceptual
processes, such as response-related processes (Miller and Navon 2002).
Global advantage: brain localization. Data from behavioral and functional neuroimaging studies
are seen to suggest functional hemispheric asymmetry in global versus local perception, with
the right hemisphere biased toward global processing and the left hemisphere biased toward
local processing (e.g., Delis et al. 1986; Fink et al. 1997; Kimchi and Merhav 1991; Robertson
et al. 1993; Weissman and Woldorff 2005). One view suggests that this asymmetry is related
to the relation between spatial frequency processing and global and local perception. Ivry and
Robertson (1998; Robertson and Ivry 2000), proponents of this view, proposed that there are two
stages of spatial frequency filtering, and the two hemispheres differ in the secondary stage that is
sensitive to the relative rather than absolute spatial frequencies. The left hemisphere emphasizes
information from the higher spatial frequencies within the initially selected range, and the right
hemisphere emphasizes the lower spatial frequencies, with the result that the right hemisphere
is preferentially biased to process global information and the left hemisphere local information.
132 Kimchi
Alternative accounts for the hemispheric asymmetry in global/local processing include the
proposal of hemispheric differences in sensitivity to the saliency of the stimulus, with the right
hemisphere biased toward more salient objects and the left hemisphere biased toward less salient
objects (Mevorach et al. 2006a, 2006b), and the integration hypothesis, which suggests that the
hemispheres are equivalent with respect to shape identification but differ in their capacities for
integrating shape and level information, with the right hemisphere involved in binding shapes to
the global level and the left hemisphere involved in binding shapes to the local level (Hubner and
Volberg 2005).
Element
Few-element
similarity
Configuration
similarity
Element
Many-element
similarity
Configuration
similarity
(b)
40
30
20
10
0
Priming (msec)
few-element patterns, search for local elements was fast and efficient, whereas the global configu-
ration was searched less efficiently (see also, Enns and Kingstone 1995).
The results of the microgenetic analysis show that the relative dominance of the global configu-
ration and the local elements varies during the evolution of the percept, presumably as a result of
grouping and individuation processes that operate in early perceptual processing. Many, relatively
small elements are grouped into global configuration rapidly and effortlessly, providing an early
134 Kimchi
(a) (b)
representation of global structure; the individuation of the elements occurs later and appears to be
time consuming and attention demanding. Few, relatively large elements, on the other hand, are
individuated rapidly and effortlessly and their grouping into a global configuration consumes time
and requires attention. Kimchi (1998) suggested that early and rapid grouping of many small ele-
ments on the one hand, and early and rapid individuation of a few large elements on the other hand,
are desirable characteristics for a system whose one of its goals is object identification and recogni-
tion, because many small elements close to one another are likely to be texture elements of a single
object, whereas a few large elements are likely to be several discrete objects or several distinctive
parts of a complex object.1
Notwithstanding the critical role of number and relative size of the elements in the micro-
genesis of the perception of hierarchical patterns, additional research has suggested that the
“nature” of the elements also plays an important role (Han et al. 1999; Kimchi 1994, 2000),
further demonstrating the involvement of organizational processes in global advantage. Thus,
when the few, relatively large elements are open-ended line segments as opposed to closed
shapes (Figure 7.3), their configuration, rather than the elements, is available at brief exposure
duration, provided the presence of collinearity and/or closure (Kimchi 2000). Furthermore
the advantage of the global level of many-element patterns can be modulated and even van-
ish, depending on how strongly the local elements group and on the presence of strong cues
to segment the local elements, as when closure is present at the local level (Han et al. 1999;
Kimchi 1994).
1 Note that in these hierarchical patterns the number of elements is correlated with their relative size for strictly
geometrical reasons: increasing the number of elements necessarily results in decreasing their relative size as
long as the overall size of the pattern is kept constant. The effect of relative size can be separated from that of
number by constructing patterns in which there are only a few element that are relatively small or large, but if
the global size is to be kept constant, other factors, such as relative spacing may be involved. Furthermore, it is
impossible to completely isolate the effect of number from the effect of size because the complete orthogonal
design combining number and relative size would require a geometrically problematic figure—a pattern com-
posed of many relatively large elements (see Kimchi and Palmer 1982, for discussion).
The Perception of Hierarchical Structure 135
T T
Few-element
D D
Many-element
T T
D D
(b)
60
50
Reaction time slope (ms/item)
40
30
20
10
–10 5 10 14 23
Age (years)
Few-global Many-global
Few-local Many-local
Fig. 7.4 (a) Examples of displays in the visual search task used by Kimchi et al. (2005). An example is
shown for each combination of pattern (many-elements or few-elements) and target (global or local).
The target (T) and distractors (D) for each example are indicated. All the examples presented illustrate
display size of 6. (b) Search slopes for global and local targets as a function of pattern and age.
Reproduced from Ruth Kimchi, Batsheva Hadad, Marlene Behrmann, and Stephen E. Palmer, Psychological
Science, 16(4), Microgenesis and Ontogenesis of Perceptual Organization: Evidence From Global and Local
Processing of Hierarchical Patterns, pp. 282–90, doi:10.1111/j.0956-7976.2005.01529.x Copyright © 2005 by
SAGE Publications. Reprinted by Permission of SAGE Publications.
These findings may help resolve some of the apparent contradictions in the developmental literature
mentioned earlier. Enns et al. (2000; Burack 2000) used few-element patterns and found age-related
improvements in search rates for globally-defined but not for locally-defined targets. Mondloch et al.
(2003), on the other hand, used many-element patterns and found age-related improvements for local
but not for global processing. Thus, depending on the nature of the stimuli used, the different studies
tapped into different processes that emerge along different developmental trajectories.
The Perception of Hierarchical Structure 137
(b)
15
Percentage error
10
0
5 10 14 22
Age
Few-global Many-global
Few-local Many-local
Importantly, however, the adult-like grouping of many small elements observed with the
younger children in the visual search and classification tasks (Kimchi et al. 2005) may not
reflect the same level of functioning as the fast and early grouping observed in adults in the
primed matching task (Kimchi 1998), as suggested by the findings of Scherf et al. (2009). Using
the primed matching task, Scherf et al. (2009) found age-related improvement in the ability to
derive the global shape of the many-element patterns at the short prime durations that contin-
ued through adolescence. It is possible then, that different tasks tap into different levels of the
organizational abilities. Children are capable of grouping elements into global configuration to a
certain degree, which may suffice to support performance in the visual search and classification
tasks, but when confronted with more challenging task such as primed matching under brief
exposures, adult-like performance emerged only in adolescence, indicating that the full pro-
cess of integrating local elements into coherent shapes to the extent of facilitating global shape
identification develops late into adolescence. This long developmental trajectory coincides with
138 Kimchi
what is known about the structural and functional development of the ventral visual pathway
(Bachevalier et al. 1991; Gogtay et al. 2004).
The findings concerning the development of the perception of hierarchical structure converge
with other findings reported in the literature, suggesting that there is a protracted developmental
trajectory for some perceptual organization abilities, even those that appear to emerge during
infancy (see Kimchi 2012, for a review and discussion).
argued that globality is inherently confounded with relative size, that it is a fact of nature that rela-
tive size is “an inherent concomitant of part–whole relationship.” This is indeed the case if global
properties are properties of a higher level unit. For example, the shape of a face is larger than the
shape of its nose. Yet, if global properties are meant to be properties that depend on the relation-
ship between the components, as the theoretical motivation for the global precedence hypothesis
seems to imply (e.g., Navon 1977, 2003), then the essential difference between global proper-
ties and component properties is not captured by their relative size. To distinguish, for example,
squareness from the component vertical and horizontal lines of a square, or faceness from the
facial components of a face, based only on their relative sizes would miss the point.
Thus, a refinement of terminology is called for between global properties, which are defined
by the level they occupy within the hierarchical structure of the stimulus, and holistic/configural
properties that arise from the interrelations between the component properties of the stimulus
(Kimchi 1992, 1994). Evidence concerning the primacy of holistic properties and the distinction
between holistic properties and global properties is presented in the next sections.
(a) (b)
(c) (d)
Fig. 7.6 Examples of the stimulus sets for the discrimination and classification tasks used by Kimchi
(1994) and Kimchi and Bloch (1998). Four simple lines that vary in orientation (a) are grouped into the
stimuli in (b). Four simple lines that vary in curvature (c) are grouped into the stimuli in (d). Note that for
the stimuli in (d), configurations that share holistic properties (e.g., closure) are not, unlike those in (b),
simple rotation of one another.
Parts (a) and (b) are reproduced from Ruth Kimchi, The role of wholistic/configural properties versus global
properties in visual form perception, Perception, 23(5), pp. 489–504, doi:10.1068/p230489 © 1994, Pion. With
permission from Pion Ltd, London www.pion.co.uk and www.envplan.com. Parts (c) and (d) are reproduced from
Psychonomic Bulletin & Review, 5(1), pp. 135–139, Dominance of configural properties in visual form perception,
Ruth Kimchi and Benny Bloch, DOI: 10.3758/BF03209469 Copyright © 1998, Springer-Verlag. With kind
permission from Springer Science and Business Media.
property (e.g., oblique lines) but differed in holistic property (closed vs. open). The pattern of per-
formance with the configurations was not predicted by the discriminability of their components;
rather it confirmed the prediction of the hypothesis about the primacy of holistic properties: the
two most difficult discriminations were between stimuli with dissimilar components but similar
holistic properties (square vs. diamond and plus vs. X). Moreover, the discrimination between a
pair of stimuli that differ in a holistic property was equally easy, regardless of whether they dif-
fered in component properties (e.g., the discrimination between square and plus was as easy as the
discrimination between square and X). Also, the easiest classification was the one that was based
on holistic properties, namely the classification that involved grouping of the square and diamond
together and the plus and X together (Kimchi 1994, see also Lasaga 1989). Similar results were
also observed with stimulus sets in which stimuli that shared a holistic property were not a simple
rotation of each other (Figure 7.6c,d; Kimchi and Bloch 1998).
Thus, when both holistic and component properties are present in the stimuli and can be
used for the task at hand, performance is dominated by holistic properties, regardless of the
The Perception of Hierarchical Structure 141
discriminability of the component properties. When holistic properties are not effective for the
task at hand, discrimination and classification can be based on component properties, but there is
a significant cost relative to performance based on holistic properties.
The primacy of holistic properties is also manifested in the configural superiority effect
(Pomerantz et al. 1977; see also Pomerantz and Cragin, this volume): the discrimination of two
simple oblique lines can be significantly improved by the addition of a context that creates a tri-
angle and an arrow configuration.
Other studies have provided converging evidence for the early representation of holistic proper-
ties. Thus, Kimchi (2000; Hadad and Kimchi 2008), using primed matching, showed that shapes
grouped by closure were primed at very short exposure durations, suggesting that closure was
effective already early in the perceptual process. Holistic properties were also found to be acces-
sible to rapid search (e.g., Rensink and Enns 1995).
Holistic primacy in faces. The case of faces is an interesting one. The “first-order spatial relations”
between facial components, namely the basic arrangement of the components (i.e., the eyes above
the nose and the mouth below the nose), is distinguished from the “second-order spatial relations”—
the spacing of the facial components relative to each other. Facial configuration, or faceness, is the
consequence the former, differentiating faces from other object classes. The configural properties
that arise from the latter (e.g., elongation, roundedness) differentiate individual faces (e.g., Diamond
and Carey 1986; Maurer et al. 2002). The dominance of the facial configuration (i.e., faceness) over
the components is easily demonstrated: replacing the components but keeping their spatial arrange-
ment the same does not change the perception of faceness. An example is the “fruit face” painting
by the Renaissance artist Archimbaldo. On the other hand, the relative contribution of configural
properties and component properties to face perception and recognition has been a controversial
issue (e.g., Maurer et al. 2002). Some studies demonstrated that configural properties dominate face
processing (e.g., Bartlett and Searcy 1993; Freire et al. 2000; Leder and Bruce 2000; Murray et al.
2000), and other studies provided evidence that facial features themselves play an important role in
face processing (e.g., Cabeza and Kato 2000; Harris and Nakayama 2008; Schwarzer and Massaro
2001). However, Amishav and Kimchi (2010) demonstrated, using Garner’s (1974) speeded classi-
fication paradigm with proper control of the relative discriminability of the two types of properties,
that perceptual integrality of configural and component properties, rather than relative dominance
of either, is the hallmark of upright face perception (see also Behrmann et al. this volume).
Global level
Closure Line orientation
Closure
Local level
Line orientation
Fig. 7.7 Four sets of four stimuli each, produced by the orthogonal combination of type of property
and level of structure.
Reproduced from Ruth Kimchi, The role of wholistic/configural properties versus global properties in visual form
perception, Perception, 23(5), pp. 489–504, doi:10.1068/p230489 © 1994, Pion. With permission from Pion Ltd,
London www.pion.co.uk and www.envplan.com.
properties at the global or the local levels. The orthogonal combination of type of property and
level of structure produced four sets of four stimuli each (see Figure 7.7). Participants classified
each set of four stimuli on the basis of the variation at either the global or the local level of the
stimuli (global or local classification task). Depending on the stimulus set, classification (global
or local) was based on closure or on line orientation. The results showed that global classification
was faster than local classification only when the local classification was based on line orientation;
no global classification advantage was observed when local classification was based on closure.
Han et al. (1999) used different stimuli (arrows and triangles) and the typical global-local
task. They found a global advantage (i.e., faster RTs for global than for local identification and
global-to-local interference) for both orientation discrimination and closure discrimination, but
the global advantage was much weaker for the closure discrimination task than for the orientation
discrimination task. Under divided-attention conditions, there was a global advantage for orienta-
tion but not for closure discrimination tasks.
Thus, both Kimchi’s (1994) and Han et al.’s (1999) results indicate that relative global or local
advantage for many-element hierarchical patterns depends on whether discrimination at each
level involves configural or nonconfigural properties. When local discrimination involves a con-
figural property like closure, the global advantage markedly decreases or even disappears relative
to the case in which discrimination at that level involves a nonconfigural property like orientation.
These findings converge with the findings reviewed earlier that show a relative perceptual
dominance of configural properties. They also suggest that configural properties are not neces-
sarily global or larger. Leeuwenberg and van der Helm (1991; 2013) using a different approach,
also claim that holistic properties that dominate classification and discrimination of visual forms
The Perception of Hierarchical Structure 143
are not always global. According to the descriptive minimum principle approach proposed by
Leeuwenberg and van der Helm (see also van der Helm’s chapter on simplicity, this volume), the
specification of dominant properties can be derived from the simplest pattern representations,
and it is the highest hierarchical level in the simplest pattern-representation, the “superstructure,”
that dominates classification and discrimination of visual forms. The “superstructure” is not nec-
essarily global or larger.
Concluding remarks
The vast majority of the findings reviewed in this chapter support the view of holistic dominance.
This dominance can arise from temporal precedence of the global level of structure, as when the
global configuration of a many-element pattern is represented before the elements are individu-
ated (global precedence), or from dominance in information processing, as when holistic proper-
ties such as closure, dominate component properties in discrimination and classification of visual
forms (holistic primacy).
In light of this evidence, a view that holds that the whole is perceived just by assembling compo-
nents is hardly tenable. However, several findings suggest that positing holistic dominance as a rigid
perceptual law is hardly tenable either. Early relative dominance of either the global structure or the
components has been found, depending on certain stimulus factors (e.g., Kimchi 1998, 2000), con-
figural dominance has been found with certain configurations but not with others (e.g., Pomerantz
1981; see also Pomerantz and Cragin, this volume), and the relative dominance of configural proper-
ties versus component properties has been found to depend on its relevance to the task at hand (e.g.,
Han et al., 1999; Pomerantz and Pristach 1989). It is also important to note that there are different
kinds of wholes with different kinds of parts and part-whole relationships. Consider for example, a
face with its eyes, nose, mouth, and a wall of bricks. Both are visual objects—wholes—but the eyes,
nose and mouth of a face are its component parts, whereas the bricks in the wall are mere constitu-
ents. Furthermore, there are weak or strong wholes, mere aggregation of elements or configuration
that preempt the components (see Rock 1986). To complicate things even further (or rather, shed
some light), a distinction has been made between global versus local in terms of relative size and
levels of representation in a hierarchical structure and between holistic/configural versus simple/
component properties (Kimchi 1992, 1994). It is likely, therefore, that global precedence charac-
terizes the course of processing of some wholes but not of others, and that the processing of some
wholes but not of others is dominated by holistic properties; it is also the case that the processing of
some wholes (e.g., faces) is characterized by the integrality of configural and component properties.
In a final note, it is appropriate to comment about holistic dominance and the logical relations
between parts and wholes, or between components and configurations. Components can exist with-
out a global configuration, but a configuration cannot exist without components. Therefore, compo-
nents are logically prior to the configuration of which they are part. Similarly, if holistic/configural
properties do not reside in the component properties but rather emerge from the interrelations
among components, then logic dictates the priority of the components. Holistic dominance is also
not easily reconciled with the classical view of visual hierarchy in the spirit of Hubel and Wiesel
(1968; Maunsell and Newsome 1987). However, the logical structure of the stimulus does not neces-
sarily predict processing consequences at all levels of processing (Garner 1983; Kimchi 1992; Kimchi
and Palmer 1985), and the anatomical, structural aspects of the hierarchy of the visual system can be
distinguished from the temporal, functional aspects of it, taking into account the extended connec-
tion within cortical areas and the massive feedback pathways (e.g., Maunsell and Essen 1983). It is
possible, for example, as suggested by Hochstein and Ahissar’s (2002) reverse hierarchy theory, that
implicit, nonconscious, fast perceptual processing proceeds from components to configurations,
144 Kimchi
whereas, conscious, top-down, task-driven attentional processing begins with configurations and
then descends to components/local details if required by the task.
Acknowledgments
Preparation of this chapter was supported by the Max Wertheimer Minerva Center for Cognitive
Processes and Human Performance, University of Haifa.
Correspondence should be sent to Ruth Kimchi, Department of Psychology, University of
Haifa, Haifa 3498838, Israel; email: rkimchi@research.haifa.ac.il.
References
Amirkhiabani, G. and Lovegrove, W. J. (1999). Do the global advantage and interference effects covary?
Perception and Psychophysics 61(7) : 1308–19.
Amishav, R. and Kimchi, R. (2010). Perceptual integrality of componential and configural information in
face processing. Psychonomic Bulletin & Review 17(5): 743–48.
Andres, A. J. D. and Fernandes, M. A. (2006). Effect of short and long exposure duration and dual-tasking
on a global-local task. Acta Psychologica 122(3): 247–66.
Asch, S. E. (1962). A problem in the theory of associations. Psychologische Beiträge 6: 553–63.
Bachevalier, J., Hagger, C., and Mishkin, M. (1991). In N. A. Lassen, D. H. Ingvar, M. E. Raicjle, and
L. Friberg (eds.), Brain work and mental activity, Vol. 31, pp. 231–40. Copenhagen: Munksgaard.
Badcock, C. J., Whitworth, F. A., Badcock, D. R., and Lovegrove, W. J. (1990). Low-frequency filtering and
processing of local-global stimuli. Perception 19: 617–29.
Bartlett, J. C. and Searcy, J. (1993). Inversion and configuration of faces. Cognitive Psychology 25(3): 281–316.
Behrmann, M. and Kimchi, R. (2003). What does visual agnosia tell us about perceptual organization
and its relationship to object perception? Journal of Experimental Psychology-Human Perception and
Performance 29(1): 19–42.
Beller, H. K. (1971). Priming: effects of advance information on matching. Journal of Experimental
Psychology 87: 176–82.
Boer, L. C. and Keuss, P. J. G. (1982). Global precedence as a postperceptual effect: An analysis of
speed-accuracy tradeoff functions. Perception & Psychophysics 13: 358–66.
Broadbent, D. E. (1977). The hidden preattentive process. American Psychologist 32(2): 109–18.
Burack, J. A., Enns, J. T., Iarocci, G., and Randolph, B. (2000). Age differences in visual search for
compound patterns: Long-versus short-range grouping. Developmental Psychology 36(6): 731–40.
Cabeza, R. and Kato, T. (2000). Features are also important: Contributions of featural and configural
processing to face recognition. Psychological Science 11(5) : 429–33.
Delis, D. C., Robertson, L. C., and Efron, R. (1986). Hemispheric specialization of memory for visual
hierarchical stimuli. Neuropsychologia 24(2): 205–14.
Diamond, R. and Carey, S. (1986). Why faces are and are not special: An effect of expertise. Journal of
Experimental Psychology: General 115(2): 107–17.
Dukette, D. and Stiles, J. (1996). Children’s analysis of hierarchical patterns: Evidence from a similarity
judgment task. Journal of experimental Child Psychology 63: 103–40.
Dukette, D., and Stiles, J. (2001). The effects of stimulus density on children’s analysis of hierarchical
patterns. Developmental Science 4(2): 233–51.
Enns, J. T. and Kingstone, A. (1995). Access to global and local properties in visual search for compound
stimuli. Psychological Science 6(5): 283–91.
Enns, J. T., Burack, J. A., Iarocci, G., and Randolph, B. (2000). The orthogenetic principle in the perception
of “forests” and “trees”? Journal of Adult Development 7(1): 41–8.
The Perception of Hierarchical Structure 145
Fink, G. R., Halligan, P. W., Marshall, J. C., Frith, C. D., Frackowiak, R. S. J., and Dolan, R. J. (1997).
Neural mechanisms involved in the processing of global and local aspects of hierarchically organized
visual stimuli. Brain 120: 1779–91.
Freeseman, L. J., Colombo, J., and Coldren, J. T. (1993). Individual differences in infant visual
attention: Four-month-olds’ discrimination and generalization of global and local stimulus properties.
Child Development 64(4): 1191–203.
Freire, A., Lee, K., and Symons, L. A. (2000). The face-inversion effect as a deficit in the encoding of
configural information: direct evidence. Perception 29(2): 159–70.
Frick, J. E., Colombo, J., and Allen, J. R. (2000). Temporal sequence of global-local processing in
3-month-old infants. Infancy 1(3): 375–86.
Garner, W. R. (1974). The processing of information and structure. Potomac, MD: Erlbaum.
Garner, W. R. (1978). Aspects of a stimulus: Features, dimensions, and onfigurations. In E. Rosch and
B. B. Lloyd (eds.), Cognition and ategorization, pp. 99–133. Hillsdale, NJ: Erlbaum.
Garner, W. R. (1983). Asymmetric interactions of stimulus dimensions in perceptual information
processing. In T. J. Tighe and B. E. Shepp (eds.), Perception, cognition, and development: Interactional
analysis (pp. 1–37). Hillsdale, NJ: Erlbaum.
Ghim, H. r., and Eimas, P. D. (1988). Global and local processing by 3- and 4-month-old infants. Perception
& Psychophysics 43(2): 165–71.
Gogtay, N., Giedd, J. N., Lusk, L., Hayashi, K. M., Greenstein, D., Vaituzis, A. C. et al. (2004). Dynamic
mapping of human cortical development during childhood through early adulthood. Proceedings of the
National Academy of Sciences of the United States of America 101(21): 8174–9.
Grice, G. R., Canham, L., and Boroughs, J. M. (1983). Forest before trees? It depends where you look.
Perception & Psychophysics 33(2) : 121–8.
Hadad, B., and Kimchi, R. (2008). Time course of grouping of shape by perceptual closure: Effects of spatial
proximity and collinearity. Perception & Psychophysics 70: 818–27.
Han, S. and Humphreys, G. W. (1999). Interactions between perceptual organization based on Gestalt laws
and those based on hierarchical processing. Perception & Psychophysics 61(7): 1287–98.
Han, S. and Humphreys, G. W. (2002). Segmentation and selection contribute to local processing in hierarchical
analysis. The Quarterly Journal of Experimental Psychology. A, Human Experimental Psychology 55(1): 5–21.
Han, S., Fan, S., Chen, L., and Zhuo, Y. (1997). On the different processing of wholes and
parts: A psychophyiological analysis. Journal of Cognitive Neuroscience 9: 687–98.
Han, S., Humphreys, G. W., and Chen, L. (1999). Parallel and competitive processes in hierarchical
analysis: Perceptual grouping and encoding of closure. Journal of Experimental Psychology: Human
Perception and Performance 25(5): 1411–32.
Han, S., Weaver, J. A., Murray, S. O., Kang, X., Yund, E. W., and Woods, D. L. (2002). Hemispheric
asymmetry in global/local processing: effects of stimulus position and spatial frequency. Neuroimage
17(3): 1290–9.
Harris, A. and Nakayama, K. (2008). Rapid adaptation of the m170 response: importance of face parts.
Cereb Cortex 18(2): 467–76.
Harrison, T. B. and Stiles, J. (2009). Hierarchical forms processing in adults and children. Journal of
Experimental Child Psychology 103(2): 222–40.
Hochstein, S. and Ahissar, M. (2002). View from the top: hierarchies and reverse hierarchies in the visual
system. Neuron 36(5): 791–804.
Hubel, D. H. and Wiesel, T. N. (1968). Receptive fields and functional architecture of monkey striate
cortex. Journal of Physiology 195: 215–43.
Hubner, R. and Volberg, G. (2005). The integration of object levels and their content: a theory of global/
local processing and related hemispheric differences. Journal of Experimental Psychology. Human
Perception and Performance 31(3): 520–41.
146 Kimchi
Hughes, H. C., Fendrich, R., and Reuter-Lorenz, P. (1990). Global versus local processing in the absence of
low spatial frequencies. Journal of Cognitive Neuroscience 2: 272–82.
Ivry, R. and Robertson, L. C. (1998). The two sides of perception. Cambridge, MA: MIT Press.
Kimchi, R. (1988). Selective attention to global and local-levels in the comparison of hierarchical patterns.
Perception & Psychophysics 43(2): 189–98.
Kimchi, R. (1990). Children’s perceptual organisation of hierarchical visual patterns. European Journal of
Cognitive Psychology 2(2): 133–49.
Kimchi, R. (1992). Primacy of wholistic processing and global/local paradigm: A critical review.
Psychological Bulletin 112(1): 24–38.
Kimchi, R. (1994). The role of wholistic/configural properties versus global properties in visual form
perception. Perception 23(5) 489–504.
Kimchi, R. (1998). Uniform connectedness and grouping in the perceptual organization of hierarchical
patterns. Journal of Experimental Psychology: Human Perception and Performance 24(4): 1105–18.
Kimchi, R. (2000). The perceptual organization of visual objects: a microgenetic analysis. Vision Research
40(10–12): 1333–47.
Kimchi, R. (2003a). Relative dominance of holistic and component properties in the perceptual
organization of visual objects. In M. A. Peterson and G. Rhodes (eds.), Perception of faces, objects, and
scenes: Analytic and holistic processes, pp. 235–63. New York, NY: Oxford University Press.
Kimchi, R. (2003b). Visual perceptual organization: A microgenetic analysis. In R. Kimchi, M. Behrmann,
and C. R. Olson (eds.), Perceptual organization in vision: Behavioral and neural perspectives, pp. 117–54.
Mahwah, NJ: Lawrence Erlbaum Associates Publishers.
Kimchi, R. (2012). Ontogenesis and microgenesis of visual perceptual organization. In J. A. Burack, J. T.
Enns, and N. A. Fox (eds.), Cognitive Neuroscience, Development, and Psychopathology, pp. 101–31.
New York: Oxford University Press.
Kimchi, R. and Bloch, B. (1998). Dominance of configural properties in visual form perception.
Psychonomic Bulletin & Review 5(1): 135–9.
Kimchi, R. and Merhav, I. (1991). Hemispheric Processing of Global Form, Local Form, and Texture. Acta
Psychologica 76(2): 133–47.
Kimchi, R. and Palmer, S. E. (1982). Form and Texture in Hierarchically Constructed Patterns. Journal of
Experimental Psychology: Human Perception and Performance 8(4): 521–35.
Kimchi, R. and Palmer, S. E. (1985). Separability and Integrality of Global and Local Levels of Hierarchical
Patterns. Journal of Experimental Psychology: Human Perception and Performance 11(6): 673–88.
Kimchi, R., Hadad, B., Behrmann, M., and Palmer, S. E. (2005). Microgenesis and ontogenesis
of perceptual organization: Evidence from global and local processing of hierarchical patterns.
Psychological Science 16(4): 282–90.
Kinchla, R. A. (1974). Detecting target elements in multi-element arrays: A confusability model. Perception
& Psychophysics 15: 149–158.
Kinchla, R. A. (1977). The role of structural redundancy in the perception of visual targets. Perception &
Psychophysics 22: 19–30.
Kinchla, R. A., Macias, S.-V., and Hoffman, J. E. (1983). Attending to different levels of structure in a visual
image. Perception & Psychophysics 33: 1–10.
Kinchla, R. A. and Wolfe, J. M. (1979). The order of visual processing: “Top-down,” “bottom-up,” or
“middle-out.”. Perception & Psychophysics 25(3): 225–31.
Köhler, W. (1930/1971). Human Perception (M. Henle, trans.). In M. Henle (ed.), The selected papers of
Wofgang Köhler, pp. 142–67). New York: Liveright.
Koivisto, M. and Revonsuo, A. (2004). Preconscious analysis of global structure: Evidence from masked
priming. Visual Cognition 11(1): 105–27.
The Perception of Hierarchical Structure 147
LaGasse, L. L. (1994). Effects of good form and spatial frequency on global precedence. Perception &
Psychophysics 53 : 89–105.
Lamb, M. R. and Robertson, L. (1988). The processing of hierarchical stimuli: Effects of retinal locus,
location uncertainty, and stimulus identity. Perception & Psychophysics 44: 172–81.
Lamb, M. R. and Robertson, L. C. (1990). The effect of visual angle on global and local reaction times
depends on the set of visual angles presented. Perception & Psychophysics 47(5): 489–96.
Lamb, M. R., Pond, H. M., and Zahir, G. (2000). Contributions of automatic and controlled processes
to the analysis of hierarchical structure. Journal of Experimental Psychology: Human Perception and
Performance 26(1): 234–45.
Lasaga, M. I. (1989). Gestalts and their components: Nature of information-precedence. In
B. S. S. Ballesteros (ed.), Object perception: Structure & Process, pp. 165–202. Hillsdale, NJ: Erlbaum.
Lasaga, M. I. and Garner, W. R. (1983). Effect of line orientation on various information-processing tasks.
Journal of Experimental Psychology: Human Perception and Performance 9(2): 215–25.
Leder, H. and Bruce, V. (2000). When inverted faces are recognized: The role of configural information
in face recognition. Quarterly Journal of Experimental Psychology: Human Experimental Psychology
53A(2): 513–36.
Leeuwenberg, E. and Van der Helm, P. (1991). Unity and variety in visual form. Perception
20(5): 595–622.
Leeuwenberg, E. and Van der Helm, P. A. (2013). Structural Information Theory. Cambridge: Cambridge
University Press.
Luna, D. (1993). Effects of exposure duration and eccentricity of global and local information on processing
dominance. European Journal of Cognitive Psychology 5(2): 183–200.
Luna, D., Merino, J. M., & Marcos-Ruiz, R. (1990). Processing dominance of global and local information
in visual patterns. Acta Psychologica, 73(2), 131–143.
Martin, M. (1979). Local and global processing: the role of sparsity. Memory and Cognition 7: 476–84.
Maunsell, J. H. R. and Essen, D. C. V. (1983). The connections of the middle temporal visual area and their
relationship to a cortical hierarchy in macaque monkey. Journal of Neuroscience 3: 2563–86.
Maunsell, J. H. R. and Newsome, W. T. (1987). Visual processing in monkey extrastriate cortex. Annual
Review of Neuroscience 10: 363–401.
Maurer, D., Le Grand, R., and Mondloch, C. J. (2002). The many faces of configural processing. Trends in
Cognitive Sciences 6(6): 255–60.
Mevorach, C., Humphreys, G. W., and Shalev, L. (2006a). Effects of saliency, not global dominance, in
patients with left parietal damage. Neuropsychologia 44(2): 307–319.
Mevorach, C., Humphreys, G. W., and Shalev, L. (2006b). Opposite biases in salience-based selection for
the left and right posterior parietal cortex. Nature Neuroscience 9(6): 740–2.
Miller, J. (1981a). Global precedence in attention and decision. Journal of Experimental Psychology: Human
Perception and Performance 7: 1161–74.
Miller, J. (1981b). Global precedence: Information availability or use Reply to Navon. Journal of
Experimental Psychology: Human Perception and Performance 7: 1183–5.
Miller, J. and Navon, D. (2002). Global precedence and response activation: evidence from LRPs. The
Quarterly Journal of Experimental Psychology: A, Human Experimental Psychology 55(1): 289–310.
Mondloch, C. J., Geldart, S., Maurer, D., and de Schonen, S. (2003). Developmental changes in the
processing of hierarchical shapes continue into adolescence. Journal of Experimental Child Psychology
84: 20–40.
Murray, J. E., Yong, E., and Rhodes, G. (2000). Revisiting the perception of upside-down faces.
Psychological Science 11(6): 492–6.
Navon, D. (1977). Forest before trees: The precedence of global features in visual perception. Cognitive
Psychology, 9, 353–383.
148 Kimchi
Navon, D. (1981). The forest revisited: More on global precedence. Psychological Research, 43, 1–32.
Navon, D. (1991). Testing a queue hypothesis for the processing of global and local information. Journal of
Experimental Psychology: General, 120, 173–189.
Navon, D. (2003). What does a compound letter tell the psychologist’s mind? Acta Psychologica, 114(3),
273–309.
Navon, D., and Norman, J. (1983). Does global precedence really depend on visual angle? Journal of
Experimental Psychology: Human Perception and Performance, 9, 955–965.
Palmer, S. E. (1977). Hierarchical structure in perceptual representation. Cognitive Psychology 9: 441–74.
Paquet, L. (1999). Global dominance outside the focus of attention. Quarterly Journal of Experimental
Psychology: Human Experimental 52(2): 465–85.
Paquet, L. and Merikle, P. (1984). Global precedence: The effect of exposure duration. Canadian Journal of
Psychology 38: 45–53.
Paquet, L. and Merikle, P. (1988). Global precedence in attended and nonattended objects. Journal of
Experimental Psychology: Human Perception and Performance 14(1): 89–100.
Poirel, N., Pineau, A., and Mellet, E. (2006). Implicit identification of irrelevant local objects interacts with
global/local processing of hierarchical stimuli. Acta Psychol (Amst) 122(3): 321–36.
Poirel, N., Mellet, E., Houde, O., and Pineau, A. (2008). First came the trees, then the forest: developmental
changes during childhood in the processing of visual local-global patterns according to the
meaningfulness of the stimuli. Developmental Psychology 44(1): 245–53.
Pomerantz, J. R. (1981). Perceptual organization in information processing. In J. R. Pomerantz and
M. Kubovy (eds.), Perceptual Organization, pp. 141–80. Hillsdale, NJ: Lawrence Erlbaum Associates.
Pomerantz, J. R. (1983). Global and local precedence: Selective attention in form and motion perception.
Journal of Experimental Psychology: General 112(4): 516–40.
Pomerantz, J. R. and Pristach, E. A. (1989). Emergent features, attention, and perceptual glue in visual
form perception. Journal of Experimental Psychology: Human Perception and Performance 15: 635-49.
Pomerantz, J. R., Sager, L. C., and Stoever, R. J. (1977). Perception of wholes and of their component
parts: Some configural superiority effects. Journal of Experimental Psychology: Human Perception and
Performance 3(3): 422–35.
Porporino, M., Shore, D. I., Iarocci, G., and Burack, J. A. (2004). A developmental change in selective
attention and global form perception. International Journal of Behavioral Development 28: 358–64.
Quinn, P. C. and Eimas, P. D. (1986). Pattern-line effects and units of visual processing in infants. Infant
Behavior and Development 9(1): 57–70.
Quinn, P. C., Burke, S., and Rush, A. (1993). Part-whole perception in early infancy: Evidence for
perceptual grouping produced by lightness similarity. Infant Behavior and Development 16(1): 19–42.
Razpurker-Apfeld, I. and Kimchi, R. (2007). The time course of perceptual grouping: The role of
segregation and shape formation. Perception & Psychophysics 69(5): 732–43.
Rensink, R. A. and Enns, J. T. (1995). Preemption effects in visual search: evidence for low-level grouping.
Psychological Review 102: 101–30.
Robertson, L. C. (1996). Attentional persistence for features of hierarchical patterns. Journal of
Experimental Psychology: General 125(3) 227–49.
Robertson, L. C. and Ivry, R. (2000). Hemispheric asymmetries: Attention to visual an auditory primitives.
Current Directions in Psychological Science 9(2): 59–64.
Robertson, L. C., Lamb, M. R., and Zaidel, E. (1993). Interhemispheric relations in processing hierarchical
patterns: Evidence from normal and commissurotomized subjects. Neuropsychology 7(3): 325–42.
Rock, I. (1986). The description and analysis of object and event perception. In K. R. Boff, L. Kaufman and
J. P. Thomas (eds.), Handbook of perception and human performance, Vol. 33, pp. 1–71. New York: Wiley.
Scherf, K. S., Behrmann, M., Kimchi, R., and Luna, B. (2009). Emergence of Global Shape Processing
Continues Through Adolescence. Child Development 80(1): 162–77.
The Perception of Hierarchical Structure 149
Schwarzer, G. and Massaro, D. W. (2001). Modeling face identification processing in children and adults.
Journal of Experimental Child Psychology 79(2): 139–61.
Sebrechts, M. M. and Fragala, J. J. (1985). Variation on parts and wholes: Information precedence vs. global
precedence. Proceedings of the Seventh Annual Conference of the Cognitive Science Society, pp. 11–18).
Sekuler, A. B. and Palmer, S. E. (1992). Perception of partly occluded objects: A microgenetic analysis.
Journal of Experimental Psychology: General 121(1): 95–111.
Shulman, G. L., Sullivan, M. A., Gish, K., and Sakoda, W. J. (1986). The role of spatial-frequency channels
in the perception of local and global structure. Perception 15: 259–73.
Shulman, G. L. and Wilson, J. (1987). Spatial frequency and selective attention to local and global
information. Neuropsychologia 18: 89–101.
Wagemans, J. (1995). Detection of visual symmetries. Spatial Vision 9(1): 9–32.
Wagemans, J. (1997). Characteristics and models of human symmetry detection. Trends in Cognitive
Sciences 1(9): 346–52.
Ward, L. M. (1982). Determinants of attention to local and global features of visual forms. Journal of
Experimental Psychology: Human Perception and Performance 8: 562–81.
Weissman, D. H. and Woldorff, M. G. (2005). Hemispheric asymmetries for different components of
global/local attention occur in distinct temporo-parietal loci. Cerebral Cortex 15(6): 870–6.
Wertheimer, M. (1923/1938). Laws of organization in perceptual forms In W. D. Ellis (ed.), A source book of
Gestalt psychology, pp. 71–88. London: Routledge and Kegan Paul.
Yovel, G., Yovel, I., and Levy, J. (2001). Hemispheric asymmetries for global and local visual
perception: Effects of stimulus and task factors. Journal of Experimental Psychology: Human Perception
and Performance 27(6): 1369–85.
Chapter 8
Introduction: seeing statistics
The human visual system has evolved to guide behaviour effectively within complex natural visual
environments. To achieve this goal, the brain must rapidly distil a massive amount of sensory
data into a compact representation that captures important image structure (Marr 1982). Natural
images are particularly rich, in part because the surfaces that populate them are often covered in
markings or texture. This texture can be richly informative, for example about material composi-
tion (Kass and Witkin 1985), but is intrinsically complex since textures are by their nature com-
posed of a large number of individual features. One way the visual system produces a compact
description of complex textures is to exploit redundancy (i.e. that one image-patch is not unrelated
to any other patch of the same image) by characterizing attributes of the features making up the
texture (such as orientation) in terms of local statistical properties (e.g. mean orientation). Indeed,
a useful operational definition of ‘visual texture’ is any image for which a statistical representation
is appropriate. To put it another way, texture is less about the image, but more about the quality of
the statistic that can be computed from it (in the context of the task at hand).
Statistics are a sufficient representation of natural texture in the sense that one can synthesize
realistic texture based on statistical descriptions of image features derived from histograms of, for
example, grey levels, local orientation, and spatial frequency structure (Figure 8.1a; Portilla and
Simoncelli 1999). Since they exploit redundancy, these schemes work well on uniform regions
of texture. However, changes in statistics over space also inform our interpretation of natural
scenes. Figure 8.1b is defined by a continuous variation in the average orientation/size and in
the range of orientation/sizes present in the texture. The vivid impression of surface tilt and slant
generated by this image is consistent with the visual system assuming that surface texture is iso-
tropic (i.e. all orientations are equally likely) so that changes in the mean and variance of orien-
tation structure must arise from underlying changes in surface tilt and slant respectively (Malik
and Rosenholtz 1994; Witkin 1981). Furthermore, there is evidence that these statistics drive
a general and active reconstruction process that is used to resolve uncertainty about the local
structure of complex scenes. Texture statistics influence the appearance of elements rendered
uncertain either by visual crowding (Parkes et al. 2001) or by recall within a visual memory task
(Brady and Alvarez 2011).
For the visual system to make accurate statistical descriptions it must combine information
across space and/or time, and in this chapter I focus exclusively on this integration process. This
contrasts with the traditional view of texture perception that emphasizes its role in the segmen-
tation (Rosenholtz chapter) of the distinct surfaces that populate scenes, i.e. in the signalling of
discontinuity—rather than continuity—of feature properties across space.
Note that there is some confusion in the literature over ‘order’ of texture statistics. Bela Julesz
proposed that humans use so-called first- and second-order statistics to capture differences in
texture, i.e. to achieve texture segmentation. According to this terminology, ‘first-order’ refers to
Seeing Statistical Regularities 151
(a) (b)
Fig. 8.1 Statistics convey the (a) appearance and (b) shape of texture. (a) Although this image
appears to be entirely natural, with scrutiny one can see that only the top half shows real leaves. The
lower half started its life as random pixel-noise that had statistical properties of the leaves imposed
upon it (Portilla and Simoncelli 1999). While statistical representations capture important properties
of texture, changes in those statistics are also informative. For example, (b) shows a gradient
defined by simultaneous changes in the mean and variance of both the size of elements and their
orientation. Notice how changes in these statistics convey a vivid sense of surface shape.
all grey-level (i.e. measured from single pixels) statistics and ‘second-order’ refers to all statis-
tics of dipoles (pixel-pairs; Julesz 1981; Julesz et al. 1973). In this chapter, I use ‘order’ in the
more conventional sense, i.e. the order of a histogram statistic where variance (for example) is
a second-order statistic because it is computed on the square of the raw data. Thus, statistics
of varying order can be computed on different image features such as ‘pixel luminance’ or ‘disc
size’, and here I will consider statistical representations on a ‘feature-by-feature’ basis. Such an
approach makes the implicit assumption that these features are appropriate ‘basis functions’ for
further visual processing (see Feldman chapter on probabilistic features). For example, consider
Figure 8.2b showing a texture composed of a ramp controlling the range of grey levels present.
While this information is captured by second-order luminance statistics, it is also captured by
the first-order contrast statistics. Indeed, this is a more meaningful characterization of the struc-
ture in that it is contrast and not luminance that is the currency of visually driven responses in
the primate cortex. More specifically, such a texture will lead to a change in the mean response
(a first-order statistic) of a bank of Gabor filters, which (like V1 neurons) are tuned for contrast
and not luminance. This point is made by Kingdom, Hayes, and Field (2001) who argue that a
basis set of spatial-frequency/orientation band-pass Gabor filters (Daugman 1985) is appropri-
ate because Gabors are not only a reasonable model of receptive field organization in V1 but can
also generate an efficient/sparse code for natural image structure (Olshausen and Field 2005). I
will follow this approach and comment on the appropriateness of a basis function (size, orienta-
tion, etc.) with respect to either specific neural mechanism or the standard Gabor model of V1
receptive fields. Finally note that discrimination of the spatial structure of the pattern in Figure
8.2b cannot be achieved by pooling filter-responses across the whole pattern (which, for example,
could not distinguish a horizontal from a vertical gradient). Instead what is required is integration
across space by mechanisms tuned to (confusingly) the ‘second-order’ (here contrast-defined) spa-
tial structure. Such mechanisms are linked to texture segmentation and are considered in depth
elsewhere (Rosenholtz chapter).
152 Dakin
Probability
Probability
Dark Light Dark Light
Probability
Dark Light Dark Light
Fig. 8.2 Noise textures made up of vertical ‘slices’ varying in (a) first- (b) second-, (c) third- and
(d) fourth-order grey-level statistics. Probability density functions for three ‘slices’ through the
image are given to the right of each texture, with curve-colour coding the slice they correspond to.
Probability density functions are Pearson type VII distributions, which allow one to independently
manipulate these statistical moments (http://en.wikipedia.org/wiki/Kurtosis#The_Pearson_type_
VII_family). Note that the normal distribution (a, b, and green curves in c, d) is a special case of this
distribution.
Luminance statistics
Figure 8.2 shows four textures containing left-to-right-variation in their (a) first- to (d) fourth-
order luminance (L) statistics. Bauer (2009) reports that elements contribute to average perceived
luminance (or brightness) in proportion to their own perceived brightness, i.e. a power law L0.33
(Stevens 1961). However, Nam and Chubb (2000) have reported that humans are near veridical
at judging the brightness of textures containing variation in luminance, with elements (broadly)
contributing in proportion to their luminance. Furthermore, Nam and Chubb (2000) acknowl-
edge that while much of their data are well fit by a power function, this tends to over- and under-
emphasize the role of the highest and lowest luminance respectively.
Different image statistics have been proposed to capture our sensitivity to the range of lumi-
nances present (contrast; Figure 8.2b), but a good predictor of perceived contrast in complex
images remains the standard deviation of grey levels (Bex and Makous 2002; Moulden, Kingdom,
and Gatley 1990). It should be evident from Figure 8.2 that the most salient changes in these
noise textures are carried by the first- and second-order luminance statistics. However, Chubb
et al. (2007) showed that observers’ sensitivity to modulation of grey levels is determined by ‘tex-
ture filters’ with sensitivity not only to mean grey level and contrast, but also to a specific type of
grey-level skewness: the presence of dark elements embedded in light backgrounds which they call
‘blackshot’ (Chubb, Econopouly, and Landy 1994). Sensitivity to such skewness cannot be medi-
ated by simple contrast-gain control1 since the response of neurons in lateral geniculate nucleus
(LGN) of cat are wholly determined by first- and second-order statistics and ignore manipula-
tion of luminance skew and kurtosis (8.2c, d, ; Bonin, Mante, and Carandini 2006). Motoyoshi
1 Processes regulating neural responsivity (gain) as a function of prevailing local contrast and thought to max-
imise information transmission in the visual pathway.
Seeing Statistical Regularities 153
et al. (2007) have suggested that grey-level skewness yields information about surface gloss, with
positive skew (left part of Figure 8.2c) being associated with darker and more glossy surfaces
than skew in the opposite direction (right part of Figure 8.2c). However, it has been argued that
specular reflections (that are largely responsible for kurtosis differences in natural scenes) have
to be appropriately located with respect to underlying surface structure in order for a percept of
gloss to arise (Anderson and Kim 2009; Kim and Anderson 2010). This suggests that perception
of material properties cannot be achieved in the absence of a structural scene analysis. The lack of
any perceptible gloss in Figure 8.2c is consistent with the latter view.
Kingdom et al. (2001) studied sensitivity to changes in contrast histogram statistics (variance,
skew, and kurtosis) by manipulating the contrast, phase, and density of Gabor elements mak-
ing up their textures. They report that a model observer using the distribution of wavelet/filter
responses does a better job of accounting for human discrimination than raw pixel distributions.
Orientation statistics
In terms of spatial vision, orientation is a critical visual attribute that is made explicit at the earli-
est stages of representation in V1, the primary visual cortex (Hubel and Wiesel 1962). That orien-
tation is a property of a Gabor filter supports it being considered a reasonable basis function for
studying human perception of texture statistics (Kingdom et al. 2001). Furthermore, orientation
is known to be encoded in cortex using a distributed or population code, so that there are natu-
ral comparisons to be made between human coding of orientation statistics and computational
models of orientation coding across neural populations (e.g. Deneve, Latham, and Pouget 1999).
Miller and Sheldon (1969) used magnitude estimation to show that observers could accurately
and precisely judge the average orientation of six lines spanning 20°, with each element con-
tributing in proportion to its physical orientation. Dakin and Watt (1997) had observers clas-
sify if the mean orientation of a spatially unstructured field of elements with orientations drawn
from a Gaussian distribution (e.g. 3a, b) was clockwise or anti-clockwise of vertical. For elements
with a standard deviation of 6° observers could judge if the mean orientation was clockwise or
anti-clockwise of vertical as precisely as they could for a sine-wave grating (which contains neg-
ligible variation in orientation2). Using textures composed of two populations of elements with
different means, Dakin and Watt (1997) also showed that observers rely on the mean, and not on,
for example, the mode, to represent global orientation, and that observers can discern changes
in the second-order statistics (orientation variance or standard deviation—s.d.) of a texture but
not in a third-order statistic (orientation skew). Morgan, Chubb, and Solomon (2008) went on to
show that discrimination of changes in orientation s.d. as a function of baseline (‘pedestal’) ori-
entation s.d. follows a dipper-shaped function, i.e. best discrimination arises around a low—but
demonstrably non-zero—level of orientation s.d. Such a pattern of results arises naturally from
an observer basing their judgements on a second-order statistic computed over orientation esti-
mates corrupted by internal noise. However, Morgan et al. found that two-thirds of their observ-
ers showed more facilitation3 than predicted by the intrinsic noise model. They speculate that this
could arise from the presence of a threshold non-linear transduction of orientation variability
The range of orientations present in a sine-wave grating (its orientation bandwidth) depends only on the size
2
of the aperture the grating is presented within. In the limit, a grating of infinite size contains only one orienta-
tion. For the multi-element textures used in the averaging experiment, orientation bandwidth results from a
complex interaction of element-size, element-orientation and arrangement.
The extent to which performance improves in the presence of a low-variance pedestal.
3
154 Dakin
(e.g. as it does for blur), which would serve to reduce the visibility of intrinsic noise/uncertainty
and ‘regularize’ the appearance of arrays of oriented elements.
Such orientation statistics provide information that may support other visual tasks. Orientation
variance provides an index of organization that predicts human performance on structure-vs-
noise tasks (Dakin 1999) and can be used as a criterion for selecting filter size for texture process-
ing (Dakin 1997). Baldassi and Burr (2000) presented evidence that texture-orientation statistics
support orientation ‘pop-out’. They showed that observers presented with an array of noisy ori-
ented elements containing a single ‘orientation outlier’ could identify the tilt of the target ele-
ment even when they couldn’t say which element was the target. Furthermore, target orientation
thresholds show a square-root dependency on the number of distractors present, suggesting that
the cue used was the result of averaging target and distractor information. Observers’ ability to
report the orientation of a single element presented in the periphery, and surrounded by distrac-
tors, depends on feature spacing. When target and flanker are too closely spaced visual crowd-
ing arises—a phenomenon whereby observers can see that a target is present but lose detailed
information about its identity (Levi 2008). Using orientation-pop-out stimuli Parkes et al. (2001)
showed that under crowded conditions observers were still able to report the average orientation
(suggesting that target information was not lost but had been combined with the flankers) and
that orientation averaging does not require resolution of the individual components of the texture.
Collectively, these findings suggest that some simple global statistics computed from a pool of
local orientation estimates support the detection of salient orientation structure across the visual
field. But how does that process work: does pooling operate in parallel, is it spatially restricted,
and is it local estimation or global pooling that limits human performance? A qualitative compari-
son of orientation discrimination thresholds across conditions will not answer these questions;
rather, one needs to compare performance to an ideal observer. An equivalent noise paradigm
(Figure 8.3a–e) involves measuring the smallest discernible change in mean orientation in the
presence of difference levels of orientation variability (Figure 8.3a–c). Averaging performance—
the threshold mean orientation offset (θ)—can then be predicted using:
σ int
2
+ σ ext
2
θ=
n (1)
where σint is the internal noise (i.e. the observer’s effective uncertainty about the orientation of any
one element), σext the external noise (i.e. the orientation variability imposed on the stimulus), and
n the effective number of samples averaged. By fitting this model to our data we can read off the
global limits on performance (the effective number of samples being averaged by observers) and
the local limits on performance (the precision of each estimate). This model provides an excel-
lent account of observers’ ability to average orientation and has allowed us to show that experi-
enced observers, confronted with N elements, judge mean orientation using a global pool of ~√N
elements irrespective of spatial arrangement, indicating no areal limit on orientation averaging
(Dakin 2001). Precision of local samples tends to fall as the number of elements increases, at least
in part due to increases in crowding (Dakin 2001; Dakin et al. 2009; Solomon 2010), although
it persists with widely spaced elements (Dakin 2001). Solomon (2010) showed that the number
of estimates pooled for orientation variance discrimination was actually higher than for mean
orientation, a finding that could perhaps arise from a strategy that weighted the contribution of
elements with ‘outlying’ orientations more heavily.
This approach assumes that observers’ averaging strategy does not change with the amount of exter-
nal noise added to the stimulus. Recently, Allard and Cavanagh (2012) questioned this notion, reporting
(a) Low variance (b) High variance (c) Probability density functions
Reference
Probability
a
b
Reference
Orientation
θ θ θ
(f) High coherence (g) Low coherence (h) Probability density functions
Signal
Probability
g
f
Signal
Orientation
Fig. 8.3 Probing the statistical representation of orientation. (a–b) Stimuli from a discrimination
experiment, containing (c) differing ranges of orientation (here (a) σ = 6° or (b) σ = 16°). (d)
Observers judge if the average orientation of the elements is clockwise or anti-clockwise of a
reference orientation (here, vertical) and one experimentally determines the minimum offset of the
mean (the mean-orientation threshold) supporting some criterion level of performance. (e) For an
equivalent noise paradigm one measures the mean-orientation thresholds with differing levels of
orientation variability and fit results with a model that yields estimates of how many samples are
being averaged and how noisy each sample is. (f, g) Depicts stimuli from a detection experiment
where observers detect the presence of a subset of elements at a single orientation (here vertical). (h)
In coherence paradigms one establishes the minimum proportion of elements required, here (f) 50%
or (g) 12.5%, to support discrimination from randomly oriented elements.
156 Dakin
that the effective sample size (n) for orientation averaging changed with noise level, which they specu-
late could result from a strategy change whereby observers are less prone to pool orientations that
look the same. These authors estimated sampling by taking ratios of mean-orientation-discrimination
thresholds collected with two different numbers of elements at the same noise level.
Combining Equation 1 with the assumption that internal noise does not change with the num-
ber of elements present, they predicted that threshold ratios should be inversely proportional
to the ratio of sampling rates. However, data from various averaging tasks (Dakin 2001; Dakin,
Mareschal, and Bex 2005a) violate this assumption; estimates of internal/additive noise derived
using Equation 1 change with the number of elements present. For this reason, estimation of sam-
pling efficiency by computing threshold ratios is not reasonable and Allard and Cavanagh’s (2012)
results are equally consistent with rises in additive noise (which Equation 1 attributes to local-
orientation uncertainty) offsetting the benefits of more elements being present. What this study
does do is to highlight the interesting issue of why additive noise should rise with the number of
elements present on screen, especially when crowding is minimized.
Girshick, Landy, and Simoncelli (2011) examined observers’ judgement of mean orientation in
terms of their precision (i.e. threshold, variability of observers’ estimate) and accuracy (i.e. bias,
a systematic tendency to misreport the average). Observers compared the means of texture pairs
composed of orientations where (a) both textures had high variability, (b) both textures had low
variability, or (c) one texture had high and one low variability (this ingenious condition being
designed to reveal intrinsic bias which would be matched—and so cancel—when variability lev-
els were matched across comparisons). The authors not only measured the well-known oblique
effect (lower thresholds for cardinal orientations; Appelle 1972) in low-noise conditions but also
a relative bias effect consistent with observers generally over-reporting cardinal orientations. The
idea is then that (within a Bayesian framework; Feldman chapter on Bayesian models) observ-
ers report the most likely mean orientation using not only the data to hand but also their prior
experience of orientation structure (i.e. from natural scenes). Observers’ performance is limited
both by the noise on their readout (the likelihood term) and their prior expectation. Using an
encoder–decoder approach Girshick et al. (2011) then used variability/bias estimates to infer each
observer’s prior and showed that it closely matched the orientation structure of natural scenes.
Consistent with this view, observers are less likely to report oblique orientations as their uncer-
tainty rises when they become increasingly reliant on their prior expectations which are based on
natural scene statistics (Tomassini, Morgan, and Solomon 2010).
Using a coherence paradigm (Figure 8.1d–f; Newsome and Pare 1988), Husk, Huang, and Hess
(2012) examined orientation processing by measuring observers’ tolerance to the presence of
random-oriented elements when judging overall orientation. They report that coherence thresh-
olds were largely invariant to the contrast, spatial frequency, and number of elements present
(like motion coherence tasks), but that the task showed more dependency on eccentricity than
motion-processing. They further showed that their data could not only reflect a ‘pure’ integration
mechanism (e.g. one computing a vector average of all signal orientation), but must also reflect
the limits set by our ability to segment the signal orientation from the noise (a process they model
using overlapping spatial filters tuned to the two orientations i.e. signal alternatives).
(Watamaniuk, Sekuler, and Williams 1989). Such directional pooling is flexible over a range of
directions (Watamaniuk and Sekuler 1992; Watamaniuk et al. 1989), operates over a large (up to
63 deg2) spatial range (consistent with large MT receptive fields) and over intervals of around 0.5 s
(Watamaniuk and Sekuler 1992).
Interestingly, direction judgements are biased by the luminance content, with brighter elements
contributing more strongly to the perceived direction (Watamaniuk, Sekuler, and McKee 2011).
This is interesting as it suggests that the direction estimates themselves may not reflect the output
of motion-tuned areas like MT which (unlike LGN or V1) exhibit little or no tuning for contrast
once the stimulus is visible (Sclar, Maunsell, and Lennie 1990). This in turn speaks to the appro-
priateness of element direction as a basis function for studying motion averaging. Although it is
widely accepted that percept of global motion in such dot displays does reflect genuine pooling
of local motion and not the operation of a motion-signalling mechanism operating at a coarse
spatial scale, this is based on evidence that, for example, high-pass filtering stimuli do not reduce
integration (Smith, Snowden, and Milne 1994). A more sophisticated motion channel that pooled
coarsely across space but across a range of spatial frequencies (Bex and Dakin 2002) might explain
motion pooling without recourse to explicit representation of individual elements. Motion coher-
ence paradigms (analogous to Figure 8.3d–f) not only assume that local motion is an appropriate
level of abstraction of their stimulus but that a motion coherence threshold can be meaningfully
mapped onto mechanism in the absence of an ideal observer. Barlow and Tripathy’s (1997) com-
prehensive effort to model motion coherence tasks suggests the limiting factor tends not to be a
limited sampling capacity (of perfectly registered local motion) but correspondence noise (i.e. on
registration of local motion). This is problematic for the studies that use poor performance on
motion coherence tasks as an indicator of an ‘integration deficit’ in a range of neuropsychiatric
and neurodevelopmental disorders (see also de-Wit & Wagemans chapter).
Adapting the equivalent noise approach described for orientation we have also shown that the
oblique effect for motion (poor discrimination around directions other than horizontal and verti-
cal) is a consequence of poor processing of local motion (not reduced global pooling) and that
the pattern of performance mirrors the statistical properties of motion energy in dynamic natural
scenes (Dakin, Mareschal, and Bex 2005b). Furthermore—like orientation—pooling of direction
is flexible and can operate over large areas with little or no effect on the global sampling or on
local uncertainty.
The standard model of motion averaging (Eqn 1) is vector summation—essentially averaging
of individual (noisy) motions. However, such a model fails badly on motion coherence stimuli
(where it is in the observer’s interest to ignore a subset of ‘noise’ directions; Dakin et al. 2005a).
This flexibility—to both average over-estimates and to exclude noise where appropriate—can be
captured by a maximum likelihood estimator (MLE). In this context MLEs work by fitting a series
of Gaussian templates (with profiles matched to a series of channels tuned to different direc-
tions) to simulated neural responses (subject to Poisson noise) evoked by the stimulus (Dakin
et al. 2005a). The preferred direction of the best-fitting channel is the MLE direction estimate.
This model—unlike a simple vector averaging of directions—can also explain observers’ ability to
judge the mean direction of asymmetrical direction distributions (Webb, Ledgeway, and McGraw
2007) better than simple vector averaging of stimulus directions. Furthermore, presence of mul-
tiplicative noise4 explains why sampling rate changes, for example, with the number of elements
Random variability of the response of neurons in the visual pathway often rises in proportion to their mean
4
Reference
Fig. 8.4 Even though these stimuli contain elements with either (a) low or (b) high levels of size
variability, one can tell that elements are on average (a) bigger or (b) smaller than the reference.
present. The MLE is a population decoder operating on combined neural responses to all of the
elements present. As for any system, the more elements we add, the more information we add
and so we expect the quality of our estimate of direction to improve. However, as the number
of elements rises so does the overall levels of neural activity and with it the multiplicative noise.
The trade-off between gains (arising from the larger sample size) and losses (because of increased
noise) are captured by a power-law dependence of the effective number of elements pooled on the
number of elements present (Dakin et al. 2005a).
With respect to the speed of motion, observers can make an estimate of mean (rather than
modal) speed over multiple elements for displays containing asymmetrical distributions of ele-
ment speed (Watamaniuk and Duchon 1992). Speed discrimination thresholds are not greatly
affected by the addition of substantial speed variation (µ = 7.6, σ = 1.7 deg/sec) consistent with
observers’ having a high level of uncertainty about the speed of any one element of the display
(Watamaniuk and Duchon 1992). Observers can make perceptual discriminations based on the
mean and variance of speed information but neither skewness nor kurtosis (Atchley and Andersen
1995). Anecdotally, displays composed of a broad range of speeds often produce a percept not of
coherent movement but of two transparent surfaces composed of either fast or slow elements.
Thus, performance of a mean speed task could be based on which display contains more fast ele-
ments. This strategy could be supported by the standard model of speed perception (where per-
ceived speed depends on the ratio of outputs from two channels tuned to high and low temporal
frequencies; e.g. Tolhurst, Sharpe, and Hart 1973). Simple temporally tuned channels necessarily
operate on a crude spatial stimulus representation and would predict, for example, that observers
would be unable to individuate elements within moving-dot stimuli (Allik 1992).
Size statistics
Looking at Figure 8.4 one is able to tell that the average element size on the left and right is
respectively greater or less than the size of the reference disk in the centre. However, demonstrat-
ing that such a judgement really involves averaging has taken some time. Like orientation, early
work relied on magnitude estimation to show that observers could estimate average line length
(Miller and Sheldon 1969). Ariely (2001) showed that we are better at judging the mean area of
Seeing Statistical Regularities 159
a set of disks than we are at judging the size of any member of the set. Importantly, Chong and
Treisman (2003) determined what visual attribute of the disk was getting averaged by having
observers adjust the size of a single disc to match the mean of two disks. They found (following
Teghtsoonian 1965) that observers pooled a size estimate about halfway between area (A) and
diameter (D), i.e. A0.76. Chong and Treisman (2003) went on to show that observers’ mean-size
estimates for displays containing 12 discs were little affected by size heterogeneity (over a ±0.5
octave range), exposure duration, memory delays, or even the shape of the probability density
function for element size. Note that when discriminating stimuli composed of disks with different
mean size there are potential confounds in terms of either overall luminance or contrast of the
display (for disk or Gabor elements, respectively) as well as the density of element (if elements
occupy the similarly-sized regions). Chong and Treisman (2005) showed that judgements of mean
element size were unlikely to be based on such artefacts; neither mismatching density nor inter-
mingling the two sets to be discriminated greatly impacted performance.
Although they were carefully conducted, it is difficult to draw definitive conclusions about
the mechanism for size averaging based on these early studies because of the qualitative nature
of their data analyses. Quantitative comparison of human data to the performance of an ideal
observer (that averages a series of noiseless size estimates from a subset of the elements present)
led Myczek and Simons (2008) to conclude that the evidence for size averaging was equivocal.
Performance was frequently consistent with observers not averaging but rather, for example,
reporting the largest element in a display. In response Chong, Joo, Emmanouil, and Treisman
(2008) presented results which are intuitively difficult to reconcile with a lack of averaging
(e.g. superior performance with more elements) but what hampered resolution of this debate
was a consistent failure to apply a single plausible ideal observer model to a complete psy-
chophysical data set. The ideal observer used by Myczek and Simons (2008) limited sample
size but not uncertainty about individual disk sizes, and varied its decision rules based on the
condition. To resolve this debate, Solomon, Morgan, and Chubb (2011) used an equivalent
noise approach, measuring mean size and size-variance discrimination in the presence of dif-
ferent levels of size variability, and modelled results using a variant on Equation 1. Their results
indicate that observers can average 62–75% of elements present to judge size variance and that
(most) observers could use at least three elements when judging mean size. Although Solomon
et al. note that performance was not substantially better than that of an ideal observer using the
largest size present, more recent estimates of sampling for size averaging are closer to an effec-
tive sample size of five elements5 (Im and Halberda 2013). This suggests that size averaging does
involve some form of pooling. Note that it is a unique benefit of equivalent noise analysis that—
provided one accepts the assumptions of the ideal observer—one can remain agnostic as to the
underlying psychological/neural reality of how averaging works but still definitely establish
that observers perform in a manner that effectively involves averaging across multiple elements.
Recently, however, Allik et al. (2013) have presented compelling evidence that observers not
only use mean size but that this size averaging is compulsory (i.e. taking place without awareness
of individual sizes).
There has been considerable debate in this field as to whether the number of elements pre-
sent influences the observers’ ability to average size. The majority of studies (Allik et al. 2013;
Alvarez 2011; Ariely 2001; Chong and Treisman 2005) report little gain from the addition of
5 This is a corrected value based on a reported value of 7, which Allik et al (2013) point out is an over-estimate
(by a factor of 2 ). This is because the equivalent noise model fit by Im and Halberda’s (2013) does not allow
for a two-interval/two-alternative forced-choice task.
160 Dakin
extra elements, which has led some to conclude that this is evidence for a high-capacity parallel
processor of mean size (Alvarez 2011; Ariely 2001). From the point of view of averaging, Allik
et al. (2013) point out that near-constant performance indicates a consistent drop in efficiency
(i.e. sample size divided by number of elements), and propose a variant on the equivalent noise
approach that can account for this pattern of performance.
The development of models of size averaging that link behaviour to neural mechanisms has
been limited by a general lack of knowledge about the neural code for size. As a candidate basis
function for texture averaging, let us once again consider the Gabor model of V1 receptive fields.
Gabors code for spatial frequency (SF) not size. Although SF is likely a central component of
the neural code for size it cannot suffice in isolation (since it confounds size with SF content).
A further complication arises from the finding that the codes for size, number, and density are
intimately interconnected. Randomizing the size or density of elements makes it hard to judge
their number and we have suggested that this is consistent with estimates of magnitude from
texture (element size, density, or number) sharing a common mechanism possibly based on the
relative response of filters tuned to different SFs (Dakin et al. 2011). I note that such a model—
like the notion that a ratio of high to low temporal-frequency-tuned filters could explain speed
averaging—predicts no requirement for individuation of element sizes for successful size averag-
ing (Allik et al. 2013).
Attention
Attneave (1954) argued that statistical characterization of images could provide a compact rep-
resentation of complex visual structure that can distil useful information and so reduce task
demands. In this chapter I have reviewed evidence that the computation of texture statistics pro-
vides one means to achieve this goal. It has been proposed that attention serves essentially the
same purpose, filtering relevant from irrelevant information: ‘it implies withdrawal from some
things in order to deal effectively with others’ (James 1890: 256). How then do attention and
Seeing Statistical Regularities 161
averaging interact? Alvarez and Oliva (2009) used a change-detection task to show that simul-
taneous changes in local and global structure were more detectable, under conditions of high
attentional load, than changes to local features alone. They argue that this is consistent with a
reduction in attention to the background increasing noise in local (but less so on global) repre-
sentations. However, to perform this task one had only to notice any change in the image, so that
observers could use whatever cue reaches threshold first. Consequently, another interpretation of
these findings is that global judgements are easier so that observers use them when they can. In
order to determine the role of attention in averaging one must have a task where one can quantify
the extent to which observers are relying on local or global information. To this end, an equiva-
lent noise paradigm (see above) has been used to assess the role of attention in averaging and, in
particular, to separate its influence from that of crowding (Dakin et al. 2009). Attentional load and
crowding in an orientation-averaging task have quite distinct effects on observers’ performance.
While crowding effectively made observers uncertain about the orientation of each local element,
attentional restrictions limited global processing, specifically how many elements they could effec-
tively average.
Discussion
My review suggests several commonalities between averaging of various features. Coding seems
to be predominantly limited to first- and second-order statistics (sensitivity to third-order sta-
tistics in the luminance domain likely arises from the cortical basis filters being tuned for con-
trast, itself a second-order statistic). Computation of texture statistics generally exhibits flexibility
about the spatial distribution of elements, and does not require individuation of elements. Many
experimental manipulations of averaging end up influencing the local representation of direction
and orientation (e.g. crowding, eccentricity, absolute direction/orientation) with global pooling/
sampling being influenced only by attention or by the number of elements actually present. The
fact that size averaging only benefits modestly if at all from the addition of more elements is
odd—and has been used to call into question whether size averaging is possible at all. However,
recent equivalent noise experiments suggest that size averaging is possible. Further application of
this technique to determine the influence of number of elements on size averaging would allow us
to determine if the lack of effect of element number represents, for example, a trade-off between
sampling improvements and loss of local information that accompanies an increase in the num-
ber of elements.
I would sound a note of caution about the use of equivalent noise paradigms to study the human
estimation of visual ensemble statistics. The two-parameter model (Equation 1) is a straightfor-
ward means of interpreting discrimination performance in terms of local/global limits on visual
processing. However, this is psychophysics and the parameters such a model yields cannot guar-
antee that the underlying neural mechanism operates in the same manner as the ideal observer.
For example, if your performance on a size-averaging task is best fit by an EN model averaging
three elements, this means you are behaving as though you are averaging a sample of three ele-
ments. In other words, you could not achieve this performance using fewer than three elements.
What it does not say is that you are necessarily averaging a series of estimates at all. As described
above, you could average using all the elements (corrupted by noise) or (if the sampling rate
were low) just a few outlying sizes (i.e. very large or very small). Similarly, estimated internal
noise—which I have termed local noise—reflect the sum of all additive noise to which the system
is prone. Consequently, extra noise terms can be added to the two-parameter model to capture
the influence of late or decisional noise (Solomon 2010). However, wherever noise originates, the
162 Dakin
two-parameter form of this expression is still a legitimate means of estimating how much perfor-
mance is being limited by an effective precision on judgements about individual elements and an
effective ability to pool across estimates. I contend that this, like the psychometric function, can be
treated as a compact characterization of performance that is useful for constraining biologically
plausible models of visual processing of texture statistics.
I further submit that current psychophysical data on averaging of luminance, motion, orienta-
tion, speed, and perhaps size suggest a rather simple ‘back-pocket’ model of ensemble statisti-
cal encoding. Specifically, a bank of mechanisms each pooling a set of input units (with V1-like
properties) distributed over a wide range of spatial locations and spatial frequencies and with
input sensitivities distributed over a Gaussian range of the attribute of interest. Activity of each
over these channels is limited by (a) effective noise on each input unit and (b) multiplicative noise
on the pool, and is decoded using a maximum-likelihood/template-matching procedure to con-
fer levels of resistance to uncorrelated noise (of the sort used in coherence paradigms) that a
vector-averaging procedure would be unable to produce.
The cortical locus for the computation of these statistics is unknown. However, it may be ear-
lier than one might think. As well as the unexpected dependence of motion pooling on signal
luminance (indicating pooling of signals generated pre-MT), note also that while observers can
average orientation signals defined by either luminance or contrast, they are unable to average
across stimulus types. This indicates that averaging happens before assignment of an abstract (i.e.
cue-invariant) orientation label (Allen et al. 2003). As well as the issue of neural locus, there
are several other open questions around visual computation of summary statistics. First, what is
actually getting averaged? We have seen some effort in this regard for size averaging—something
between diameter and area (a ‘one-and-a-half-dimensional’ representation?) gets averaged—but
no effort has been made to separate out size from (say) spatial frequency. Building better models
requires an understanding of their input. In this vein, can spatially coarse channels of the kind
described above really provide a sufficient description of images? Such a representation would pre-
dict an almost complete loss of information about individual elements under averaging. Although
that does seem to happen in some circumstances, the limits on the local representation have yet to
be firmly established. And finally, how important are natural scenes in driving our representation
of ensemble statistics other than orientation or motion?
References
Allard, R. and P. Cavanagh (2012). ‘Different Processing Strategies Underlie Voluntary Averaging in Low
and High Noise’. Journal of Vision 12(11): 6. doi: 10.1167/12.11.6
Allen, H. A., R. F. Hess, B. Mansouri, and S. C. Dakin (2003). ‘Integration of First- and Second-Order
Orientation’. Journal of the Optical Society of America. A: Optics, Image Science, and Vision
20(6): 974–986.
Allik, J. (1992). ‘Competing Motion Paths in Sequence of Random Dot Patterns’. Vision Research
32(1): 157–165.
Allik, J., M. Toom, A. Raidvee, K. Averin, and K. Kreegipuu (2013). ‘An almost General Theory of Mean
Size Perception’. Vision Research 83: 25–39. doi: 10.1016/j.visres.2013.02.018
Alvarez, G. A. and A. Oliva (2009). ‘Spatial Ensemble Statistics are Efficient Codes that Can Be Represented
with Reduced Attention’. Proceedings of the National Academy of Sciences of the United States of America
106(18): 7345–7350. doi: 10.1073/pnas.0808981106
Alvarez, G. A. (2011). ‘Representing multiple objects as an ensemble enhances visual cognition’. Trends
Cogn. Sci. 15(3): 122–131. doi: 10.1016/j.tics.2011.01.003
Seeing Statistical Regularities 163
Anderson, B. L. and J. Kim (2009). ‘Image Statistics Do Not Explain the Perception of Gloss and Lightness’.
Journal of Vision 9(11): 10 11–17. doi: 10.1167/9.11.10
Appelle, S. (1972). ‘Perception and Discrimination as a Function Of Stimulus Orientation: The “Oblique
Effect” in Man and Animals’. Psychol. Bull. 78(4): 266–278.
Ariely, D. (2001). ‘Seeing Sets: Representation by Statistical Properties’. Psychological Science 12(2): 157–162.
Atchley, P. and G. J. Andersen (1995). ‘Discrimination of Speed Distributions: Sensitivity to Statistical
Properties’. Vision Research 35(22): 3131–3144.
Attneave, F. (1954). ‘Some Informational Aspects of Visual Perception’. Psychol. Rev. 61(3): 183–193.
Baldassi, S. and D. C. Burr (2000). ‘Feature-Based Integration of Orientation Signals in Visual Search’.
Vision Research 40(10–12): 1293–1300.
Barlow, H. and S. P. Tripathy (1997). ‘Correspondence Noise and Signal Pooling in the Detection of
Coherent Visual Motion’. Journal of Neuroscience 17(20): 7954–7966.
Bauer, B. (2009). ‘Does Stevens’s Power Law for Brightness Extend to Perceptual Brightness Averaging’.
Psychological Record 59: 171–186.
Bex, P. J. and S. C Dakin (2002). ‘Comparison of the Spatial-Frequency Selectivity of Local and Global
Motion Detectors’. Journal of the Optical Society of America. A: Optics, Image Science, and Vision
19(4): 670–677.
Bex, P. J. and W. Makous (2002). ‘Spatial Frequency, Phase, and the Contrast of Natural Images’. Journal of
the Optical Society of America. A: Optics, Image Science, and Vision 19(6): 1096–1106.
Bonin, V., V. Mante, and M. Carandini (2006). ‘The Statistical Computation Underlying Contrast Gain
Control’. Journal of Neuroscience 26(23): 6346–6353. doi: 10.1523/JNEUROSCI.0284-06.2006
Brady, T. F. and G. A. Alvarez (2011). ‘Hierarchical Encoding in Visual Working Memory: Ensemble
Statistics Bias Memory for Individual Items’. Psychological Science 22(3): 384–392.
doi: 10.1177/0956797610397956
Chong, S. C. and A. Treisman (2003). ‘Representation of Statistical Properties’. Vision Research 43(4): 393–404.
Chong, S. C. and A. Treisman (2005). ‘Statistical Processing: Computing the Average Size in Perceptual
Groups’. Vision Research 45(7): 891–900. doi: 10.1016/j.visres.2004.10.004
Chong, S. C., S. J. Joo, T. A. Emmanouil, and A. Treisman (2008). ‘Statistical Processing: Not so
Implausible After All’. Perception and Psychophysics 70(7): 1327–1334; discussion 1335–1326.
doi: 10.3758/PP.70.7.1327
Chubb, C., J. Econopouly, and M. S. Landy (1994). ‘Histogram Contrast Analysis and the Visual
Segregation of IID Textures’. Journal of the Optical Society of America. A: Optics, Image Science, and
Vision 11(9): 2350–2374.
Chubb, C., J. H. Nam, D. R. Bindman, and G. Sperling (2007). ‘The Three Dimensions of Human Visual
Sensitivity to First-Order Contrast Statistics’. Vision Research 47(17): 2237–2248. doi: 10.1016/j.
visres.2007.03.025
Dakin, S. C. (1997). ‘The Detection of Structure in Glass Patterns: Psychophysics and Computational
Models’. Vision Research 37(16): 2227–2246.
Dakin, S. C. and R. J. Watt (1997). ‘The Computation of Orientation Statistics from Visual Texture’. Vision
Research 37(22): 3181–3192.
Dakin, S. C. (1999). ‘Orientation Variance as a Quantifier of Structure in Texture’. Spatial Vision 12(1): 1–30.
Dakin, S. C. (2001). ‘Information Limit on the Spatial Integration of Local Orientation Signals’. Journal of
the Optical Society of America. A: Optics, Image Science, and Vision 18(5): 1016–1026.
Dakin, S. C., I. Mareschal, and P. J. Bex (2005a). ‘Local and Global Limitations on Direction Integration
Assessed Using Equivalent Noise Analysis’. Vision Research 45(24): 3027–3049. doi: 10.1016/j.
visres.2005.07.037
Dakin, S. C., I. Mareschal, and P. J. Bex (2005b). ‘An Oblique Effect for Local Motion: Psychophysics and
Natural Movie Statistics’. Journal of Vision 5(10): 878–887. doi: 10.1167/5.10.9
164 Dakin
Dakin, S. C., P. J. Bex, J. R. Cass, and R. J. Watt (2009). ‘Dissociable Effects of Attention and Crowding on
Orientation Averaging’. Journal of Vision 9(11): 28, 1–16. doi: 10.1167/9.11.28
Dakin, S. C., M. S. Tibber, J. A. Greenwood, F. A. Kingdom, and M. J. Morgan (2011). ‘A Common Visual
Metric for Approximate Number and Density’. Proceedings of the National Academy of Sciences of the
United States of America 108(49): 19552–19557. doi: 10.1073/pnas.1113195108
Daugman, J. G. (1985). ‘Uncertainty Relation for Resolution in Space, Spatial-Frequency, and Orientation
Optimized by Two Dimensional Cortical Filters’. Journal of the Optical Society of America. A: Optics,
Image Science, and Vision 2: 1160–1169.
Dean, A. F. (1981). ‘The Variability of Discharge of Simple Cells in the Cat Striate Cortex’. Exp. Brain Res.
44(4): 437–440.
Deneve, S., P. E. Latham, and A. Pouget (1999). ‘Reading Population Codes: A Neural Implementation of
Ideal Observers’. Nat. Neurosci. 2(8): 740–745. doi: 10.1038/11205
de Fockert, J. and C. Wolfenstein (2009). ‘Rapid Extraction of Mean Identity from Sets of Faces’. Q. J. Exp.
Psychol. (Hove) 62(9): 1716–1722. doi: 10.1080/17470210902811249
de Gardelle, V. and C. Summerfield (2011). ‘Robust Averaging during Perceptual Judgment’. Proceedings
of the National Academy of Sciences of the United States of America 108(32): 13341–13346. doi: 10.1073/
pnas.1104517108
Girshick, A. R., M. S. Landy, and E. P. Simoncelli (2011). ‘Cardinal Rules: Visual Orientation
Perception Reflects Knowledge of Environmental Statistics’. Nat. Neurosci. 14(7): 926–932.
doi: 10.1038/nn.2831
Greenwood, J. A., P. J. Bex, and S. C. Dakin (2009). ‘Positional Averaging Explains Crowding with
Letter-Like Stimuli’. Proceedings of the National Academy of Sciences of the United States of America
106(31): 13130–13135. doi: 10.1073/pnas.0901352106
Haberman, J. and D. Whitney (2007). ‘Rapid Extraction of Mean Emotion and Gender from Sets of Faces’.
Curr. Biol. 17(17): R751–753. doi: 10.1016/j.cub.2007.06.039
Hubel, D. H. and T. N. Wiesel (1962). ‘Receptive Fields, Binocular Interaction and Function Architecture in
the Cat’s Visual Cortex’. Journal of Physiology 160: 106–154.
Husk, J. S., P. C. Huang, and R. F. Hess (2012). ‘Orientation Coherence Sensitivity’. Journal of Vision
12(6): 18. doi: 10.1167/12.6.18
Im, H. Y. and J. Halberda (2013). ‘The Effects of Sampling and Internal Noise on the Representation of
Ensemble Average Size’. Atten. Percept. Psychophys. 75(2): 278–286. doi: 10.3758/s13414-012-0399-4
James, W. (1890). The Principles of Psychology. New York: Henry Holt and Co.
Julesz, B., E. N. Gilbert, L. A. Shepp, and H. L. Frisch (1973). ‘Inability of Humans to Discriminate
between Visual Textures that Agree in Second-Order Statistics—Revisited’. Perception 2(4): 391–405.
Julesz, B. (1981). ‘Textons, the Elements of Texture Perception, and their Interactions’. Nature
290(5802): 91–97.
Kass, M. and A. Witkin (1985). ‘Analyzing Oriented Patterns’. Paper presented at the Ninth International
Joint Conference on Artificial Intelligence.
Kim, J. and B. L. Anderson (2010). ‘Image Statistics and the Perception of Surface Gloss and Lightness’.
Journal of Vision 10(9): 3. doi: 10.1167/10.9.3
Kingdom, F. A., A. Hayes, and D. J. Field (2001). ‘Sensitivity to Contrast Histogram Differences in
Synthetic Wavelet-Textures’. Vision Research 41(5): 585–598.
Levi, D. M. (2008). ‘Crowding—an Essential Bottleneck for Object Recognition: A Mini-Review’. Vision
Research 48(5): 635–654. doi: 10.1016/j.visres.2007.12.009
Malik, J. and R. Rosenholtz (1994). ‘A Computational Model for Shape from Texture’. Ciba Foundation
Symposium 184: 272–283; discussion 283–276, 330–278.
Mareschal, I., P. J. Bex, and S. C. Dakin (2008). ‘Local Motion Processing Limits Fine Direction
Discrimination in the Periphery’. Vision Research 48(16): 1719–1725. doi: 10.1016/j.visres.2008.05.003
Seeing Statistical Regularities 165
Watamaniuk, S. N. and A. Duchon (1992). ‘The Human Visual System Averages Speed Information’. Vision
Research 32(5): 931–941.
Watamaniuk, S. N. and R. Sekuler (1992). ‘Temporal and Spatial Integration in Dynamic Random-Dot
Stimuli’. Vision Research 32(12): 2341–2347.
Watamaniuk, S. N., R. Sekuler, and S. P. McKee (2011). ‘Perceived Global Flow Direction Reveals Local
Vector Weighting by Luminance’. Vision Research 51(10): 1129–1136. doi: 10.1016/j.visres.2011.03.003
Webb, B. S., T. Ledgeway, and P. V. McGraw (2007). ‘Cortical Pooling Algorithms for Judging Global
Motion Direction’. Proceedings of the National Academy of Sciences of the United States of America
104(9): 3532–3537. doi: 10.1073/pnas.0611288104
Williams, D. W. and R. Sekuler (1984). ‘Coherent Global Motion Percepts from Stochastic Local Motions’.
Vision Research 24(1): 55–62.
Witkin, A. (1981). ‘Recovering Surface Shape and Orientation from Texture’. Artificial Intelligence 17: 17–47.
Chapter 9
Texture perception
Ruth Rosenholtz
Introduction: What is texture?
The structure of a surface, say of a rock, leads to a pattern of bumps and dips that we can feel with
our fingers. This applies equally well to the surface of skin, the paint on the wall, the surface of a car-
rot, or the bark of a tree. Similarly, the pattern of blades of grass in a lawn, pebbles on the ground,
or fibers in woven material, all lead to a tactile ‘texture’. The surface variations that lead to texture
we can feel also tend to lead to variations in the intensity of light reaching our eyes, producing what
is known as ‘visual texture’ (or here, simply ‘texture’). Visual texture can also come from variations
that do not lend themselves to tactile texture, such as the variation in composition of a rock (quartz
looks different from mica), waves in water, or patterns of surface color such as paint.
Texture is useful for a variety of tasks. It provides a cue to the shape and orientation of a surface
(Gibson 1950). It aids in identifying the material of which an object or surface is made (Gibson
1986). Most obviously relevant for this Handbook, texture similarity provides one cue to perceiv-
ing coherent groups and regions in an image.
Understanding human texture processing requires the ability to synthesize textures with desired
properties. By and large this was intractable before the wide availability of computers. Gibson
(1950) studied shape-from-texture by photographing wallpaper from different angles. Our under-
standing of texture perception would be quite limited if we were restricted to the small set of
textures found in wallpaper. Attneave (1954) gained significant insight into visual representation
by thinking about perception of a random noise texture, though he had to generate that texture
by hand, filling in each cell according to a table of random numbers. Beck (1966; 1967) formed
micropattern textures out of black tape affixed to white cardboard, restricting the micropatterns
to those made of line segments. Olson and Attneave (1970) had more flexibility, as their micropat-
terns were drawn in india ink. Julesz (1962, 1965) was in the enviable position of having access
to computers and algorithms for generating random textures. More recently, texture synthesis
techniques have gotten far more powerful, allowing us to gain new insights into human vision.
It is elucidating to ask why we label the surface variations of tree bark ‘texture’, and the sur-
face variations of the eyes, nose, and mouth ‘parts’ of a face object, or objects in their own right.
One reason for the distinction may be that textures have different identity-preserving transfor-
mations than objects. Shifting around regions within a texture does not fundamentally change
most textures, whereas swapping the nose and mouth on a face turns it into a new object (see also
Behrmann et al., this volume). Two pieces of the same tree bark will not look exactly the same,
but will seem to be the same ‘stuff ’, and therefore swapping regions has minimal effect on our
perception of the texture. Textures are relatively homogeneous, in a statistical sense, or at least
slowly varying. Fundamentally, texture is statistical in nature, and one could argue that texture
is stuff that is more compactly represented by its statistics—its aggregate properties—than by the
configuration of its parts (Rosenholtz 1999).
(a) (b)
(c) (d)
(e) (f)
Fig. 9.1 Texture segmentation pairs. (a)–(d): Micropattern textures. (a) Easily segments, and the two
textures have different 2nd order pixel statistics; (b) also segments fairly easily, yet the textures have the
same 2nd order statistics; (c) different 2nd-order statistics, does not easily segment, yet it is easy to tell
apart the two textures; (d) neither segments nor is it easy to tell apart the textures. (e,f) Pairs of natural
textures. The pair in (f) is easier to segment, but all four textures are clearly different in appearance.
Texture Perception 169
That texture and objects have different identity-preserving transformations suggests that one
might want to perform different processing on objects than on texture. In the late 1990s, that
was certainly the case in computer vision and image processing. Object recognition algorithms
differed greatly from texture classification algorithms. Algorithms for determining object shape
and pose were very different from those that found the shape of textured surfaces. In image cod-
ing, regions containing texture might be compressed differently than those dominated by objects
(Popat and Picard 1993). The notion of different processing for textures vs. objects was preva-
lent enough that several researchers developed algorithms to find regions of texture in an image,
though this was hardly a popular idea (Karu et al. 1996; Rosenholtz 1999).
However, exciting recent work (Section “Texture perception is not just for textures”) suggests that
human vision employs texture processing mechanisms even when performing object recognition
tasks in image regions not containing obvious ‘texture’. The phenomena of visual crowding provided
the initial evidence for this hypothesis. However, if true, such mechanisms would influence the
information available for object recognition, scene perception, and diverse tasks in visual cognition.
This chapter reviews texture segmentation, texture classification/appearance, and visual crowd-
ing. It is obviously impossible to fully cover such a diversity of topics in a short chapter. The
material covered will focus on computational issues, on the representation of texture by the visual
system, and on connections between the different topics.
Texture segmentation
Phenomena
An important facet of vision is the ability to perform ‘perceptual organization’, in which the visual
system quickly and seemingly effortlessly transforms individual feature estimates into perception
of coherent regions, structures, and objects. One cue to perceptual organization is texture similar-
ity. The visual system uses this cue in addition to and in conjunction with (Giora and Casco 2007;
Machilsen and Wagemans 2011) grouping by proximity, feature similarity, and good continuation
(see also Brooks, this volume; Elder, this volume).
The dual of grouping by similar texture is important in its own right, and has, in fact, received
more attention. In ‘preattentive’ or ‘effortless’ texture segmentation two texture regions quickly
and easily segregate—in less than 200 milliseconds. Observers may perceive a boundary between
the two. Figure 9.1 shows several examples. Like contour integration and perception of illusory
contours, texture segmentation is a classic Gestalt phenomenon. The whole is different than the
sum of its parts (see also Wagemans, this volume), and we perceive region boundaries which are
not literally present in the image (Figure 9.1a,b).
Researchers have taken performance under rapid presentation, often followed by a mask,
as meaning that texture segmentation is preattentive and occurs in early vision (Julesz 1981;
Treisman 1985). However, the evidence for both claims is somewhat questionable. We do not
really understand in what way rapid presentation limits visual processing. Can higher-level pro-
cessing not continue once the stimulus is removed? Does fast presentation mean preattentive?
(See also Gillebert & Humphreys, this volume.) Empirical results have given conflicting answers.
Mack et al. (1992) showed that texture segmentation was impaired under conditions of inatten-
tion due to the unexpected appearance of a segmentation display during another task. However,
the segmentation boundaries in their stimuli aligned almost completely with the stimulus for
the main task: two lines making up a large ‘+’ sign. This may have made the segmentation task
more difficult. Perhaps judging whether a texture edge occurs at the same location as an actual
170 Rosenholtz
line requires attention. Mack et al. (1992) demonstrated good performance at texture segmenta-
tion in a dual-task paradigm. Others (Braun and Sagi 1991; Ben-Av and Sagi 1995) show similar
results for a singleton-detection task they refer to as texture segregation. Certainly performance
with rapid presentation would seem to preclude mechanisms which require serial processing of
the individual micropatterns which make up textures like those in Figure 9.1a–d.
Some pairs of textures segment easily (Figure 9.1a), others with more difficulty (Figure 9.1b).
Some texture pairs are obviously different, even if they do not lead to a clearly perceived segmen-
tation boundary (Figure 9.1c), whereas other texture pairs require a great deal of inspection to tell
the difference (Figure 9.1d). Predicting the difficulty of segmenting any given pair of textures pro-
vides an important benchmark for understanding texture segmentation. Researchers have hoped
that such understanding would provide insight more generally into early vision mechanisms, such
as what features are available preattentively.
Statistics of pixels
When two textures differ sufficiently in their mean luminance, segmentation occurs (Boring 1945;
Julesz 1962). The same seems true for other differences in the luminance histogram (Julesz 1962;
Julesz 1965; Chubb et al. 2007). In other words, a sufficiently large difference between two textures
in their 1st-order luminance statistics leads to effortless segmentation.1 Differences in 1st-order
chrominance statistics also support segmentation (e.g. Julesz 1965).
However, differences in 1st-order pixel statistics are not necessary for texture segmentation to
occur. Differences in line orientation between two textures are as effective as differences in bright-
ness (Beck 1966; Beck 1967; Olson and Attneave 1970). Consider micropattern textures formed
of line segments (e.g. Figures 9.1a–c). Differences in the orientations of the line segments predict
segmentation better than either the orientation of the micropatterns, or their rated similarity. An
array of upright Ts segments poorly from an array rotated by 90 degrees; the line orientations are
the same in the two patterns. A T appears more similar to a tilted (45˚) T than to an L, but Ts seg-
ment from tilted-Ts more readily than they do from Ls.
Julesz (1965) generated textures defined by Markov processes, in which each pixel depends
probabilistically on its predecessors. He observed that one could often see within these textures
clusters of similar brightness values. For example, such clusters might form horizontal stripes, or
dark triangles. Julesz suggested that early perceptual grouping mechanisms might extract these
clusters, and that: ‘As long as the brightness value, the spatial extent, the orientation and the den-
sity of clusters are kept similar in two patterns, they will be perceived as one.’
It is tempting to observe clusters in Julesz’ examples and conclude that extraction of ‘texture
elements’ (aka texels), underlies texture perception. However, texture perception might also be
mediated by measurement of image statistics, with no intermediate step of identifying clusters.
The stripes and clusters in Julesz’ examples were, after all, produced by random processes. As
Julesz (1975) put it:
[10 years ago], I was skeptical of statistical considerations in texture discrimination because I did not
see how clusters of similar adjacent dots, which are basic for texture perception, could be controlled
Terminology in the field of texture perception stands in a confused state. ‘1st- and 2nd-order’ can refer to
1
(a) 1st-order histograms of features vs. 2nd-order correlations of those features; (b) statistics involving a
measurement to the first power (e.g. the mean) vs. a measurement to the power of 2 (e.g. the variance)—i.e.
the 1st- and 2nd-moments from mathematics; or (c) a model with only one filtering stage, vs. a model with a
filtering stage, a non-linearity, and then a 2nd filtering stage. This chapter uses the first definition.
Texture Perception 171
and analyzed by known statistical methods . . . In the intervening decade much work went into finding
statistical methods that would influence cluster formation in desirable ways. The investigation led to
some mathematical insights and to the generation of some interesting textures.
The key, for Julesz, was to figure out how to generate textures with desired clusters of dark
and light dots, while controlling their image statistics. With the help of collaborators Gilbert,
Shepp, and Frisch (acknowledged in Julesz 1975), Julesz proposed simple algorithms for gen-
erating pairs of micropattern textures with the same 1st- and 2nd-order pixel statistics. For
Julesz’ black and white textures, 1st-order statistics reduce to the fraction of black dots making
up the texture. 2nd-order or dipole statistics can be measured by dropping ‘needles’ onto a
texture, and observing the frequency with which both ends of the needle land on a black dot,
as a function of needle length and orientation. Such 2nd-order statistics are equivalent to the
power spectrum.
Examination of texture pairs sharing 1st- and 2nd-order pixel statistics led to the now-famous
‘Julesz conjecture’: ‘Whereas textures that differ in their first- and second-order statistics can be
discriminated from each other, those that differ in their third- or higher-order statistics usu-
ally cannot’ (Julesz 1975). This theory predicted a number of results, for both random noise and
micropattern-based textures. For instance, the textures in Figure 9.1a differ in their 2nd-order
statistics, and readily segment, whereas the textures in Figure 9.1d share 2nd-order statistics, and
do not easily segment.
Statistics of textons
However, researchers soon found counterexamples to the Julesz conjecture (Caelli and Julesz
1978; Caelli et al 1978; Julesz et al 1978; Victor and Brodie 1978). For example, the Δ ➔ texture
pair (Figure 9.1b) is relatively easy to segment, yet the two textures have the same 2nd-order
statistics. A difference in 2nd-order pixel statistics appeared neither necessary nor sufficient for
texture segmentation.
Based on the importance of line orientation in texture segmentation (Beck 1966, 1967; Olson
and Attneave 1970), two new classes of theories emerged. The first suggested that texture segmen-
tation was mediated not by 2nd-order pixel statistics, but rather by 1st-order statistics of basic
stimulus features such as orientation and size (Beck et al. 1983). Here ‘1st-order’ refers to histo-
grams of, e.g., orientation, instead of pixel values.
But what of the Δ ➔ texture pair? By construction, it contained no difference in the 1st-order
statistics of line orientation. However, notably triangles are closed shapes, whereas arrows are not.
Perhaps emergent features (Pomerantz & Cragin, this volume), like closure, also matter in texture
segmentation. Other iso-2nd order pairs hinted at the relevance of additional higher-level fea-
tures, dubbed textons. Texton theory proposes that segmentation depends upon 1st-order statis-
tics not only of basic features like orientation, but also of textons such as curvature, line endpoints,
and junctions (Julesz 1981; Bergen and Julesz 1983).
While intuitive on the surface, this explanation was somewhat unsatisfying. Proponents were
vague about the set of textons, making the theory difficult to test or falsify. In addition, it was
not obvious how to extract textons, particularly for natural images (Figure 9.1e,f). (Though see
Barth et al. (1998), for both a principled definition of a class of textons, and a way to measure
them in arbitrary images.) Texton theories have typically been based on verbal descriptions of
image features rather than actual measurements (Bergen and Adelson 1988). These ‘word models’
effectively operate on ‘things’ like ‘closure’ and ‘arrow junctions’ which a human experimenter has
labeled (Adelson 2001).
172 Rosenholtz
alignment, and sign of contrast (Graham et al. 1992; Beck et al. 1987), for which word models
inherently have trouble making predictions.
(a) (b)
(c) (d)
Fig. 9.2 Comparison of the information encoded in different texture descriptors. (a) Original
peas image; (b) texture synthesized to have the same power spectrum as (a), but random phase.
This representation cannot capture the structures visible in many natural and artificial textures,
though it performs adequately for some textures such as the left side of Figure 9.1e. (c) Marginal
statistics of multiscale, oriented and non-oriented filter banks better capture the nature of edges
in natural images. (d) Joint statistics work even better at capturing structure.
Data from D.J. Heeger and J.R. Bergen, Pyramid-based texture analysis/synthesis, Proceedings of the 22nd
annual conference on Computer graphics and interactive techniques (SIGGRAPH ‘95), IEEE Computer Society
Press, Silver Spring, MD, 1995. Data from E.P. Simoncelli and B.A. Olshausen, Natural image statistics and neural
representation, Annual Review of Neuroscience, 24, pp. 1193–216, 2001.
The visual system may do something intelligent, like a statistical test (Voorhees and Poggio 1988;
Puzicha et al. 1997; Rosenholtz 2000), or Bayesian inference (Lee 1995; Feldman, on Bayesian
models, this volume), when detecting texture boundaries within an image. These decisions can
be implemented using biologically plausible image processing operations, thus bringing together
image processing-based and statistical models of texture segmentation.
Texture Perception 175
The parallels to texture segmentation should be obvious: researchers rightly skeptical about
the power of simple statistical models abandoned them in favor of models operating on discrete
‘things’. As with texture segmentation, the lack of faith in statistical models proved unfounded.
Sufficiently rich statistical models can capture a lot of structure. Demonstrating this requires
more complex texture synthesis methodologies to find samples of texture with the same statis-
tics. A number of texture synthesis techniques have been developed, with a range of proposed
descriptors.
Heeger and Bergen’s (1995) descriptor, motivated by the success of the LNL segmentation mod-
els, consists of marginal (i.e. 1st-order) statistics of the outputs of multiscale filters, both oriented
and unoriented. Their algorithm synthesizes new samples of texture by beginning with an arbi-
trary image ‘seed’—often a sample of random noise, though this is not required—and iteratively
applying constraints derived from the measured statistics. After a number of iterations, the result
is a new image with (approximately) the same 1st-order statistics as the original. Figure 9.2c shows
an example. Their descriptor captures significantly more structure than the power spectrum;
enough to reproduce the general size of the peas and their dimples. It still does not quite get the
edges right, and misrepresents larger-scale structures.
Portilla and Simoncelli (2000) extended the Heeger/Bergen methodology, and included in their
texture descriptor the joint (2nd-order) statistics of responses of multiscale V1-like simple and
complex ‘cells’. Figure 9.2d shows an example synthesis. This representation captures much of
the perceived structure, even in micropattern textures (Portilla and Simoncelli 2000; Balas 2006),
though it is not perfect. Some non-parametric synthesis techniques have performed better at
producing new textures that look like the original (e.g. Efros and Leung 1999). However, these
techniques use a texture descriptor that is essentially the entire original image. It is unclear how
biologically plausible such a representation might be, or what the success of such techniques teach
us about human texture perception.
Portilla and Simoncelli (2000), then, remains a state-of-the-art parametric texture model.
This does not imply that its measurements are literally those made by the visual system, though
they are certainly biologically plausible. A ‘rotation’ of the texture space would maintain the
same information while changing the representation dramatically. Furthermore, a sufficiently
rich set of 1st-order statistics can encode the same information as higher-order statistics (Zhu
et al 1996). However, the success of Portilla and Simoncelli’s model demonstrates that a rich and
high-dimensional set of image statistics comes close to capturing the information preserved and
lost in visual representation of a texture.
Peripheral crowding
Texture processing mechanisms have been associated with visual search (Treisman 1985) and set
perception (Chong and Treisman 2003). One can argue that texture statistics naturally inform
these tasks. Evidence of more general texture processing in vision has come from the study of
peripheral vision, in particular visual crowding.
Peripheral vision is substantially worse than foveal vision. For instance, the eye trades off sparse
sampling over a wide area in the periphery for sharp, high resolution vision over a narrow fovea.
If we need finer detail, we move our eyes to bring the fovea to the desired location.
The phenomenon of visual crowding2 illustrates that loss of information in the periphery is not
merely due to reduced acuity. A target such as the letter ‘A’ is easily identified when presented in
the periphery on its own, but becomes difficult to recognize when flanked too closely by other
stimuli, as in the string of letters, ‘BOARD’. An observer might see these crowded letters in the
wrong order, perhaps confusing the word with ‘BORAD’. They might not see an ‘A’ at all, or might
see strange letter-like shapes made up of a mixture of parts from several letters (Lettvin 1976).
Crowding occurs with a broad range of stimuli (see Pelli and Tillman 2008, for a review).
However, not all flankers are equal. When the target and flankers are dissimilar or less grouped
together, target recognition is easier (Andriessen and Bouma 1976; Kooi et al 1994; Saarela et al.
2009). Strong grouping among the flankers can also make recognition easier (Livne and Sagi 2007;
Sayim et al 2010; Manassi et al. 2012). Furthermore, crowding need not involve discrete ‘target’
and ‘flankers’; Martelli et al. (2005) argue that ‘self-crowding’ occurs in peripheral perception of
complex objects and scenes.
‘Crowding’ is used inconsistently and confusingly in the field, sometimes as a transitive verb (‘the flankers
2
crowd the target’), sometimes as a mechanism, and sometimes as the experimental outcome in which rec-
ognizing a target is impaired in the presence of nearby flankers. This chapter predominantly follows the last
definition, though in describing stimuli sometimes refers to the lay ‘at lot of stuff in a small space.’
(a) (b) (c)
(d) (e)
Fig. 9.3 Original images (a,c) and images synthesized to have approximately the same local summary
statistics (b,d). Intended (and model) fixation on the ‘+’. The cat can clearly be recognized while
fixating, even though much of the object falls outside the fovea. The summary statistics contain
sufficient information to capture much of its appearance (b). Similarly, the summary statistics contain
sufficient information to recognize the gist of the scene (d), though perhaps not to correctly assess its
details. (e) A patch of search display, containing a tilted target and vertical distractors. (f) The summary
statistics (here, in a single pooling region) are sufficient to decipher the approximate number of items,
much about their appearance, and the presence of the target. A target-absent patch from search for a
white vertical among black vertical and white horizontal bars. (h) The summary statistics are ambiguous
about the presence of a white vertical, perhaps leading to perception of illusory conjunctions.
Parts c-h are reproduced from Ruth Rosenholtz, Jie Huang, and Krista A. Ehinger, Rethinking the role of top-
down attention in vision: effects attributable to a lossy representation in peripheral vision, Frontiers in Psychology,
3, p. 13, DOI: 10.3389/fpsyg.2012.00013 (c) 2012, Frontiers Media S.A. This work is licensed under a Creative
Commons Attribution 3.0 License.
Texture Perception 179
produce forms’ (Lettvin 1976)? This seems antithetical to ideas of different processing for textures
and objects. Prior to 2000, it would have seemed surprising to use a texture-like representation
for more general visual tasks.
However, several state-of-the-art computer vision techniques operate upon local texture-like
image descriptors, even when performing object and scene recognition. The image descriptors
include local histograms of gradient directions, and local mean response to oriented multi-scale
filters, among others (Bosch et al. 2006, 2007; Dalal and Triggs 2005; Oliva and Torralba 2006;
Tola et al. 2010; Fei-Fei and Perona 2005). Such texture descriptors have proven effective for
detection of humans in natural environments (Dalal and Triggs 2005), object recognition in natu-
ral scenes (Bosch et al. 2007; Mutch and Lowe 2008; Zhu et al. 2011), scene classification (Oliva
and Torralba 2001; Renninger and Malik 2004; Fei-Fei and Perona 2005), wide-baseline stereo
(Tola et al. 2010), gender discrimination (Wang et al. 2010), and face recognition (Velardo and
Dugelay 2010). These results represent only a handful of hundreds of recent computer vision
papers utilizing similar methods.
Suppose we take literally the idea that peripheral vision involves early local texture process-
ing. The key questions are whether on the one hand, humans make the sorts of errors one would
expect, and on the other hand whether texture processing preserves enough information to
explain the successes of vision, such as object and scene recognition.
A local texture representation predicts vision would be locally ambiguous in terms of the phase
and location of features, as texture statistics contains such ambiguities. Do we see evidence in
vision? In fact, we do. Observers have difficulty distinguishing 180 degree phase differences
in compound sine wave gratings in the periphery (Bennett and Banks 1991; Rentschler and
Treutwein 1985) and show marked position uncertainty in a bisection task (Levi and Klein 1986).
Furthermore, such ambiguities appear to exist during object and scene processing, though we
rarely have the opportunity to be aware of them. Peripheral vision tolerates considerable image
variation without giving us much sense that something is wrong (Freeman and Simoncelli 2011;
Koenderink et al. 2012). Koenderink et al. (2012) apply a spatial warping to an ordinary image.
It is surprisingly difficult to tell that anything is wrong, unless one fixates near the disarray. (See
<http://i-perception.perceptionweb.com/fulltext/i03/i0490sas>.)
To go beyond qualitative evidence, we need a concrete proposal for what ‘texture process-
ing’ means. This chapter has reviewed much of the relevant work. Texture appearance models
aim to understand texture processing in general, whereas segmentation models attempt only to
predict grouping. Our current best guess as to a model of texture appearance is that of Portilla
and Simoncelli (2000). Perhaps the visual system computes something like 2nd-order statistics
of the responses of V1-like cells, over each local pooling region. We call this the Texture Tiling
Model. This proposal (Balas et al. 2009; Freeman and Simoncelli 2011) is not so different from
standard object recognition models, in which later stages compute more complex features by
measuring co-occurrences of features from the previous layer (Fukushima 1980; Riesenhuber
and Poggio 1999). Second-order correlations are essentially co-occurrences pooled over a sub-
stantially larger area.
Can this representation predict crowded object recognition? Balas et al (2009) demonstrate that
its inherent confusions and ambiguities predict difficulty recognizing crowded peripheral letters.
Rosenholtz et al. (2012a) further show that this model predicts crowding of other simple symbols.
Visual search employs wide field-of-view, crowded displays. Is the difference between easy and
difficult search due to local texture processing? We can utilize texture synthesis techniques to
visualize the local information available (Figure 9.3). When target and distractor bars differ sig-
nificantly in orientation, the statistics are sufficient to identify a crowded peripheral target. The
model predicts easy ‘popout’ search (Figure 9.3e,f). The model also predicts the phenomenon of
180 Rosenholtz
illusory conjunctions (Figure 9.3g,h), and other classic search results (Rosenholtz et al. 2012b;
Rosenholtz et al. 2012a). Characterizing visual search as limited by peripheral processing rep-
resents a significant departure from earlier interpretations which attributed performance to the
limits of processing in the absence of covert attention (Treisman 1985).
Under the Default Processing assumption, we must also ask whether texture processing might
underlie normal object and scene recognition. We synthesized an image to have the same local
summary statistics as the original (Rosenholtz 2011; Rosenholtz et al. 2012b; see also Freeman
and Simoncelli 2011). A fixated object (Figure 9.3b) is clearly recognizable; it is quite well encoded
by this representation. Glancing at a scene (Figure 9.3d), much information is available to deduce
the gist and guide eye movements; however, precise details are lost, perhaps leading to change
blindness (Oliva and Torralba 2006; Freeman and Simoncelli 2011; Rosenholtz et al. 2012b).
These results and demos indicate the power of the Texture Tiling Model. It is image-computa-
ble, and can make testable predictions for arbitrary stimuli. It predicts on the one hand difficulties
of vision, such as crowded object recognition and hard visual search, while plausibly supporting
normal object and scene recognition.
model computes 700–1000 image statistics per texture (depending upon choice of parameters).
(The Texture Tiling Model computes this many statistics per local pooling region.) The ‘forced
texture perception’ presumed to underlie crowding must also be high dimensional—after all, it
must at the very least support perception of actual textures.
Unfortunately it is difficult in general to get intuitions about behavior of high-dimensional
models. Low-dimensional models do not simply scale up to higher dimensions. A single mean
feature value captures little information about a stimulus. Additional statistics provide an increas-
ingly good representation of the original patch. Stuff-models, if sufficiently rich, can in fact cap-
ture a great deal of information about the visual input.
How well a stimulus can be encoded depends upon its complexity relative to the representation.
Flanker grouping can theoretically simplify the stimulus, leading to better representation and
perhaps better performance. In some cases the information preserved is insufficient to perform
a given task, and in common parlance the stimulus is ‘crowded’. In other cases, the information
is sufficient for the task, predicting the ‘relief from crowding’ accompanying, for example, a dis-
similar target and flankers (e.g. Rosenholtz et al. 2012a and Figure 9.3e,f).
A high-dimensional representation can also preserve the information necessary to individu-
ate ‘things’. For instance, it can capture the approximate number of discrete objects in Figure
9.3e,g. In fact, one can represent an arbitrary amount of structure in the input by varying the
size of the regions over which statistics are computed (Koenderink and van Doorn 2000),
and the set of statistics. The structural/statistical distinction is not a dichotomy, but rather a
continuum.
The mechanisms underlying crowding may be ‘later’ than texture perception mechanisms, and
operate on precomputed groups or ‘things’. However, just because we often recognize ‘things’
in our stimuli, as a result of the full visual-cognitive machinery, does not mean that our visual
systems operate upon those things to perform a given task. One should not underestimate the
power of high-dimensional models which operate on continuous ‘stuff ’. In texture perception,
such models have explained results for a wider variety of stimuli, and with arguably simpler
mechanisms.
Conclusions
In the last several decades, much progress has been made toward better understanding the mecha-
nisms underlying texture segmentation, classification, and appearance. There exists a rich body
of work on texture segmentation, both behavioral experiments and modeling. Many results can
be explained by intelligent decisions based on some fairly simple image statistics. Researchers
have also developed powerful models of texture appearance. More recent work demonstrates that
similar texture-processing mechanisms may account for the phenomena of visual crowding. The
details remain to be worked out, but if true, the visual system may employ local texture processing
throughout the visual field. This predicts that, rather than being relegated to a narrow set of tasks
and stimuli, texture processing underlies visual processing in general, supporting such diverse
tasks as visual search, object and scene recognition.
References
Adelson, E. H. (2001). ‘On seeing stuff: The perception of materials by humans and machines’. In
Proceedings of the SPIE: HVEI VI, edited by B. E. Rogowitz and T. N. Pappas, Vol. 4299: 1–12.
Andriessen, J. J.J., and Bouma, H. (1976) ‘Eccentric vision: Adverse interactions between line segments’.
Vision Research 16: 71–8.
182 Rosenholtz
Attneave, F. (1954). ‘Some informational aspects of visual perception’. Psychological Review 61(3): 183–93.
Bajcsy, R. (1973). ‘Computer identification of visual surfaces’. Computer Graphics and Image Processing
2(2): 118–30.
Balas, B. J. (2006). ‘Texture synthesis and perception: using computational models to study texture
representations in the human visual system’. Vision research 46(3): 299–309.
Balas, B., Nakano, L., and Rosenholtz, R. (2009). ‘A summary-statistic representation in peripheral vision
explains visual crowding’. Journal of Vision 9(12): 1–18.
Barth, E., Zetzsche, C., and Rentschler, I. (1998). ‘Intrinsic two-dimensional features as textons’. Journal of
the Optical Society of America. A, Optics, image science, and vision 15(7): 1723–32.
Beck, J. (1966). ‘Effect of orientation and of shape similarity on perceptual grouping’. Perception &
psychophysics 1(1): 300–2.
Beck, J. (1967). ‘Perceptual grouping produced by line figures’. Perception & Psychophysics 2(11): 491–5.
Beck, J., Prazdny, K., and Rosenfeld, A. (1983). ‘A theory of textural segmentation’. In Human and machine
vision, edited by J. Beck, B. Hope, and A. Rosenfeld, pp. 1–38. (New York: Academic Press).
Beck, J., Sutter, A., and Ivry, R. (1987). ‘Spatial frequency channels and perceptual grouping in texture
segregation’. Computer Vision, Graphics, and Image Processing 37(2): 299–325.
Behrmann et al. (this volume). Holistic face perception. In Oxford Handbook of Perceptual Organization,
edited by J. Wagemans. (Oxford: Oxford University Press).
Ben-av, M. B. and Sagi, D. (1995). ‘Perceptual grouping by similarity and proximity: Experimental results
can be predicted by intensity autocorrelations’. Vision Research 35(6): 853–66.
Bennett, P. J. and Banks, M. S. (1991). ‘The effects of contrast, spatial scale, and orientation on foveal and
peripheral phase discrimination’. Vision Research 31(10): 1759–86.
Bergen, J. R. and Adelson, E. H. (1988). ‘Early vision and texture perception’. Nature 333(6171): 363–4.
Bergen, J. R. and Julesz, B. (1983). ‘Parallel versus serial processing in rapid pattern discrimination’. Nature
303(5919): 696–8.
Bergen, J. R. and Landy, M. S. (1991). ‘Computational modeling of visual texture segregation’. In
Computational models of visual perception, edited by M. S. Landy and J. A. Movshon, pp. 253–71.
(Cambridge, MA: MIT Press).
Boring, E. G. (1945). ‘Color and camouflage’. In Psychology for the armed services, edited by E. G. Boring,
pp. 63–96. (Washington, D.C: The Infantry Journal).
Bosch, A., Zisserman, A., and Munoz, X. (2006). ‘Scene classification via pLSA’. In Proceedings of the 9th
European Conference on Computer Vision (ECCV’06), Springer Lecture Notes in Computer Science
3954: 517–30.
Bosch, A., Zisserman, A., and Munoz, X. (2007). ‘Image classification using random forests and ferns’.
In Proceedings of the 11th International Conference on Computer Vision (ICCV’07) (Rio de Janeiro,
Brazil): 1–8.
Bouma, H. (1970). ‘Interaction effects in parafoveal letter recognition’. Nature 226: 177–8.
Bovik, A. C., Clark, M., and Geisler, W. S. (1990). ‘Multichannel Texture Analysis Using Localized Spatial
Filters’. IEEE transactions on pattern analysis and machine intelligence 12(1): 55–73.
Braun, J. and Sagi, D. (1991). ‘Texture-based tasks are little affected by second tasks requiring peripheral or
central attentive fixation’. Perception 20: 483–500.
Brooks (this volume). Traditional and new principles of perceptual grouping. In Oxford Handbook of
Perceptual Organization, edited by J. Wagemans. (Oxford: Oxford University Press).
Caelli, T. (1985). ‘Three processing characteristics of visual texture segmentation’. Spatial Vision 1(1): 19–30.
Caelli, T. M. and Julesz, B. (1978). ‘On perceptual analyzers underlying visual texture discrimination: Part
I’. Biol. Cybernetics 28: 167–75.
Caelli, T. M., Julesz, B., and Gilbert, E. N. (1978). ‘On perceptual analyzers underlying visual texture
discrimination: Part II’. Biol. Cybernetics 29: 201–14.
Texture Perception 183
Cant, J. S. and Goodale, M. A. (2007). ‘Attention to form or surface properties modulates different regions
of human occipitotemporal cortex’. Cerebral Cortex 17: 713–31.
Chong, S. C. and Treisman, A. (2003). ‘Representation of statistical properties’. Vision research 43: 393–404.
Chubb, C. and Landy, M. S. (1991). ‘Orthogonal distribution analysis: A new approach to the study
of texture perception’. In Computational Models of Visual Processing, edited by M. S. Landy and
J. A. Movshon, pp. 291–301. (Cambridge, MA: MIT Press).
Chubb, C., Nam, J.-H., Bindman, D. R., and Sperling, G. (2007). ‘The three dimensions of human visual
sensitivity to first-order contrast statistics’. Vision research 47(17): 2237–48.
Dakin (this volume). In Oxford Handbook of Perceptual Organization, edited by J. Wagemans. (Oxford:
Oxford University Press).
Dakin, S. C., Williams, C. B., and Hess, R. F. (1999). ‘The interaction of first- and second-order cues to
orientation’. Vision research 39(17): 2867–84.
Dakin, S. C., Cass, J., Greenwood, J. A., and Bex, P. J. (2010). ‘Probabilistic, positional averaging predicts
object-level crowding effects with letter-like stimuli’. Journal of Vision 10(10): 1–16.
Dalal, N., and Triggs, B. (2005). ‘Histograms of oriented gradients for human detection’. In 2005 IEEE
Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ‘05): 886–93.
Efros, A. A., and Leung, T. K. (1999). ‘Texture synthesis by non-parametric sampling’. In Proceedings of the
Seventh IEEE International Conference on Computer Vision 2: 1033–8.
Elder (this volume). Bridging the dimensional gap: Perceptual organization of contour in two-dimensional
shape. In Oxford Handbook of Perceptual Organization, edited by J. Wagemans. (Oxford: Oxford
University Press).
Fei-Fei, L. and Perona, P. (2005). ‘A Bayesian Hierarchical Model for Learning Natural Scene Categories’.
2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)
2: 524–31.
Feldman (this volume). Bayesian models of perceptual organization. In Oxford Handbook of Perceptual
Organization, edited by J. Wagemans. (Oxford: Oxford University Press).
Fogel, I. and Sagi, D. (1989). ‘Gabor filters as texture discriminator’. Biological Cybernetics 61: 103–13.
Freeman, J. and Simoncelli, E. P. (2011). ‘Metamers of the ventral stream’. Nature neuroscience
14(9): 1195–201.
Fukushima, K. (1980). ‘Neocognitron: a self-organizing neural network model for a mechanism of pattern
recognition unaffected by shift in position’. Biological Cybernetics 36: 193–202.
Gibson, J. (1950). ‘The perception of visual surfaces’. The American journal of psychology 63(3): 367–84.
Gibson, J. J.J. (1986). The ecological approach to visual perception. (Hillsdale, NJ: Lawrence Erlbaum
Associates).
Gillebert and Humphreys (this volume). Mutual interplay between perceptual organization and attention: a
neuropsychological perspective. In Oxford Handbook of Perceptual Organization, edited by J. Wagemans.
(Oxford: Oxford University Press).
Giora, E. and Casco, C. (2007). ‘Region- and edge-based configurational effects in texture segmentation’.
Vision Research 47(7): 879–86.
Graham, N., Beck, J., and Sutter, A. (1992). ‘Nonlinear processes in spatial-frequency channel models of
perceived texture segregation: Effects of sign and amount of contrast’. Vision Research 32(4): 719–43.
Greenwood, J. A., Bex, P. J., and Dakin, S. C. (2009). ‘Positional averaging explains crowding with
letter-like stimuli’. Proceedings of the National Academy of Sciences of the United States of America
106(31): 13130–5.
Greenwood, J. A., Bex, P. J., and Dakin, S. C. (2012). ‘Crowding follows the binding of relative position and
orientation’. Journal of Vision 12(3): 1–20.
Gurnsey, R. and Browse, R. (1987). ‘Micropattern properties and presentation conditions influencing visual
texture discrimination’. Percept. Psychophys. 41: 239–52.
184 Rosenholtz
Haralick, R. M. (1979). ‘Statistical and Structural Approaches to Texture’. Proceedings of the IEEE
67(5): 786–804.
Heeger, D. J. and Bergen, J. R. (1995). ‘Pyramid-based texture analysis/synthesis’. In Proceedings of the 22nd
annual conference on Computer graphics and interactive techniques (SIGGRAPH ‘95), pp. 229–38. (Silver
Spring, MD: IEEE Computer Society Press).
Hess, R. F. (1982). ‘Developmental sensory impairment: Amblyopia or tarachopia?’ Human neurobiology 1:
17–29.
Hindi Attar, C., Hamburger, K., Rosenholtz, R., Götzl, H., and Spillman, L. (2007). ‘Uniform versus
random orientation in fading and filling-in’. Vision Research 47(24): 3041–51.
Julesz, B. (1962). ‘Visual Pattern Discrimination’. IRE Transactions on Information Theory 8(2): 84–92.
Julesz, B. (1965). ‘Texture and Visual Perception’. Scientific American 212: 38–48.
Julesz, B. (1975). ‘Experiments in the visual perception of texture’. Scientific American 232(4): 34–43.
Julesz, B. (1981). ‘A theory of preattentive texture discrimination based on first-order statistics of textons’.
Biological Cybernetics 41: 131–8.
Julesz, B., Gilbert, E. N., and Victor, J. D. (1978). ‘Visual discrimination of textures with identical
third-order statistics’. Biol. Cybernet. 31: 137–40.
Karu, K., Jain, A., and Bolle, R. (1996). ‘Is there any texture in the image?’ Pattern Recognition
29(9): 1437–46.
Kooi, F. L., Toet, A., Tripathy, S. P., and Levi, D. M. (1994). ‘The effect of similarity and duration on spatial
interaction in peripheral vision’. Spatial vision 8(2): 255–79.
Knutsson, H. and Granlund, G. (1983). ‘Texture analysis using two-dimensional quadrature filters’. In
IEEE Computer Society workshop on computer architecture for pattern analysis and image database
management (CAPAIDM), pp. 206–13 (Silver Spring, MD: IEEE Computer Society Press).
Koenderink, J. J.J. and van Doorn, A. J. (2000). ‘Blur and disorder’. Journal of visual communication and
image representation 11(2): 237–44.
Koenderink, J. J.J., Richards, W., and van Doorn, A. J. (2012). ‘Space-time disarray and visual awareness’.
i-Perception 3(3): 159–62.
Kröse, B. (1986). ‘Local structure analyzers as determinants of preattentive pattern discrimination’. Biol.
Cybernet. 55 289–98.
Landy, M. S. and Graham, N. (2004). ‘Visual Perception of Texture’. In The Visual Neurosciences, edited by
L. M. Chalupa and J. S. Werner, pp. 1106–18. (Cambridge, MA: MIT Press).
Lee, T. S. (1995). ‘A Bayesian framework for understanding texture segmentation in the primary visual
cortex’. Vision Research 35(18): 2643–57.
Lettvin, J. Y. (1976). ‘On seeing sidelong’. The Sciences 16: 10–20.
Leung, T. K. and Malik, J. (1996). ‘Detecting, localizing, and grouping repeated scene elements from
an image’. In Proceedings of the 4th European Conf. on Computer Vision (ECVP ‘96), 1, 546–55
(London: Springer-Verlag).
Levi, D. M. and Carney, T. (2009). ‘Crowding in peripheral vision: why bigger is better’. Current biology
19(23): 1988–93.
Levi, D. M. and Klein, S. A. (1986). ‘Sampling in spatial vision’. Nature 320: 360–2.
Livne, T. and Sagi, D. (2007). ‘Configuration influence on crowding’. Journal of Vision 7(2): 1–12.
Louie, E., Bressler, D., and Whitney, D. (2007). ‘Holistic crowding: Selective interference between
configural representations of faces in crowded scenes’. Journal of Vision 7(2): 24.1–11.
Machilsen, B. and Wagemans, J. (2011). ‘Integration of contour and surface information in shape detection’.
Vision Research 51: 179–86. doi:10.1016/j.visres.2010.11.005.
Mack, A., Tang, B., Tuma, R., Kahn, S., and Rock, I. (1992). ‘Perceptual organization and attention’.
Cognitive Psychology 24: 475–501.
Malik, J. and Perona, P. (1990). ‘Preattentive texture discrimination with early vision mechanisms’. Journal
of the Optical Society of America. A 7(5): 923–32.
Texture Perception 185
Manassi, M., Sayim, B., and Herzog, M. (2012). ‘Grouping, pooling, and when bigger is better in visual
crowding’. Journal of Vision 12(10): 13.1–14.
Martelli, M., Majaj, N., and Pelli, D. (2005). ‘Are faces processed like words? A diagnostic test for
recognition by parts’. Journal of Vision 5: 58–70.
Mutch, J. and Lowe, D. G. (2008). ‘Object class recognition and localization using sparse features within
limited receptive fields’. International Journal of Computer Vision 80: 45–57.
Nothdurft, H. C. (1991). ‘Texture segmentation and pop-out from orientation contrast’. Vision Research
31(6): 1073–8.
Oliva, A. and Torralba, A. (2001). ‘Modeling the shape of the scene: A holistic representation of the spatial
envelope’. International Journal of Computer Vision 42(3): 145–75.
Oliva, A. and Torralba, A. (2006). ‘Building the gist of a scene: the role of global image features in
recognition’. Progress in Brain Research 155: 23–36.
Olson, R. K. and Attneave, F. (1970). ‘What Variables Produce Similarity Grouping?’ American Journal of
Psychology 83(1): 1–21.
Parkes, L., Lund, J., Angelucci, A., Solomon, J. A., and Morgan, M. (2001). ‘Compulsory averaging of
crowded orientation signals in human vision’. Nature Neuroscience 4(7): 739–44.
Pelli, D. G. and Tillman, K. A. (2008). ‘The uncrowded window of object recognition’. Nature Neuroscience
11(10): 1129–35.
Pelli, D. G., Palomares, M., and Majaj, N. (2004). ‘Crowding is unlike ordinary masking: Distinguishing
feature integration from detection’. Journal of Vision 4: 1136–69.
Põder, E. and Wagemans, J. (2007). ‘Crowding with conjunctions of simple features’. Journal of Vision
7(2): 23.1–12.
Pomerantz & Cragin (this volume). Emergent features and feature combination. In Oxford Handbook of
Perceptual Organization, edited by J. Wagemans. (Oxford: Oxford University Press).
Popat, K. and Picard, R. W. (1993). ‘Novel cluster-based probability model for texture synthesis,
classification, and compression’. In Proceedings of the SPIE Visual Communications and Image Processing
‘93, edited by B. G. Haskell and H.-M. Hang 2094: 756–68.
Portilla, J. and Simoncelli, E. P. (2000). ‘A Parametric Texture Model Based on Joint Statistics of Complex
Wavelet Coefficients’. International Journal of Computer Vision 40(1): 49–71.
Puzicha, J., Hofmann, T., and Buhmann, J. M. (1997). ‘Non—parametric Similarity Measures for
Unsupervised Texture Segmentation and Image Retrieval’. In Proceedings of the Computer Vision and
Pattern Recognition, CVPR ‘97, IEEE, 267–72.
Renninger, L. W. and Malik, J. (2004). ‘When is scene identification just texture recognition?’ Vision
Research 44(19): 2301–11.
Rentschler, I. and Treutwein, B. (1985). ‘Loss of spatial phase relationships in extrafoveal vision’. Nature
313: 308–10.
Riesenhuber, M. and Poggio, T. (1999). ‘Hierarchical models of object recognition in cortex’. Nature
neuroscience 2(11): 1019–25.
Rosenholtz, R. (1999). ‘General-purpose localization of textured image regions’. In Proceedings of
the SPIE, Human Vision and Electronic Imaging IV, edited by M. H. Wu et al., 3644: 454–60.
doi=10.1117/12.348465.
Rosenholtz, R. (2000). ‘Significantly different textures: A computational model of pre-attentive texture
segmentation’. In Proceedings of the European Conference on Computer Vision (ECCV ‘00), LNCS, edited
by D. Vernon 1843: 197–211.
Rosenholtz, R. (2011). ‘What your visual system sees where you are not looking’. In SPIE: Human
Vision and Electronic Imaging, XVI, edited by B. E. Rogowitz and T. N. Pappas,. 7865: 786510.
doi=10.1117/12.876659.
Rosenholtz, R., Huang, J. Raj, A., Balas, B. J., and Ilie, L. (2012a). ‘A summary statistic representation in
peripheral vision explains visual search’. Journal of Vision 12(4): 14. 1–17. doi: 10.1167/12.4.14.
186 Rosenholtz
Rosenholtz, R., Huang, J., and Ehinger, K. A. (2012b). ‘Rethinking the role of top-down attention in
vision: Effects attributable to a lossy representation in peripheral vision’. Frontiers in Psychology 3: 13.
doi:10.3389/fpsyg.2012.00013.
Rubenstein, B. S. and Sagi, D. (1996). ‘Preattentive texture segmentation: the role of line terminations, size,
and filter wavelength’. Perception & Psychophysics 58(4): 489–509.
Saarela, T. P., Sayim, B., Westheimer, G., and Herzog, M. H. (2009). ‘Global stimulus configuration
modulates crowding’. Journal of Vision 9(2): 5.1–11.
Sayim, B., Westheimer G., and Herzog, M. H. (2010). ‘Gestalt Factors Modulate Basic Spatial Vision’.
Psychological Science 21(5): 641–4.
Simoncelli, E. P. and Olshausen, B. A. (2001). ‘Natural image statistics and neural representation’. Annual
Review of Neuroscience 24: 1193–216.
Strasburger, H. (2005). ‘Unfocused spatial attention underlies the crowding effect in indirect form vision’.
Journal of Vision 5(11): 1024–37.
Sutter, A., Beck, J., and Graham, N. (1989). ‘Contrast and spatial variables in texture segregation: Testing a
simple spatial-frequency channels model’. Perception & Psychophysics 46(4): 312–32.
Tola, E., Lepetit, V., and Fua, P. (2010). ‘DAISY: an efficient dense descriptor applied to wide-baseline
stereo’. IEEE transactions on pattern analysis and machine intelligence 32(5): 815–30.
Tomita, F., Shirai, Y., and Tsuji, S. (1982). ‘Description of Textures by a Structural Analysis’. IEEE
transactions on pattern analysis and machine intelligence PAMI-4(2): 183–91.
Treisman, A. (1985). ‘Preattentive processing in vision’. Computer Vision, Graphics, and Image Processing
31: 156–77.
Turner, M. R. (1986). ‘Texture discrimination by Gabor functions’. Biological Cybernetics 55: 71–82.
van den Berg, R., Johnson, A., Martinez Anton, A., Schepers, A. L., and Cornelissen, F. W. (2012).
‘Comparing crowding in human and ideal observers’. Journal of Vision 12(8): 1–15.
Velardo, C. and Dugelay, J.-L. (2010). ‘Face recognition with DAISY descriptors’. In Proceedings of the 12th
ACM workshop on multimedia and security ACM: 95–100.
Victor, J. D. and Brodie, S. (1978). ‘Discriminable textures with identical Buffon Needle statistics’. Biol.
Cybernet. 31: 231–4.
Voorhees, H. and Poggio, T. (1988). ‘Computing texture boundaries from images’. Nature 333: 364–7.
Wagemans (this volume). Historical and conceptual background: Gestalt theory. In Oxford Handbook of
Perceptual Organization, edited by J. Wagemans. (Oxford: Oxford University Press).
Wang, J.-G., Li, J., W.-Y. Yau, and E. Sung (2010). ‘Boosting dense SIFT descriptors and shape contexts
of face images for gender recognition’. In Proceedings of the Computer Vision and Pattern Recognition
Workshop (CVPRW ‘10) San Francisco, CA, pp. 96–102.
Wechsler, H. (1980). ‘Texture analysis—a survey’. Signal Processing 2: 271–82.
Zetzsche, C., Barth, E., and Wegmann, B. (1993). ‘The importance of intrinsically two-dimensional
image features in biological vision and picture coding’. In Digital images and human vision, edited by
A. B. Watson, pp. 109–38. (Cambridge, MA: MIT Press).
Zhu, S., Wu, Y. N., and Mumford, D. (1996). ‘Filters, random fields and maximum entropy (FRAME)—
Towards the unified theory for texture modeling’. In IEEE Conf. Computer Vision and Pattern
Recognition, pp. 693–6.
Zhu, C., Bichot, C. E., and Chen, L. (2011). ‘Visual object recognition using daisy descriptor’. In Proc. IEEE
Intl. Conf. on Multimedia and Expo (ICME 2011), Barcelona, Spain, pp. 1–6.
Zucker, S. W. (1976). ‘Toward a model of texture’. Computer Graphics and Image Processing 5(2): 190–202.
Section 3
Contour integration: Psychophysical,
neurophysiological, and computational
perspectives
Robert F. Hess, Keith A. May, and Serge O. Dumoulin
A psychophysical perspective
Natural scenes and the visual system
The mammalian visual system has evolved to extract relevant information from natural images that
in turn have specific characteristics, one being edge alignments that define image features. Natural
scenes exhibit consistent statistical properties that distinguish them from random luminance distri-
butions over a large range of global and local image statistics. Edge co-occurrence statistics in natural
images are dominated by aligned structure (Geisler et al. 2001; Sigman et al. 2001; Elder and Goldberg
2002) and parallel structure (Geisler et al. 2001). The aligned edge structure follows from the fact
that pairs of separated local edge segments are most likely to be aligned along a linear or co-circular
path. This pattern occurs at different spatial scales (Sigman et al. 2001). The co-aligned information
represents contour structure in natural images. The parallel information, on the other hand, is most
frequently derived from regions of the same object and arises from surface texture. Edges are an
important and highly informative part of our environment. Edges that trace out a smooth path show
correspondence of position over a wide range of different spatial scales. As edges become more jag-
ged, and indeed more like edges of the kind common in natural images (i.e. fractal), correspondence
in position becomes limited to a smaller band of spatial scales. Although jagged edges have continu-
ous representation over spatial scale, the exact position and orientation of the edge changes from scale
to scale (Field et al. 1993). The contour information is therefore quite different at different spatial
scales so, to capture the full richness of the available information, it is necessary to make use of a range
of contour integration operations that are each selective for a narrow band of scales.
Since local edge alignment in fractal images depends on scale, Field et al. (1993) addressed
this question using spatial frequency narrowband elements (i.e. Gabors) and ensured that local
density cues could not play a role. We thought there might be specific rules for how the responses
of orientation-selective V1 cells are combined to encode contours in images. A typical stimulus is
seen in Figure 10.1a; it is an array of oriented Gabor micropatterns, a subset of which (frame on
the left) are aligned to make a contour (indicated by arrow).
In the figure in the left frame of Figure 10.1a, the contour in the middle of the field going from
the bottom right to the top left is clearly visible, suggesting that either elements aligned or of the
same orientation group together. The figure in the right frame of Figure 10.1a on first inspection
(b) (c)
90
Percent correct
80
70
Fig. 10.1 Contours defined by orientation-linking. In (a), a comparison of a straight contour defined
by elements that are aligned with the contour (left) or orthogonal to it (right). In (b), the visual system’s
performance on detecting orientationally-linked contours of different curvature, compared with that
of a single elongated filter (solid line). In (c), the proposed mechanism, a network interaction called an
‘Association Field’.
Reprinted from Vision Research, 33 (2), David J. Field, Anthony Hayes, and Robert F. Hess, Contour integration by
the human visual system: Evidence for a local “association field”, pp. 173–93, Copyright © 1993, with permission
from Elsevier and Robert F. Hess and Steven C. Dakin, Absence of contour linking in peripheral vision, Nature, 390
(6660), pp. 602–4, DOI: 10.1038/37593 Copyright (c) 1997, Nature Publishing Group.
Contour Integration 191
does not contain an obvious contour, yet there is a similar subset of the elements of the same
orientation and in the same spatial arrangement as in the left frame of Figure 10.1a. These ele-
ments are however not aligned with the contour path, but orthogonal to it, and one of our initial
observations was that although this arrangement did produce visible contours, the contours were
far less detectable than those with elements aligned with the path. This suggested rules imposed
by the visual grouping analysis relating to the alignment of micropatterns, which may reflect the
interactions of adjacent cells with similar orientation preference exploiting the occurrence of
co-oriented structure in natural images.
elements (and half the background elements) is reversed (Field et al. 1997). This manipulation
would defeat any elongated receptive field that linearly summated across space. This suggests that
even the detection of straight contours may be via the linking of responses of a number of cells
aligned across space but with similar orientation preferences.
On the basis of the above observations Field et al. (1993) suggested that these interactions
could be described in terms of an Association Field, a network of cellular interactions specifically
designed to capitalize on the edge-alignment properties of contours in natural images. Figure
10.1c illustrates the idea and summarizes the properties of the Association Field. The facilitatory
interactions are shown by continuous lines and the inhibitory interactions by dashed lines. The
closer the adjacent cell is in its position and preferred orientation, the stronger the facilitation.
This psychophysically defined ‘Association Field’ matches the joint-statistical relationship that
edge-alignment structure has in natural images (Geisler et al, 2001; Sigman et al, 2001; Elder and
Goldberg 2002; Kruger 1998; for more detail, see Elder, this volume).
So far we have assumed that the detection of contours defined by the alignment of spatial fre-
quency bandpass elements embedded within an array of similar elements of random orientation
is accomplished by a low-level mechanism operating within spatial scale (i.e. V1–V3 receptive
fields) rather than by a high-level mechanism operating across scale. This latter idea would be
more in line with what the Gestalt psychologists envisaged. The question then becomes, are con-
tours integrated within or across spatial scale? Figure 10.2 shows results obtained when the spatial
frequency of alternate micropatterns is varied (Dakin and Hess 1998). The top frames show exam-
ples of curved contours made up of elements of the same spatial scale (b) as opposed to elements
from two spatial scales (a and c). The results in the bottom frames show how the psychophysical
contour detection performance depends on the spatial frequency difference between alternate
contour elements. Contour integration exhibits spatial frequency tuning, more so for curved than
for straight contours, suggesting it is primarily a within-scale operation, providing support for
orientation linking as described by the Association Field operating at a low level in the cortical
hierarchy.
a b c
90 90
Percent correct
Percent correct
80 80
70 70
60 60
50 50
1 2 4 8 1 2 4 8
Carrier spatial frequency cpd Carrier spatial frequency cpd
Fig. 10.2 Orientational linking occurs within spatial scale. Frames at the top left and right (a) and (c)
show examples of contours defined by the orientation of elements that alternate in spatial scale. The
frame at the top center illustrates a contour defined by the orientation of elements within the one scale.
In the bottom frames, the detectability of contours, be they straight (bottom left) or curved (bottom
right), shows spatial scale tuning (adapted from Dakin and Hess 1998). In this experiment, one set of
Gabors had a carrier spatial frequency of 3.2 cpd, and the other set had a spatial frequency indicated by
the horizontal axis of the graphs.
Adapted from S.C. Dakin and R.F. Hess, Spatial-frequency tuning of visual contour integration, Journal of the
Optical Society of America A: Optics, Image Science, and Vision, 15(6), pp. 1486–99 © 1998, The Optical Society.
curved contours. Third, this does not depend on absolute contrast of elements (Hess et al. 2001).
These dynamics are not what one would expect if either synchrony of cellular firing which is in
the 1–2 ms range (Singer and Gray 1995) (Beaudot 2002; Dakin and Bex 2002) or contrast (Polat
1999; Polat and Sagi 1993, 1994) were involved in the linking process. The sluggish temporal
properties of the linking process may point to the code being carried by the later sustained part
of the spike train (Lamme 1995; Lamme et al. 1998; Zipser et al. 1996).
Contour integration is not a cue-invariant process (Zhou and Baker 1993) in that not all ori-
ented features result in perceptual contours: contours composed of elements alternately defined
by chromaticity and luminance do not link into perceptual contours (McIlhagga and Mullen
1996) and elements defined by texture-orientation do not link together either (Hess et al. 2000).
The rules that define linkable contours provide a psychophysical cue as to the probable site of
these elementary operations. McIlhagga and Mullen (1996) and Mullen et al. (2000) showed that
194 Hess, May, and Dumoulin
contours defined purely by chromaticity obey the same linking rules but that elements alternately
defined by luminance and chromatically do not link together. This suggests that, at the cortical
stage at which this occurs, luminance and chromatic information are processed separately, sug-
gesting a site later than V1since in V1 cells tuned for orientation processing both chromatic and
achromatic information (Johnson et al. 2001). Hess and Field (1995) showed that contour integra-
tion must occur at a level in the cortex where the cells process disparity. They devised a dichoptic
stimulus in which the embedded contour could not be detected monocularly because it oscillated
between two depth planes—it could be detected only if disparity had been computed first. These
contours were easily detected and their detectability did not critically depend on the disparity
range, suggesting the process operated at a cortical stage at or after where relative disparity was
computed. This is believed to be V2 (Parker and Cumming 2001).
A neurophysiological perspective
Cellular physiology
Neurons in primary visual cortex (V1 or striate cortex) respond to a relatively narrow range of
orientations within small (local) regions of the visual field (Hubel and Wiesel 1968). As such, V1
can be thought of as representing the outside world using a bank of oriented filters (De Valois
and De Valois 1990). These filters form the first stage of contour integration. In line with this
filter notion, the V1 response to visual stimulation is well predicted by the contrast-energy of
the stimulus for synthetic (Boynton et al. 1999; Mante and Carandini 2005) and natural images
(Dumoulin et al. 2008; Kay et al. 2008; Olman et al. 2004).
Even though V1 responses are broadly consistent with the contrast-energy within the images,
a significant contribution of neuronal interactions is present that modulate the neural responses
independent of the overall contrast-energy (Allman et al. 1985; Fitzpatrick 2000). These neuronal
interactions can enhance or suppress neural responses and may also support mechanisms such
as contour integration. The Association Field might be implemented by facilitatory interactions
between cells whose preferred stimuli lie close together on a smooth curve, and inhibitory inter-
actions between cells whose preferred stimuli would be unlikely to coexist on the same physi-
cal edge. There is anatomical evidence for such a hard-wired arrangement within the long-range
intrinsic cortical connections in V1 (Gilbert and Wiesel 1979; Gilbert and Wiesel 1989). Neurons
in different orientation columns preferentially link with neurons with co-oriented, co-axially
aligned receptive fields (Bosking et al. 1997; Kisvárday et al. 1997; Malach et al. 1993; Stettler et al.
2002; Weliky et al. 1995; Schmidt 1997; Pooresmaeili, 2010).
Neurophysiological recordings further support these anatomical observations (Gilbert et al.
1996; Kapadia et al. 1995; Li et al. 2006; Nelson and Frost 1985; Polat et al. 1998). Neuronal
responses to local oriented bars within the classical receptive field are modulated by the pres-
ence of flanking bars outside the classical receptive field, i.e. in the extra-classical receptive field.
Importantly, the elements in the extra-classical receptive field are not able to stimulate the neu-
ron alone, so the response modulation critically depends on an interaction between the elements
placed within the classical receptive field and those placed outside it. Furthermore, the amount of
response modulation is greatly affected by the relative positions and orientations of the stimulus
elements. Co-axial alignment usually increases neural responses whereas orthogonal orientations
usually decrease neural responses (Blakemore and Tobin 1972; Jones et al. 2002; Kastner et al.
1997; Knierim and Van Essen 1992; Nelson and Frost 1978; Nothdurft et al. 1999; Sillito et al.
1995). These neural modulations may partly be explained by the hard-wired intrinsic connectivity
Contour Integration 195
in V1 but may also be supported by feedback or top-down influences from later visual cortex
(Li et al. 2008).
The evidence suggests that the extra-classical receptive field modulations resemble the
proposed contour Association Field. For example, recording in V1, Kapadia and col-
leagues (Kapadia et al. 1995) presented flanking bars in many different configurations in the
extra-classical receptive field while presenting a target bar in the classical receptive field at the
neuron’s preferred orientation. Kapadia and colleagues found that facilitation was generally
highest for small separations and small or zero lateral offsets between the flanker and target
bar. They also varied the orientation of the flanking bar while maintaining good continuation
with the target bar. The distribution of preferred flanker orientations was strongly peaked at
the cell’s preferred orientation, indicating co-axial facilitation. Yet some cells did not have an
obvious preferred flanker orientation or appeared to prefer non-co-axial flanker orientations.
Kapadia and colleagues suggested that the latter neurons might play a part in integrating curved
contours. Tuning to curvature is also highly prevalent in V2 and V4 (Anzai et al. 2007; Hegde
and Van Essen 2000; Ito and Komatsu 2004; Pasupathy and Connor 1999) suggesting a role
for these sites in co-circular integration along curved contours. V4 neurons are also tuned to
simple geometric shapes, further highlighting its role in intermediate shape perception (Gallant
et al. 1993; Gallant et al. 1996).
Functional imaging
Functional MRI studies further highlight the involvement of human extra-striate cortex in con-
tour integration. For example, Dumoulin et al. (2008) contrasted the responses to several natural
and synthetic image categories (Figure 10.3). They found distinct response profiles in V1 and
extra-striate cortex. Contrast-energy captured most of the variance in V1, though some evidence
for increased responses to contour information was found as well. In extra-striate cortex, on the
other hand, the presence of sparse contours captured most of the response variance despite large
variations in contrast-energy. These results provide evidence for an initial representation of natu-
ral images in V1 based on local oriented filters. Later visual cortex (and to a modest degree V1)
incorporates a facilitation of contour-based structure and suppressive interactions that effectively
amplify sparse-contour information within natural images.
Similarly, Kourtzi and colleagues implicated both early and late visual cortex in the process of
contour integration (Altmann et al. 2003; Altmann et al. 2004; Kourtzi and Huberle 2005; Kourtzi
et al. 2003). Using a variety of fMRI paradigms they demonstrated involvement of both V1 and
later visual areas. However, the stimuli in all these fMRI studies contain closed contours. Contour
closure creates simple concentric shapes that may be easier to detect (Kovacs and Julesz 1993)
and may involve specialized mechanisms in extra-striate cortex (Altmann et al. 2004; Dumoulin
and Hess 2007; Tanskanen et al. 2008). Furthermore, contour closure may introduce symmetry
for which specialized detection mechanisms exist (Wagemans 1995). Therefore these fMRI results
may reflect a combination of contour integration and shape processing, and may not uniquely
identify the site of the contour integration.
Beyond V2 and V4 lies ventral cortex, which processes shapes. In humans, the cortical region
where intact objects elicit stronger responses than their scrambled counterparts is known as the
lateral occipital complex (LOC) (Malach et al. 1995). It extends from lateral to ventral occipital
cortex. The term ‘complex’ acknowledges that this region consists of several visual areas. Early vis-
ual cortex (V1) is often also modulated by the contrast between intact and scrambled objects but
in an opposite fashion, i.e. fMRI signal amplitudes are higher for scrambled images (Dumoulin
196 Hess, May, and Dumoulin
T-values
3 3 3
(b)
6 6 6
T-values
3 3 3
(c) (d)
Fig. 10.3 fMRI responses elicited by viewing pseudo-natural (a, b) and synthetic (b,d) images. The
fMRI responses are shown on an inflated cortical surface of the left hemisphere (c,d). The responses
are an average of five subjects and the average visual area borders are identified. Both pseudo-natural
and synthetic images yield similar results. In V1 strongest responses are elicited by viewing of the
‘full images’ (d, bottom inset). This supports the notion that V1 responses are dominated by the
contrast-energy within images. In extra-striate cortex, on the other hand, strongest responses are
elicited by viewing ‘contour’ images (d, top inset). These results suggest that facilitative and suppressive
neural interactions within and beyond V1 highlight contour information in extra-striate visual cortex.
Reproduced from Serge O. Dumoulina, Steven C. Dakinb, and Robert F. Hess, Sparsely distributed contours
dominate extra-striate responses to complex scenes, NeuroImage, 42(2), pp. 890–901, DOI: 10.1016/j.
neuroimage.2008.04.266 (c) 2008, The Wellcome Trust. This work is licensed under a Creative Commons
Attribution 3.0 License.
and Hess 2006; Fang et al. 2008; Grill-Spector et al. 1998; Lerner et al. 2001; Murray et al. 2002;
Rainer et al. 2002). Stronger responses to scrambled objects have been interpreted as feedback
from predictive coding mechanisms (Fang et al. 2008; Murray et al. 2002) or incomplete match
of low-level image statistics including the breakup of contours (Dumoulin and Hess 2006; Rainer
et al. 2002). These results highlight the interaction between early and late visual areas in the pro-
cessing of contour and shape.
Contour Integration 197
A computational perspective
Two main classes of contour integration model
Models of contour integration generally fall into one of two categories: Association Field models or
filter overlap models (although see Watt et al. (2008) for consideration of other models). In con-
trast to the Association Field, in filter overlap models, grouping occurs purely because the filter
responses to adjacent elements overlap.
Association Field models. Field et al (1993) did not explicitly implement an Association Field
model, but several researchers have done so since. Yen and Finkel (1998) set up a model that had
two sets of facilitatory connections: co-axial excitatory connections between units whose pre-
ferred stimulus elements lay on co-circular paths (for detecting snakes, as in Figure 10.1a, left),
and trans-axial excitatory connections between units whose preferred stimulus elements were
parallel (for detecting ladders, as in Figure 10.1a, right). The two sets of connections competed
with each other, so the set of connections carrying the weaker facilitatory signals was suppressed.
Their model did a fairly good job of quantitatively accounting for a range of data from Field et al.
(1993) and Kovács and Julesz (1993).
Another Association Field model was set up by Li (1998), who took the view that contour
integration is part of the wider task of computing visual saliency. Li’s saliency model was based
firmly on the properties of V1 cells. The same model was able to account for contour integra-
tion phenomena, as well as many other phenomena related to visual search and segmentation in
multi-element arrays (Li 1999; Li 2000; Li 2002; Zhaoping and May 2007). However, Li provided
only qualitative demonstrations of the model’s outputs, rather than quantitative simulations of
psychophysical performance like those of Yen and Finkel.
The models of Li and of Yen and Finkel were recurrent neural networks, which exhibit temporal
oscillations. Both models showed synchrony in oscillations between units responding to elements
within the same contour, but a lack of synchrony between units responding to elements in dif-
ferent contours. Both sets of authors suggested that this might form the basis of segmentation of
one contour from others or from the background. In addition, the units responding to contour
elements responded more strongly than those responding to distractor elements.
The Association Field models described so far used ad hoc weightings on the facilitatory con-
nections. A different approach is to assume that the connection weights reflect the image sta-
tistics that the observer is using to do the task. In this view, the Association Field is a statistical
distribution that allows the observer to make a principled decision about whether two edge ele-
ments should be grouped into the same contour. Geisler et al (2001) used this approach and found
that Association Fields derived from edge co-occurrence statistics in natural images accurately
accounted for human data on a contour detection task. Elder and Goldberg (2002) followed with
a similar approach.
Watt et al. (2008) have pointed out that many of the patterns of performance found in con-
tour integration experiments may reflect the difficulty of the task, rather than the properties of
the visual mechanism that the observer is using. Traditionally, task difficulty is factored out by
expressing the participant’s performance relative to the performance of the ideal observer for the
task (Banks et al. 1987; Geisler 1984; Geisler 1989). For many simple visual tasks, it is straight-
forward to derive the ideal algorithm, but this is not the case for most contour integration tasks
because of the complexity of the algorithms used for generating the contours. Recently, Ernst et al.
(2012) tackled this problem in an elegant way: they turned the idea of the Association Field on its
head and used it to generate the contours in the first place. The Association Field used to generate
198 Hess, May, and Dumoulin
the contours is then the correct, i.e. optimal, statistical distribution for calculating the likelihood
that the stimulus contains the contour. Using this approach, the properties of the contour, such as
curvature, element separation, etc., are determined by the parameters of the Association Field; the
ideal observer, who always uses the Association Field that generated the contour in the first place,
would therefore have an advantage over the human observer in knowing which sort of contour
was being presented on each trial. Not surprisingly, Ernst et al. found that, although the ideal
observer’s pattern of performance, as a function of contour properties, was qualitatively similar to
human performance, the ideal observer performed much better. They investigated the possibility
that the human observer was using the same Association Field on each trial. This strategy would
be optimal for contours generated using that Association Field, but suboptimal in all other cases.
They generated the single Association Field that fitted best to all the data, but even this subopti-
mal model outperformed the human observers. Ernst et al. ruled out the effect of noise because
the model’s correlation with the human data was the same as the correlations between individual
subjects, so it would seem that their model was simply using a better Association Field for the task
than the human observers.
Although the ideal observer’s performance can provide a useful benchmark against which to
compare human performance, it may be over-optimistic to assume that human observers will be
able to implement a strategy that is optimal for whichever psychophysical task they are set: it is
more likely that the human observer possesses mechanisms that are optimal for solving real-world
tasks, and recruits them to carry out the artificial psychophysical task at hand (McIlhagga and
May 2012). The natural-image-based approach to deriving the association Field taken by Geisler
et al. and Elder and Goldberg may therefore be more fruitful than a pure ideal-observer approach.
Filter-overlap models. As an alternative to Association Field models, Hess and Dakin (1997)
implemented a model in which the contour linking occurred due to spatial overlap of filter
responses to different elements. Applying a V1-style filter to the image has the effect of blurring
the elements so that they join up. Thresholding the filter output to black and white generates a set
of blobs, or zero-bounded response distributions (ZBRs), and a straight contour will generate a
long ZBR in the orientation channel aligned with the contour. In Hess and Dakin’s model, the for-
mation of ZBRs took place only within orientation channels, and this severely limited its ability to
integrate curved contours. The model’s performance, as a function of contour curvature, is plotted
in Figure 10.1b, which shows that, while the model could successfully detect straight contours, its
performance deteriorated rapidly as the contour became more curved. Hess and Dakin suggested
that this kind of model may reflect contour integration in the periphery, while the Association
Field may reflect processing in the fovea.
The poor performance of Hess and Dakin’s filter-overlap model on detection of highly curved
contours was not a result of the filter-overlap process itself, but a result of the fact that formation
of ZBRs took place within a single orientation channel. May and Hess (2008) lifted this restriction,
and implemented a model that could extend ZBRs across orientation channel as well as space.
Unlike Hess and Dakin’s model, May and Hess’s model can easily integrate curved contours, and
we have recently found that it provides an excellent fit to a large psychophysical data set (Hansen
et al. in submission). May and Hess’s model forms ZBRs within a 3-dimensional space, (x, y, θ),
consisting of the two dimensions of the image (x, y), and a third dimension representing filter
orientation (θ). A straight contour would lie within a plane of constant orientation in this space,
whereas a curved contour would move gradually along the orientation dimension as well as across
the spatial dimensions. This 3-D space is formally known as the tangent bundle, and subsequently
other researchers have confirmed its usefulness in contour-completion tasks (Ben-Yosef and
Ben-Shahar 2012).
Contour Integration 199
Around the same time that May and Hess (2008) were developing their model of contour inte-
gration, Rosenholtz and colleagues independently had the same idea, but applied it to a much
broader set of grouping tasks (Rosenholtz et al. 2009). To perform grouping on the basis of some
feature dimension, f, you can create a multidimensional space (x, y, f), and then plot the image in
this space. Then image elements with similar feature values and spatial positions will be nearby
and, if you blur the representation, they join up.
presentation to a much greater extent than contour integration, suggesting that contour integra-
tion has a more central cortical site than flanker facilitation. The results from Williams and Hess
(1998) and Huang et al. (2006) showed that flanker facilitation occurs in a much more limited
range of conditions than contour integration, so it seems unlikely that contour integration could
be achieved by the mechanisms responsible for psychophysical flanker facilitation. Williams and
Hess argued that the latter effect might arise through a reduction in positional uncertainty due to
the flanking elements, a view subsequently supported by Petrov et al. (2006).
Conclusion
The visual system groups local edge information into contours that are segmented from the back-
ground clutter in a visual scene. We have outlined two ways that this might be achieved. One is
an Association Field, which explicitly links neurons with different preferred locations and orien-
tations in a way that closely matches edge co-occurrence statistics in natural images. The other
is a simple filter-rectify-filter mechanism that, in the first stage, obtains a response to the con-
tour elements and, in the second stage, blurs this filter response along the contour; contours are
then defined by thresholding the filter output and identifying regions of contiguous response
across filter orientation and 2D image space. Both proposed mechanisms are consistent with
much of the available evidence, and it may be that either or both of these mechanisms play a
Contour Integration 201
Acknowledgements
This work was support by CIHR (#mop 53346 & mop10818) and NSERC (#46528-110) grants
to RFH. NWO (#452-08-008 & #433-09-223) grants supported SOD. KAM was supported by
EPSRC grant EP/H033955/1 to Joshua Solomon.
References
Allman, J., Miezin, F., and McGuinness, E. (1985). Stimulus specific responses from beyond the classical
receptive field: neurophysiological mechanisms for local-global comparisons in visual neurons. Ann.
Rev. Neurosci. 8: 407–30.
Altmann, C.F., Bulthoff, H.H., and Kourtzi, Z. (2003). Perceptual organization of local elements into global
shapes in the human visual cortex. Curr. Biol. 13(4): 342–9.
Altmann, C.F., Deubelius, A., and Kourtzi, Z. (2004). Shape saliency modulates contextual processing in
the human lateral occipital complex. J. Cogn. Neurosci. 16(5): 794–804.
Anzai, A., Peng, X., and Van Essen, D.C. (2007). Neurons in monkey visual area V2 encode combinations
of orientations. Nat. Neurosci. 10(10): 1313–21.
Banks, M.S., Geisler, W.S., and Bennett, P.J. (1987). The physical limits of grating visibility. Vision Research
27: 1915–24.
Beaudot, W.H.A. (2002). Role of onset asychrony in contour integration. Vision Research 42: 1–9.
Beck, J., Rosenfeld, A., and Ivry, R. (1989). Line segmentation. Spatial Vision 42(3): 75–101.
Ben-Yosef, G. and Ben-Shahar, O. (2012). A tangent bundle theory for visual curve completion. IEEE
Transactions on Pattern Analysis and Machine Intelligence 34: 1263–80.
Bex, P.J., Simmers, A.J., and Dakin, S.C. (2001). Snakes and ladders: the role of temporal modulation in
visual contour integration. Vision Research 41: 3775–82.
Blakemore, C. and Tobin, E.A. (1972). Lateral inhibition between orientation detectors in the cat’s visual
cortex. Experimental Brain Research 15: 439–40.
Bosking, W.H., Zhang, Y., Schofield, B., and Fitzpatrick, D. (1997). Orientation selectivity and the
arrangement of horizontal connections in the tree shrew striate cortex. J. Neurosci. 17: 2112–27.
Boynton, G.M., Demb, J.B., Glover, G.H., and Heeger, D.J. (1999). Neuronal basis of contrast
discrimination. Vision Research 39(2): 257–69.
Chakravarthi, R. and Pelli, D.G. (2011). The same binding in contour integration and crowding. Journal of
Vision 11(8), 10: 1–12.
Dakin, S.C. and Bex, P.J. (2002). Role of synchrony in contour binding: some transient doubts sustained.
J. Opt. Soc. Am. A, Opt. Image Sci. Vis. 19(4): 678–86.
Dakin, S.C. and Hess, R.F. (1998). Spatial-frequency tuning of visual contour integration. J. Opt. Soc. Am. A
15(6): 1486–99.
202 Hess, May, and Dumoulin
De Valois, R.L. and De Valois, K.K. (1990). Spatial Vision. Oxford: Oxford University Press.
Dumoulin, S.O. and Hess, R.F. (2006). Modulation of V1 activity by shape: image-statistics or shape-based
perception? J. Neurophysiol. 95(6): 3654–64.
Dumoulin, S.O. and Hess, R.F. (2007). Cortical specialization for concentric shape processing. Vision
Research 47(12): 1608–13.
Dumoulin, S.O., Dakin, S.C., and Hess, R.F. (2008). Sparsely distributed contours dominate extra-striate
responses to complex scenes. Neuroimage 42(2): 890–901.
Elder, J.H. and Goldberg, R.M. (2002). Ecological statistics of Gestalt laws for the perceptual organization
of contours. Journal of Vision 2(4), 5: 324–53.
Ernst, U.A., Mandon, S., Schinkel-Bielefeld, N., Neitzel, S.D., Kreiter, A.K., and Pawelzik, K.R. (2012).
Optimality of Human Contour Integration. PLoS Computational Biology 8(5): e1002520
Fang, F., Kersten, D., and Murray, S.O. (2008). Perceptual grouping and inverse fMRI activity patterns in
human visual cortex. J. Vis., 8(7), 2: 1–9.
Field, D.J., Hayes, A., and Hess, R.F. (1993). Contour integration by the human visual system: evidence for
a local ‘association field’. Vision Research 33(2): 173–93.
Field, D.J., Hayes, A., and Hess, R.F. (1997). The role of phase and contrast polarity in contour integration.
Investigative Ophthalmology and Visual Science 38: S999.
Fitzpatrick, D. (2000). Seeing beyond the receptive field in primary visual cortex. Curr. Opin. Neurobiol.
10(4): 438–43.
Gallant, J.L., Braun, J., and Van Essen, D.C. (1993). Selectivity for polar, hyperbolic, and Cartesian gratings
in macaque visual cortex. Science 259(5091): 100–3.
Gallant, J.L., Connor, C.E., Rakshit, S., Lewis, J.W., and Van Essen, D.C. (1996). Neural responses to polar,
hyperbolic, and Cartesian gratings in area V4 of the macaque monkey. J. Neurophysiol. 76(4): 2718–39.
Geisler, W.S. (1984). Physical limits of acuity and hyperacuity. J. Op. Soc. Am., A 1: 775–82.
Geisler, W.S. (1989). Sequential ideal-observer analysis of visual discriminations. Psychological Review
96: 267–314.
Geisler, W.S., Perry, J.S., Super, B.J., and Gallogly, D.P. (2001). Edge co-occurrence in natural images
predicts contour grouping performance. Vision Research 41(6): 711–24.
Gilbert, C.D. and Wiesel, T.N. (1979). Morphology and intracortical connections of functionally
characterised neurones in the cat visual cortex. Nature 280: 120–5.
Gilbert, C.D. and Wiesel, T.N. (1989). Columnar specificity of intrinsic horizontal and corticocortical
connections in cat visual cortex. J. Neurosci. 9(7): 2432–42.
Gilbert, C.D., Das, A., Ito, M., Kapadia, M., and Westheimer, G. (1996). Spatial integration and
cortical dynamics. Proceedings of the National Academy of Sciences of the United States of America
93: 615–22.
Grill-Spector, K., Kushnir, T., Hendler, T., Edelman, S., Itzchak, Y., and Malach, R. (1998). A sequence of
object-processing stages revealed by fMRI in the human occipital lobe. Hum Brain Mapp, 6(4): 316–28.
Hegde, J. and Van Essen, D.C. (2000). Selectivity for complex shapes in primate visual area V2. J. Neurosci.
20(5): RC61.
Hess, R.F., and Field, D.J. (1995). Contour integration across depth. Vision Research 35(12): 1699–711.
Hansen, B. C., May, K. A., and Hess, R. F. (2014) One “Shape” Fits All: The Orientation Bandwidth of
Contour Integration. J. Vis., (in submission)
Hess, R.F. and Dakin, S.C. (1997). Absence of contour linking in peripheral vision. Nature 390: 602–4.
Hess, R.F., Dakin, S.C., and Field, D.J. (1998). The role of ‘contrast enhancement’ in the detection and
appearance of visual contours. Vision Research 38 (6): 783–7.
Hess, R.F., Beaudot, W.H.A., and Mullen, K.T. (2001). Dynamics of contour integration. Vision Research
41: 1023–37.
Contour Integration 203
Hess, R.F., Ledgeway, T., and Dakin, S.C. (2000). Improvished second-order input to global linking in
human vision. Vision Research 40: 3309–18.
Hess, R.F., Hayes, A., and Field, D.J. (2003). Contour integration and cortical processing. J. Physiol. Paris
97(2–3): 105–19.
Huang, P.-C., Hess, R.F., and Dakin, S.C. (2006). Flank facilitation and contour integration: Different sites.
Vision Research 46: 3699–706.
Hubel, D.H. and Wiesel, T.N. (1968). Receptive fields and functional architecture of monkey striate cortex.
J. Physiol. 195(1): 215–43.
Ito, M. and Komatsu, H. (2004). Representation of angles embedded within contour stimuli in area V2 of
macaque monkeys. J. Neurosci. 24(13): 3313–24.
Johnson, E.N., Hawken, M.J., and Shapley, R. (2001). The spatial transformation of color in the primary
visual cortex of the macaque monkey. Nat. Neurosci. 4(4): 409–16.
Jones, H.E., Wang, W., and Sillito, A.M. (2002). Spatial organization and magnitude of orientation contrast
interactions in primate V1. J. Neurophysiol. 88: 2796–808.
Kapadia, M.K., Ito, M., Gilbert, C.D., and Westheimer, G. (1995). Improvement in visual sensitivity by
changes in local context: parallel studies in human observers and in V1 of alert monkeys. Neuron
15(4): 843–56.
Kastner, S., Nothdurft, H.C., and Pigarev, I.N. (1997). Neuronal correlates of pop-out in cat striate cortex.
Vision Research 37: 371–76.
Kay, K.N., Naselaris, T., Prenger, R.J., and Gallant, J.L. (2008). Identifying natural images from human
brain activity. Nature 452(7185): 352–5.
Kisvárday, Z.F., Tóth, E., Rausch, M., and Eysel, U.T. (1997). Orientation-specific relationship between
populations of excitatory and inhibitory lateral connections in the visual cortex of the cat. Cerebral
Cortex 7: 605–18.
Knierim, J.J. and Van Essen, D.C. (1992). Neuronal responses to static texture patterns in area V1 of the
alert macaque monkey. J. Neurophysiol. 67: 961–80.
Koffka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt, Brace and World.
Kourtzi, Z. and Huberle, E. (2005). Spatiotemporal characteristics of form analysis in the human visual
cortex revealed by rapid event-related fMRI adaptation. Neuroimage 28(2): 440–52.
Kourtzi, Z., Tolias, A.S., Altmann, C.F., Augath, M., and Logothetis, N.K. (2003). Integration of local
features into global shapes: monkey and human FMRI studies. Neuron 37(2): 333–46.
Kovacs, I. and Julesz, B. (1993). A closed curve is much more than an incomplete one: effect of closure
in figure-ground segmentation. Proceedings of the National Academy of Sciences of the United States of
America 90: 7495–7.
Kruger, N. (1998). Colinearity and parallelism are statistically significant second order relations of complex
cell responses. Neural Processing Letters. 8: 117–29.
Lamme, V.A.F. (1995). The neurophysiology of figure-ground segregation in primary visual cortex.
J. Neurosci. 15(2): 1605–15.
Lamme, V.A.F., Super, H., and Speckreijse, H. (1998). Feedforward, horizontal and feedback processing in
the visual cortex. Curr. Op. Neurobiol. 8: 529–35.
Ledgeway, T., Hess, R.F., and Geisler, W.S. (2005). Grouping local orientation and direction signals to
extract spatial contours: Empirical tests of ‘association field’ models of contour integration. Vision
Research 45: 2511–22.
Lerner, Y., Hendler, T., Ben-Bashat, D., Harel, M., and Malach, R. (2001). A hierarchical axis of object
processing stages in the human visual cortex. Cereb. Cortex 11(4): 287–97.
Li, Z. (1996). A neural model of visual contour integration. Advances in Neural Information Processing
Systems, 9, pp. 69–75. Boston: MIT Pres.
204 Hess, May, and Dumoulin
Li, Z. (1998). A neural model of contour integration in the primary visual cortex. Neural Computation
10(4): 903–40.
Li, Z. (1999). Contextual influences in V1 as a basis for pop out and asymmetry in visual search. Proceedings
of the National Academy of Sciences of the United States of America 96: 10530–5.
Li, Z. (2000). Pre-attentive segmentation in the primary visual cortex. Spatial Vision 13: 25–50.
Li, Z. (2002). A saliency map in primary visual cortex. Trends in Cognitive Sciences 6: 9–16.
Li, W., Piech, V., and Gilbert, C.D. (2006). Contour saliency in primary visual cortex. Neuron
50(6): 951–62.
Li, W., Piech, V., and Gilbert, C.D. (2008). Learning to link visual contours. Neuron 57(3): 442–51.
Malach, R., Amir, Y., Harel, H., and Grinvald, A. (1993). Relationship between intrinsic connections and
functional architecture revealed by optical imaging and in vivo targeted biocytin injections in primary
striate cortex. Proc. Natl. Acad. Sci. USA 90: 10469–73.
Malach, R., Reppas, J.B., Benson, R.R., Kwong, K.K., Jiang, H., Kennedy, W.A., Ledden, P.J., Brady, T.J.,
Rosen, B.R., and Tootell, R.B. (1995). Object-related activity revealed by functional magnetic resonance
imaging in human occipital cortex. Proc. Natl. Acad. Sci. USA 92(18): 8135–9.
Mante, V. and Carandini, M. (2005). Mapping of stimulus energy in primary visual cortex. J. Neurophysiol.
94(1): 788–98.
May, K.A. and Hess, R.F. (2007a). Dynamics of snakes and ladders. J. Vis. 7(12) 13: 1–9.
May, K.A. and Hess, R.F. (2007b). Ladder contours are undetectable in the periphery: a crowding effect?
J. Vis. 7 (13) 9: 1–15.
May, K.A. and Hess, R.F. (2008). Effects of element separation and carrier wavelength on detection of
snakes and ladders: Implications for models of contour integration. J. Vis. 8(13), 4: 1–23.
McIlhagga, W.H. and May, K.A. (2012). Optimal edge filters explain human blur detection. J. Vis. 12(10),
9: 1–13.
McIlhagga, W.H. and Mullen, K.T. (1996). Contour integration with colour and luminance contrast. Vision
Research 36(9): 1265–79.
Moulden, B. (1994). Collator units: second-stage orientational filters. In: M.J. Morgan (ed.) Higher-order
processing in the visual system: CIBA Foundation Symposium 184, pp. 170–84. Chichester: John Wiley
and Sons.
Mullen, K.T., Beaudot, W.H.A., and McIlhagga, W.H. (2000). Contour integration in color vision: a
common process for blue-yellow, red-green and luminance mechanisms? Vision Research 40: 639–55.
Murray, S.O., Kersten, D., Olshausen, B.A., Schrater, P., and Woods, D.L. (2002). Shape perception
reduces activity in human primary visual cortex. Proc. Natl. Acad. Sci. USA, 99(23): 15164–9.
Nelson, J.I., and Frost, B.J. (1978). Orientation-selective inhibition from beyond the classic visual receptive
field. Brain Res. 139(2): 359–65.
Nelson, J.I., and Frost, B.J. (1985). Intracortical facilitation among co-oriented, co-axially aligned simple
cells in cat striate cortex. Exp. Brain Res. 61(1): 54–61.
Nothdurft, H.C., Gallant, J.L., and Van Essen, D.C. (1999). Response modulation by texture surround in
primate area V1: correlates of ‘popout’ under anesthesia. Vis. Neurosci. 16 (1): 15–34.
Olman, C.A., Ugurbil, K., Schrater, P., and Kersten, D. (2004). BOLD fMRI and psychophysical
measurements of contrast response to broadband images. Vision Research 44(7): 669–83.
Parker, A.J. and Cumming, B.G. (2001). Cortical mechanisms of binocular stereoscopic vision. Prog. Brain
Res. 134: 205–16.
Pasupathy, A. and Connor, C.E. (1999). Responses to contour features in macaque area V4. J. Neurophysiol.
82(5): 2490–502.
Pelli, D.G., Palomares, M., and Majaj, N.J. (2004). Crowding is unlike ordinary masking: distinguishing
feature integration from detection. J. Vis. 4(12): 1136–69.
Contour Integration 205
Petrov, Y., Verghese, P., and McKee, S.P. (2006). Collinear facilitation is largely uncertainty reduction. J.Vis.
6(2): 170–8.
Pettet, M.W., McKee, S.P., and Grzywacz, N.M. (1996). Smoothness constrains long-range interactions
mediating contour-detection. Investigative Ophthalmology and Visual Science 37: 4368.
Pettet, M.W., McKee, S.P., and Grzywacz, N.M. (1998). Constraints on long-range interactions mediating
contour-detection. Vision Research 38(6): 865–79.
Polat, U. (1999). Functional architecture of long-range perceptual interactions. Spatial Vision 12: 143–62.
Polat, U. and Bonneh, Y. (2000). Collinear interactions and contour integration. Spatial Vision
13(4): 393–401.
Polat, U. and Sagi, D. (1993). Lateral interactions between spatial channels: suppression and facilitation
revealed by lateral masking experiments. Vision Research 33(7): 993–9.
Polat, U. and Sagi, D. (1994). The architecture of perceptual spatial interactions. Vision Research
34(1): 73–8.
Polat, U., Mizobe, K., Pettet, M.W., Kasamatsu, T., and Norcia, A.M. (1998). Collinear stimuli regulate
visual responses depending on cell’s contrast threshold. Nature 391(6667): 580–4.
Pooresmaeili, A, Herrero, J. L., Self, M. W., Roelfsema, P. P., and Thiele, A. (2010). Suppressive lateral
interactions at parafoveal representations in primary visual cortex. The Journal of Neuroscience,
30(38): 12745–12758.
Rainer, G., Augath, M., Trinath, T., and Logothetis, N.K. (2002). The effect of image scrambling on visual
cortical BOLD activity in the anesthetized monkey. Neuroimage 16 (3 Pt 1): 607–16.
Rosenholtz, R., Twarog, N.R., Schinkel-Bielefeld, N., and Wattenberg, M. (2009). An intuitive model of
perceptual grouping for HCI design. Proceedings of the 27th international conference on Human factors
in computing systems, pp. 1331–40.
Schmidt, K.E., Goebel, R., Lowel, S., and Singer, W. (1997). The perceptual grouping criterion of
collinearity is reflected by anisotropies of connections in the primary visual cortex. J. Eur. Neurosci.
9: 1083–1089.
Sigman, M., Cecchi, G.A., Gilbert, C.D., and Magnasco, M.O. (2001). On a common circle: natural scenes
and gestalt rules. Proc. Nat. Acad. Sci. USA 98(4): 1935–40.
Sillito, A.M., Grieve, K.L., Jones, H.E., Cudeiro, J., and Davis, J. (1995). Visual cortical mechanisms
detecting focal orientation discontinuities. Nature 378: 492–6.
Singer, W., and Gray, C.M. (1995). Visual feature integration and the temporal correlation hypothesis. Ann.
Rev. Neurosci. 18: 555–86.
Smits, J.T. and Vos, P.G. (1987). The perception of continuous curves in dot stimuli. Perception
16(1): 121–31.
Stemmler, M., Usher, M., and Niebur, E. (1995). Lateral interactions in primary visual cortex: A model
bridging physiology and psychophysics. Science 269: 1877–80.
Stettler, D.D., Das, A., Bennett, J., and Gilbert, C.D. (2002). Lateral connectivity and contextual
interactions in macaque primary visual cortex. Neuron 36: 739–50.
Tanskanen, T., Saarinen, J., Parkkonen, L., and Hari, R. (2008). From local to global: Cortical dynamics of
contour integration. J. Vis. 8(7), 15: 1–12.
Uttal, W.R. (1983). Visual form detection in 3-dimentional space. Hillsdale: Lawrence Erlbaum.
van den Berg, R., Roerdink, J.B.T.M., and Cornelissen, F.W. (2010). A neurophysiologically plausible
population code model for feature integration explains visual crowding. PLoS Computational Biology
6 (1): e1000646.
Wagemans, J. (1995). Detection of visual symmetries. Spat. Vis. 9(1): 9–32.
Watt, R., Ledgeway, T., and Dakin, S.C. (2008). Families of models for gabor paths demonstrate the
importance of spatial adjacency. J. Vis. 8(7): 1–19.
206 Hess, May, and Dumoulin
Weliky, G.A., Kandler, K., Fitzpatrick, D., and Katz, L.C. (1995). Patterns of excitation and inhibition
evoked by horizontal connections in visual cortex share a common relationship to orientation columns.
Neuron 15: 541–52.
Williams, C.B., and Hess, R.F. (1998). The relationship between facilitation at threshold and suprathreshold
contour integration. J. Op. Soc. Am., A 15(8): 2046–51.
Yen, S.-C. and Finkel, L.H. (1998). Extraction of perceptually salient contours by striate cortical networks.
Vision Research 38: 719–41.
Zhaoping, L. and May, K.A. (2007). Psychophysical tests of the hypothesis of a bottom-up saliency map in
primary visual cortex. PLoS Computational Biology, 3(4). doi: 10.1371/journal.pcbi.0030062
Zhou, Y.X. and Baker, C.L., Jr. (1993). A processing stream in mammalian visual cortex neurons for
non-Fourier responses. Science 261(5117): 98–101.
Zipser, K., Lamme, V.A.F., and Schiller, P.H. (1996). Contextural modulation in primary visual cortex.
J. Neurophysiol. 16: 7376–89.
Chapter 11
Introduction
The visible surface of a 3D object in the world projects to a 2D region of the retinal image. The rim
of the object, defined to be the set of surface points on the object grazed by the manifold of rays
passing through the optical centre of the eye (Koenderink 1984), projects to the image as a 1D
bounding contour. For a simply connected, unoccluded object, the rim projects as a simple closed
curve in the image, and such contours are sufficient to yield compelling percepts of 2D and even
3D shape (Figure 11.1a).
In the general case, however, even for a smooth object the bounding contour can be fragmented
due to occlusions, including self-occlusions, and the representation of the bounding contour is
further fragmented by the pointillist representations of the early visual system. From the photo-
receptors of the retina through the retinal ganglia, midbrain, and spatiotopic areas of the object
pathway in visual cortex, the image, and hence its contours, are represented piecemeal. A fun-
damental question is how the visual system assembles these pieces into the coherent percepts of
whole objects we experience.
An alternative to grouping the contour fragments of the boundary is to group the points inte-
rior to this contour based on their apparent similarity, a process known as region segmentation
(see Self and Roelfsema, this volume). By the Jordan Curve Theorem (Jordan 1887), for a simple
closed boundary curve the region and its boundary are formally dual (i.e. one can be derived from
the other), so in theory either method should suffice. In addition, an advantage of region grouping
is that one can initialize the solution with the correct topology (e.g. a simply connected region)
and easily maintain this topology as the solution evolves. The downside is the dependence of these
methods upon the homogeneous appearance of the object, which may not apply (Figure 11.1b). In
such cases, the geometric regularity of the boundary may be the only basis for perceptual organi-
zation. This is consistent with psychophysical studies using simple fragmented shapes that reveal
specialized mechanisms for contour grouping, distinct from processes for region grouping (Elder
and Zucker 1994).
One valid concern is that the contour grouping mechanisms revealed with simple artificial
stimuli may not generalize to complex natural scenes. However, a recent study by Elder and
Velisavljević (2009) suggests otherwise. This study used the Berkeley Segmentation Dataset
(BSD, Martin, Fowlkes, and Malik 2004) to explore the dynamics of animal detection in natural
scenes. For each image in the dataset, the BSD provides hand segmentations created by human
subjects, each of which carves up the image into meaningful regions. Elder and Velisavjlević
208 Elder
(a) (b)
Fig. 11.1 (a) Shape from contour. (b) When surface textures are heterogeneous, geometric
regularities of the object boundaries are the only cues for object segmentation. From Iverson (2012).
Reprinted with permission.
used this dataset to create new images in which luminance, colour, texture, and contour shape
cues were selectively turned on or off (Figure 11.2(a)). They then measured performance for
animal detection using these various modified images over a range of stimulus durations (Figure
11.2(b)). While each condition generally involved multiple cues, assuming additive cue combi-
nation, the contribution of each cue can be estimated using standard regression methods (Figure
11.2(c)).
The results show that humans do not use simple luminance or colour cues for animal detection,
but instead rely on contour shape and texture cues. Interestingly, the contour shape cues appear to
be the first available, influencing performance for stimulus durations as short as 10 msec, within a
backward masking paradigm. A control study found only a modest performance decrement when
the hand-drawn outlines were replaced by computer-generated edge maps (Elder and Zucker
1998b). Thus, contour grouping mechanisms appear to underlie rapid object perception for both
simple artificial images and complex natural scenes. (One can speculate on whether animal cam-
ouflage may make colour and texture cues less reliable than shape cues for animal detection in
particular—see Osorio and Cuthill, this volume.)
At the same time, we know from the fifty-year history of computer vision that contour grouping
is computationally difficult, due to fragmentation caused by occlusions as well as sections of con-
tour where figure/ground contrast is low. These two scenarios illustrate the problems of amodal
and modal completion, respectively (Figure 11.3). (A debate persists regarding whether a com-
mon mechanism underlies both amodal and modal completion—see van Lier and Gerbino, this
volume, for details. I will not address this debate here, but rather will consider the more general
problem of grouping fragmented contours, without regard for the cause of the fragmentation. It is
likely that the models discussed here could be productively refined by making this distinction, for
example by switching grouping mechanisms based upon the detection of T-junctions suggestive
of occlusion.)
To further complicate matters, natural images are often highly cluttered, so that for each contour
fragment, there are typically multiple possible fragments that might b e the correct continuation
Bridging the Dimensional Gap 209
(a)
(c) 2
Texture
(b) 1.5
Shape
1000 ms
30-120 ms 1
50 ms
d'
+ 0.5
Until response Colour
0 Luminance
Animal Non-Animal
–0.5
100 101 102 103
Stimulus duration (msec)
Fig. 11.2 Psychophysical animal detection experiment. (a) Example stimuli. The letters indicate the cues
available: Luminance, Color, Texture, Shape. ‘SO’ stands for ‘Shape Outline’. (b) Stimulus sequence. (c)
Estimated influence of the four individual cues to animal detection.
Reproduced from James H. Elder and Ljiljana Velisavljević, Cue Dynamics Underlying Rapid Detection of Animals
in Natural Scenes, Journal of Vision, 9(7), figure 3, doi: 10.1167/9.7.7 © 2009, Association for Research in Vision
and Ophthalmology.
of the contour. Thus to effectively exploit contours for object segmentation, the visual system must
be able to cope with uncertainty, using a relaxed form of perceptual contour closure that can work
reliably even for fragmented contours (Elder and Zucker 1993). For these reasons, computing the
correct bounding contours of objects in complex natural scenes is generally thought to be one of
the harder computer vision problems, and the state of the art is still quite far from human per-
formance (Arbelaez et al. 2011). So the question remains: how does the brain rapidly and reliably
solve this problem that computer vision algorithms fail to solve?
Computational framework
The standard computational framework for modelling contour grouping consists of three stages:
1 Local orientation coding. Detection of the local oriented elements (edges or line segments) to be
grouped.
2 Pairwise association. Computation of the strength of grouping (ideally expressed as a
probability) between each pair of local elements. This can be represented as a transition matrix.
These local probabilities are typically based on classical local Gestalt cues such as proximity,
good continuation and similarity in brightness, contrast and colour.
3 Global contour extraction. Inference of global contours based upon this transition matrix.
I will review all three of these stages below, but will focus primarily on the last, which in my
view is the hardest. To see this, we must first more clearly articulate the exact goal of the global
contour extraction stage. There are essentially two proposals. One (e.g. Geisler et al. 2001) is to
extract the unordered set of local elements comprising each contour. The second (e.g. Elder and
Goldberg 2002) is to extract the ordered sequence of local elements forming the contour. We
210 Elder
Fig. 11.3 Object boundaries project to the image as fragmented contours, due to occlusions (cyan) and
low figure/ground contrast (red).
Reproduced from Wagemans, J., Elder, J., Kubovy, M., Palmer, S., Peterson, M., Singh, M., & von der Heydt, R.,
A century of Gestalt psychology in visual perception: I. Perceptual grouping and figure-ground organization.
Psychological Bulletin, 138(6), pp. 1172–1217 (c) 2012, American Psychological Association.
will analyse these two objectives in more detail below, but for now note that in either case the
solution space is exponential in the number of elements comprising each contour. In particular,
given n oriented elements in the image and k elements comprising a particular contour, there are
n!/(k!(n – k)!) possible set solutions and n!/(n – k)! sequence solutions. Thus a key problem is to
identify effective algorithms that only need to explore a small part of this search space to find the
correct contours.
θ1 θ2
α1 ρ
α2
β1
β2
Fig. 11.4 The Gestalt cue of proximity can be expressed as a function of the distance ρ between each
pair of local elements. The cue of good continuation for oriented edges in an image can be expressed to
first order as a function of two angles θ1 and θ2. The cue of similarity can be expressed as a function of
photometric measurements αi, βi on either side of each edge.
Reproduced from James H. Elder and Richard M. Goldberg, Ecological statistics of Gestalt laws for the perceptual
organization of contours, Journal of Vision, 2(4), figure 4, doi: 10.1167/2.4.5 © 2002, Association for Research in
Vision and Ophthalmology.
using grating stimuli (Blakemore and Nachmias 1971; Campbell and Kulikowski 1966; Phillips
and Wilson 1984; Snowden 1992) and orientation fields (e.g. Glass patterns, Maloney, Mitchison,
and Barlow 1987; Dakin 1997, 2001; Or and Elder 2011) to be between 7 and 15 deg (half-width
at half-height), and this corresponds fairly well to the physiology (Hawken and Parker 1991;
Ringach 2002).
Beyond issues of scale and contrast is the problem that for natural scenes, not all contours are
created equally. Contours corresponding to object boundaries may in fact be in the minority, lost in
a sea of contours produced by reflectance changes, shading, and shadows. Computationally, colour
and texture information has been found useful in estimating the relative importance of local edges
(e.g. Martin et al. 2004), but the mapping of these mechanisms to visual cortex remains unclear.
Pairwise association
The study of the strength of association between pairs of local elements is rooted in the early work
of Gestalt psychologists (Wertheimer 1938), who identified three central cues that are relevant
here: proximity, good continuation, and similarity (Figure 11.4). We consider each in turn below.
(See also Feldman, this volume.)
Proximity
The principle of proximity states that the strength of grouping between two elements increases
as these elements are brought nearer to each other. But how exactly does grouping strength vary
as a function of their separation? In an early attempt to answer this question, Oyama (1961)
manipulated the horizontal and vertical spacing of dots arranged in a rectangular array, measur-
ing the duration of time subjects perceived the arrays organized as vertical lines vs horizontal lines
(Figure 11.5a). He found that the ratio of durations th/tv could be accurately related to the ratio of
dot spacing dh/dv through a power law: th/tv = (dh/dv)−α, with α ≈ 2.89.
Using an elaboration of this psychophysical technique, Kubovy and colleagues (Kubovy and
Wagemans 1995; Kubovy, Holcombe, and Wagemans 1998) modelled the proximity cue as an
exponential decay, which is consistent with random-walk models of contour formation (Mumford
1992; Williams and Jacobs 1997). However, they also noted that a power law model would fit
their data equally well. Further, they found that the proximity cue was approximately scale invari-
ant: scaling all distances by the same factor did not affect results. Since the power law is the only
212 Elder
(a) (b)
Proximity: Contour likelihood distribution
101 Data
Power law model
100 Simulated noisy
power law
10–1
p(Gap)
10–2
dv 10–3
10–4
dh –3 –2 –1 0 1 2 3 4 5
log(Gap)
Fig. 11.5 (a) Psychophysical stimulus used to measure the proximity cue (Oyama 1961). See text for
details. (b) Ecological statistics of the proximity cue for contour grouping. The data follow a power law
for distances greater than 2 image pixels. For smaller distances, measurement noise dominates.
Adapted from James H. Elder and Richard M. Goldberg, Ecological statistics of Gestalt laws for the perceptual
organization of contours, Journal of Vision, 2(4), figure 7a, doi: 10.1167/2.4.5 © 2002, Association for Research
in Vision and Ophthalmology.
perfectly scale-invariant distribution, this last result adds strength to the power-law model of
proximity.
Perceptual scale invariance is rational if in fact the proximity of elements along real contours in
natural images is scale invariant, i.e. if the ecological distribution follows a power law. In support
of this idea, Sigman et al. (2001) reported that the spatial correlation in the response of collinearly
oriented filters to natural images does indeed follow a power law, suggesting a correspondence
between perception and the ecological statistics of the proximity cue. Quantitatively, however, the
correspondence is poor: while Oyama estimated the perceptual exponent to be α ≈ 2.89, Sigman
et al. estimated an ecological exponent of only 0.6, reflective of a much weaker cue to grouping.
This discrepancy can be accounted for if we consider that Sigman et al. did not restrict their
measurements to pairs of neighbouring elements on the same contour of the image. In fact, the
measurements were not constrained to be on the same contour, or even on a contour at all. Thus
the estimate mixes measurements made between strongly related and only weakly related image
features. This mixing of measurements on, off, and between contours can be expected to weaken
estimates of the conditional statistical distributions that generate natural images.
Elder and Goldberg (2002) estimated these distributions more directly, using human subjects
to label the sequence of elements forming the contours of natural images, with the aid of an inter-
active image editing tool (Elder and Goldberg 2001). This technique allowed the measurements
to be restricted to successive elements along the same contour, and yielded a clear power law
(Figure 11.5b) with exponent α = 2.92, very close to the perceptual estimate of Oyama.
In summary, the convergence between psychophysics and ecological statistics is compelling.
Ecologically, proximity follows a power law and exhibits scale invariance, and these properties are
mirrored by the psychophysical results. Thus we have a strong indication that the human percep-
tual system for grouping contours is optimally tuned for the ecological statistics of the proximity
cue in natural scenes.
Bridging the Dimensional Gap 213
Good continuation
The principal of good continuation refers to the tendency for elements to be grouped to form
smooth contours (Wertheimer 1938). A very nice method for studying the principal of good con-
tinuation in isolation was developed by Field, Hayes, and Hess (1993) (see also Hess et al, this
volume). In this method, a contour formed from localized oriented elements is embedded in a
random field of distractor elements, in such a way that the cue of proximity is roughly eliminated.
Aligning the contour elements to be tangent to the contour makes the contour easily detected,
whereas randomizing the orientation of the elements renders the contour invisible. This clearly
demonstrates the role of good continuation in isolation from other cues.
These findings led Field et al to suggest the notion of an ‘association field’ that determines the
linking of oriented elements within a local visual neighbourhood (Figure 11.6), a construct that is
closely related to the machinery of cocircularity support neighbourhoods, developed somewhat
earlier for the purpose of contour refinement in computer vision (Parent and Zucker 1989).
Ecological data on good continuation have also begun to emerge. Kruger (1998) and later
Sigman et al. (2001) found evidence for colinearity, cocircularity and parallelism in the statistics
of natural images. Geisler et al. (2001) found similar results using both labelled and unlabelled
natural image data. Crucially, Geisler et al. also conducted a companion psychophysics experi-
ment that revealed a fairly close correspondence between the tuning of human perception to the
good continuation cue, and the statistics of this cue in natural images.
To be optimal the decision to group two elements should be based on the likelihood ratio, in
this case, the ratio of the probability that two elements from the same contour would generate
the observed geometric configuration, to the probability that a random pair of elements would
generate this configuration. To compute this ratio, Geisler et al. treated contours as unordered
sets of oriented elements, measuring the statistics for pairs of contour elements on a common
object boundary, regardless of whether these element pairs were close together or far apart on the
object contour. In contrast, Elder and Goldberg (2002) modelled contours as ordered sequences
of oriented elements, restricting measurements to adjacent pairs of oriented elements along the
contours. Figure 11.7 shows maps of the likelihood ratios determined using the two methods.
Note that the likelihood ratios are much larger for the sequential statistics, reflecting a stronger
statistical association between neighbouring contour elements.
(a) (b)
Fig. 11.6 Models of good continuation. (a) Cocircularity support neighbourhood. (b) Association field.
(a) © 1998 IEEE. Adapted, with permission, from Parent, P.; Zucker, S.W., Trace inference, curvature
consistency, and curve detection, IEEE Transactions on Pattern Analysis and Machine Intelligence. (b)
Adapted from Vision Research, 33(2), David J. Field, Anthony Hayes, and Robert F. Hess, Contour
integration by the human visual system: Evidence for a local “association field”, pp. 173–93, Copyright
(1993), with permission from Elsevier.
214 Elder
10 Likelihood ratio
1000000
1 4.9
3.6
0.1 100000
2.6
0.01 1.9
10000
<1.4
Gap (pixels)
φ = 0°
0 1000
<1.4
1.9 100
2.6
3.6
10
4.9
1
d = 1.23° <1.4 0 <1.4 1.9 2.6 3.6 4.9 6.7 9.2 13 17 >23
Gap (pixels)
Fig. 11.7 Association fields derived from the ecological statistics of contours. (a) Likelihood ratio for two
oriented elements to be on the same object boundary, adapted from Geisler et al. (2001). (b) Likelihood
ratio for two oriented elements to be neighbouring elements on the same object boundary.
Adapted from James H. Elder and Richard M. Goldberg, Ecological statistics of Gestalt laws for the perceptual
organization of contours, Journal of Vision, 2(4), figure 18 b and e, doi: 10.1167/2.4.5 © 2002, Association for
Research in Vision and Ophthalmology.
When defined over pairs of oriented elements, there are various ways to encode the principle
of good continuation. A straight-line interpolation between the elements, either between their
centres or their end-points, induces two interpolation angles (Figure 11.4): small values for these
angles indicate good continuation. However, Elder and Goldberg (2002) observed that these
angles are highly correlated for contours in natural scenes (Figure 11.8a), suggesting a recoding
into the difference and sum of these angles, which are approximately uncorrelated and represent
the cues of cocircularity and parallelism, respectively (Figure 11.8b). Kellman and Shipley (1991)
have used the term ‘relatability’ to refer to a particular constraint on these two angles found to be
predictive of contour completion phenomena.
Similarity
In the context of contour grouping, the principle of similarity suggests that elements with similar
photometric properties—brightness, contrast, colour, texture—are more likely to group than ele-
ments that differ on these dimensions. Psychophysically, the principle has been demonstrated in
a number of ways with dot patterns. Hochberg and Hardy (1960) showed that proximity ratios of
up to two can be overcome by intensity similarity cues, and contrast similarity is known to affect
the perception of Glass patterns (Earle 1999).
Elder and Goldberg (2002) explored the ecological statistics of similarity in edge grouping,
coding similarity in terms of the difference in brightness (α1 + β1) − (α2 + β2) and the difference in
contrast (α1 − β1) − (α2 − β2) between the edges (see Figure 11.4). They found that while the bright-
ness cue carries useful information for grouping, the contrast cue is relatively weak.
The edges shown in Figure 11.4 are consistent in contrast polarity: light matches light and dark
matches dark. However, it has been argued that grouping mechanisms should be insensitive to
contrast polarity (Grossberg and Mingolla 1985; Kellman and Shipley 1991), since polarity can
easily reverse along an object boundary due to variations in the background. On the other hand,
Bridging the Dimensional Gap 215
300
150
100 200
0 0
–50 –100
–100 –200
–150 –300
–150 –100 –50 0 50 100 150 –300 –200 –100 0 100 200 300
θij (deg) Parallelism cue: θji+θij (deg)
Fig. 11.8 (a) The two angles formed when interpolating between two oriented elements are negatively
correlated. (b) Linear recoding into parallelism and cocircularity cues results in a more independent code.
Adapted from James H. Elder and Richard M. Goldberg, Ecological statistics of Gestalt laws for the perceptual
organization of contours, Journal of Vision, 2(4), figure 8 a and b, doi: 10.1167/2.4.5 © 2002, Association for
Research in Vision and Ophthalmology.
while Elder and Goldberg (2002) restricted their statistical study to pairs of elements of the same
contrast polarity, they observed that fewer than 13% of the associations in their original ground
truth dataset involved a reversal in contrast polarity. This suggests that contrast polarity could in
fact be an important cue for contour grouping. Is there behavioural evidence that humans take
advantage of this cue?
Although the psychophysical record is a bit complex, the simple answer to this question is
yes. For example, contrast reversals are known to essentially eliminate the perception of Glass
patterns (Glass and Switkes 1976), consistent with the use of polarity to disambiguate grouping.
Similarly, Elder and Zucker (1993) found that contrast reversal eliminated the benefit of bound-
ary grouping cues in fragmented contour stimuli, and Field, Hayes, and Hess (2000) found that
contrast reversals reduced the detectability of contours embedded in random-oriented element
distractors. Further, while Rensink and Enns (1995) found that polarity reversal did not appear
to weaken the contour grouping required to elicit the Muller-Lyer illusion, Chan and Hayward
(2009) found that careful control of junction effects does reveal a sensitivity to contrast polarity
in this illusion.
On the other hand, Gilchrist et al. (1997) found that the effect of contrast on pairwise ele-
ment grouping depends on the shape of the elements, and, using modified forms of the Elder
and Zucker stimuli, Spehar (2002) found that the effect of contrast reversal was greatly reduced
if the reversal does not coincide with an orientation discontinuity. Together, these results suggest
an interesting perceptual interaction between geometric relationships such as good continuation
and similarity cues.
While these behavioural results all involve simple synthetic stimuli, Geisler and Perry (2009)
have more recently reported a joint study of the ecological statistics of contours with a compan-
ion psychophysical investigation modelled on these statistics. This study not only confirmed and
quantified the contrast polarity cue for natural scenes, but showed that humans do in fact take
advantage of this cue, in a way that is consistent with the underlying statistics.
216 Elder
Cue combination
One of the central questions in perceptual organization concerns how the brain combines mul-
tiple cues to determine the association between pairs of local elements. Historically this problem
has often been posed in terms of competitive interactions. In natural scenes, however, disparate
weak cues can often combine synergistically to yield strong evidence for a particular grouping.
It is perhaps this aspect of perceptual organization research that has benefited the most from the
modern probabilistic approach (see also both chapters by Feldman, this volume).
Geisler et al. (2001) used a non-parametric statistical approach, jointly modelling the ecological
statistics of proximity and good continuation cues as a 3D histogram. They showed that human
psychophysical performance on a contour detection task parallels these statistics, suggesting that
the brain combines these two classical Gestalt cues in a near-optimal way. Elder and Goldberg
(2002) demonstrated that the ecological statistics of proximity, good continuation, and similarity
cues can be coded in such a way as to be roughly uncorrelated, so that to a first approximation
the Gestalt laws can be factored: the likelihood of a particular grouping can be computed as the
product of the likelihoods for each individual grouping cue.
Elder and Goldberg’s approach also allowed quantification of the statistical power of each
Gestalt cue, which they quantified as the reduction in the entropy of the grouping decision deriv-
ing from observation of the cue. They found that the cue of proximity was by far the most power-
ful, reducing the entropy by roughly 75%, whereas good continuation and similarity cues, while
important, reduced entropy by roughly 10% each. They further demonstrated that the most accu-
rate grouping decisions are made by combining all of these cues optimally according to the proba-
bilistic model, trained on the ecological statistics of natural images.
Fig. 11.9 Common topological errors resulting from feed-forward grouping algorithms. (a) Bifurcations
that can result from a transitivity rule. (b–c) Self-intersections that can also be produced by shortest-path
algorithms. The intersections in (b) have non-unit rotation indices and can thus be weeded out easily;
however the contour in (c) has the correct rotation index and therefore is more difficult to detect.
(a) Reprinted from Vision Research, 41(6), W.S. Geisler, J.S. Perry, B.J. Super, and D.P. Gallogly, Edge co-occurrence
in natural images predicts contour grouping performance, pp. 711–24, Copyright (2001), with permission from
Elsevier. Adapted from James H. Elder and Stephen W. Zucker, ‘Computer Contour Closure’. In Bernard Buxton
and Roberto Cipolla (eds), Proceedings of the 4th European Conference on Computer Vision, pp. 399–412,
DOI: 10.1007/BFb0015553 Copyright © 1996, Springer-Verlag. With kind permission from Springer Science and
Business Media.
Bridging the Dimensional Gap 217
Geisler et al. (2001), which do not discriminate the sequencing of elements along the contour.
However, as a consequence, this transitivity principle does not discriminate between simple (i.e.
non-intersecting) curves and more complex topologies, including contours with bifurcations
and intersections (Figure 11.9), and generally yields ‘textures’ of oriented elements as opposed
to bounding contours. For this reason, we will focus here on a common probabilistic approach,
which is to model contours as first-order Markov chains.
Fig. 11.10 Contour grouping algorithms. Right column: single scale. Left three columns: multi-scale,
with coarse-to-fine feedback.
© 2006 IEEE. Reprinted, with permission, from Estrada, F.J., Elder, J.H., Multi-Scale Contour Extraction Based on
Natural Image Statistics, IEEE Conference on Computer Vision and Pattern Recognition Workshop.
Fig. 11.11 Using the first-order Markov model with a strong prior for skin hue.
© 2006 IEEE. Reprinted, with permission, from Johnston, L., & Elder, J. H., Efficient Computation of Closed
Contours using Modified Baum-Welch Updating. IEEE Workshop on Perceptual Organization in Computer Vision.
220 Elder
Closure
The classical Gestalt demonstration shown in Figure 11.12 is often taken to demonstrate a princi-
ple of closure overcoming the principle of proximity to determine the perceptual organization of
contours (Koffka 1935). Note, however, that the percept here can potentially be explained as the
result of a principle of good continuation, without requiring the invention of a separate factor of
closure. This close relationship between good continuation and closure has continued to confound
in more recent work. Using the methodology of Field et al. (1993), Kovacs and Julesz (1993) found
superior detection performance for closed, roughly circular contours, compared to open curvilin-
ear controls. However, the good continuation cues between the open and closed stimuli were not
perfectly equated in these experiments. For example, the open controls contained many inflections
in curvature, whereas the closed contours were nearly circular. These differences are important, as
it has been shown that changes in curvature sign can greatly reduce the detectability of contours
(Pettet 1999).
Tversky, Geisler, and Perry (2004) addressed this question directly, using the Field et al. (1993)
methodology to compare detection for circular contours and S-shaped contours matching the
circular contours exactly in curvature, save for a single inflection point. They found a small advan-
tage for closed contours, but argued that this advantage could potentially be due to probabil-
ity summation over smaller groups of elements. Thus, despite its long history in the perceptual
organization literature, recent findings suggest that closure may play at most a minor role in the
detection of contours.
Does this mean that the Gestaltists were wrong? Not necessarily. Koffka’s observations were not
that closure is a grouping cue per se, but rather that closure somehow profoundly determines the
final percept of form:
Ordinary lines, whether straight or curved, appear as lines and not as areas. They have shape, but they
lack the difference between an inside and an outside . . . If a line forms a closed, or almost closed, figure,
we see no longer merely a line on a homogeneous background, but a surface figure bounded by the line.
(Koffka 1935, p. 150)
The Gestaltists thus believed that closure, above and beyond the cue of good continuation,
determines the percept of solid form. In this spirit, Elder and Zucker (1993, 1994, 1998a)
argued for closure as a perceptual bridge from 1D contour to 2D shape, i.e. as a perceptual
form of the Jordan Curve Theorem (see ‘Introduction’). They investigated this idea through
a series of 2D shape discrimination experiments in which they manipulated the degree of
Fig. 11.12 The role of closure in perceptual organization. One perceives four large rectangles even
though this requires grouping together more distant pairs of contour fragments.
Reproduced from Kurt Koffka, Principles of Gestalt Psychology, Harcourt, Brace, and World, New York, Copyright
© 1935, Harcourt, Brace, and World.
Bridging the Dimensional Gap 221
closure, but held the shape information constant. They showed that small changes in good
continuation and closure could yield large changes in shape discriminability (Figures 11.13a–
b). Moreover, the task seems to remain fairly difficult when good continuation is restored
without closure (Figure 11.13c), suggesting that the property of closure contributes something
above and beyond good continuation cues. In support of this, Garrigan (2012) has recently
shown that contour shape is more effectively encoded in memory when the contour is closed
than when it is open.
Some models for global contour extraction based on the first-order Markov assumption incor-
porate closure by explicitly searching for closed cycles of local elements (Elder and Zucker 1996;
Elder et al. 2003), but these first-order Markov models still suffer from the problems discussed
above. Moreover, the statistical structure of a cycle is profoundly different from that of a Markov
chain, as closure induces more global statistical dependencies between local elements. In this
sense there is a mismatch between the first-order Markov model used by these methods and the
goal of recovering closed contours. Future work will hopefully reveal more principled ways to
incorporate closure into models of global contour extraction: in ‘Generative Models of Shape’ we
discuss one promising direction.
Convexity
Convexity has long been known as a figure/ground cue (Rubin 1927) (see also the chapters by
Peterson, by Fowlkes and Malik, and by Kogo and van Ee in this volume). In the computer vision
literature, Jacobs (1996) demonstrated its utility for grouping contour fragments that can then be
used as features for object recognition, and Liu, Jacobs, and Basri (1999) subsequently developed
a novel psychophysical method to demonstrate that the human visual system also uses a convex-
ity cue for grouping contours. Their method relies on the finding of Mitchison and Westheimer
(1984) that judging the relative stereoscopic depth of two contour fragments becomes more dif-
ficult when the fragments are arranged to form a configuration with good continuation and
closure. Using an elaboration of this method, they showed that stereoscopic thresholds are sub-
stantially higher for occluded contour fragments that can be completed to form a convex shape,
relative to fragments whose completion induces one or more concavities. This suggests that the
visual system is using convexity as a grouping cue. A more recent computer vision algorithm
Fig. 11.13 Closure as a bridge from 1D to 2D shape. (a) Shape discrimination is easy when good
continuation and closure are strong. (b) Discrimination becomes hard when good continuation and
closure are weak. (c) Discrimination is of intermediate difficulty when good continuation is strong but
closure is weak.
Reprinted from Vision Research, 33 (7), James Elder and Steven Zucker, The effect of contour closure on the rapid
discrimination of two-dimensional shapes, pp. 981–91, Copyright © 1993, with permission from Elsevier.
222 Elder
that uses convexity as a soft cue, allowing contours that are highly but not perfectly convex, has
been show to outperform Jacob’s original algorithm on a standard dataset (Corcoran, Mooney,
and Tilton 2011).
Feedback
We have seen the importance of both local cues and global cues in the perceptual organization
of contours. How could these most effectively be brought together, given what is known of the
functional architecture of primate visual cortex?
In contrast to V1, many neurons in extrastriate visual area V2 of macaque are selective for
both real and illusory contours (von der Heydt, Peterhans, and Baumgartner 1984; see also van
Lier and Gerbino, and Kogo and van Ee, this volume). Illusory contours are the result of modal
completion processes (see ‘Introduction’) that generate percepts of contours in the absence of
local contrast, by extrapolating from nearby, geometrically aligned inducers—see Figure 11.15
(bottom right) for an example. Illusory contours are thus a direct manifestation of contour
grouping processes, in this case the result of grouping together contour fragments on spatially
separated inducers. The selectivity of neurons in V2 for illusory contours suggests that the
transformation of the visual input from V1 to V2 involves the grouping of contour fragments
based upon Gestalt principles of proximity and good continuation. This computation may be
supported by long-range horizontal connections that, at least in areas 17 and 18 of cat, are
known to run between cortical columns with similar orientation specificity (Gilbert and Wiesel
1989), although input from later visual areas may be equally or even more important in this
computation.
Indeed, while physiological models for contour integration based upon good continuation
principles have been based primarily upon these cortical networks in area V1 and V2 (Li 1998;
Yen and Finkel 1998), fMRI data in both human and macaque implicate not only V1 and V2 but
other extrastriate visual areas (VP, V4, LOC) in contour grouping. Although sketches of a more
Bridging the Dimensional Gap 223
complete physiological model for contour grouping have begun to emerge (e.g. Roelfsema 2006),
the overall computational architecture is still largely unknown.
One possibility is that the computation is feedforward. For example, progressively more global
and selective representations may be computed in V1, V2, V4, culminating in a neurally local-
ized representation of entire objects in TE/TEO (Thorpe 2002; see also Joo et al, this volume).
However, the functional architecture of visual cortex suggests that recurrent feedback might also
be involved. Figure 11.14(b) shows the known connectivity of visual areas in the object pathway
of primate brain. In addition to the feedforward sequence V 1 → V 2 → V 4 → TE/TEO emphasized
in prior work (Thorpe 2002), there are feedback connections from each of the later areas to each
of the earlier areas, as well as additional feedforward connections. How can we determine empiri-
cally if these feedback connections play a role in the perceptual organization of contours into
representations of global shape?
Timing
One way to test the plausibility of computational architectures for perceptual organization is to
examine the timing of stimulus-driven perceptual and neural events relative to the stimulus onset
and to each other. Here I will review a range of results using varied methodological paradigms
that together suggest a strong role for feedback in the perceptual organization of contours.
Animal detection
Some models of contour formation have been based upon recurrent interactions within and
between areas V1 and V2 (e.g. Neumann and Sepp 1999; Gintautas et al. 2011). However, psycho-
physical results on the animal detection task (Figure 11.2) show that humans can perform above
chance using contour shape alone for stimulus presentations as short as 10 msec, even with strong
(b)
V1 V2
PG
d
d
V3 V1
V4
TEO TE
TE
VTF
(a) TG
Feedback
Generative
model 36,35
STP
7a
TF
TE
Fig. 11.14 Feedback in the human object pathway. (a) Feedback of global shape hypotheses may be
used to condition grouping in earlier visual areas. (b) Connectivity in primate object pathway. Solid
arrowheads indicate feedforward connections, open arrowheads indicate feedback connections.
From Leslie G. Ungerleider, Functional Brain Imaging Studies of Cortical Mechanisms for Memory, Science 270
(5237), pp. 769–775, Copyright © 1995, The American Association for the Advancement of Science. Reprinted
with permission from AAAS.
224 Elder
backward masking (Elder and Velisavljević 2009). While inferring underlying mechanisms from
these results is complicated by the unknown degree of temporal blurring in the cortical network,
roughly speaking this result suggests that at least on some trials, recurrencies involving delays
much greater than 10 msec may not be involved, and this constrains the class of computations
that might underlie performance on these specific trials. For example, Gintautas et al. (2011) have
modelled contour detection based upon a lateral connection network in V1, estimating that each
iteration of the network should take on the order of 37.5 msec. This appears to be too long to
explain the most rapid trials in the animal detection task.
On the other hand, Elder and Velisavljević (2009) also found that performance on the animal
task improves continuously up to at least 120-msec stimulus duration, leaving open the pos-
sibility of recurrence for harder trials. Similarly, in animal detection experiments measuring
reaction time (e.g. Thorpe, Fize, and Marlot 1996), most attention has focused on the fastest
trials, where evoked potentials correlated with the stimulus emerge as soon as 150 msec after
stimulus onset, leaving little time for recurrence or feedback. Average reaction times, however,
are much longer, closer to 500 msec, and the distribution has a long positive tail with many
reaction times greater than 600 msec, leaving ample time for recurrence and/or feedback for
most trials. Further, more recent evidence suggests that visual signals may arrive in higher
areas much faster than previously thought (Foxe and Simpson 2002), allowing sufficient time
for feedback even on the faster trials (see also Self and Roelfsema, this volume, on the limits of
feed-forward processing).
Border ownership
Physiologically, it is known that selective response to higher-order contour properties depend-
ent upon contour grouping emerges later in time. For example, in V2, while edge signals emerge
within 30 msec of stimulus onset and peak roughly 100 msec post-stimulus, border-ownership
signals emerge roughly 80 msec after stimulus onset, peaking 130–180 msec post-stimulus.
Importantly, this delay does not appear to depend upon the spatial extent of the contour, arguing
against lateral recurrence and suggesting instead a role for feedback from higher visual areas with
a round-trip time delay of 30–80 msec (Craft et al. 2007; see also Kogo and van Ee, this volume).
V1/V2 LO
* *
* *
100 100
*
95 95
Correct responses (%)
85 85
80 80
None 100– 160– 240– None 100– 160– 240–
122 182 262 122 182 262
TMS time window (ms) TMS time window (ms)
V1/V2 LO
Fig. 11.15 Evidence for the role of feedback in bridging the dimensional gap. TMS was found to disrupt
illusory contour shape judgments later when applied to V1/V2 than when applied to LO – see text for
details.
Reproduced from Martijn E. Wokke, Annelinde R.E. Vandenbroucke, H. Steven Scholte, Victor A.F. Lamme,
Psychological Science, Confuse Your Illusion: Feedback to Early Visual Cortex Contributes to Perceptual
Completion, 24 (9), pp. 63–71, © 2013, SAGE Publications. Reprinted by Permission of SAGE Publications.
TMS was found to disrupt performance at both locations, but interestingly, the effect depended
critically on the timing. In LO, TMS disrupted processing when the pulse occurred 100–122
msec after stimulus onset, whereas in V1/V2, processing was disrupted when the pulse was
applied later, 160–182 msec after stimulus onset. This is strongly suggestive of a feedback pro-
cess in the grouping of inducer contour fragments to form shape percepts, with a one-way
feedback time constant (LO to V1/V2) of 40–80 msec.
226 Elder
In summary, numerous behavioural and physiological results suggest a role for feedback in
bridging the gap from contour to shape. One purpose of this feedback might be to allow global
features computed and available first in higher visual areas to condition the local associations
computed in V1/V2. In order to further develop this idea, a more formal computational theory
is called for.
Computational models
Using local Gestalt cues alone to drive shortest-path or approximate search algorithms based on
the first-order Markov assumption fails in the general case. However, Estrada and Elder (2006)
have demonstrated that a relatively simple elaboration of the approximate search scheme can sub-
stantially improve performance. The idea is to place the Markov model within a coarse-to-fine
scale-space framework (Figure 11.10—left three columns). Specifically, the image is represented
at multiple scales (i.e. levels of resolution) by progressive smoothing with a Gaussian filter, and
breadth-first search is first initiated at the coarsest scale. Since the number of features at this scale
is greatly reduced, the search space is much smaller and the algorithm generally finds good, coarse
blob hypotheses that code the rough location and shape of the salient objects in the scene. These
hypotheses are then fed back to the next finer level of resolution, where they serve as probabil-
istic priors, conditioning the likelihoods and effectively shrinking the search space to promising
regions of the image.
This is a very specific kind of feedback model that does not incorporate any sophisticated
global features or probabilistic model over shapes, and is not really recurrent, but it does dem-
onstrate the potential performance advantages of feedback. A number of more general models
for incorporating feedback into perceptual organization have been advanced (Grossberg 1976;
Cavanagh 1991; Hochstein and Ahissar 2002; Lee and Mumford 2003; Tu et al. 2005; Yuille and
Kersten 2006; also Self and Roelfsema, and van Leeuwen, this volume). Figure 11.14a sketches a
conceptual model that is broadly consistent with these prior ideas. For concreteness, let us sup-
pose that earlier areas (e.g. V1, V2) in the visual pathway compute and encode specific partial
grouping hypotheses corresponding to fragments of contours. These fragment hypotheses are
communicated to higher-order areas (e.g. V4 or TEO), which use them and more global princi-
ples to generate complete hypotheses of object shape. These global hypotheses are then fed back
to earlier visual areas to sharpen selectivity for other fragments that might support these global
hypotheses.
Neurons in higher areas of the object pathway in primate visual cortex encode shape informa-
tion using a more global representation than neurons in early visual areas (Pasupathy and Connor
1999; Connor, Brincat, and Pasupathy 2007; see also van Leeuwen, this volume). In order to feed
back useful information, the brain must be able to convert this global representation to the more
local, spatiotopic representation native to these earlier areas. Because there will always be uncer-
tainty about the shapes being represented (due to grouping ambiguity, for example), this mapping
is probabilistic. A probabilistic model capable of randomly generating observed data consistent
with an internal representation is known as a generative model. One of the great strengths of a
generative model of shape is its capacity to produce probable global shape hypotheses given even
partial shape information, thus contributing to the grouping process. In the final part of this chap-
ter we consider what form such a generative model might take.
1988). A key problem in establishing a generative model of shape is to guarantee that gener-
ated shape hypotheses have valid topology. For example, if the goal is to recover a simple closed
contour, the model should only generate simple, closed curve hypotheses. While this has been a
major limitation of prior contour-based models (e.g. Dubinskiy and Zhu 2003), a recently pro-
posed alternative approach based on spatial perturbations of perceptual space called formlets can
provide this guarantee (Grenander, Srivastava, and Saini 2007; Oleskiw, Elder, and Peyré 2010;
Elder et al. 2013).
The formlet approach involves the application of coordinate transformations of the planar space
in which a shape is embedded. This idea can be traced back at least to D’Arcy Thompson, who
considered specific classes of global coordinate transformations to model the relationship between
the shapes of different animal species (Thompson 1917). Coordinate transformation methods for
representing shape have been explored more recently in the field of computer vision (e.g. Jain,
Zhong, and Lakshmanan 1996; Sharon and Mumford 2006) and for developmental studies of
human shape selectivity and categorization (Ons and Wagemans 2011, 2012), but these methods
do not in general preserve the topology of embedded contours.
Formlets are based on the key insight that, while general smooth coordinate transformations
of the plane will not preserve the topology of an embedded curve, it is straightforward to design
a specific family of diffeomorphic transformations (i.e. smooth 1:1 mappings) that will. It then
follows immediately by induction that a generative model based upon arbitrary sequences of dif-
feomorphisms will preserve topology.
Specifically, a formlet is defined to be a simple, isotropic, radial deformation of planar space that
is localized within a specified circular region of a selected point in the plane. The formlet family
comprises formlets over all locations and spatial scales. While the gain of the deformation is also
a free parameter, it is constrained to satisfy a simple criterion that guarantees that the formlet is
a diffeomorphism. Since topological changes in an embedded figure can only occur if the defor-
mation mapping is either discontinuous or non-injective, these diffeomorphic deformations are
guaranteed to preserve the topology of embedded figures. Figure 11.16 shows some examples.
Evaluation
One way to evaluate and compare generative shape models is to take advantage of their ability to
generate complete shape hypotheses given only partial data. Specifically, one can use the models
Fig. 11.16 Shapes generated by random formlet composition over the unit circle. Top row: shapes
resulting from a sequence of five random formlets. The red dot and circle indicate formlet location and
scale, respectively. Bottom row: e\xample shapes produced from the composition of many random
formlets.
© 2010, IEEE. Adapted with permission, from T.D. Oleskiw, J.H Elder, and G. Peyré, On growth and formlets:
Sparse multi-scale coding of planar shape, IEEE Conference on Computer Vision and Pattern Recognition.
228 Elder
to address the problem of contour completion (Figure 11.3), using an animal shape dataset, based
on the conceptual model illustrated in Figure 11.14.
Elder et al. (2013) used this method to compare the formlet model with a contour-based shape-
let model (Dubinskiy and Zhu 2003) that is not guaranteed to preserve topology. For each shape
in the dataset, they simulated the occlusion of a single random section of the contour, and used
each model and a variation of matching pursuit (Mallat and Zhang 1993) to approximate the
animal shapes, allowing the models to see only the visible portions of the shapes. (Note that these
models could in principle handle more than one occlusion.) They then measured the residual
error between the model and target for both the visible and occluded portions of the shapes, as a
function of the number of model basis functions (shapelets or formlets) employed. Performance
on the occluded portions, where the model is under-constrained by the data, reveals how well the
structure of the model captures properties of natural shapes.
Figure 11.17 shows an example result for this experiment. While shapelet pursuit intro-
duces topological errors in both visible and occluded regions, formlet pursuit remains topo-
logically valid, as predicted. Figure 11.18 shows quantitative results on a database of animal
shapes. While the shapelet and formlet models achieve comparable error on the visible por-
tions of the boundaries, on the occluded portions the error is substantially lower for the
formlet representation. This suggests that the structure of the formlet model better captures
regularities in the shapes of natural objects.
Fig. 11.17 Example of 30% occlusion pursuit with shapelets (red) and formlets (blue) for k = 0, 2, 4, 8,
16, 32 basis functions. Solid lines indicate visible contour, dashed lines indicate occluded contour.
Reprinted from Image and Vision Computing, 31(1), James H. Elder, Timothy D. Oleskiw, Alex Yakubovich, and
Gabriel Peyré, On growth and formlets: Sparse multi-scale coding of planar shape, pp. 1–13, Copyright © 2013,
with permission from Elsevier.
Bridging the Dimensional Gap 229
0.05 0.04
Normalized RMS error
0.01 0.01
0 0
0 10 20 30 0 10 20 30
Number of components Number of components
Fig. 11.18 Results of occlusion pursuit evaluation. The formlet model is substantially more accurate than the
shapelet model on the occluded portions of the shapes. Black denotes error for the initial affine-fit ellipse.
Reprinted from Image and Vision Computing, 31(1), James H. Elder, Timothy D. Oleskiw, Alex Yakubovich, and
Gabriel Peyré, On growth and formlets: Sparse multi-scale coding of planar shape, pp. 1–13, Copyright © 2013,
with permission from Elsevier.
and reliable global contour extraction in complex natural scenes. This idea is supported by recent
physiological results (Wokke et al. 2013).
While global cues such as closure, convexity, symmetry, and parallelism could potentially be
computed in higher areas of object pathway and combined with local cues using standard cue
combination mechanisms, a more general theory identifies these higher areas with generative
shape representations capable of producing global shape ‘hallucinations’ based on contour frag-
ments computed in early visual cortex. These global shape hypotheses can then be fed back to
early visual areas to refine the segmentation.
The main problem in establishing such a generative model has been topology: prior models do
not guarantee that sampled shapes are simple closed contours. However, a recent novel framework
for shape representation provides this guarantee. The theory (Grenander et al. 2007; Oleskiw et al.
2010; Elder et al. 2013), based upon localized diffeomorphic deformations of the image called
formlets, has its roots in early investigations of biological shape transformation (Thompson 1917).
The formlet representation is seen to yield more accurate shape completion than an alternative
contour-based generative model of shape, which should make it more effective at generating
global shape hypotheses to guide feedforward contour grouping processes.
While the nature of the computations underlying the perceptual organization of con-
tours into representations of shape is becoming clearer, there are still many unknowns. These
include: (1) What are the key statistical properties of shapes not captured by the first-order Markov
model? (2) To what degree is the human visual system tuned to these higher-order properties?
(3) How can a generative model like the formlet model be elaborated accurately to embody these
statistics? (4) How exactly do generated hypotheses condition selectivity in earlier visual areas?
We do not know exactly when these questions will be answered, but it seems certain that the
answer will come from the kind of closely coupled computational, behavioural and physiological
investigation that has led to recent progress in this field.
230 Elder
References
Arbelaez, P., M. Maire, C. Fowlkes, and J. Malik (2011). ‘Contour Detection and Hierarchical Image
Segmentation’. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(5): 898–916.
Arnheim, R. (1967). Art and Visual Perception. Berkeley, CA: University of California Press.
Behrmann, M., R. S. Zemel, and M. C. Mozer (1998). ‘Object-Based Attention and Occlusion Evidence
from Normal Participants and a Computational Model’. Journal of Experimental Psychology Human
Perception and Performance 24: 1011–1036.
Blakemore, C., and J. Nachmias (1971). ‘The Orientation Specificity of Two Visual After-Effects’. Journal of
Physiology 213: 157–174.
Campbell, F., and J. Kulikowski (1966). ‘Orientation Selectivity of the Human Visual System’. Journal of
Physiology 187: 437–445.
Cavanagh, P. (1991). ‘What’s Up in Top-Down Processing?’ In Representations of Vision Trends and Tacit
Assumptions in Vision Research, edited by A. Gorea, pp. 295–304. Cambridge: Cambridge University Press.
Chan, L. K. H. and W. G. Hayward (2009). ‘Sensitivity to Attachments Alignment, and Contrast Polarity
Variation in Local Perceptual Grouping’. Attention, Perception and Psychophysics 71(7): 1534–1552.
Cohen, L. and T. Deschamps (2001). ‘Multiple Contour Finding and Perceptual Grouping as a Set of
Energy Minimizing Paths’. In Energy Minimization Methods in Computer Vision and Pattern Recognition
Lecture Notes in Computer Science 2134, pp. 560–575. Los Alamitos, CA: IEEE.
Connor, C., S. Brincat, and A. Pasupathy (2007). ‘Transformation of Shape Information in the Ventral
Pathway’. Current Opinion in Neurobiology 17: 140–147.
Corcoran, P., P. Mooney, and J. Tilton (2011). ‘Convexity Grouping of Salient Contours’. In Proceedings of
the International Workshop on Graph Based Representations in Pattern Recognition, Vol. 6658 of Lecture
Notes in Computer Science, edited by X. Jiang, M. Ferrer, and A. Torsello, pp. 235–244.
Corthout, E., B. Uttl, V. Walsh, M. Hallett, and A. Cowey (1999). ‘Timing of Activity in Early Visual
Cortex as Revealed by Transcranial Magnetic Stimulation’. NeuroReport 10: 2631–2634.
Craft, E., H. Schutze, E. Niebur, and R. von der Heydt (2007). ‘A Neural Model of Figure-Ground
Organization’. Journal of Neurophysiology 97: 4310–4326.
Dakin, S. (1997). ‘The Detection of Structure in Glass patterns Psychophysics and Computational Models’.
Vision Research 37: 2227–2246.
Dakin, S. (2001). ‘Information Limit on the Spatial Integration of Local Orientation Signals’. Journal of the
Optical Society of America A—Optics, Image Science, and Vision 18: 1016–1026.
Dubinskiy, A. and S. C. Zhu (2003). ‘A Multi-Scale Generative Model for Animate Shapes and Parts’. In
Proceedings of the 9th IEEE International Conference on Computer Vision, Vol. 1, pp. 249–256. Los
Alamitos, CA: IEEE.
Earle, D. C. (1999). ‘Glass Patterns Grouping by Contrast Similarity’. Perception 28(11): 1373–1382.
Elder, J. H. and S. W. Zucker (1993). ‘The Effect of Contour Closure on the Rapid Discrimination of
Two-Dimensional Shapes’. Vision Research 33(7): 981–991.
Elder, J. H. and S. W. Zucker (1994). ‘A Measure of Closure’. Vision Research 34(24): 3361–3370.
Elder, J. H. and S. W. Zucker (1996). ‘Computing Contour Closure’. In Proceedings of the 4th European
Conference on Computer Vision, pp. 399–412. New York. Springer.
Elder, J. H. and S. W. Zucker (1998a). ‘Evidence for Boundary-Specific Grouping’. Vision Research
38(1): 143–152.
Elder, J. H. and S. W. Zucker (1998b). ‘Local Scale Control for Edge Detection and Blur Estimation’. IEEE
Transactions on Pattern Analysis and Machine Intelligence 20(7): 699–716.
Elder, J. H. and R. M. Goldberg (2001). ‘Image Editing in the Contour Domain’. IEEE Transactions on
Pattern Analysis and Machine Intelligence 23(3): 291–296.
Bridging the Dimensional Gap 231
Elder, J. H. and R. M. Goldberg (2002). ‘Ecological Statistics of Gestalt Laws for the Perceptual
Organization of Contours’. Journal of Vision 2(4): 324–353.
Elder, J. H., A. Krupnik, and L. A. Johnston (2003). ‘Contour Grouping with Prior Models’. IEEE
Transactions on Pattern Analysis and Machine Intelligence 25(25): 661–674.
Elder, J. H. and A. J. Sachs (2004). ‘Psychophysical Receptive Fields of Edge Detection Mechanisms’. Vision
Research 44(8): 795–813.
Elder, J. H. and L. Velisavljević (2009). ‘Cue Dynamics Underlying Rapid Detection of Animals in Natural
Scenes’. Journal of Vision 9(7): 1–20.
Elder, J. H., T. D. Oleskiw, A. Yakubovich, and G. Peyré (2013). ‘On Growth and Formlets: Sparse
Multi-Scale Coding of Planar Shape’. Image and Vision Computing 31: 1–13.
Estrada, F. and J. H. Elder (2006). ‘Multi-Scale Contour Extraction Based on Natural Image Statistics’.
In IEEE Conference on Computer Vision and Pattern Recognition Workshop. Washington, DC: IEEE.
Feldman, J. (2007). ‘Formation of Visual “Objects” in the Early Computation of Spatial Relations’. Perception
and Psychophysics 69(5): 816–827.
Field, D., A. Hayes, and R. F. Hess (1993). ‘Contour Integration by the Human Visual System: Evidence for
a Local “Association Field”’. Vision Research 33(2): 173–193.
Field, D., A. Hayes, and R. Hess (2000). ‘The Roles of Polarity and Symmetry in the Perceptual Grouping of
Contour Fragments’. Spatial Vision 13(1): 51–66.
Foxe, J. and G. Simpson (2002). ‘Flow of Activation from V1 to Frontal Cortex in Humans’. Experimental
Brain Research 142: 139–150.
Garrigan, P. (2012). ‘The Effect of Contour Closure on Shape Recognition’. Perception 41: 221–235.
Geisler, W. S., J. S. Perry, B. J. Super, and D. P. Gallogly (2001). ‘Edge Co-Occurence in Natural Images
Predicts Contour Grouping Performance’. Vision Research 41(6): 711–724.
Geisler, W. S. and J. S. Perry (2009). ‘Contour Statistics in Natural Images: Grouping across Occlusions’.
Visual Neuroscience 26(1): 109–121.
Gilbert, C. D and T. N. Wiesel (1989). ‘Columnar Specificity of Intrinsic Horizontal and Corticocortical
Connections in Cat Visual Cortex’. Journal of Neuroscience 9(7): 2432–2443.
Gilchrist, I., G. Humphreys, M. Riddoch, and H. Neumann (1997). ‘Luminance and EDE Information
in Grouping: A Study Using Visual Search’. Journal of Experimental Psychology Human Perception and
Performance 23: 464–480.
Gintautas, V., M. Ham, B. Kunsberg, S. Barr, S. Brumby, C. Rasmussen, J. George, I. Nemenman,
L. Bettencourt, and G. Kenyon (2011). ‘Model Cortical Association Fields Account for the Time Course
and Dependence on Target Complexity of Human Contour Perception’. PLOS Computational Biology
7(10): 1–16.
Glass, L. and E. Switkes (1976). ‘Pattern Recognition in Humans: Correlations which Cannot Be Perceived’.
Perception 5: 67–72.
Grenander, U., A. Srivastava, and S. Saini (2007). ‘A Pattern-Theoretic Characterization of Biological
Growth’. IEEE Transactions on Medical Imaging 26(2): 648–659.
Grossberg, S. (1976). ‘Adaptive Pattern Classification and Universal Recoding: I. Parallel Development and
Coding of Neural Feature Detectors’. Biological Cybernetics 23: 121–134.
Grossberg, S. and E. Mingolla (1985). ‘Neural Dynamics of Form Perception: Boundary Completion,
Illusory Figures, and Neon Color Spreading’. Psychological Review 92: 173–211.
Halgren, E., J. Mendola, C. Chong, and A. Dale (2003). ‘Cortical Activation to Illusory Shapes as Measured
with Magnetoencephalography’. NeuroImage 18: 1001–1009.
Hawken, M. J. and A. J. Parker (1991). ‘Spatial Receptive Field Organization in Monkey V1 and its
Relationship to the Cone Mosaic’. In Computational Models of Visual Processing, edited by M. S. Landy
and J. A. Movshon, chap. 6, pp. 84–93. Cambridge, MA: MIT Press.
232 Elder
von der Heydt, R., E. Peterhans, and G. Baumgartner (1984). ‘Illusory Contours and Cortical Neuron
Responses’. Science 224: 1260–1262.
Hochberg, J. and D. Hardy (1960). ‘Brightness and Proximity Factors in Grouping’. Perceptual and Motor
Skills 10: 22.
Hochstein, S. and M. Ahissar (2002). ‘View from the Top Hierarchies and Reverse Hierarchies in the Visual
System’. Neuron 36(5): 791–804.
Hubel, D. H. and T. N. Wiesel (1968). ‘Receptive Fields and Functional Architecture of Monkey Striate
Cortex’. Journal of Physiology 195: 215–243.
Jacobs, D. (1996). ‘Robust and Efficient Detection of Salient Convex Groups’. IEEE Transactions on Pattern
Analysis and Machine Intelligence 18(1): 23–37.
Jacobs, D. (2003). ‘What Makes Viewpoint-Invariant Properties Perceptually Salient?’ Journal of the Optical
Society of America A 20(7): 1304–1320.
Jain, A., Y. Zhong, and S. Lakshmanan (1996). ‘Object Matching Using Deformable Templates’. IEEE
Transactions on Pattern Analysis and Machine Intelligence 18(3): 267–278.
Jepson, A., W. Richards, and D. Knill (1996). ‘Modal Structure and Reliable Inference’. In Perception as
Bayesian Inference, edited by D. Knill and W. Richards, pp. 63–92. Cambridge: Cambridge University Press.
Johnston, L. and J. H. Elder (2004). ‘Efficient Computation of Closed Contours using Modified
Baum-Welch Updating’. In Proceedings of IEEE Workshop on Perceptual Organization in Computer
Vision, Los Alamitos, CA: IEEE Computer Society Press.
Jordan, C. (1887). Cours d’analyse, Vol. 3. Pris: Gauthier-Villars.
Kanizsa, G. (1979). Organization in Vision. New York: Praeger.
Kellman, P. and T. Shipley (1991). ‘A Theory of Visual Interpolation in Object Perception’. Cognitive
Psychology 23: 142–221.
Koenderink, J. J. (1984). ‘What Does the Occluding Contour Tell us About Solid Shape?’ Perception
13: 321–330.
Koffka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt, Brace and World.
Kovacs, I. and B. Julesz (1993). ‘A Closed Curve Is Much More than an Incomplete One: Effect of
Closure in Figure-Ground Discrimination’. Proceedings of the National Academy of Sciences of the USA
90: 7495–7497.
Kruger, N. (1998). ‘Collinearity and Parallelism are Statistically Significant Second Order Relations of
Complex Cell Responses’. Neural Processing Letters 8: 117–129.
Kubovy, M. and J. Wagemans (1995). ‘Grouping by Proximity and Multistability in Dot
Lattices: A Quantitative Gestalt Theory’. Psychological Science 6(4): 225–234.
Kubovy, M., A. O. Holcombe, and J. Wagemans (1998). ‘On the Lawfulness of Grouping by Proximity’.
Cognitive Psychology 35: 71–98.
Lamme, V. A. and P. R. Roelfsema (2000). ‘The Distinct Modes of Vision Offered by Feedforward and
Recurrent Processing’. Trends in Neuroscience 23(11): 571–579.
Lee, T. and D. Mumford (2003). ‘Hierarchical Bayesian Inference in the Visual Cortex’. Journal of the
Optical Society of America A 20(7): 1434–1448.
Leyton, M. (1988). ‘A Process-Grammar for Shape’. Artifical Intelligence 34: 213–247.
Li, Z. (1998). ‘A Neural Model of Contour Integration in the Primary Visual Cortex’. Neural Computation
10(4): 903–940.
Lindeberg, T. (1998). ‘Edge Detection and Ridge Detection with Automatic Scale Selection’. International
Journal of Computer Vision 30(2): 117–154.
Liu, Z., D. W. Jacobs, and R. Basri (1999). ‘The Role of Convexity in Perceptual Completion’. Vision
Research 39(25): 4244–4257.
Lowe, D. G. (1985). Perceptual Organization and Visual Recognition. Boston: Kluwer.
Bridging the Dimensional Gap 233
Machilsen, B., M. Pauwels, and J. Wagemans (2009). ‘The Role of Vertical Mirror Symmetry in Visual
Shape Detection’. Journal of Vision 9(12).
Mahamud, S., K. K. Thornber, and L. R. Williams (1999). ‘Segmentation of Salient Closed Contours
from Real Images’. In IEEE International Conference on Computer Vision, pp. 891–897. Los Alamitos,
CA: IEEE Computer Society.
Mallat, S. and Z. Zhang (1993). ‘Matching Pursuits with Time-Frequency Dictionaries’. In IEEE
Transactions on Signal Processing 41(12): 3397–3415.
Maloney, R., G. Mitchison, and H. Barlow (1987). ‘Limit to the Detection of Glass Patterns in the Presence
of Noise’. Journal of the Optical Society of America A—Optics and Image Science 4: 2236–2341.
Martin, D., C. Fowlkes, and J. Malik (2004). ‘Learning to Detect Natural Image Boundaries Using Local
Brightness, Color and Texture Cues’. IEEE Transactions on Pattern Analysis and Machine Intelligence
26(5): 530–549.
Mitchison, G. J. and G. Westheimer (1984). ‘The Perception of Depth in Simple Figures’. Vision Research
24(9): 1063–1073.
Mohan, R. and R. Nevatia (1992). ‘Perceptual Organization for Scene Segmentation and Description’. IEEE
Transactions on Pattern Analysis and Machine Intelligence 14(6): 616–635.
Mortensen, E. N. and W. A. Barrett (1995). ‘Intelligent Scissors for Image Composition’. In SIGGRAPH’95
Proceedings of the 22nd Annual Conference on Computer Graphics and Interactive Techniques, pp. 191–
198. Los Angeles, CA: SIGGRAPH.
Mortensen, E. N. and W. A. Barrett (1998). ‘Interactive Segmentation with Intelligent Scissors’. Graphical
Models and Image Processing 60(5): 349–384.
Mumford, D. (1992). ‘Elastica and Computer Vision’. In Algebraic Geometry and Applications, edited by
C. Bajaj. Heidelberg: Springer.
Murray, R. F., P. Bennett, and A. Sekuler (2002). ‘Optimal Methods for Calculating Classification
Images: Weighted Sums’. Journal of Vision 2: 79–104.
Neumann, H. and W. Sepp (1999). ‘Recurrent V1–V2 Interaction in Early Visual Boundary Processing’.
Biological Cybernetics 81(5–6): 425–444.
Oleskiw, T., J. Elder, and G. Peyré (2010). ‘On Growth and Formlets’. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA: IEEE Computer Society.
Ons, B. and J. Wagemans (2011). ‘Development of Differential Sensitivity for Shape Changes Resulting
from Linear and Nonlinear Planar Transformations’. i-Perception 2: 121–136. Doi: 10.1068/i0407.
Ons, B. and J. Wagemans (2012). ‘A Developmental Difference in Shape Processing and Word–Shape
Associations between 4 and 6.5 Year Olds’. i-Perception 3: 481–494. Doi: 10.1068/i0481.
Or, C. and J. Elder (2011). ‘Oriented Texture Detection Ideal Observer Modeling and Classification Image
Analysis’. Journal of Vision 11(8): 1–19.
Oyama, T. (1961). ‘Perceptual Grouping as a Function of Proximity’. Perceptual and Motor Skills
13: 305–306.
Parent, P. and S. W. Zucker (1989). ‘Trace Inference, Curvature Consistency, and Curve Detection’. IEEE
Transactions on Pattern Analysis and Machine Intelligence 11: 823–839.
Pasupathy, A. and C. E. Connor (1999). ‘Responses to Contour Features in Macaque Area V4’. Journal of
Neurophysiology 82: 2490–2502.
Pettet, M. W. (1999). ‘Shape and Contour Detection’. Vision Research 39: 551–557.
Phillips, G. and H. Wilson (1984). ‘Orientation Bandwidths of Spatial Mechanisms Measured by Masking’.
Journal of the Optical Society of America A—Optics and Image Science 1: 226–232.
Ren, X., C. Fowlkes, and J. Malik (2008). ‘Learning Probabilistic Models for Contour Completion in
Natural Images’. International Journal of Computer Vision 77: 47–63.
Rensink, R. A. and J. T. Enns (1995). ‘Preemption Effects in Visual Search Evidence for Low-Level
Grouping’. Psychological Review 102(1): 101–130.
234 Elder
Ringach, D. L. (2002). ‘Spatial Structure and Symmetry of Simple-Cell Receptive Fields in Macaque
Primary Visual Cortex’. Journal of Neurophysiology 88: 455–463.
Roelfsema, P. R. (2006). ‘Cortical Algorithms for Perceptual Grouping’. Annual Review of Neuroscience
29: 203–227.
Rubin, E. (1927). ‘Visuell wahrgenommene wirkliche bewegungen’. Zeitschrift für Psychologie 103: 354–384.
Sasaki, Y. (2007). ‘Processing Local Signals into Global Patterns’. Current Opinion in Neurobiology
17(2): 132–139.
Sha’ashua, A. and S. Ullman (1988). ‘Structural Saliency: The Detection of Globally Salient Structures Using
a Locally Connected Network’. In Proceedings of the 2nd International Conference on Computer Vision,
pp. 321–327. Los Alamos, CA: IEEE.
Sharon, E. and D. Mumford (2006). ‘2D-Shape Analysis Using Conformal Mapping’. International Journal
of Computer Vision 70(1): 55–75.
Sigman, M., G. A. Cecchi, C. D. Gilbert, and M. O. Magnasco (2001). ‘On a Common Circle: Natural
Scenes and Gestalt Rules’. Proceedings of the National Academy of Sciences 98(4): 1935–1940.
Snowden, R. (1992). ‘Orientation Bandwidth: The Effect of Spatial and Temporal Frequency’. Vision
Research 32: 1965–1974.
Spehar, B. (2002). ‘The Role of Contrast Polarity in Perceptual Closure’. Vision Research 42(3): 343–350.
Stahl, J. and S. Wang (2008). ‘Globally Optimal Grouping for Symmetric Closed Boundaries by Combining
Boundary and Region Information’. IEEE Transactions on Pattern Analysis and Machine Intelligence
30(3): 395–411.
Thompson, D. (1917). On Growth and Form. Cambridge: Cambridge University Press.
Thorpe, S. (2002). ‘Ultra-Rapid Scene Categorization with a Wave of Spikes’. In Proceedings of the
Biologicaly-Motivated Computer Vision Conference, Vol. LNCS 2525, pp. 1–15.
Thorpe, S., D. Fize, and C. Marlot (1996). ‘Speed of Processing in the Human Visual System’. Nature
381: 520–522.
Tu, Z., X. Chen, A. Yuille, and S. Zhu (2005). ‘Image Parsing: Unifying Segmentation, Detection, and
Recognition’. International Journal of Computer Vision 63(2): 113–140.
Tversky, T., W. S. Geisler, and J. S. Perry (2004). ‘Contour Grouping: Closure Effects are Explained by Good
Continuation and Proximity’. Vision Research 44: 2769–2777.
Ungerleider, L. (1995). ‘Functional Brain Imaging Studies of Cortical Mechanisms for Memory’. Science
270(5237): 769–775.
Van Essen, D. C., B. Olshausen, C. H. Anderson, and J. L. Gallant (1991). ‘Pattern Recognition, Attention,
and Information Processing Bottlenecks in the Primate Visual Search’. SPIE 1473: 17–28.
Wagemans, J., J. Elder, M. Kubovy, S. Palmer, M. Peterson, M. Singh, and R. von der Heydt (2012).
‘A Century of Gestalt Psychology in Visual Perception: I. Perceptual Grouping And Figure-Ground
Organization’. Psychological Bulletin 138(6): 1172–1217. Doi: 10.1037/a0029333.
Walsh, V. and A. Cowey (1998). ‘Magnetic Stimulation Studies of Visual Cognition’. Trends in Cognitive
Science 2: 103–110.
Wang, S. and J. M. Siskind (2003). ‘Image Segmentation with Ratio Cut’. IEEE Transactions on Pattern
Analysis and Machine Intelligence 25(6): 675–690.
Watt, R. J. and M. J. Morgan (1984). ‘Spatial Filters and the Localization of Luminance Changes in Human
Vision’. Vision Research 24(10): 1387–1397.
Wertheimer, M. (1938). ‘Laws of Organization in Perceptual Forms’. In A Sourcebook of Gestalt Psychology,
edited by W. D. Ellis, pp. 71–88. London: Routledge and Kegan Paul.
Williams, L. R. and D. W. Jacobs (1997). ‘Stochastic Completion Fields: A Neural Model of Illusory
Contour Shape and Salience’. Neural Computation 9(4): 837–858.
Wilson, H. R. and J. R. Bergen (1979). ‘A Four Mechanism Model for Threshold Spatial Vision’. Vision
Research 19: 19–32.
Bridging the Dimensional Gap 235
Wokke, M. E., A. R. E. Vandenbroucke, H. S. Scholte, and V. A. F. Lamme (2013). ‘Confuse your
Illusion: Feedback to Early Visual Cortex Contributes to Perceptual Completion’. Psychological Science
24(1): 63–71.
Yen, S. and L. Finkel (1998). ‘Extraction of Perceptually Salient Contours by Striate Cortical Networks’.
Vision Research 38(5): 719–741.
Yoshino, A., M. Kawamoto, T. Yoshida, N. Kobayashi, and J. Shigemura (2006). ‘Activation Time Course
of Responses to Illusory Contours and Salient Region: A High-Density Electrical Mapping Comparison’.
Brain Research 1071(1): 137–144.
Yuille, A. and D. Kersten (2006). ‘Vision as Bayesian Inference Analysis by Synthesis?’ Trends in Cognitive
Sciences 10(7): 301–308.
Zisserman, A., J. Mundy, D. Forsyth, J. Lui, N. Pillow, C. Rothwell, and S. Utcke (1995). ‘Class-Based
Grouping in Perspective Images’. In Proceedings of the 5th International Conference on Computer Vision,
pp. 183–188. Los Alamitos, CA: IEEE.
Chapter 12
Visual representation of
contour and shape
Manish Singh
A detailed report of Attneave’s original experiment was apparently never published. His 1954 article cites only
1
a ‘mimeographed note’.
Visual Representation of Contour and Shape 237
(a)
(b)
Fig. 12.1 (a) Generative model of open contours expressed a probability distribution on turning angle
from the current contour orientation. The distribution is centered on 0: meaning that going ‘straight’
(i.e. zero turning) is most likely, with the probability decreasing monotonically with turning angle in
either direction. This empirically motivated generative model explains why information along contour
increases monotonically with curvature (b) Sample results from Norman et al’s (2001) replication of
Attneave’s experiment. Histograms of points selected by subjects show peaks at maxima of curvature.
(a) Reproduced from Jacob Feldman and Manish Singh, Information Along Contours and Object Boundaries,
Psychological Review, 112(1), pp. 243-252, DOI: 10.1037/0033-295X.112.1.243 (c) 2005, American Psychological
Association. (b) Reproduced from J. Farley Norman, Flip Phillips, Heather E. Ross, Information concentration along
the boundary contours of naturally shaped solid objects, Perception 30(11) pp. 1285 – 1294, doi:10.1068/p3272,
Copyright (c) 2001, Pion. With kind permission from Pion Ltd, London www.pion.co.uk and www.envplan.com
of shape representation as well (Hoffman and Richards 1984; Richards, Dawson, and Whittington
1986; Leyton 1989; Hoffman and Singh 1997; Singh and Hoffman 2001; De Winter and Wagemans
2006, 2008a; Cohen and Singh 2007).
But why should curvature maxima be the most informative points along a contour? The link
between contour curvature and information content follows fairly directly from Shannon’s
theory of information (in particular, from the definition of surprisal as u = –log(p)), once one
adopts a simple and empirically motivated generative model of contours (Feldman and Singh
2005; Singh and Feldman 2012).2 Specifically, one may ask, as one moves along a contour, where
is the contour likely to go ‘next’ at any given point? A great deal of psychophysical work on
contour integration and contour detection has shown that the visual system implicitly expects
that a contour is most likely to go ‘straight’ (i.e. to continue along its current tangent direction),
and that the probability of ‘turning’ away from the current tangent direction decreases mono-
tonically with the magnitude of the turning angle (Field, Hayes, and Hess 1993; Feldman 1997;
Geisler et al. 2001; Geisler and Perry, 2009; Elder and Goldberg 2002; Yuille et al. 2004). The
visual system’s local probabilistic expectations about contours may thus be summarized as a
Note that the formula for the surprisal is consistent with the simple everyday intuition that improbable events,
2
when they occur, are cause for greater surprise—and hence are more informative—than when a highly prob-
able, or expected, event occurs. As they say, ‘man bites dog’ is news; ‘dog bites man’ is not.
238 Singh
von Mises (or circular normal) distribution on turning angles, centered on 0 (see Figure 12.1b;
Feldman and Singh 2005; Singh and Feldman 2012). Indeed, even the assumption of a specific
distributional form is not necessary to derive Attneave’s claim; all that is needed is that the
distribution on turning angles peak at 0 degrees, and decrease monotonically on both sides. It
then follows directly from this that the surprisal, u = –log(p), increases monotonically with the
magnitude of the turning angle. And turning angle, of course, is simply the discrete analogue
of curvature. Hence maxima of curvature are also maxima of contour information—which is
precisely Attneave’s claim.
One can go further, however. Attneave (1954) treated curvature only as an unsigned quantity,
i.e. simply as a magnitude. For a closed contour (such as the outline of an object), however, it is not
only meaningful but also more appropriate to treat curvature as a signed quantity—specifically,
as having positive sign in convex sections of the contour, and negative sign in concave sections.
Indeed, there are principled reasons to expect that the visual system should treat convex and con-
cave portions of a shape quite differently (Koenderink and van Doorn 1982; Koenderink 1984;
Hoffman and Richards 1984). From the point of view of information content of contours, however,
the key observation is that on closed contours, the probability distribution on turning angles is
not centred on 0, but rather is biased such that positive turning angles (involving turns toward
the shape, or figural side of the contour) are more likely than negative turning angles. Indeed, this
must be the case if the contour is to eventually close in on itself. And it entails, via the –log(p) rela-
tion, an asymmetry in surprisal, such that negative curvature is more ‘surprising’—and hence more
informative—than corresponding magnitudes of positive curvature (see Feldman and Singh 2005
for details). This asymmetry in information content is supported by empirical findings showing
that changes at concavities are easier to detect visually than corresponding changes at convexities
(Barenholtz et al. 2003; Cohen et al. 2005), although there are nonlocal influences as well—based
on, for example, whether a shape change alters qualitative part structure (e.g. Bertamini and Farrant
2005; Vandekerckhove, Panis, and Wagemans 2008). (See also ‘Interactions between Contour and
Region Geometry’ for more on nonlocal influences in shape perception.)
In summary, Attneave’s claim about curvature and information follows from a simple and
empirically motivated generative model of contours. And, as noted above, Attneave’s theoretical
claim can also be extended to closed contours, with the result that negative curvature segments
carry more information than corresponding positive curvature segments.3 The stochastic gen-
erative model of contours may also be extended to incorporate the role of co-circularity, i.e. the
visual expectation that contours tend to maintain their curvature (Singh and Feldman 2012).
Psychophysical evidence for this expectation by the visual system comes from studies of contour
integration (Feldman 1997; Pizlo, Salach-Goyska, and Rosenfeld 1997) as well as visual extrapola-
tion of contours (Singh and Fulvio 2005, 2007).
It is important to note that, since the generative models of contours considered in this section were entirely
3
local, these claims follow simply from local expectations about contour behaviour.
Visual Representation of Contour and Shape 239
fill in the missing intervening portion of the shape. Because visually completed contours are, by
definition, generated by the visual system (being absent in the retinal images themselves), detailed
measurement of their shape provides a unique window on the shape constraints embodied in the
visual processing of contours.
Contour extrapolation
Perhaps the simplest context for examining visual shape completion is that of contour extrapola-
tion: if a curved contour disappears behind an occluder, how does the visual system ‘expect’ it will
proceed behind the occluder? In other words, what shape will it take—not just in the immediate
vicinity of the point of occlusion, but also further away? A precise answer to this question would
serve to characterize the commonly (though often loosely) used notion of ‘good continuation’.4
Indeed, Wertheimer (1923) originally proposed the principle of good continuation as a way of
choosing between different possible extensions of a contour segment (e.g. see his Figures 16–19).
However, a mathematically precise characterization has been elusive. Some formal questions con-
cerning the meaning of good continuation include:
1 Which geometric variables of the contour does the visual system use in extrapolating its shape,
e.g. its tangent direction, curvature, rate of change of curvature, higher derivatives?
2 How does the visual system combine the contributions of these variables to actually generate
the extended shape of the extrapolated contour?
In addition, contour extrapolation is also a critical component of the general problem of shape
completion—since a visually interpolated contour must both smoothly extend each inducing con-
tour, as well as smoothly connect the two individual extrapolants (e.g. Ullman 1976; Fantoni and
Gerbino 2003). Therefore, a full understanding of visual shape completion requires an under-
standing of how the visual system extrapolates each curved inducing contour.
Singh and Fulvio (2005, 2007) used an experimental method they called location-and-gradient
mapping to measure the shape of visually extrapolated contours. This method obtains paired
measurements of extrapolation position and orientation at multiple distances from the point of
occlusion in order to build up an extended representation of a visually extrapolated contour. In
their stimuli, a curved contour disappears behind the straight edge of a half-disk occluder (see
Figure 12.2a). Observers iteratively adjust the (angular) position of a short line probe on the oppo-
site (curved) side of the occluder, and its orientation, in order to optimize the percept of smooth
continuation. Measurements are taken at multiple distances from the point of occlusion by using
half-disk occluders of different sizes (see Figure 12.2b).
In their first study, Singh and Fulvio (2005) used arcs of circles and parabolas as inducing con-
tours. By fitting various shape models to the extrapolation data, they found that:
1 The visual system makes systematic use of contour curvature in extrapolating contours—in
other words, extrapolation curvature increases systematically with the curvature of the inducing
contour. Although this result makes perfect intuitive sense, it is noteworthy that current models
of shape completion (in both human and computer vision) do not use the curvature of the
inducer—only its position and tangent direction at the point of occlusion. This empirical result
thus underscores the need for models of shape completion to incorporate the role of inducer
curvature as well.
This question is of course intimately related to the generative models of contours considered in ‘Contours and
4
Information’. The main difference is that the previously considered models focused on where contour is likely
to go ‘next’—i.e. in the immediate vicinity of the current location—whereas the question we are now posing
includes the extended behaviour of the contour.
240 Singh
(a) (b)
Fig. 12.2 (a) Stimulus used by Singh and Fulvio (2005, 2007) to study the visual extrapolation of
contours behind an occluder. A curved inducing contour disappears behind the straight edge of a half-
disk occluder. Observers adjust the angular position as well as the orientation of a line probe around
the curved edge of the occluder to optimize the percept of smooth continuation. (b) Measurements
are obtained at multiple distances from the point of occlusion to build a detailed representation of an
observer’s visually extrapolated contour.
Reproduced from Manish Singh and Jacqueline M. Fulvio, Visual Extrapolation of Contour Geometry, Proceedings
of the National Academy of Sciences, USA 102(3), pp. 939–944, doi: 10.1073/pnas.0408444102, Copyright
(2005) National Academy of Sciences, U.S.A.
2 Visually extrapolated contours are characterized by decaying curvature with increasing distance
from the point of occlusion. Specifically, fits of spiral shape models (i.e. models that include
both a curvature term and a rate of change of curvature term) to extrapolation data consistently
yielded negative values for the rate of change of curvature.5
3 The precision of subjects’ visually extrapolated contours decreases systematically with the
curvature of the inducing contour: the higher the inducing curvature, the less precisely the
visually extrapolated contour is localized. This result is consistent with findings from contour
interpolation studies using dot-sampled contours, which have also found a ‘cost of curvature’
in human performance (Warren, Maloney, and Landy 2002).
In a subsequent study, Singh and Fulvio (2007) tested whether observers make use of the rate of
change of curvature of an inducing contour in visually extrapolating its shape. This study used
arcs of Euler spirals as inducing contours—characterized by linearly increasing or decreasing cur-
vature as a function of arc length (i.e. length measured along the contour)—and manipulated
their rate of change of curvature (both in the positive and negative directions). In fitting a two-
parameter Euler-spiral model to the extrapolation settings, they found no systematic relationship
between the rate of change of curvature of the inducing contour and the rate of change of cur-
vature of the fitted Euler spiral to the extrapolation data. Thus observers appear not to take into
account rate of change of curvature in visually extrapolating contours behind occluders. Indeed,
visually extrapolated contours continued to exhibit a decaying-curvature behaviour even when
5 The decaying curvature behaviour explains the (initially surprising) finding that a parabolic shape model bet-
ter explained observers’ extrapolation data than a circular shape model—irrespective of whether the inducing
contour itself was a circular or parabolic arc (see Singh and Fulvio 2005 for details).
Visual Representation of Contour and Shape 241
the inducing contours had monotonically increasing curvature as they approached the occluder.
Importantly, this failure to use inducer rate of change of curvature was not simply due to a fail-
ure to detect it. A control experiment confirmed that observers could indeed reliably distinguish
between inducing contours with monotonically increasing vs decreasing curvature.
Taken together, these results may be viewed as providing a formal characterization of ‘good
continuation’. Specifically, they show that the visual system uses tangent direction as well as curva-
ture—but not rate of change of curvature—in visually extrapolating a curved contour. Moreover,
the influence of inducer curvature on visually extrapolated contours decays with distance from the
point of occlusion. Singh and Fulvio (2005, 2007) modelled these characteristics using a Bayesian
model involving two probabilistically expressed constraints: a likelihood constraint to maintain
the curvature of the inducing contour (i.e. a bias toward ‘co-circularity’; Parent and Zucker 1989),
and a prior constraint to minimize curvature (i.e. a bias toward ‘straightness’; e.g. Field et al.
1993; Feldman 1997, 2001; Geisler et al. 2001; Elder and Goldberg 2002). Both constraints were
expressed as probability distributions on curvature. The prior was expressed as a Gaussian dis-
tribution centred on 0 curvature with fixed variance, whereas the likelihood was centred on the
estimated inducer curvature at the point of occlusion, with a (Weber-like) linearly increasing
standard deviation with distance from the point of occlusion. Near the point of occlusion, the like-
lihood is very precise (low variance) and thus tends to dominate the prior.6 With increasing dis-
tance from the point of occlusion, however, the likelihood becomes less reliable (larger variance),
and so gradually the prior comes to dominate the likelihood. This shift in relative reliabilities leads
to the decaying curvature behaviour (see Singh and Fulvio 2007 for details).
Contour interpolation
Fulvio, Singh, and Maloney (2008) extended the location-and-gradient mapping method to study
contour interpolation. Their stimulus displays contained a contour whose middle portion was
occluded by a rectangular surface. On each trial, a vertical interpolation window was opened at
one of six possible locations through which a short linear probe was visible (see Figure 12.3a).
Observers iteratively adjusted the location (height) and orientation of the line probe in order to
optimize the percept of smooth continuation of a single contour behind the occluder. The per-
ceived interpolated contours were thus mapped out by taking measurements at six evenly spaced
locations along the width of the occlusion region. The experiments manipulated the geometry of
the two inducing segments—specifically, the turning angle between them (Figure 12.3b) and their
relative vertical offset (Figure 12.3c).
A basic question was: for a given pair of inducing contours, are observers’ settings of position and
orientation through the six interpolation windows globally consistent—i.e. consistent with a single,
stable, smooth interpolating contour. Using two measures of global consistency—a parametric one
and a non-parametric one—Fulvio et al. (2008) found that although increasing the turning angle
between inducers adversely affected the precision of interpolation settings, it did not adversely
affect their internal consistency. By contrast, increasing the relative offset between the two inducing
contours did disrupt the internal consistency of observers’ interpolation settings. In other words,
observers made their settings using simple heuristics (they were largely influenced by the closest
inducing contour), and their local settings of height and orientation at various locations no longer
‘hung together’ into any actual extended contour. A natural way to understand this difference is
6 Under the assumption of Gaussian distributions for the prior and likelihood, the Bayesian posterior is also
a Gaussian distribution whose mean is a weighted average of the prior mean and likelihood mean, with the
relative weights inversely proportional to their respective variances (see e.g. Box and Tiao 1992).
242 Singh
(a) (b)
(c)
Fig. 12.3 (a) Stimulus used by Fulvio, Singh, and Maloney (2008, 2009) to study contour interpolation.
For a given pair of inducing edges, an interpolation window is opened at one of six possible locations
along the width of the occluder. Observers adjust the height as well as the orientation of a line probe
visible through the interpolation window in order to optimize the percept of smooth interpolation. The
inducer geometry was manipulated by varying the turning angle (shown in (b)) and the relative offset
(shown in (c)) between the two inducers.
Reprinted from Vision Research, 48(6), Jacqueline M. Fulvio, Manish Singh, and Laurence T. Maloney, Precision and
consistency of contour interpolation, pp. 831–49, Copyright (2008), with permission from Elsevier.
that increasing the relative offset between inducer pairs leads eventually to a geometric context
where the interpolating contour must be inflected—i.e. contain a point of inflection (or change in
the sign of curvature) somewhere along its path—which is a factor that is known to disrupt visual
completion (Takeichi et al. 1995; Singh and Hoffman 1999). On the other hand, simply increasing
the turning angle between the two inducers does not necessitate inflected interpolating contours—
it only requires interpolating contours with greater curvature in a single direction.
These two factors—turning angle and relative offset between inducers—are often combined
conjunctively to define the strength of grouping between pairs of inducing edges. For example,
Kellman and Shipley’s (1991) definition of edge relatability requires that both the relative offset
between inducers, as well as the turning angle between them, be within specific ranges in order for
them to be considered ‘relatable’. This conjunctive combination, however, ignores the qualitatively
different effects that these two factors have on contour interpolation. Specifically, although both
factors lead to an increase in imprecision, only relative offset leads to a failure of internal consist-
ency. In a subsequent study, Fulvio, Singh, and Maloney (2009) developed a purely experimental
criterion to test for internal consistency of interpolation measurements—one that relied solely on
observers’ own interpolation performance rather than on any experimenter-defined measures.
The results independently verified and extended their earlier findings.
piecewise manner. In other words, it segments contours and shapes into simpler ‘parts’ and organ-
izes shape representation using these parts and their spatial relationships. Far from being arbitrary
subsets, these perceptual parts are highly systematic, and segmented using predictable geometric
‘rules’. Moreover, these segmented parts tend to correspond, in high-level vision, to psychologi-
cally meaningful subunits of objects (such as head, leg, branch, etc.) that are highly relevant to a
number of cognitive processes, including categorization, naming, and object recognition.
Although in Attneave’s (1954) usage, the phrase ‘maxima of curvature’ along a contour does
not distinguish between positive (convex) and negative (concave) curvature, the sign of curva-
ture actually plays a fundamental role in modern theories of shape representation—and especially
in theories of part segmentation. Once one treats curvature as a signed quantity (which can be
done whenever the distinction between convex and concave is well defined), one can differentiate
between positive maxima of curvature (marked by M+ in Figure 12.4a) and negative minima of
curvature (marked by m– in Figure 12.4a). Both of these extrema types have locally maximal mag-
nitude of curvature, and are hence ‘maxima of curvature’ by Attneave’s nomenclature. However, by
definition, positive maxima lie in convex segments of a shape’s bounding contour, whereas negative
minima lie in concave segments. Apart from these two extrema types, another important class of
points is defined by inflections, which are zero crossings of curvature—i.e. points where curvature
crosses from positive (convex) to negative (concave), or vice versa (marked by o in Figure 12.4a).
The distinction between positive maxima and negative minima of curvature is critical for part
segmentation—where negative minima of curvature play a special role. According to Hoffman
and Richards’ (1984) ‘minima rule’, the visual system uses negative minima of curvature to seg-
ment shapes into parts. This rule is motivated by the principle of transversality, according to which
when two smooth objects are joined to form a composite object, their intersection generically
(a) (b)
M+
m– m–
O O
O M+
m– m–
O m–
O
O m–
(c)
m–
M+
produces a concave crease (i.e. a discontinuity in the tangent plane of the composite surface; see
Figure 12.4b). And a concave crease is simply an extreme—i.e. ‘sharp’—form of a negative mini-
mum of curvature. (More precisely, a generic application of smoothing to a concave crease yields
a smooth negative minimum.) Similarly, when a new branch grows out of a trunk (or a limb out
of an embryo), negative minima of curvature are created between the sprouting branch and the
trunk (see Figure 12.4c; Leyton 1989). Hence, when faced with a complex object with unknown
part structure, it is a reasonable strategy for the visual system to use the presence of negative
minima of curvature as a cue to identifying separate parts.
A great deal of psychophysical evidence indicates that negative minima of curvature do indeed
play an important role in visually segmenting shapes into parts. For example, when subjects are
asked to draw cuts on line drawings of various objects to demarcate their natural parts, a large
proportion of their cuts pass through or near negative minima of curvature (Siddiqi, Tresness,
and Kimia 1996; De Winter and Wagemans 2006). Similar results have also been obtained with
3D models of objects (Chen, Golivinskiy, and Funkhouser 2009). Furthermore, even when unfa-
miliar, randomly generated shapes are used (hence lacking any high-level cues from recognition
or category knowledge), and subjects are simply asked to indicate whether or not a given contour
segment belongs to a particular shape (i.e. in a performance-based task where the instructions to
participants involve no mention of ‘parts’), their identification performance is substantially bet-
ter for segments delineated by negative minima of curvature than for those delineated by other
extrema types (Cohen and Singh 2007). This result indicates that part segmentation is a relatively
low-level geometry-driven process that operates automatically without relying on familiarity with
the shape, or any task requirement involving naming or recognition.7
Part segmentation using negative minima of curvature has been shown to explain a number
of visual phenomena, including the perception of figure and ground (Baylis and Driver 1994,
1995; Hoffman and Singh 1997); the perception of shape similarity (Hoffman and Richards 1984;
Bertamini and Farrant 2005; Vandekerckhove et al. 2008); object recognition in contour-deleted
images (Biederman 1987; Biederman and Cooper 1991); perception of transparency (Singh
and Hoffman 1998); visual search for shapes (Wolfe and Bennett 1997; Hulleman, te Winkel
and Boselie 2000; Xu and Singh 2002); the visual estimation of the ‘centre’ of a two-part shape
(Denisova, Singh, and Kowler 2006); the visual estimation of the orientation of a two-part shape
(Cohen and Singh 2006); and the allocation of visual attention to multi-part objects (Vecera,
Behrmann, and Filapek 2001; Barenholtz and Feldman 2003).
Although the minima rule provides an important cue for part segmentation, it is not suf-
ficient to divide a shape into parts—which of course requires segmenting the interior region
of a shape, not simply its bounding contour. Specifically, although the minima rule provides
a number of candidate part boundaries (namely, the negative minima of curvature), it does
not indicate how these boundaries should be paired to form part cuts that segment the shape.
Furthermore, even in shapes containing exactly two negative minima, simply connecting these
two minima does not necessarily yield intuitive part segmentations (see e.g. Singh, Seyranian,
and Hoffman 1999; Singh and Hoffman 2001 for examples). The basic limitation of the minima
rule stems from the fact that localizing negative minima of curvature involves only the local
geometry of the bounding contour of the shape, but not the nonlocal geometry of its interior
region (see ‘Interactions between Contour and Region Geometry’ for more on this important
7 This does not mean, of course, that high-level cognitive factors do not also exert an influence when present;
they clearly do (see e.g. De Winter and Wagemans 2006). The point is simply that cognitive factors are not
necessary for part segmentation; low-level geometry-driven mechanisms of part segmentation can and do
operate in their absence.
Visual Representation of Contour and Shape 245
(c)
Fig. 12.5 Two examples of failure of the minima rule: (a) A negative minimum that does not correspond
to a part boundary; and (b) a part boundary that does not correspond to a negative minimum. These
failures arise because the minima rule uses only local contour geometry, not region-based geometry.
(c) A different approach to part segmentation involves establishing a one-to-one correspondence
between axial branches are parts. Such a correspondence is achieved by a Bayesian approach to skeleton
computation.
Data from Jacob Feldman and Manish Singh, Bayesian estimation of the shape skeleton, Proceedings of the
National Academy of Sciences of the United States of America 103(47), pp. 18014–18019, doi: 10.1073/
pnas.0608811103, 2006.
246 Singh
and plant morphology (e.g. Blum 1973).8 However, as recognized subsequently by Blum and Nagel
(1978; see their Figure 2), the MAT does not achieve this one-to-one correspondence. Although
modern techniques for computing the medial axis and related transforms have become increas-
ingly sophisticated, they nevertheless largely inherit the intrinsic limitations of the MAT—which
stem from the basic conception of skeleton computation as a deterministic process involving the
application of a fixed geometric ‘transform’ to any given shape. Specifically, a geometric-transform
approach does not attempt to separate the shape ‘signal’ from any contributions of noise. Every
feature along the contour is effectively treated as being ‘intrinsic’ to the shape. One consequence of
this is a high degree of sensitivity of the skeleton to noise, such that the smallest perturbation to the
contour can dramatically alter the branching topology of the shape skeleton.
In order to address these concerns, Feldman and Singh (2006) used an inverse-probability
approach to estimate the skeleton that ‘best explains’ a given shape. The key idea in this approach
is to treat object shapes as resulting from a combination of generative factors and noise. The skel-
etal shape representation must then model the generative (or ‘intrinsic’) factors, while factoring
out the noise. Specifically, shapes are assumed to ‘grow’ from a skeleton via a stochastic generative
process. The estimated skeleton of a given shape is then one’s best inference of the skeleton that
generated it. Skeletons with more branches, and more highly curved branches, can of course pro-
vide a better fit to the shape (i.e. lead to a higher likelihood), but they are also penalized for their
added complexity (i.e. they have a lower prior). Thus one’s ‘best’ estimate of the skeleton involves
a Bayesian trade-off between fit to the shape and the complexity of the skeleton.
This trade-off leads to a pruning criterion for ‘spurious’ branches of the shape skeleton: a candi-
date axial branch is included in the final shape skeleton only if it improves the fit to the shape suf-
ficiently to warrant the increase in skeletal complexity that it entails. More precisely, the posterior
of the skeleton that includes the test branch must be larger than the posterior of the skeleton that
excludes it (recall that the posterior includes both the contribution of the fit to the shape, via the
likelihood term, as well as of skeleton complexity, via the prior). Axial branches that do not meet
this criterion are effectively treated as ‘noise’ and pruned. As a result, this probabilistic computa-
tion is able to establish a one-to-one correspondence between axial branches and perceptual parts
(see Figure 12.5c for an example). Importantly, it can predict both the successes of the minima
rule (cases where negative minima are perceived as part boundaries) and its failures (cases where
negative minima are not perceived as part boundaries, or where part boundaries do not corre-
spond to negative minima; recall Figures 12.5a and 12.5b)—despite the fact that in this approach
contour curvature is never explicitly computed. Thus, it yields a single axial branch for the curved
shape in Figure 12.5a; but a skeleton with two axial branches for the shape in Figure 12.5b. Indeed,
the contributions of other known factors influencing part segmentation can all be understood in
terms of this more fundamental process of probabilistic estimation of the shape skeleton, indicat-
ing that this may provide a unifying theory of part segmentation. See Singh, Feldman, and Froyen
(in preparation) and Feldman et al. (2013) for more on this probabilistic approach to skeletons and
parts, and its application to various visual problems.
8 In the MAT conception, a shape is viewed as the union of maximally inscribed circles, and its skeleton—the
MAT—is taken to be the locus of the centres of these circles.
Visual Representation of Contour and Shape 247
(e.g. Elder and Zucker 1993; Kovacs and Julesz 1993; Garrigan 2012). However, because closed
contours automatically define an enclosed region, it is less clear whether this advantage of
closure obtains at the level of contour geometry (see Tversky, Geisler, and Perry 2004), or at
the level of region-based geometry, i.e. the geometry of the region enclosed by the contour.
We have seen in the context of part segmentation that there is more to the representation of
a shape than simply the geometry of its bounding contour. To motivate the distinction between
contour geometry and region (or surface) geometry further, consider the simple shape shown in
Figure 12.6a. This shape may be conceptualized in two different ways:
1 It could be viewed as a rubber band lying on a table (the ‘rubber-band representation’).
Mathematically, we would define it as a closed one-dimensional contour embedded in
two-dimensional space. In this case, a natural way to represent its geometry would be in terms
of some contour property—say, curvature—expressed as a function of arc length (resulting in
a curvature plot such as in Figure 12.6b). The relevant notions of distance and neighbourhood
relations would then also be defined along the contour. As a result, although points A and B
on the shape are close to each other in the Euclidean plane, they would not be considered
‘neighbouring’ points because they are quite far from each other when distances are measured
along the contour.
2 Alternatively, it could be viewed as a piece of cardboard cut out into a particular shape (the
‘cardboard-cutout representation’). Mathematically, we may define it as a connected and compact
two-dimensional subset of the Euclidean plane (namely, the region enclosed by the contour).
Under this conceptualization, points A and B on the shape would indeed be considered quite
close to each other (because the intervening region is now also part of the shape).
(a) (b)
+
A
Curvature
– A
B
(c) (d)
+
Curvature
Fig. 12.6 Illustrating the limitations of a contour-based representation of shape. (a) Although the
two points A and B are very close to each other on the shape, they are very distant on the curvature
plot of its bounding contour, as shown in (b). (c) Similarly, although the two highlighted sections of
the contour belong to the same ‘bend’ in the shape, this fact is not reflected in any obvious way in
the curvature plot in (d).
248 Singh
The distinction between region-based and contour-based notions of shape has a number of
other implications as well. In Figure 12.6c, for example, the two highlighted sections of the con-
tour belong to the same ‘bend’ in the shape. A purely contour-based representation, however,
would have difficulty in explicitly representing this fact. In the curvature plot in Figure 12.6d,
for instance, the two contour sections do not appear to be related in any obvious way. What a
contour-based representation misses here is the locally parallel structure of the two highlighted
contour segments. It is clear that such structure can be extracted only by examining relationships
across (i.e. on ‘opposite’ sides of) the shape—not just along the contour. For the same reason,
bilateral symmetry or local symmetry in shapes is relatively easy to capture using region-based
representations, but difficult using purely contour-based representations. As an example, even
though the two shapes shown in Figure 12.7 have very similar curvature profiles, their global
region-based geometries are entirely different (Sebastian and Kimia 2005).
We should note that, in the examples above, we assumed that ‘material’ surface was on the inside
of the closed contour—not an unreasonable assumption for closed contours if we know we are
viewing solid, bounded, objects (the alternative would be an extended surface containing a shaped
hole). In the general case, however, the visual system faces the problem of border-ownership or
figure-ground assignment—determining whether the material object or surface lies on one side
of the contour or the other—a problem that is particularly acute when only a small portion of
an object’s outline is visible. An interesting interaction occurs between contour geometry and
region-based geometry in solving this problem, such that the side with the ‘simpler’ region-based
description tends to be assigned figural status. In more formal terms, the relevant geometric fac-
tors have been characterized in terms of part salience (Hoffman and Singh 1997) and stronger
axiality (Froyen, Feldman, and Singh 2010).
A natural way to capture region-based geometry is in terms of skeletal, or axial, representations
(introduced briefly in ‘Part-Based Representations of Shape’)—compact ‘stick-figure’ representa-
tions that capture essential aspects of its morphology (see, e.g., Kimia 2003). A well-known fig-
ure by Marr and Nishihara (1978) shows 3D models of various animals made out pipe cleaners.
A striking aspect of these models is how easily they are recognized as specific animals, despite the
absence of surface geometry—or indeed any surface characteristics. The demonstration suggests
that the axial information preserved in these pipe-cleaner models is an important component of
human shape representation. It should be borne in mind, however, that a skeletal representation
actually includes not just an estimate of the shape’s axes (which are shown in Marr and Nishihara’s
pipe-cleaner models), but also an estimate of the shape’s ‘width’ at each point on each axis (which
is not). In Blum’s MAT, for instance, this local ‘width’ is captured by the size of the maximally
Fig. 12.7 Although the two shapes have similar curvature profiles—differing only in the presence of a
zero-curvature segment in the shape on the right—their region-based geometries are entirely different.
Example based on Sebastian and Kimia (2005).
Adapted from Signal Processing, 85(2), Thomas B. Sebastian and Benjamin B. Kimia, Curves vs. skeletons in object
recognition, pp. 247–63, Copyright © 2005, with permission from Elsevier.
Visual Representation of Contour and Shape 249
Fig. 12.8 Illustrating the distinction between contour and region (or surface) geometry. The same
contour segment, visible through an aperture in (a), could belong to surfaces with very different
geometries. First, the contour segment could correspond to a protuberance on the shape, as in (b),
or to an indentation, as in (c). Second, the curvature of the contour could arise due to variation in
the width of the shape about a straight axis (as in (b) and (c)), or due to curvature of the axis itself,
with the local width function being constant (as in (d) and (e)).
inscribed circle at any given point. In Feldman and Singh’s (2006) Bayesian skeleton model, it
is approximately twice the length of the ‘ribs’ along which the shape is assumed to have ‘grown’
from the axis. Each such measure of local width of the shape implicitly defines a point-to-point
correspondence across the shape. In other words, it specifies for any given point on the shape’s
bounding contour which point on the ‘opposite’ side of the shape is locally symmetric to it.9
What are the perceptual implications of the difference between contour-based geometry and
region-based geometry? Consider the local contour segment in Figure 12.8a, shown through an
aperture. The same contour segment could belong to shapes with very different region-based geom-
etries. First, the contour segment could correspond either to a convex protuberance on the shape, or
to a concave indentation (Figures 12.8b vs. 12.8c). This distinction is based simply on a figure-ground
reversal (or change in border ownership)—whether the shape lies either on one, or the other, side
of the contour. This has been shown to be an important factor in predicting perceptual grouping in
the context of both amodal (Liu, Jacobs, and Basri 1999) and modal (Kogo et al. 2010) completion.
The second distinction we consider, however, does not depend on a figure-ground rever-
sal: assuming a locally convex region (say), the curvature on the contour could arise either from
variation in the width of the shape about a straight axis (as in Figures 12.8b and 12.8c), or from
curvature of the axis itself, with the local width of the shape being constant (Figures 12.8d and
12.8e). It is clear that these two cases actually represent two extremes of a continuum—where all
of the contour curvature can be attributed entirely to either the width function alone, or to axis
curvature alone. A continuous family of intermediate cases is of course possible—where the con-
tour’s curvature arises partly due to the curvature of the shape’s axis, and partly due to variations
in the shape’s width (Siddiqi et al. 2001; Fulvio and Singh 2006).
In order to examine the perceptual consequences of such region-based differences in shape, Fulvio
and Singh (2006) examined visual shape interpolation in stereoscopic illusory-contour displays. Their
displays varied systematically in their region-based geometry, while preserving the contour-based
geometry of the inducing edges (see Figure 12.9). Using two different experimental methods, they
probed the perceived shape of the illusory contours in the ‘missing’ region. The results exhibited large
influences of region-based geometry on perceived illusory-contour shape. First, illusory contours
One way to think about local symmetry is as follows: imagine placing a mirror at a point along the shape’s axis,
9
with its orientation matching the local orientation of the axis. If the axis is defined appropriately, this mirror
will reflect the tangent of the contour on one side of the shape to the tangent of the contour on the opposite
side of the shape (Leyton 1989).
250 Singh
(a) (b)
Fig. 12.9 (a) Stereoscopic stimuli used by Fulvio and Singh (2006) to study the influence of region-based
geometry on illusory-contour shape. In these stimuli, region-based geometry was manipulated while
keeping local contour geometry fixed (as in Figure 12.8). A schematic of the binocular percept is shown
in (b). The results showed significant differences in perceived illusory-contour shape as a function of
region-based geometry.
Reprinted from Acta Psychologica, 123 (1–2), Jacqueline M. Fulvio and Manish Singh, Surface geometry influences
the shape of illusory contours, pp. 20–40, Copyright © 2006 with permission from Elsevier.
enclosing locally concave shapes were found to be systematically more angular (closer to the inter-
section point of the linear extrapolations of the two inducers) than those enclosing locally convex
shapes. This influence of local convexity is consistent with results obtained with partly occluded
shapes (Fantoni, Bertamini, and Gerbino 2005). Beyond the influence of local sign of curvature,
however, this influence of local convexity also exhibited an interaction with two skeleton-based vari-
ables: shape width and axis curvature. Specifically, the influence of local convexity on illusory-contour
shape was found to be: (1) greater for narrower shapes than for wider ones; and (2) greater for shapes
with a straight axis and symmetric contours (‘diamonds’ and ‘bowties’; Figures 12.8b and 12.8c) than
for shapes with a curved axis and locally parallel contours (‘bending tubes’; see Figures 12.8d and
12.8e). These results indicate that, even at the level of illusory ‘contours’, an important role is played by
nonlocal region-based geometry involving skeleton-based parameters.
The influence of region-based geometry manifests itself in object recognition and classification
as well. In comparing the recognition performance of contour and region-based models, Sebastian
and Kimia (2005) compared the shape-matching performance of two algorithms—one based on
matching their bounding contours, the other based on matching axis-based graphs derived from
them. They found that when small variations were introduced on the shapes (e.g. involving partial
occlusion, rearrangement of parts, or addition or deletion of a part), the contour-based matching
scheme produced many spurious matches, leading to a substantial deterioration in performance.
By contrast, the axis-based matching scheme was highly robust to such variations. They con-
cluded that, even though axis-based representations are more complex and take more time to
compute, the additional time and effort required to compute them are well worth it.
Do human observers make use of parameters of the shape skeleton in classifying shapes?
Different classes of shape—e.g. animals and leaves—differ not only in their means along various
skeleton-based parameters (e.g. number of branches, axis curvature, etc.), but also in their dis-
tributional forms. For example, the distribution of number of branches tends to be Gaussian for
animals with a mean of around 5 (reflecting the typical number of body parts in an animal body
plan), whereas the distribution tends to be exponential for leaves (consistent with a recursively
(a)
0.6
Animals n=424
Leaves n=341
0.5
Probability 0.4
0.3
0.2
0.1
0
0 5 10 15 20 25
Number of branches
(b)
70 50 30
Fig. 12.10 Different categories of shape, such as animals and leaves, differ in the statistics of various
skeleton-based parameters. (a) Shows the distribution of number of axial branches computed from
databases of animal and leaf shapes. Note that the two categories differ both in the mean, as well as
the distributional form, of this variable. (b) To address the question of whether human observers rely on
skeleton-based statistics to classify shapes, Wilder, Feldman, and Singh (2011) created morphed shapes
by mixing animals and leaves in different proportions. Subjects were asked whether each morphed
shape looked ‘more like’ an animal or leaf. The results showed that a naive Bayesian classifier based on
the distribution of a small number of axis-based parameters provided an excellent predictor of human
shape classification.
Reprinted from Cognition 119(3), John Wilder, Jacob Feldman, and Manish Singh, Superordinate shape
classification using natural shape statistics, pp. 325–40, Copyright © 2011 with permission from Elsevier.
252 Singh
branching process); see Figure 12.10a. Do human subjects rely on such statistical differences in
skeletal parameters when performing shape classification? Wilder, Feldman, and Singh (2011)
used morphed shapes created by combining animal and leaf shapes in different proportions (e.g.
60% animal and 40% leaf; see Figure 12.10b). Subjects indicated whether each shape looked more
like an animal or more like a leaf. (The morphing proportions ranged between 30% and 70% so
the shapes were typically not recognizable as any particular animal or leaf.) They then compared
subjects’ performance with that of a naive Bayesian classifier based on a small number of skeletal
parameters, and found a close match between the two. By contrast, classification based only on
contour-based variables (such as contour curvature) and other traditional shape measures (such
as compactness and aspect ratio) did not provide good predictors of human classification perfor-
mance. These comparisons provide strong evidence for the use of a skeleton-based representation
of shape by the human visual system. More recent work also provides evidence for the role of
region-based representation of shape in contour-detection tasks, i.e. detecting a closed contour in
background noise (Wilder, Singh, and Feldman 2013).
Conclusions
Contours constitute an essential source of information about shape, and along contours points
with the greatest magnitude of curvature tend to be most informative. This concentration of
information is closely tied to generative models of contours assumed by the visual system—i.e. its
internal models about how contours tend to be generated (and hence its expectations about how
contours tend to behave locally). Therefore, visual expectations about contour continuity (‘good
continuation’) and the information content of contours are naturally viewed as two sides of the
same coin. In going from open to closed contours—such as the outlines of objects—the influence
of sign of curvature (convex vs concave) becomes critical, with concave sections of a contour
carrying more information, and playing a special role in part segmentation. The visual system
represents complex shapes by automatically segmenting them into simpler parts—‘simpler’
because these parts are closer to being convex (they contain less negative curvature). One type
of curvature extrema—negative minima of curvature—provides a particularly important cue
for part segmentation. However, sign of curvature (local convexity) and curvature extrema are
entirely contour-based notions, and this fact likely explains why the minima rule cannot fully
predict part segmentation. The visual system employs not only a contour-based representation of
shape, but also a region-based one—namely, a representation of the interior region enclosed by
the contour—making explicit properties such as the local width of the shape, the curvature of its
axis, and more generally, locally parallel and locally symmetric structure. Psychophysical results
from a variety of domains—shape classification, amodal and modal grouping, visual shape com-
pletion—provide clear evidence for the representation of region geometry based on skeleton or
axis models. Even at the level of so-called ‘illusory contours’, nonlocal region-based geometry
exerts a strong influence.
We conclude that, as far as the human visual representation of shape is concerned, contour
geometry cannot ultimately be viewed in isolation, but must be considered in tandem with
region-based geometry.
References
Attneave, F. (1954). ‘Some Informational Aspects of Visual Perception’. Psychological Review 61: 183–193.
Barenholtz, E., E. H. Cohen, J. Feldman, and M. Singh (2003). ‘Detection of Change in Shape: An
Advantage for Concavities’. Cognition 89(1): 1–9.
Visual Representation of Contour and Shape 253
Barenholtz, E. and J. Feldman (2003). ‘Visual Comparisons within and between Object Parts: Evidence for
a Single-part Superiority Effect’. Vision Research 43(15): 1655–1666.
Baylis, G. C. and J. Driver (1994). ‘Parallel Computation of Symmetry but not Repetition in Single Visual
Objects’. Visual Cognition 1: 377–400.
Baylis, G. C. and J. Driver (1995). ‘Obligatory Edge Assignment in Vision: The Role of Figure and Part
Segmentation in Symmetry Detection’. Journal of Experimental Psychology: Human Perception and
Performance 21(6): 1323–1342.
Bertamini, M. and T. Farrant (2005). ‘Detection of Change in Shape and its Relation to Part Structure’. Acta
Psychologica 120: 35–54.
Biederman, I. (1987). ‘Recognition by Components: A Theory of Human Image Understanding’.
Psychological Review 94: 115–147.
Biederman, I. and G. Ju (1988). ‘Surface vs. Edge-Based Determinants of Visual Recognition’. Cognitive
Psychology 20: 38–64
Biederman, I. and E. E. Cooper (1991). ‘Priming Contour-Deleted Images: Evidence for Intermediate
Representations in Visual Object Recognition’. Cognitive Psychology 23: 393–419.
Blum, H. (1973). ‘Biological Shape and Visual Science (Part I)’. Journal of Theoretical Biology 38: 205–287.
Blum, H. and R. N. Nagel (1978). ‘Shape Description Using Weighted Symmetric Axis Features’. Pattern
Recognition 10: 167–180.
Box, G. E. P. and G. C. Tiao (1992). Bayesian Inference in Statistical Analysis. New York: Wiley.
Chen, X., A. Golovinskiy, and T. A. Funkhouser (2009). ‘A Benchmark for 3D Mesh Segmentation’. ACM
Transactions on Graphics 28(3): 1–12.
Clottes, J. (2003). Chauvet Cave: The Art of Earliest Times. Translated by Paul G. Bahn. Salt Lake
City: University of Utah Press.
Cohen, E. H., E. Barenholtz, M. Singh, and J. Feldman (2005). ‘What Change Detection Tells Us about the
Visual Representation of Shape’. Journal of Vision 5(4): 313–321.
Cohen, E. H. and M. Singh (2006). ‘Perceived Orientation of Complex Shape Reflects Graded Part
Decomposition’. Journal of Vision 6(8): 805–821.
Cohen, E. H. and M. Singh (2007). ‘Geometric Determinants of Shape Segmentation: Tests Using Segment
Identification’. Vision Research 47: 2825–2840.
Cole, F., K. Sanik, D. DeCarlo, A. Finkelstein, T. Funkhouser, S. Rusinkiewicz, and M. Singh (2009).
‘How Well Do Line Drawings Depict Shape?’ In ACM Transactions on Graphics (Proc. SIGGRAPH)
28: 2009.
De Winter, J. and J. Wagemans (2006). ‘Segmentation of Object Outlines into Parts: A Large-scale
Integrative Study’. Cognition 25: 275–325.
De Winter, J. and J. Wagemans (2008a). ‘The Awakening of Attneave’s Sleeping Cat: Identification of
Everyday Objects on the Basis of Straight-line Versions of Outlines’. Perception 37: 245–270.
De Winter, J. and J. Wagemans (2008b). ‘Perceptual Saliency of Points along the Contour of Everyday
Objects: A Large-scale Study’. Perception and Psychophysics 70(1): 50–64.
Denisova, K., M. Singh, and E. Kowler (2006). ‘The Role of Part Structure in the Perceptual Localization of
a Shape’. Perception 35: 1073–1087.
Elder, J. H. and S. W. Zucker (1993). ‘Contour Closure and the Perception of Shape’. Vision Research
33(7): 981–991.
Elder, J. H. and R. M. Goldberg (2002). ‘Ecological Statistics of Gestalt Laws for the Perceptual
Organization of Contours’. Journal of Vision 2(4): 324–353.
Fantoni, C. and W. Gerbino (2003). ‘Contour Interpolation by Vector-field Combination’. Journal of Vision
3(4): 281–303.
Fantoni, C., M. Bertamini, and W. Gerbino (2005). ‘Contour Curvature Polarity and Surface Interpolation’.
Vision Research 45: 1047–1062.
254 Singh
Feldman, J. (1997). ‘Curvilinearity, Covariance, and Regularity in Perceptual Groups’. Vision Research
37(20): 2835–2848.
Feldman, J. (2001). ‘Bayesian Contour Integration’. Perception and Psychophysics 63(7): 1171–1182.
Feldman, J. and M. Singh (2005). ‘Information along Contours and Object Boundaries’. Psychological
Review 112(1): 243–252.
Feldman, J. and M. Singh (2006). ‘Bayesian Estimation of the Shape Skeleton’. Proceedings of the National
Academy of Sciences 103(47): 18014–18019.
Feldman, J., M. Singh, E. Briscoe, V. Froyen, S. Kim, and J. Wilder (2013). ‘An Integrated Bayesian
Approach to Shape Representation and Perceptual Organization’. In Shape Perception in Human and
Computer Vision: An Interdisciplinary Perspective, edited by S. Dickinson and Z. Pizlo, pp. 55–70.
London: Springer.
Field, D. J., A. Hayes, and R. F. Hess (1993). ‘Contour Integration by the Human Visual System: Evidence
for a Local “Association Field”’. Vision Research 33(2): 173–193.
Froyen, V., J. Feldman, and M. Singh (2010). ‘A Bayesian Framework for Figure-ground Interpretation’.
In Advances in Neural Information Processing Systems, edited by J. Lafferty, C. K. I. Williams,
J. Shawe-Taylor, R. Zemel, and A. Culotta, pp. 631–639. La Jolla, CA: The NIPS Foundation.
Fulvio, J. M. and M. Singh (2006). ‘Surface Geometry Influences the Shape of Illusory Contours’. Acta
Psychologica 123: 20–40.
Fulvio, J. M., M Singh, and L. T. Maloney (2008). ‘Precision and Consistency of Contour Interpolation’.
Vision Research 48: 831–849.
Fulvio, J. M., M. Singh, and L. T. Maloney (2009). ‘An Experimental Criterion for Consistency in
Interpolation of Partially-occluded Contours’. Journal of Vision 9(4): 5: 1–19.
Garrigan, P. (2012). ‘The Effect of Contour Closure on Shape Recognition’. Perception 41(2): 221–235.
Geisler, W. S., J. S. Perry, B. J. Super, and D. P. Gallogly (2001). ‘Edge Co-occurrence in Natural Images
Predicts Contour Grouping Performance’. Vision Research 41: 711–724.
Geisler, W. S. and J. S. Perry (2009). ‘Contour Statistics in Natural Images: Grouping across Occlusions’.
Visual Neuroscience 26: 109–121.
Hoffman, D. D. and W. A. Richards (1984). ‘Parts of Recognition’. Cognition 18: 65–96.
Hoffman, D. D. and M. Singh (1997). ‘Salience of Visual Parts’. Cognition 63: 29–78.
Hulleman, J., W. te Winkel, and F. Boselie (2000). ‘Concavities as Basic Features in Visual Search: Evidence
from Search Asymmetries’. Perception and Psychophysics 62: 162–174.
Hume, D. (1748/1993). An Enquiry concerning Human Understanding. Indianapolis, IN: Hackett.
Kellman, P. and T. Shipley (1991). ‘A Theory of Visual Interpolation in Object Perception’. Cognitive
Psychology 23: 141–221.
Kennedy, J. M. and R. Domander (1985). ‘Shape and Contour: The Points of Maximum Change Are Least
Useful for Recognition’. Perception 14: 367–370.
Kimia, B. (2003). ‘On the Role of Medial Geometry in Human Vision’. Journal of Physiology 97: 155–190.
Koenderink, J. J. and A. van Doorn (1982). ‘The Shape of Smooth Objects and the Way Contours End’.
Perception 11: 129–137.
Koenderink, J. J. (1984). ‘What Does the Occluding Contour Tell us about Solid Shape?’ Perception
13: 321–330.
Koffka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt, Brace and World.
Kogo, N., C. Strecha, L. Van Gool, J. Wagemans (2010). ‘Surface Construction by a 2-D
Differentiation-Integration Process: A Neurocomputational Model for Perceived Border Ownership,
Depth, and Lightness in Kanizsa Figures’. Psychological Review 117(2), 406–439.
Kovacs, I. and B. Julesz (1993). ‘A Closed Curve Is Much More than an Incomplete One: Effect
of Closure in Figure-ground Segmentation’. Proceedings of the National Academy of Sciences
90: 7495–7497.
Visual Representation of Contour and Shape 255
Latecki, L. and R. Lakamper (1999). ‘Convexity Rule for Shape Decomposition Based on Discrete Contour
Evolution’. Computer Vision and Image Understanding 73: 441–454.
Leyton, M. (1989). ‘Inferring Causal History from Shape’. Cognitive Science 13: 357–387.
Liu, Z., D. Jacobs, and R. Basri (1999). ‘The Role of Convexity in Perceptual Completion: Beyond Good
Continuation’. Vision Research 39: 4244–4257.
Marr, D. and H. K. Nishihara (1978). ‘Representation and Recognition of the Spatial Organization of
Three-dimensional Shapes’. Proceedings of the Royal Society of London B 200: 269–294.
Norman, J. F., F. Phillips, and H. E. Ross (2001). ‘Information Concentration along the Boundary Contours
of Naturally Shaped Solid Objects’. Perception 30: 1285–1294.
Panis, S., J. de Winter, J. Vandekerckhove, and J. Wagemans (2008). ‘Identification of Everyday Objects on
the Basis of Fragmented Versions of Outlines’. Perception 37: 271–289.
Parent, P. and S. W. Zucker (1989). ‘Trace Inference, Curvature Consistency, and Curve Detection’. IEEE
Transactions on Pattern Analysis and Machine Intelligence 2(8): 823–839.
Pasupathy, A. and C. E. Connor (2002). ‘Population Coding of Shape in Area V4’. Nature Neuroscience
5(12): 1332–1338.
Pizlo, Z., M. Salach-Goyska, and A. Rosenfeld (1997). ‘Curve Detection in a Noisy Image’. Vision Research
37(9): 1217–1241.
Richards, W., B. Dawson, and D. Whittington (1986). ‘Encoding Contour Shape by Curvature Extrema’.
Journal of the Optical Society of America A 3: 1483–1491.
Rosin, P. L. (2000). ‘Shape Partitioning by Convexity’. IEEE Transactions on Systems, Man, and Cybernetics,
Part A 30: 202–210.
Sebastian, T. and B. Kimia (2005). ‘Curves vs. Skeletons in Object Recognition’. Signal Processing 85 (2): 247–263.
Siddiqi, K., B. Kimia, A. Tannenbaum, and S. Zucker (2001). ‘On the Psychophysics of the Shape Triangle’.
Vision Research 41(9): 1153–1178.
Siddiqi, K., K. Tresness, and B. Kimia (1996). ‘Parts of visual form: psychophysical aspects. Perception
25: 399–424.
Singh, M. and D. D. Hoffman (1998). ‘Part Boundaries Alter the Perception of Transparency’. Psychological
Science 9: 370–378.
Singh, M. and D. D. Hoffman (1999). ‘Completing Visual Contours: The Relationship between Relatability
and Minimizing Inflections’. Perception and Psychophysics 61: 636–660.
Singh, M., G. D. Seyranian, and D. D. Hoffman (1999). ‘Parsing Silhouettes: The Short-cut Rule’. Perception
and Psychophysics 61(4): 636–660.
Singh, M. and D. D. Hoffman (2001). ‘Part-based Representations of Visual Shape and Implications for
Visual Cognition’. In From Fragments to Objects: Segmentation and Grouping in Vision: Advances in
Psychology, edited by T. Shipley and P. Kellman, vol. 130, pp. 401–459. New York: Elsevier.
Singh, M. and J. M. Fulvio (2005). ‘Visual Extrapolation of Contour Geometry’. Proceedings of the National
Academy of Sciences, USA 102(3): 939–944.
Singh, M. and J. M. Fulvio (2007). ‘Bayesian Contour Extrapolation: Geometric Determinants of Good
Continuation’. Vision Research 47: 783–798.
Singh, M. and J. Feldman (2012). ‘Principles of Contour Information: A Response to Lim and Leek (2012)’.
Psychological Review 119(3): 678–683.
Singh, M., J. Feldman, and V. Froyen (in preparation). ‘Unifying Parts and Skeletons: A Bayesian Approach
to Part Segmentation’. In Handbook of Computational Perceptual Organization, edited by S. Gepshtein, L.
T. Maloney & M. Singh. Oxford: Oxford University Press.
Takeichi, H., H, Nakazawa, I. Murakami, and S. Shimojo (1995). ‘The Theory of the Curvature-constraint
Line for Amodal Completion’. Perception 24: 373–389.
Tversky, T., W. Geisler, and J. Perry (2004). ‘Contour Grouping: Closure Effects are Explained by Good
Continuation and Proximity’. Vision Research 44(24): 2769–2777.
256 Singh
Ullman, S. (1976). ‘Filling-in the Gaps: The Shape of Subjective Contours and a Model for their Generation’.
Biological Cybernetics 25: 1–6.
Vandekerckhove, J., S. Panis, and J. Wagemans (2008). ‘The Concavity Effect is a Compound of Local and
Global Effects’. Perception and Psychophysics 69: 1253–1260.
Vecera, S. P., M. Behrmann, and J. C. Filapek (2001). ‘Attending to the Parts of a Single Object: Part-based
Selection Limitations’. Perception and Psychophysics 63: 308–321.
Walther, D., B. Chai, E. Caddigan, D. Beck, and Li Fei-Fei (2011). ‘Simple Line Drawings Suffice for
Functional MRI Decoding of Natural Scene Categories’. Proceedings of the National Academy of Sciences
of the USA, 108(23): 9661–9666.
Warren, P. A., L. T. Maloney, and M. S. Landy (2002). ‘Interpolating Sampled Contours in 3D: Analyses of
Variability and Bias’. Vision Research 42: 2431–2446.
Wertheimer, M. (1923). ‚Untersuchungen zur Lehre von der Gestalt II’. Psychologische Forschung 4: 301–350.
Translation published in W. Ellis (1938). A Source Book of Gestalt Psychology. London: Routledge and
Kegan Paul, pp. 71–88.
Wilder, J., J. Feldman, and M. Singh (2011). ‘Superordinate Shape Classification Using Natural Shape
Statistics’. Cognition 119: 325–340.
Wilder, J., M. Singh, and J. Feldman (2013). ‘Detecting Shapes in Noise: The Role of Contour-based and
Region-based Representations’. Poster presented at the Annual Meeting of the Vision Sciences Society
(VSS 2013).
Wolfe, J. M. and S. C. Bennett (1997). ‘Preattentive Object Files: Shapeless Bundles Of Basic Features’.
Vision Research 37: 25–43.
Xu, Y. and M. Singh (2002). ‘Early Computation of Part Structure: Evidence from visual Search’. Perception
and Psychophysics 64: 1039–1054.
Yuille, A. L., F. Fang, P. Schrater, and D. Kersten (2004). ‘Human and Ideal Observers for Detecting
Image Curves’. In Advances in Neural Information Processing Systems, edited by S. Thrun, L. Saul, and
B. Schoelkopf, vol. 16, pp. 59–70. Cambridge, MA: MIT Press.
Section 4
Figure-ground organization
Chapter 13
Background
Investigators of visual perception have yet to find a completely satisfactory answer to the funda-
mental question, ‘How do we segregate a complex scene into individual objects?’. For the most
part we seem to accomplish this task readily, but the apparent ease of object perception can lead
us astray as we try to understand how it is done.
At one level we can describe the segregation of a scene into objects (or ‘figures’) as follows.
When two regions of the visual input share a border, visual processes determine whether one of
them has a definite shape bounded by the shared border. In this case, the shaped region is per-
ceived as the figure (the object) and the border is perceived as its bounding contour. The region on
the opposite side of the border appears to simply continue behind the figure/object; it is perceived
as a shapeless ground to the figure/object at their shared border. This figure–ground interpretation
is a local one; regions can be perceived as grounds along one portion of their border and as figures
along other portions (Hochberg 1980; Peterson 2003a; Kim and Feldman 2009). Note that the
figure appears to be closer to the viewer than the ground at their shared border; thus the border is
perceived as a depth edge. Figure 13.1(A) illustrates the distinction between figures and grounds.
Our understanding of the processes involved in arriving at these percepts has progressed over the
last 100 years, but it remains far from complete.
In attempting to understand how object perception occurs, many theorists have taken figure–
ground assignment to occur an early stage of processing, one that happens at a low level in the
visual hierarchy before object memories stored at higher levels are accessed and before attention
operates. The assumption is that figures must be defined at this low/early stage in order to provide
a substrate for those higher-level processes. This is the classic view of figure-ground assignment,
and is discussed in the next section ‘The Traditional View of Figure–Ground Perception’. On the
classic view of figure–ground assignment, only properties that can be computed on the image can
influence the first figure assignment; properties that require access to memory may affect later
interpretations but not the first one (Wertheimer 1923/1938). A number of such image-based fac-
tors have been identified; those factors are reviewed in ‘The Traditional View of Figure–Ground
Perception’. Modern research suggests that the classic low-level stage view of figure assignment is
not correct. Instead, research shows that high-level representations of object structure and seman-
tics and subjective factors like attention and intention influence figure assignment. This research
is reviewed in ‘Challenges to the Classic View: High-level Influences on Figure Assignment’. In the
modern approach figure assignment is viewed as resulting from interactions between high and
low levels of the visual hierarchy. In ‘Modern Theoretical Views of Figure–Ground Perception’, we
260 Peterson
(c)
Fig. 13.1 (a) A black region shares borders with three white regions. It shares borders with two of
these white regions on the bottom and right side. There, the white regions are the near, shaped
entities (the figures)—they depict a cat and a tree—and the black region is perceived as a locally
shapeless ground. The black region shares borders with a third white region on the left and top.
There, the black region is perceived as the shaped entity—a woman—and the white side is perceived
as a locally shapeless ground. (b), (c) Displays with eight alternating black and white regions of equal
area. The black regions are critical regions in that they possess Gestalt configural properties of (local)
convexity (b) and symmetry (c). Participants tend to report that they perceive the critical regions as
figures under conditions where the critical regions are black and white equally often. (d) The black
region is smaller than, and enclosed by, the white region.
This material has been reprinted from Mary A. Peterson, ‘Overlapping partial configuration in object memory: an
alternative solution to classic problems in perception and recognition’, in Mary A. Peterson and Gillian Rhodes
(eds), Perception of Faces, Objects, and Scenes: Analytic and Holistic Processes, p. 270, figure 10.1a © 2003,
Oxford University Press and has been reproduced by permission of Oxford University Press http://ukcatalogue.
oup.com/product/9780195313659.do For permission to reuse this material, please visit http://www.oup.co.uk/
academic/rights/permissions.
discuss these models and review recent evidence consistent with this highly interactive alternative
to the classic view. Finally we give our Conclusion.
small area, and enclosure. In principle, the configural properties identified by the Gestalt psy-
chologists can be calculated on the image without calling upon memory.1 The Gestalt psycholo-
gists and others demonstrated that observers were likely to perceive regions with these classic
properties as figures more often than abutting regions that were concave, asymmetric, larger in
area, and enclosing (e.g., Bahnsen 1928; Rubin 1958/1915; Kanisza and Gerbino 1976; for review,
see Hochberg 1971; Pomerantz and Kubovy 1986; Peterson 2001).
Results demonstrating the effectiveness of many of the configural properties were obtained in
experiments in which observers viewed stimuli with abutting black and white regions sharing
borders, and reported whether the black region(s) or the white region(s) appeared to be figures.
The regions of one color possessed the property under consideration whereas the regions of the
other color did not, and no other properties known to be relevant to figure-ground perception2
distinguished the two regions. Many sample displays were presented so that the property being
tested was paired with the black and white regions equally often. Figures 13.1(B)–(D) shows sam-
ple displays used to test the role of convexity, symmetry, enclosure, and small area. Observers
tended to report perceiving regions with the tested properties as figures on a large proportion of
trials, as much as 90 per cent for convexity (Kanisza and Gerbino 1976).
The Gestalt psychologists demonstrated that properties such as convexity, symmetry, closure,
and small area—properties that could be calculated on the input image and did not seem to
demand past experience—can account for figure assignment; that past experience is not neces-
sary. These results contradicted the Structuralists’ claim that past experience alone segregates
objects from one another, at least on the assumption that there is an inborn tendency to use the
Gestalt configural properties for figure assignment. The Gestalt view that figure–ground segre-
gation preceded access to object memories took hold. Many theorists still hold the classic view
today (e.g., see Craft et al. 2007 for a recent statement of this view), and it remains quite common
for theorists to conceive of figure–ground segregation as an early process or stage of processing
(e.g., Zhou et al. 2000). But note that evidence indicating that the Gestalt configural properties
are relevant to figure assignment does not entail that past experience is not also relevant. We
discuss evidence showing that past experience plays a role in figure assignment in ‘Challenges
to the Classic View: High-level Influences on Figure Assignment’. First, we review other recently
identified configural properties that can in principle be calculated on the image.
They might instead be extracted during an individual’s lifetime from statistical regularities of the
1
environment.
2 At the time, investigators did not know that using displays with multiple regions inflated estimates of the effec-
tiveness of the properties of convexity and symmetry (see Peterson and Salvagio 2008; Mojica & Peterson 2014).
262 Peterson
response time (RT) data from these other tasks to infer how observers had organized the test
displays.
One benefit of indirect methods is that they don’t require instructions regarding figure assign-
ment; hence, according to their proponents, they may be less likely to induce certain types of
response biases based on hypotheses about what the experimenter expects (Driver and Baylis
1996; Hulleman and Humphreys 2004; Vecera et al. 2002; for review, see Wagemans et al. 2012;
Peterson and Kimchi 2013). Note, however, that in all cases where indirect measures have been
employed they supported the same conclusions as direct reports. Thus, where indirect meas-
ures have been used they have not uncovered evidence that direct reports were contaminated by
response bias, an important contribution.
Another benefit of indirect measures is that whereas an individual’s reports regarding what
he or she perceives as figure cannot be scored as ‘correct’ or ‘incorrect’, there is a correct answer
on the indirect tasks that are employed; RTs on correct trials can be compared across various
conditions, and the RT differences may provide insight into various aspects of figure–ground
perception. For instance, indirect methods have been enormously useful in attempts to learn
about figure–ground-relevant processing taking place outside of awareness (see ‘Challenges to the
Classic View: High-level Influences on Figure Assignment’).
Despite the benefits of indirect methods, direct measures remain important. To date, only
direct reports allow one to measure the probability that a region with a certain property will be
perceived as figure in a briefly exposed display. Given that configural properties operate proba-
bilistically and their effectiveness is influenced by context (Zhou et al. 2000; Jehee et al. 2007;
Peterson and Salvagio 2008; Goldreich and Peterson 2012), probability measures have been
very useful in elucidating the mechanisms of figure assignment. Moreover, although indirect
methods sometimes assay perceived organization, at other times they convey information
about the process of arriving at a percept rather than the percept itself. For instance, rather
than using response times to index which region are perceived as the figure, Peterson and
Lampignano (2003) and Peterson and Enns (2005) used them to assay competition for figural
status between cues/properties that favor assigning the figure on opposite sides of a border.
Observers were aware of the figures they perceived, but they were unaware of the competition
that led to their percepts. Thus, in this case, indirect methods informed about process rather
than about the percept. In-depth discussions of the methods can be found elsewhere (e.g.,
Wagemans et al. 2012; Peterson and Kimchi 2013). In the remainder of this section we simply
indicate whether direct or indirect methods were used in experiments supporting a role for var-
ious properties in figure assignment. In ‘Challenges to the Classic View: High-level Influences
on Figure Assignment’ and ‘Modern Theoretical Views of Figure–Ground Perception’ we also
point out how indirect measures have been useful in attempts to understand the mechanisms
of figure assignment.
Part salience
Using direct reports, Hoffman and Singh (1997) showed that the figure is more likely to be per-
ceived on the side of a border where the parts are more ‘salient’. Part salience (Figure 13.2A) is
determined by a number of geometric factors, including the curvature (‘sharpness’) of the part’s
Low-level and High-level Contributions to Figure–Ground Organization 263
EE Non-EE
edge motion
Fig. 13.2 (a) The black region with a salient part tends to be perceived as the figure. (b) An
extremal edge (EE) cues the left side of the central border as the figure. (This illustration
was originally published as Figure 13.1(b) on p. 78 of ‘Extremal edges: a powerful cue to
depth perception and figure-ground organization’ by Stephen E. Palmer and Tandra Ghose,
Psychological Science, 19(1): 77–84. Copyright © 2008 Association for Psychological Science.
Reprinted by Permission of SAGE Publications.) (c) The black, lower, region tends to be perceived
as the figure. (d) The black regions are wider at the base than at the top, and tend to be
perceived as figures. (e) When the white dots on the black region and the border between the
black and white regions move synchronously in the same direction (say to the right as indicated
by the arrows above and below the display) and the black dots on the white region remain
stationary, the black region is perceived as the figure. (f) Two frames side by side indicate two
sequential frames. The dashed lines are overlaid on the figures to help the reader understand
how the displays transformed from frame 1 to frame 2. Observers perceived the black region
as the deforming figure because the convex parts delimited from the black side of the border
were perceived to move hinged on the concave cusps between them. (g) Two frames side by
side indicate two sequential frames. The black region is perceived as the moving figure, as if it is
advancing on the white region. The dashed vertical lines are added to aid the appreciation of the
advancing movement in the static display.
Reproduced Stephen E. Palmer and Joseph L. Brooks, Edge-region grouping in figure-ground organization and
depth perception, Journal of Experimental Psychology: Human Perception and Performance, 34 (6), p. 1356,
figure 13.1a © 2008, American Psychological Association.
boundaries and the degree to which it ‘sticks out’, measured as perimeter/cut length. Part salience
is related to convexity, but it allows quantification of other geometric factors.
side of a border with an EE gradient; this is true even when the EE is placed in conflict with other
factors (Ghose and Palmer 2010).
Lower region
Using both direct and indirect measures, Vecera et al. (2002) showed that regions below a hori-
zontally oriented border are more likely than regions above the border to be perceived as figure
(Figure 13.2C). In principle, the lower region can be calculated on the input image, so we list it
here, although we note that this cue could be derived from past experience. Vecera and Palmer
(2006) proposed that the configural property of the lower region derives from the ecological sta-
tistics of objects in the earth’s gravitational field. Note that ecological statistics can in principle
underlie many of the image-based configural cues; hence, these properties may have become rel-
evant over the course of evolution, as assumed by the Gestalt psychologists, or during an indi-
vidual’s lifetime.
Top–bottom polarity
Using both direct and indirect measures, Hulleman and Humphreys (2004) showed that regions
that are wider at the bottom and narrower at the top are more likely to be perceived as figures than
regions that are wider at the top and narrower at the bottom (Figure 13.2D). Like the lower region
property, top–bottom polarity can be calculated on the input image. Inasmuch as it accords with
gravitational stability, it might have evolved as a figure cue or it might be extracted from ecological
statistics during an individual’s lifetime.
Edge-region grouping
Palmer and Brooks (2008) showed that properties that group a border with the region on one
side but not the other can affect figure assignment (Figure 13.2E). Six different grouping fac-
tors (common fate, proximity, flicker synchrony, and three varieties of similarity—blur similar-
ity, color similarity, and orientation similarity) affected figure assignment, as assessed by direct
reports and confidence estimates, albeit to widely varying degrees. Figure 13.2(E) is a static
display illustrating the effect of common fate in a bipartite display comprising two equal-area
regions, one black and one white, covered with dots of the opposite contrast. When the dots
on one region and the border between the two regions move synchronously in the same direc-
tion, the region on which the dots lie is perceived as the figure. For instance, in Figure 13.2(E),
if the white dots on the black region move to the right at the same time as the central border
moves to the right (as indicated by the arrow below the display) and the black dots on the white
region remain stationary, the common fate of the white dots on the black region and the border
increases the probability that the black region will be perceived as the figure. Similar effects were
found for flicker (Weisstein and Wong 1987), blur similarity (Marshall et al. 1996; Mather and
Smith 2002), and a different common fate display (Yonas et al. 1987). Some of the properties
that group borders with regions involve dynamic changes (common fate and flicker synchrony),
whereas others are static (e.g., proximity and similarity). We next discuss two new configural
properties that involve dynamic changes.
Articulating motion
Barenholtz and Feldman (2006) showed that when a contour deforms dynamically, observers
tend to assign figure and ground in such a way that the articulating vertex is concave rather
than convex (Figure 13.2F). They used bipartite displays in which a central border separated the
Low-level and High-level Contributions to Figure–Ground Organization 265
display into two equal-area regions. One region had convex parts delimited by concave cusps
whereas the other region had concave parts. They deformed the central border between succes-
sive frames (‘Frame 1’ and ‘Frame 2’ in Figure 13.2F) and asked observers to report which side of
the display appeared to be the deforming figure. Observers perceived the convex parts as moving
as if they were hinged on the concave cusps between them, an effect that depended on the con-
cavity of the cusps separating the convex parts (Barenholtz and Feldman 2006), consistent with
the hypothesis that a concave vertex is the joint between the convex parts of a figure (Hoffman
and Richards 1984). Later, Kim and Feldman (2009) asked observers to report which side of the
border appeared to be moving rather than which side appeared to be the figure, thereby using
reports about motion to assay figure assignment indirectly. This is a valuable indirect measure
because few assumptions are required to translate observers’ moving side reports into figure side
reports, although stimuli must be exposed for relatively long durations so that the motion can
be perceived.
3 This ground cue operates only in the presence of figure cues (Peterson and Salvagio 2008; Goldreich and
Peterson 2012).
266 Peterson
(a) (b)
82% 61%
(c)
Gillam and Grove (2011) pointed out that near surfaces are not necessarily located in front of
a single surface; rather they are often interposed in front of multiple objects at different distances
from the viewer. In the latter case, the contours of the occluded far objects abut the contour of the
near object in the visual field, but they are otherwise unrelated. Gillam and Grove hypothesized
that the presence of unrelated contour alignments near a border serves as a ground cue because
the unrelated contours are improbable except under conditions of occlusion. Their results sup-
ported their hypothesis, providing additional evidence that properties of grounds, as well as prop-
erties of figures, are critical to figure assignment.
Summary
Dating back to the early twentieth century and continuing to the present day myriad image-based
configural properties have been shown to affect figure assignment. Recently, ground properties
have been discovered as well. Given that object perception, which entails figure assignment, is a
critical function of vision, it is not surprising that many factors exert an influence. An analogy can
be made to depth perception, where numerous cues signal depth, including monocular, binocular,
and movement-based cues.4
4 Note that the functions served by depth cues and configural cues overlap somewhat but not completely.
Configural cues determine where objects lie with respect to a border; they signal border assignment. In contrast
Low-level and High-level Contributions to Figure–Ground Organization 267
many depth cues are irrelevant to border assignment, and hence, to object perception (binocular disparity,
accretion/deletion, and motion parallax excepted). Some research has begun to investigate how configural
cues and depth cues combine (Peterson and Gibson 1993; Peterson, 2003b; Burge et al. 2005; Qiu et al. 2005;
Burge et al., 2010; Burge, Palmer, & Peterson, 2005; Peterson, 2003b; Peterson & Gibson, 1993; Qiu et al., 2005;
but see Gillam, Anderson, & Rizwi,et al. 2009). Further research on this topic is needed.
268 Peterson
region had been perceived as the figure and that endogenously (volitionally) allocated attention
can affect figure assignment.
Attention can also be allocated exogenously in that it can be drawn to a region by a flash of
light. Baylis and Driver failed to find evidence that exogenously allocated attention affected figure
assignment, but their failure was probably due to the use of an insensitive test. In 2004 Vecera
et al. performed a more sensitive test and, using the same indirect measure as Baylis and Driver,
showed that exogenous attention can also affect figure assignment. Moreover, Vecera et al. found
that attention effects added to those of convexity, complementing the similar additive effect
Peterson and Gibson observed for fixation. Thus, there is now ample evidence that high-level
factors like intention, fixation, and attention (both endogenously and exogeneously oriented) can
affect figure assignment. Moreover, neurophysiological evidence shows that attention enhances
neural responses to figures (Qiu et al. 2007; Poort et al. 2012).
Past experience
The Gestalt psychologists did not conduct systematic tests of whether, in addition to the low-level
factors they identified, high-level representations of previously seen objects can affect figure
assignment There were a few demonstrations that past experience could exert an influence on
figure assignment (e.g., Rubin, 1958/1915; Schafer and Murphy 1943) but these demonstrations
were not above criticism and were dismissed because they were inconsistent with the Zeitgeist (see
Peterson 1999a for review and discussion).
In 1991, Peterson, Harvey, and Weidenbacher obtained results that strongly suggested that past
experience with particular objects influences figure assignment (Peterson et al. 1991). They exam-
ined reversals of figure–ground perception using center-surround displays modeled on the Rubin
vase-faces display. In their displays the factors of symmetry, small area, enclosure, fixation, and
sometimes the depth cue of overlap favored the interpretation that the center region was the
figure. However, past experience favored the interpretation that the surrounding regions were
the figures in that a portion of a familiar object was sketched on the outside of the border shared
by the center and surrounding regions. They showed these displays to observers such that the
familiar object was depicted in its upright orientation on some trials and in an inverted orienta-
tion on other trials, and asked observers to report figure–ground reversals over the course of
30-second trials viewing both upright and inverted displays (for samples see Figure 13.4A and B,
respectively).
Peterson et al. (1991) found that when the familiar object suggested in the surround was pre-
sented in its upright orientation rather than an inverted orientation, observers both maintained
the surround as figure longer and obtained it as figure faster by reversal out of the center-as-
figure percept. The latter finding—that surrounds were obtained as figure by reversal out of the
center-as-figure interpretation faster when they depicted upright rather than inverted familiar
objects—led Peterson et al. to hypothesize that, contrary to the traditional view, access to memo-
ries of previously seen objects occurred outside of awareness prior to figure assignment. (Peterson
and Gibson (1994a) replicated this pattern of results with a set of stimuli designed to isolate effects
of object familiarity.)5
Top-down set can amplify effects of a familiar configuration (Peterson et al., 1991; Peterson & and Gibson,
5
1994a).
(a) (b) (c)
(d) (e)
Fig. 13.4 (a) Two portions of standing women are suggested on the left and right sides in the
white regions surrounding the small, symmetric black central region. (b) An upside down (inverted)
version of (b). (c) The same parts are suggested on the left and right sides in the white regions as
in (a), but here the parts have been spatially rearranged such that the configuration is no longer
familiar. (d) A bipartite display with equal-area regions to the right and left of the central border.
The black region depicts a portion of a familiar object. These displays were viewed both upright
and inverted. (e) A bipartite display with equal-area regions to the right and left of the central
border. The black region depicts a portion of a familiar object—a seahorse. The white region is a
novel symmetric shape. Hence, past experience and symmetry compete for figural status in this
stimulus.
(a) Reproduced Mary A. Peterson, Erin H. Harvey, and Hollis L. Weidenbacher, Shape recognition inputs to
figure-ground organization: which route counts?, Journal of Experimental Psychology: Human Perception and
Performance, 17 (4), p. 1356, figure 13.2a © 1991, American Psychological Association. (c) Reproduced Mary
A. Peterson, Erin H. Harvey, and Hollis L. Weidenbacher, Shape recognition inputs to figure-ground organization:
which route counts?, Journal of Experimental Psychology: Human Perception and Performance, 17 (4), p. 1356,
figure 13.2c © 1991, American Psychological Association. (d) This material has been reprinted from Mary A.
Peterson and Emily Skow-Grant, ‘Memory and learning in figure-ground perception’, in B. Ross and D. Irwin
(eds), Cognitive Vision. Psychology of Learning and Motivation Vol. 42, p. 5, figure 13.4a Copyright © 2003,
Elsevier. (e) Reproduced from Mary A. Peterson and Bradley S. Gibson, Must Figure-Ground Organization Precede
Object Recognition? An Assumption in Peril, Psychological Science 5(5), p. 254, Figure 13.1 Copyright © 1994 by
Association for Psychological Science. Reprinted by Permission of SAGE Publications.
270 Peterson
Peterson et al. (1991) observed the effects of past experience on figure assignment only when the
parts were arranged into familiar configurations; when the same parts were rearranged into novel
configurations, as in Figure 13.4(C), no such effects were observed. Thus, these were effects of
familiar configuration and not familiar parts. Moreover, instruction-delivered knowledge that the
inverted displays depicted inverted familiar objects or that the part-rearranged displays were con-
structed by rearranging the parts of well-known, familiar objects was not sufficient to allow past
experience to affect figure assignment with those stimuli; upright displays were necessary. That
instruction-delivered knowledge was insufficient to change the pattern of results obtained with
inverted and part-rearranged displays indicated that fast, bottom-up, access to the relevant object
representations afforded only by upright displays was necessary for effects of past experience on
figure assignment. These results led Peterson and colleagues to hypothesize that high-level memo-
ries of familiar objects can influence figure assignment, provided that they are accessed quickly.
Inverting the displays slowed access to memories of familiar objects, and therefore removed their
influence on figure assignment.
Peterson and her colleagues then created a set of displays designed to isolate effects of famil-
iar configuration in order to investigate whether past experience exerts an influence on the
first perceived figure assignment. In these displays, vertically elongated rectangles were divided
into two equal-area black and white regions by an articulated central border. The region on
one side of the central border depicted a portion of a familiar object, whereas the region on
the other side did not (a example is shown in Figure 13.4D.) The right/left location and black/
white color of the familiar regions was balanced across the set of displays. The displays were
exposed for brief durations (e.g., 86 ms) and masked; each display was viewed twice only, once
in an upright orientation and once in an inverted orientation. Observers reported whether they
perceived the region on the right or the left of the central border as figure. Observers’ reports
regarding the first perceived figure–ground organization indicated that the figure was more
likely to be perceived on the side of the border where the familiar configuration lay when the
displays were upright rather than inverted (Gibson and Peterson 1994). Peterson and Gibson
(1994b) also pitted a familiar configuration against the image-based configural cue of symme-
try (e.g., Figure 13.4E) and found that effects of both cues were evident in observers’ reports
regarding the first-perceived figure–ground organization in displays exposed for as little as
28 ms. Moreover, these results showed that past experience does not always dominate other
cues; instead past experience operates as one of many cues to figural status (cf., Peterson 1994).
Furthermore, these results suggested that the cues of symmetry and past experience compete
to determine the percept.
The results discussed above were obtained with direct reports regarding figural status. Some
scientists expressed concern that these direct reports might not indicate the first perceived fig-
ure assignment, that participants might reverse the displays in search of familiar objects before
they reported figure assignment. A variety of findings argued against that alternative view. First,
familiar configuration did not always determine where the figure was perceived. Second, the
same conclusions were supported by reversal data as well as by reports of the first perceived
figure assignment (Peterson et al. 1991; Peterson and Gibson 1994a). Third, Vecera and Farah
(1997) reported converging evidence using indirect measures, as did Peterson and Lampignano
(2003), Peterson and Enns (2005), Peterson and Skow (2008), and Navon (2011). For instance,
Peterson and Enns (2005) showed participants a novel border twice, first as the border of a prime
object, on its left, say, as in Figure 13.5(A) and later as the border of a test object on either the
same or the opposite side (Figure 13.5B, left and right columns, respectively). In the test the
participants’ task was to report whether two test objects were the same as or different from each
Low-level and High-level Contributions to Figure–Ground Organization 271
(a) Prime
+ +
(c) Control
+ +
Fig. 13.5 Displays used by Peterson and Enns (2005). (A) The prime display showing a figure on the
left of a stepped border. (B), (C) Four pairs of same/different test displays. All four samples show
trials on which the correct response was ‘different’. (B) In experimental test displays the prime
border was repeated in one or both of the two test displays (one on ‘different’ trials, as illustrated;
both on ‘same trials). When repeated, the prime border was either shown as the boundary of a
figure on the same side as in the prime (left column, top stimulus), or on the opposite side, the
side that was perceived as the ground in the prime (right column, top stimulus). (C) Control test
displays that did not share a border with the prime. Half the control test displays faced in the same
direction as the prime figure, half faced in the opposite direction (as in the left and right columns,
respectively), to serve as controls for the experimental same direction and opposite direction
displays.
Reproduced from Perception and Psychophysics, 67(4), The edge complex: Implicit memory for figure assignment
in shape perception, Mary A. Petrson, p. 731, Figure 13.3, DOI: 10.3758/BF03193528 Copyright © 2005,
Springer-Verlag. With kind permission from Springer Science and Business Media.
other, with no reference back to the prime object. (This is a variant of Driver and Baylis’ (1996)
indirect measure.) When the border repeated from the prime was assigned to an object on the
opposite side at test, participants’ response times were longer than they were either when it was
assigned to an object on the same side, or when the test objects were control objects with novel
borders, as in Figure 13.5(C). These results showed that a memory of the side to which a border
was previously assigned enters into the determination of where a figure lies when the border is
272 Peterson
encountered again, slowing the decision when cues in the current display favor assigning the
border to a different side.6
The results of Peterson and Enns (2005) (and other results using indirect measures) can best be
understood within a competitive architecture in which candidate objects on both sides of borders
compete for figure assignment outside of awareness. On this view, response times were longer
when the border was assigned to an object on the opposite side at test because a memory that the
object was previously located on the prime side competes with the properties that favor perceiving
the object on the opposite side of the border in the test display.7 Recall that Kienker et al. (1986)
(see also McClelland and Rumelhart 1987; Vecera and O’Reilly 1998; Vecera and O’Reilly 2000)
had introduced the idea that figure assignment entails competition. Modern views of competi-
tion are discussed in more detail in the section ‘Modern Theoretical Views of Figure–Ground
Perception’.
Summary
Research in the late twentieth and early twenty-first centuries has firmly established that, in addi-
tion to image-based factors, high-level factors like attention, intention, and past experience influ-
ence figure assignment. This research also suggested that competition is a mechanism of figure
assignment. Accordingly, modern theoretical views of figure assignment involve competition and
take into consideration influences from both high- and low-level factors, as we will now discuss.
Driver and Baylis (1996) had initially used displays like these to argue against the idea that past experience
6
exerts an influence on figure assignment. They obtained the same pattern of results on experimental trials as
Peterson and Enns (2005) did. However, their research design lacked a critical control condition. Peterson
and Enns (2005) included a control condition and were able to demonstrate that the longer reaction times
obtained on probes with the figure assigned on the opposite side at test were due to effects of past experience
on figure assignment.
Treisman and DeSchepper (1996) interpreted similar results in terms of negative priming. Peterson and
7
Lampignano (2003) and Peterson (2012) argue that competition is a better explanation.
Low-level and High-level Contributions to Figure–Ground Organization 273
alone (e.g., Moran and Desimone 1985; Miller et al. 1993; Rolls and Tovee 1995). This com-
petition has become known as biased competition because it can be biased or overcome by
contrast or attention. For instance, if an animal attends to one of two stimuli within a neuron’s
receptive field, the neuron’s response pattern changes to resemble the pattern obtained when
only the attended stimulus is present. Critically, if the attended stimulus is the poor stimulus,
the response to the good stimulus is suppressed (Chelazzi et al. 1993; Duncan et al. 1997;
Reynolds et al. 1999; see Reynolds and Chelazzi 2004 for a review). Likewise, if one shape
is higher in contrast than the other, the neuron’s response pattern resembles the response to
the high-contrast stimulus alone, and the response to the other stimulus is suppressed. Thus,
the biased competition model entails competition at high levels between objects that might
be perceived, and it predicts suppression of objects that lose the competition. Note that the
biased competition model does not rule out competition between border assignment/edge
units as well. Competition has been shown to occur at many levels in the visual hierarchy
(e.g., Craft et al. 2007).
Peterson and Skow (2008) noted that the two objects that might be perceived on opposite sides
of a border necessarily fall within the same receptive field, and reasoned that the biased competi-
tion model might account for figure–ground perception, with the winner perceived as the object/
figure and the loser perceived as the shapeless ground (see Peterson et al. 2000 for a similar pro-
posal). They reasoned that if the region perceived as ground lost the cross-border competition
for figure assignment, then responses to an object that was potentially present there would be
suppressed. To test this hypothesis they used displays in which many properties favored the inter-
pretation that the object/figure lay on the inside of a closed silhouette border, whereas familiar
configuration favored the interpretation that the object/figure lay on the outside of the silhouette’s
border (e.g., Figure 13.6.) In other words, the silhouettes were designed so that the inside would
win the competition and be perceived as the figure, whereas the outside would lose the competi-
tion and be perceived as a shapeless ground. Indeed, subjects perceived the figure on the inside
and were unaware of the familiar configuration suggested on the outside of the silhouettes, as
predicted if it lost the competition for figural status. (The familiar configuration suggested on the
outside of the left and right contours of the silhouettes in Figure 13.6 is a portion of a house with
a pitched roof and a chimney.)
To assess whether responses to the loser were suppressed, Peterson and Skow (2008) showed
a line drawing of either a real-world object or a novel object shortly after a brief exposure of
one of these silhouettes. Participants made a speeded object decision regarding the line draw-
ing (i.e., they reported whether the line drawing depicted a real-world object or a novel object).
Half the objects were of each type. The real-world objects were mostly from the Snodgrass and
Vanderwart (1980) set; the novel objects were drawn from the Kroll and Potter (1984) set. The
critical manipulation concerned the line drawings of real-world objects:8 they depicted objects
that were either from the same basic-level category or a different category to the familiar config-
uration that was suggested on the groundside of the silhouette border (Figure 13.6A and 13.B,
respectively). Peterson and Skow predicted that if assigning the figure on the inside of the bor-
der entailed suppression of a competitor on the outside, participants’ response times should be
longer to correctly classify a real-world object from the same rather than a different basic-level
8 The line drawings of novel objects were included because the task required participants to decide whether they
were viewing a line drawing of a real-world object or a novel object. To observe effects of competition-induced
suppression, some sort of discrimination at test was necessary.
274 Peterson
Fig. 13.6 Trial sequence used by Peterson and Skow (2008). Time is shown vertically. A silhouette
with a house suggested on the ground side of its left and right borders was shown centered on
fixation for 50 ms. The silhouette disappeared and 33 ms later a line drawing was displayed, also
centered on fixation. The line drawing depicted either a real-world object or a novel object. When it
was a real-world object, it was either from the same basic level category (A) or a different category
(B) as the object suggested on the groundside of the preceding silhouette. (Novel objects are not
shown.)
Reproduced Mary A. Peterson and Emily Skow, Suppression of shape properties on the ground side of an edge:
evidence for a competitive model of figure assignment, Journal of Experimental Psychology: Human Perception
and Performance, 34(2), p. 255, figure 13.3 © 2008, American Psychological Association.
category from the familiar object suggested on the outside of the silhouette borders. (Note that
this is the opposite of what would be expected if the familiar configuration in the prime was on
the figure side of the border, and that is because the inhibitory competition account predicts
that a competing object on the losing side, i.e., the groundside, is suppressed.) Peterson and
Skow observed the predicted pattern of results. Importantly, the borders of the line drawings
were not the same as those of the silhouettes, ruling out an interpretation in terms of border
units alone. Thus, Peterson and Skow’s results implied that competition occurs between objects
that might be perceived on opposite sides of borders. Note that evidence for high-level com-
petition does not rule out the existence of competition at lower levels, e.g., between border
assignment units.
The evidence for high-level influences on figure assignment and for competition between
objects that might be perceived on opposite sides of a border raises questions regarding how
high the processing of objects competing for figure assignment goes, both functionally and struc-
turally. The answers to these questions favor interpreting figure assignment within a dynami-
cal interactive model in which a fast non-selective feedforward sweep of activation occurs first,
Low-level and High-level Contributions to Figure–Ground Organization 275
competition occurs at many levels, and feedback integrates the outcome of the competition across
all levels, as discussed next.
cortex of the medial temporal lobe (long thought to be a declarative memory structure only), was
involved in effects of familiar configuration on figure assignment. These data are consistent with
the hypothesis that before figure assignment occurs, a non-selective first pass of processing pro-
ceeds to the highest levels of processing, as per the hypotheses of Lamme and Roelfsema (2000)
and Bullier (2001).
Barense et al.’s (2012) behavioral data led them to hypothesize that the perirhinal cortex of the
medial temporal lobe sends modulatory feedback to the visual cortex. Peterson et al. (2012b)
found evidence of the predicted feedback for regions perceived as figures, consistent with the
hypothesis that perceptual awareness requires additional interactive processing beyond the first
feedforward pass, as predicted by Lamme and Roelfsema (2000) and Bullier (2001). In addition,
Salvagio et al. (2012) showed that suppression applied to one side of a border, as a result of com-
petition for figural status taking place at high levels where receptive fields are large, is relayed to
levels as low as V1, where receptive fields are much smaller. Likova and Tyler (2008) also found
that activity is suppressed in V1 on the groundside of a border in conditions where a figure is
differentiated from the ground only at a global scale. These recent results are consistent with the
hypothesis that competition for figural status occurring at high structural levels generates feed-
back to lower-level visual areas. As such, they are consistent with current dynamical interactive
views of figure assignment involving (a) a first fast pass of non-selective feedforward process-
ing that identifies both low-level and high-level attributes of objects that might be perceived on
opposite sides of borders, (b) competition between those object candidates, and (c) feedback that
integrates the signals across the hierarchy of brain regions (Peterson and Cacciamani, 2013; for
related discussion see van Leeuwen, this volume).
Conclusion
One hundred years after Gestalt views first took hold, our understanding of scene segmentation
has progressed substantially. We now know that in addition to the configural properties iden-
tified by the Gestalt psychologists, figure assignment is affected by past experience, attention,
and intentions, as well as by other image-based factors identified during the twentieth century.
Figure assignment is also affected by ground properties. Recent use of indirect measures and brain
imaging techniques has revealed that there is much more processing of the regions ultimately
perceived as grounds than was supposed in traditional approaches, and that competition and
feedback are involved in figure assignment. These new methods offer the promise of uncovering
the mechanisms that organize the visual field into figures and grounds.
Acknowledgements
Much of the research reported in this chapter was conducted while the author was supported by
grants from the NSF, most recently by NSF BCS 0960529. Thanks to Laura Cacciamani for help
with the figures.
References
Bahnsen, P. (1928). Eine Untersuchung über Symmetrie und Asymmetrie bei visuellen Wahrnehmungen.
Z Psychol 108: 129–154.
Baylis, G.C. and Driver, J. (1995). One-sided edge assignment in vision: 1. Figure-ground segmentation
and attention to objects. Curr Direct Psychol Sci 4: 140–146.
Low-level and High-level Contributions to Figure–Ground Organization 277
Barenholtz, E. and Feldman, J. (2006). Determination of visual figure and ground in dynamically
deforming shapes. Cognition 101(3): 530–544.
Barenholtz, E. and Tarr, M. J. (2009). Figure–ground assignment to a translating contour: a preference for
advancing vs. receding motion. J Vision 9(5): 27, doi: 10.1167/9.5.27
Barense, M. G., Ngo, J., Hung, L., and Peterson, M. A. (2012). Interactions of memory and perception in
amnesia: the figure–ground perspective. Cereb Cortex 22(11): 2680–2691.
Bullier, J. (2001). Integrated model of visual processing. Brain Res Rev 36: 96–107.
Burge, J., Peterson, M. A., and Palmer, S. E. (2005). Ordinal configural cues combine with metric disparity
in depth perception. J Vision 5(6): 534–542.
Burge, J., Fowlkes, C., and Banks, M. S. (2010). Natural-scene statistics predict how the figure–ground cue
of convexity affects human depth perception. J Neurosci 30(21): 7269–7280.
Cacciamani, L., Mojica, A. J., Sanguinetti, J. L., and Peterson, M. A. (2014). Semantic access occurs outside
of awareness for the groundside of a figure. Unpublished manuscript.
Chelazzi, L., Miller, E. K., Duncan, J., and Desimone, R. (1993). A neural basis for visual search in inferior
temporal cortex. Nature 363: 345–347.
Craft, E., Schütze, H., Niebur, E., and von der Heydt, R. (2007). A neural model of figure-ground
organization. J Neurophysiol 97(6): 4310–4326.
Crouzet, S. M., Kirchner, H., and Thorpe, S. J. (2010). Fast saccades towards faces: face detection in just
100 ms. J Vision 10(4): 16, doi: 10.1167/10.4.16.
Dehaene, S., Changeux, J.-P., Naccache, L., Sackur, J., and Sergent, C. (2006). Conscious, preconscious,
and subliminal processing: a testable taxonomy. Trends Cogn Sci 10: 204–211.
Desimone, R. and Duncan, J. (1995). Neural mechanisms of selective visual attention. Ann Rev Neurosci
18(1): 193–222.
Duncan, J., Humphreys, G. W., and Ward, R. (1997). Competitive brain activity in visual attention. Curr
Opin Neurobiol 7: 255–261.
Driver, J. and Baylis, G. C. (1996). Figure-ground segmentation and edge assignment in short-term visual
matching. Cogn Psychol 31: 248–306.
Fahrenfort, J. J., Snijders, T. M., Heinen, K., van Gaal, S., Scholte, H. S., and Lamme, V. A. (2012).
Neuronal integration in visual cortex elevates face category tuning to conscious face perception. Proc
Natl Acad Sci USA 109(52): 21504–21509.
Ghose, T. and Palmer, S. E. (2010). Extremal edges versus other principles of figure-ground organization.
J Vision 10(8): 3, doi: 10.1167/10.8.3
Gibson, B. S. and Peterson, M. A. (1994). Does orientation-independent object recognition precede
orientation-dependent recognition? Evidence from a cueing paradigm. J Exp Psychol: Hum Percept
Perform 20: 299–316.
Gillam, B. J., Anderson, B. L., and Rizwi, F. (2009). Failure of facial configural cues to alter metric
stereoscopic depth. J Vision 9(1): 3, doi: 10.1167/9.1.3
Gillam, B. J. and Grove, P. M. (2011). Contour entropy: a new determinant of perceiving ground or a hole.
J Exp Psychol: Hum Percept Perform 37(3): 750–757.
Goldreich, D. and Peterson, M. A. (2012). A Bayesian observer replicates convexity context effects. Seeing
Perceiving 25: 365–395.
Hochberg, J. (1971). Perception 1. Color and shape. In: Woodworth and Schlosberg’s Experimental Psychology,
3rd edn, edited by J. W. Kling and L. A. Riggs, pp. 395–474 (New York: Holt, Rinehard and Winston).
Hochberg, J. (1980). Pictorial functions and perceptual structures. In: The Perception of Pictures, Vol. 2,
edited by M. A. Hagen, pp. 47–93 (New York: Academic Press).
Hoffman, D. D. and Richards, W. (1984). Parts of recognition. Cognition 18(1–3): 65–96.
Hoffman, D. D. and Singh, M. (1997). Salience of visual parts. Cognition 63: 29–78.
278 Peterson
Hulleman, J. and Humphreys, G. W. (2004). A new cue to figure–ground coding: top–bottom polarity. Vis
Res 44(24): 2779–2791.
Jehee, J. F. M., Lamme, V. A. F, and Roelfsema, P. R. (2007). Boundary assignment in a recurrent network
architecture. Vis Res 47: 1153–1165.
Joubert, O. R., Fize, D., Rousselet, G. A., and Fabre-Thorpe, M. (2008). Early interference of context
congruence on object processing in rapid visual categorization of natural scenes. J Vision 8(13): 11,
doi: 10.1167/8.13.11.
Kanisza, G. and Gerbino, W. (1976). Convexity and symmetry in figure-ground organization. In: Vision
and Artifact, edited by M. Henle, pp. 25–32 (New York: Springer).
Kienker, P. K., Sejnowski, T. J., Hinton, G. E., and Schumacher, L. E. (1986). Separating figure from
ground with a parallel network. Perception 15: 197–216.
Kim, S.-H. and Feldman, J. (2009). Globally inconsistent figure/ground relations induced by a negative
part. J Vision 9(10): 8, doi:10.1167/9.10.8.
Kroll, J. F. and Potter, M. C. (1984). Recognizing words, pictures, and concepts: a comparison of lexical,
object, and reality decisions. J Verbal Learn Behav 23: 39–66.
Lamme, V. A. F. and Roelfsema, P. R. (2000): The distinct modes of vision offered by feedforward and
recurrent processing. Trends Neurosci 23(11): 571–579.
Likova, L. T. and Tyler, C. W. (2008). Occipital network for figure/ ground organization. Exp Brain Res
189: 257–267.
McClelland, J. L. and Rumelhart, D. E. (1987). Parallel Distributed Processing, Volume 2. Explorations in the
Microstructure of Cognition: Psychological and Biological Models. (Cambridge, MA: MIT Press).
Marshall, J. A., Burbeck, C. A., Ariely, D., Rolland, J. P., and Martin, K. E. (1996). Occlusion edge blur:
a cue to relative visual depth. J Opt Soc Am A 13: 681–688.
Mather, G. and Smith, D. R. R. (2002). Blur discrimination and its relation to blur-mediated depth
perception. Perception 31(10): 1211–1219.
Miller, E. K., Gochin, P. M., and Gross, C. G. (1993). Suppression of visual responses of neurons in inferior
temporal cortex of the awake macaque by addition of a second stimulus. Brain Res 616: 25–29.
Mojica, A. J. and Peterson, M. A. (2014). Display-wide Influences on Figure-Ground Perception: The Case
of Symmetry. Attention, Perception, & Performance, doi: 10.3758/s13414-014-0646-y.
Moran, J. and Desimone, R. (1985). Selective attention gates visual processing in the extrastriate cortex.
Science 229: 782–784.
Navon, D. (2011). The effect of recognizability on figure-ground processing: does it affect parsing or only
figure selection? Q J Exp Psychol 64(3): 608–624.
Palmer, S. E. and Brooks, J. L. (2008). Edge-region grouping in figure-ground organization and depth
perception. J Exp Psychol: Hum Percept Perform 34(6): 1353–1371.
Palmer S. E. and Ghose T. (2008). Extremal edges: a powerful cue to depth perception and figure-ground
organization. Psychol Sci 19(1): 77–84.
Peterson, M. A. (1994). The proper placement of uniform connectedness. Psychonom Bull Rev 1: 509–514.
Peterson, M. A. (1999a). Organization, segregation, and recognition. Intellectica 28: 37–51.
Peterson, M. A. (1999b). What’s in a stage name? J Exp Psychol: Hum Percept Perform 25: 276–286.
Peterson, M. A. (2001). Object perception. In: Blackwell Handbook of Perception, edited by E. B. Goldstein,
pp. 168–203 (Oxford: Blackwell).
Peterson, M. A. (2003a). Overlapping partial configurations in object memory: an alternative solution to classic
problems in perception and recognition. In: Perception of Faces, Objects, and Scenes: Analytic and Holistic
Processes, edited by M. A. Peterson and G. Rhodes, pp. 269–294 (New York: Oxford University Press).
Peterson, M. A. (2003b). On figures, grounds, and varieties of amodal surface completion. In: Perceptual
Organization in Vision: Behavioral and Neural Perspectives, edited by R. Kimchi, M. Behrmann, and
C. Olson, pp. 87–116 (Mahwah, NJ: LEA).
Low-level and High-level Contributions to Figure–Ground Organization 279
Peterson, M. A. (2012). Plasticity, competition, and task effects in object perception. In rom Perception to
Consciousness: Searching with Anne Treisman. Ch. 11, edited by J. M. Wolfe, and L. Robertson, pp. 253–262.
Peterson, M. A., & Cacciamani, L. (2013). Toward a dynamical view of object perception. In: Shape
Perception in Human and Computer Vision: an Interdisciplinary Perspective, edited by S. Dickinson and
Z. Pizlo, pp. 445–459 (Berlin: Springer).
Peterson, M. A. and Enns, J. T. (2005). The edge complex: Implicit perceptual memory for cross-edge
competition leading to figure assignment. Percept Psychophys 4: 727–740.
Peterson, M. A. and Gibson, B. S. (1993). Shape recognition contributions to figure-ground organization in
three-dimensional displays. Cogn Psychol 25: 383–429.
Peterson, M. A. and Gibson, B. S. (1994a). Object recognition contributions to figure-ground
organization: operations on outlines and subjective contours. Percept Psychophys 56: 551–564.
Peterson, M. A. and Gibson, B. S. (1994b). Must figure-ground organization precede object recognition?
An assumption in peril. Psychol Sci 5: 253–259.
Peterson, M. A. and Kimchi, R. (2013). Perceptual organization. In: Handbook of Cognitive Psychology,
edited by D. Reisberg, pp. 9–31 (Oxford: Oxford University Press).
Peterson, M. A. and Lampignano, D. L. (2003). Implicit memory for novel figure–ground displays includes
a history of border competition. J Exp Psychol: Hum Percept Perform 29: 808–822.
Peterson, M. A. and Salvagio, E. (2008). Inhibitory competition in figure-ground perception: context and
convexity. J Vision 8(16): 4, doi:10.1167/8.16.4.
Peterson, M. A. and Skow, E. (2008). Suppression of shape properties on the ground side of an
edge: evidence for a competitive model of figure assignment. J Exp Psychol: Hum Percept Perform
34(2): 251–267.
Peterson, M. A., Harvey, E. H., and Weidenbacher, H. L. (1991). Shape recognition inputs to figure-ground
organization: which route counts? J Exp Psychol: Hum Percept Perform 17: 1075–1089.
Peterson, M. A., de Gelder, B., Rapcsak, S. Z., Gerhardstein, P. C., and Bachoud-Lévi, A.-C. (2000).
Object memory effects on figure assignment: conscious object recognition is not necessary or sufficient.
Vision Res 40: 1549–1567.
Peterson, M. A., Cacciamani, L., Mojica, A. J., and Sanguinetti, J. L. (2012a). The ground side of a
figure: shapeless but not meaningless. Gestalt Theory 34(3/4): 297–314.
Peterson, M. A., Cacciamani, L., Barense, M. D., and Scalf, P. E. (2012b). The perirhinal cortex modulates
V2 activity in response to the agreement between part familiarity and configuration familiarity.
Hippocampus 22: 1965–1977.
Pomerantz, J. R. and Kubovy, M. (1986). Theoretical approaches to perceptual organization. In: Handbook
of Perception and Human Performance, Vol. II, edited by K. R. Boff, L. Kaufman, and J. P. Thomas, pp.
36:1–46 (New York: John Wiley and Sons).
Poort, J., Raudies, F., Wannig, A., Lamme, V. A., Neumann, H., and Roelfsema, P. R. (2012). The role of
attention in figure-ground segregation in areas V1 and V4 of the visual cortex. Neuron 75(1): 143–156.
Qiu, F. T. and von der Heydt, R. (2005). Figure and ground in the visual cortex: V2 combines stereoscopic
cues with Gestalt rules. Neuron 47: 155–166.
Qiu, F. T., Sugihara, T., and von der Heydt, R. (2007). Figure-ground mechanisms provide structure for
selective attention. Nat Neurosci 10(11): 1492–1499.
Reynolds, J. H. and Chelazzi, L. (2004). Attentional modulation of visual processing. Ann Rev Neurosci
27: 611–647.
Reynolds, J. H., Chelazzi, L., and Desimone, R. (1999). Competitive mechanisms subserve attention in
macaque areas V2 and V4. J Neurosci 19: 1736–1753.
Rolls, E. T. and Tovee, J. (1995). The responses of single neurons in the temporal visual cortical areas
of the macaque when more than one stimulus is present in the receptive-field. Exp Brain Res
103: 409–420.
280 Peterson
Rubin, E. (1958/1915). Figure and ground. In: Readings in Perception, edited by D. C. Beardslee and
M. Wertheimer, pp. 194–203 (Princeton, NJ: Van Nostrand) (original work published 1915).
Salvagio, E. M., Cacciamani, L., and Peterson, M. A. (2012). Competition-strength-dependent ground
suppression in figure-ground perception. Attention, Percept Perform 74(5): 964–978.
Sanguinetti, J. L., Allen, J. J. B., and Peterson, M. A. (2014). The ground side of an object: perceived as
shapeless yet processed for semantics. Psychol Sci, 25(1), 256–264.
Schafer, R. and Murphy, G. (1943). The role of autism in a visual figure–ground relationship. J Exp
Psychol 2: 335–343.
Sejnowski, T. J. and Hinton, G. E. (1987). Separating figure from ground with a Boltzmann machine.
In: Vision, brain, and cooperative computation, edited by M. A. Arbib and A. Hanson, pp. 703–724
(Cambridge, MA: MIT Press).
Serre, T., Oliva, A. and Poggio, T. A. (2007). A feedforward architecture accounts for rapid categorization.
Proc Natl Acad Sci USA 104(15): 6424–6429.
Snodgrass, J. G. and Vanderwart, M. (1980). A standardized set of 260 pictures: norms for name
agreement, image agreement, familiarity, and visual complexity. J Exp Psychol: Hum Learning Memory
6(2): 174–215.
Thorpe, S., Fize, D., and Marlot, C. (1996). Speed of processing in the human visual system. Nature
381: 520–522.
Treisman, A., and DeSchepper, B. (1996). Object tokens, attention, and visual memory. In Attention and
performance, XVI: Information integration in perception and communication, edited by T. Inui and
J. McClelland, pp. 15–46. Cambridge, MA: MIT Press.
Vecera, S. P. and Farah, M. J. (1997). Is visual image segmentation a bottom-up or an interactive process?
Percept Psychophys 59: 1280–1296.
Vecera, S. P., Flevaris, A. V., and Filapek, J. C. (2004). Exogenous spatial attention influences figure–ground
assignment. Psychol Sci 15: 20–26.
Vecera, S. P. and O’Reilly, R. C. (1998). Figure–ground organization and object recognition processes: an
interactive account. J Exp Psychol: Hum Percept Perform 24: 441–462.
Vecera, S. P. and O’Reilly, R. C. (2000). Graded effects in hierarchical figure–ground organization: a reply
to Peterson (1999). J Exp Psychol: Hum Percept Perform 26: 1221–1231.
Vecera, S. P. and Palmer, S. E. (2006). Grounding the figure: contextual effects of depth planes on
figure-ground organization. Psychonom Bull Rev 13: 563–569.
Vecera, S. P., Vogel, E. K., and Woodman, G. F. (2002). Lower-region: a new cue for figure–ground
assignment. J Exp Psychol: Gen 131: 194–205.
Wagemans, J., Elder, J. H., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M., and von der Heydt, R.
(2012). A century of Gestalt psychology in visual perception I. Perceptual grouping and figure–ground
organization. Psychol Bull 138(6): 1172–1217.
Weisstein, N. and Wong, E. (1987). Figure-ground organization affects the early processing of information.
In: Vision, Brain, and Cooperative Computation, edited by M. A. Arbib and A. R. Hanson, pp. 209–230
(Cambridge, MA: MIT Press).
Wertheimer, M. (1923/1938). Laws of organization in perceptual forms. In: A Source Book of Gestalt
Psychology, edited by W. D. Ellis, pp. 71–94) (London: Routledge and Kegan Paul) (original work
published 1923).
Yonas, A., Craton, L. G., and Thompson, W. B. (1987). Relative motion: kinetic information for the order
of depth at an edge. Percept Psychophys 41(1): 53–59.
Zhou, H., Friedman, H. S., and von der Heydt, R. (2000). Coding of border ownership in monkey visual
cortex. J Neurosci 20: 6594–6611.
Chapter 14
Holes have special ontological, topological, and visual properties. Perhaps because of these they
have attracted great interest from many scholars. In this chapter, we discuss these properties,
and highlight their interactions. For instance, holes are not concrete objects, their existence in
perception is, therefore, an exception to the general principle, grounded in evolution, that the
visual system parses a scene into regions corresponding to concrete objects. In 1948, Rudolf
Arnheim discussed the role of holes in the sculptures of Henry Moore. Arnheim’s analysis was
informed by Gestalt principles of figure-ground. In the case of holes within sculptures, given
their relative closure and compactness, Arnheim detected a sense of presence. It is worth report-
ing his words here as this ambiguity is precisely the issue that has been central to much later
work: ‘Psychologically speaking, these statues […] do not consist entirely of bulging convexi-
ties, which would invade space aggressively, but reserve an important role to dells and caves and
pocket-shaped holes. Whenever convexity is handed over to space, partial “figure”-character is
assumed by the enclosed air-bodies, which consequently appear semi-substantial’ (Arnheim,
1948, p. 33).
This chapter starts with a discussion of the ontology and topology of holes. In the last part of
the chapter, the focus will be on the role of holes in the study of figure-ground organization and
perception of shape.
Ontology
In philosophy, ontology is the study of the nature of being, and of the basic categories of being and
their relationships. The ontology of holes moves from the prima facie linguistic evidence that we
make statements about holes, thus presupposing their extra-mental existence. At the same time,
holes appear to be absences, thus non-existing items. Therefore, if they exist, they are sui generis
objects. Within the debate on the nature of holes, materialism maintains that nothing exists in the
world, but concrete material objects, thus holes should be explained away by reference to proper-
ties of objects (Lewis & Lewis, 1983). Others, by contrast, maintain that holes exist, even though
they are not material (Casati & Varzi, 1994; 1996). If we accept that holes exist, further problems
must be addressed. For example, whether holes exist independently of the object in which they
find themselves, whether they should be equated with the hole linings (and thus be considered as
material parts of material objects), and whether one can destroy a hole by filling it up (as opposed
to ending up with a filled hole).
To consider holes as existing extra-mentally is no trivial assumption. There are some advan-
tages, such as the possibility of describing the shape of a holed object by referring to the shape of
the hole in it. For example we can describe a star-shaped hole in a square-shaped object. If holes
282 Bertamini and Casati
could not be referred to directly, the description of the same configuration would be awkward
(Figure 14.1a).
However, if holes exist, they are not material objects. Yet they possess geometric properties,
and therefore there are some entities with geometric properties that are not objects. This would
entail that Gestalt rules can fail in parsing the visual scene into objects. However, if holes have
shape-like figures, they do not prevent the visual area corresponding to their shape from being
seen as ground. Therefore, the same area can behave as figure and ground at the same time, which
is, prima facie, problematic for theories of figure-ground segmentation and for the principle of
unidirectional contour ownership (Koffka, 1935). Border ownership is covered in detail in Kogo
and van Ee, this volume.
Various solutions exist. Some may wonder whether ontology is relevant for the study of visual
perception. There may exist a property such that anything that is a hole has that property, but this
does not entail that to have the impression of seeing a hole one must visually represent that very
property—holes can be immaterial bodies or negative parts of objects (Hoffman and Richards,
(a)
(b)
Fig. 14.1 (a) The cognitive advantage of holes: the object is easily described as a blue square with a
star-shaped hole. A description of the shape of the object that does not mention the shape of the
hole would be more difficult. (b) Evidence for naïve topology: two solids that mathematical topology
cannot distinguish, but that appear quite different to common-sense classifications.
Reproduced from Casati, Roberto, and Achille C. Varzi., Holes and Other Superficialities, figure: “Cognitive
advantage of holes”, © 1994 Massachusetts Institute of Technology, by permission of The MIT Press.
Figures and Holes 283
1984), or portions of object boundaries, and perception may be blind to their real nature, although
still delivering the impression of perceiving a hole (Siegel, 2009). Alternatively, one may suggest
that the process of figure-ground organization misfires in the case of holes, whose Gestalt proper-
ties erroneously trigger the ‘figure’ response. That is, holes are (rare) exceptions. Another solution
is to say that holes have a special ‘tag’ as the missing part of an object (Nelson et al., 2009). The solu-
tion that requires fewer changes to Gestalt principles, however, is to say that the shape properties of
the hole are a property of the object-with-hole, just like the large concavity in a letter C. These prop-
erties do not make the hole or the concavity of the letter C into a figure in the sense of foreground.
What is meant by figure in figure-ground organization is not just something that has shape, but
something that is more specific and is closely linked to surface stratification. In all these cases,
the visual system makes important decisions about whether holes exist, and about their nature as
objects or quasi-objects. Some developmental findings comfort this hypothesis. Giralt and Bloom
(2000) found that 3-year-old children can already classify, track, and count holes. Therefore, there
is good evidence that the human perceptual system takes holes seriously into account.
Topology
Holes play an important part in topology, a branch of mathematics dealing with spatial prop-
erties. Topological shape-invariance is intuitively understood by imagining that objects are
rubber-sheet. In particular, the concept of homotopy classification is used to describe the differ-
ence between shapes. Two objects are topologically equivalent if it is possible to transform one
of them into the other by just stretching it, without cutting or gluing at any place. Thus, a cube is
topologically equivalent to a sphere, but neither is equivalent to a doughnut. This classification,
in non-technical terms, measures the number of holes in an object. For instance, all letters of the
alphabet used in this chapter belong to one of three classes respectively with zero (the capital L),
one hole (capital A) or two holes (capital B). Capital L is topologically equivalent to capital I, Y,
and V. This explains the joke that says that a topologist cannot distinguish a mug from a doughnut
(assuming the mug has a handle, they both have just one hole).
The joke about topologists hints at a psychologically interesting distinction. Intuitive topologi-
cal classifications of objects are not well aligned with topological classifications. As there is a naïve
physics that departs from standard physics, there appears to be a naïve topology that does not
coincide with mathematical topology. For instance, a cube perforated with a Y-shaped hole is
topologically equivalent to a cube perforated with two parallel I-shaped holes, surprising as this
may appear (Figure 14.1b). Moreover, a knot in a hole is invisible to mathematical topology. Naïve
topology uses both objects and holes to classify shapes.
Within vision science, Chen has argued that extraction of topological properties is a fundamen-
tal function of the visual system, and that topological perception is prior to the perception of other
featural properties (for a review, see Chen, 2005; see Casati, 2009, for a criticism). There is some
empirical evidence in support of this claim. In particular, Chen has shown that human observers
are better at discriminating pairs of shapes that are topologically different than pairs that are topo-
logically the same (Chen, 1982) and Todd et al. (1998) have found that in a match-to-sample task
performance was highest for topological properties, intermediate for affine properties, and lowest
for Euclidean properties. More recently, Wang et al. (2007) reported that sensitivity to topological
properties is greater in the left hemisphere, and Zhou et al. (2010) have found that topological
changes disrupt multiple objects tracking.
Holes play an important role in studies of topology, and topology is useful in explaining some
perceptual phenomena. However, in this context, holes are defined as an image property. In other
284 Bertamini and Casati
words, the letter O is an example of a hole whether or not this is perceived as a black object in
front of a white background. The depth order of the white and black regions is irrelevant, and the
experiments cited above did not try to establish whether observers perceived the region inside the
hole as showing a surface at greater depth than the object itself.
Let us take the phenomenon of configural superiority (Figure 14.2) studied by Pomerantz
(2003; Pomerantz, Sager, & Stoever et al., 1977; see also Pomerantz chapter, this volume) and
discussed also in Chen (2005). This effect may be taken to demonstrate the salience of perception
of a hole over individual sloped lines. However, ‘closure’ may be a better term for this configu-
ral property. That is, because depth order is not important, this concept of hole is closer to the
concept of closure. This is consistent with the literature, because closure is a factor that enhances
shape detection (Elder & Zucker, 1993) and modulates shape adaptation (Bell et al., 2010). Note
that closure is on a continuum: even contours that are not closed in a strict image sense can be
more or less closed perceptually (Elder & Zucker, 1994). This quantitative aspect of closure is
important for the concept of hole, because it makes a hole simply the extreme of a continuum
of enclosed regions and not something unique. Moreover, if closure is sufficient to define holes
then any closed contour creates a hole, which makes holes very common, whereas true holes (i.e.
apertures) are relatively rare.
background (depth order), but also a property of the foreground (contour ownership) then they
are not useful in the study of general figure-ground effects, as these would not generalize to other
ground regions. We will return to this problem after the discussion of the empirical evidence.
It is informative to attempt to draw on a piece of paper something that will be perceived imme-
diately as a visual hole. In so doing, one discovers that this is a difficult task, and for good reasons.
A finite and enclosed region of an image, such as a circle, tends to be perceived as foreground
because of factors such as closure and relative size (the closed contour is smaller relative to the
page). Therefore, other factors must be present to reverse this interpretation.
Fig. 14.3 Figural factors affecting the perception of holes: the hole percept is stronger in the top
element of each pair. (a) Arnheim (1954) claimed that globally concave shapes tend to be seen as holes.
This figure shows an extreme version of his demonstration in which the set of smooth contour segments
are identical in both cases (they are just arranged differently) and have, therefore, the same curvature
and the same total length. For a version with equal area see Bertamini (2006). Most observers, when
forced to choose, select the shape on the top as a better candidate for being a hole. (b) Bozzi (1975)
used the example of a square within a square to show the role of the relationship between contours, a
hole is perceived when edges are parallel. (c) Effect of grouping factors, such as similarity of texture or
color (Nelson and Palmer, 2001). (d) Effect of high entropy (lines with random orientation).
Reproduced Barbara Gillam and Philip M. Grove, Contour entropy: A new determinant of perceiving ground or a
hole, Journal of Experimental Psychology: Human Perception and Performance, 37(3), 750–757 © 2011, American
Psychological Association.
286 Bertamini and Casati
However, neither of the two is unambiguously perceived as a hole, so the key to the demonstra-
tion is to ask a relative judgment: which one of the two appears more like a hole. Bertamini (2006)
found that when asked this question most observers chose the concave shape, as predicted by
Arnheim.
Bozzi (1975) made phenomenological observations on the conditions necessary for the per-
ception of holes. The figure that contains the hole should have a visible outer boundary (unlike
the Arnheim examples), there should be evidence that the background visible inside the hole
is the same as the background outside, and the boundary of the hole should be related to the
outer boundary of the object, for instance when contours are parallel as in the frame of a window
(Figure 14.3b).
An early empirical study on the conditions necessary for perception of holes was conducted by
Cavedon (1980). She found that observers did not report seeing a hole even when a physical hole was
present if there were no detectable depth cues. In a more recent list of factors that affect the perception
of a hole, Nelson and Palmer (2001) reported that in addition to depth information grouping factors
are also important because they make the region visible inside a hole appear as a continuation of the
larger background (for instance because both have the same texture, Figure 14.3c). Another impor-
tant contribution to the perception of a hole is information that makes the relationship between the
shape of the hole and the shape of the object appear non-accidental. The evidence from Nelson and
Palmer (2001) confirmed the observation by Bozzi (1975). If a white region is centred inside a black
region it is more likely to be perceived as a hole than if it is slightly crooked.
Gillam and Grove (2011) have shown that properties of the ground itself may be important
to generate the percept of a hole. Specifically, they found that a simple rectangle appears more
hole-like when the entropy of the enclosed contours is greater. This can be seen by comparing a
region with multiple lines of different orientations (high entropy) and a region with parallel lines
(low entropy) (Figure 14.3d). A final factor that strongly affects figure-ground stratification is
shading. For instance, Bertamini and Helmy (2012) used shading to create the perception of holes
(described later, see also Figure 14.6).
Bertamini and Hulleman (2006) explored the appearance of surfaces seen through holes. In par-
ticular, they tested whether the surface seen under multiple holes is a single amodally-completed
surface or whether the background takes on the shape of the complement of the hole (i.e. the
contour of the hole itself). Observers found it difficult to judge the extension of these amodal
surfaces, and were affected by the context (flanking objects). It is interesting that a hole can show
a surface without any information about the bounding contours of that surface. Therefore, the
shape of this object is not specified by any form of contour extrapolation (see chapter on percep-
tual completions). The shape of the hole may still constrain what is hidden in terms of probabili-
ties (Figure 14.4). For example, given a few basic assumptions, underneath a vertically-orientated
hole the value of the posterior probability is greater for a vertically-orientated rectangle than a
horizontal one (Bertamini & Hulleman, 2006).
In another set of observations, Bertamini and Hulleman (2006) used stereograms to test holes
that were moving. If a visual hole has an existence independent of the object-with-hole, perhaps it
can move independently from that very object. However, a substantial proportion of participants
perceived a lens in the aperture of the hole. Also, for objects in which texture changed as they
moved (as in would within a hole), the percept was that of detachment of the contour from the
texture inside the contour. In all cases where there was accretion/deletion of texture on the figural
side, this resulted in detachment of texture, and introduction of a lens-like/spotlight-like appear-
ance. With respect to visual hole the most important finding was that there was strong resistance
to perceive holes moving independently from the object-with-hole.
Figures and Holes 287
Fig. 14.4 Assuming that the three grey regions are perceived as holes, what is the shape of the
underlying grey surface? Unlike other completion phenomena there is no contour continuation. One
solution is a single grey object underneath all three holes, a second solution is three shapeless blobs,
and finally, as shown by the dashed lines, the contour of the holes, albeit perceived on a different
depth plane, can constrain the possible hidden objects.
suggested by Palmer, holes may have ‘a quasi-figural status, as far as shape analysis is concerned’
(Feldman & Singh, 2005, p. 248).
Fig. 14.5 Colour and shading are powerful ways to affect figure-ground. On the left we perceive
surfaces on top of other surfaces but on the right we perceive holes. The convexity (+) and concavity
(–) of the vertices is labeled to highlight the complete reversal that takes place with a figure-ground
reversal. The hexagon on the top row has only one type of vertices, these are convex (figure) or
concave (hole). The hexagon on the bottom row has both types, and they all reverse as we move
from figure to hole.
Baylis and Driver (1993) have shown that closure of the shape improves performance, i.e. there is
a within object advantage. However, as pointed out by Gibson (1994) one has to be careful when
comparing vertices that can be perceived as convex or concave. In particular, the object on top has
convex vertices and the one at the bottom has concave vertices.
To manipulate the coding of convexity while retaining the same hexagonal shapes, Bertamini and
Croucher (2003) compared figures and holes. This is the manipulation illustrated in Figure 14.5,
although color and texture were used as figural factors rather than shading. Note that this can be
seen as a 2×2 design in which the convexity of the critical vertices varies independently of the
overall shape of the hexagon. Results confirmed that figure-ground reversal had an effect on task
difficulty: performance was better when the vertices were perceived as convex. In other words,
the coding of the vertices as convex or concave was more important that the overall shape of the
hexagon. The reason it is easier to judge the position of convex vertices is likely to be that there is
an explicit representation of position for visual parts, and convexities specify parts (Koenderink,
1990; Hoffman & Richards, 1984). Therefore, the different convexity coding for figures and holes
implies a different part structure in the two cases.
290 Bertamini and Casati
The advantage for judging the position of convex vertices (as opposed to concave) is supported
by evidence that does not rely on holes (Bertamini, 2001), but holes do provide the most direct
test of the role of convexity. Holes have been used in subsequent studies by Bertamini and Mosca
(2004), and Bertamini and Farrant (2006). Using random dot stereograms Bertamini and
Mosca (2004) could ensure that there was no ambiguity in figure-ground relations. In a random
dot stereogram, no shape information is available until images have been binocularly fused and,
therefore, depth order is established at the same time as shape information. In this sense, unlike
texture, shading, and other factors that can create a hole percept, random dot stereograms create
holes that cannot be perceived any other way. Bertamini and Mosca’s (2004) experiments con-
firmed that the critical factor in affecting relative speed on this task was whether the region was
seen as foreground or background, thus changing contour ownership.
The explanation of the effect relies on the assumption is that the contour of a silhouette is per-
ceived as the rim of an opaque object. To test this Bertamini and Farrant (2006) compared objects
and holes to a third case, that of thin (wire-like) objects. As a thin line tends to be perceived as
the contour of a surface, these thin objects, which are both objects and holes, can only be cre-
ated within random dot stereograms. Bertamini and Farrant confirmed that holes created by thin
objects are different in terms of performance from both objects and holes. They concluded that
thin wire-like objects have a different perceived part structure, which is intermediate between that
of objects and that of holes.
Albrecht et al. (2008) studied holes with a cueing paradigm. It is known that responses to uncued
locations are faster for probes that are located on the cued surface compared with the uncued sur-
face (Egly et al., 1994). This is taken as evidence of object-based attention. Albrecht et al. (2008)
compared surfaces with identical rectangular regions perceived as holes. Stereograms were used
to ensure that holes were perceived as such. The object-based advantage was not found for holes
when the background surface visible through the holes was shared by the two holes, but the effect
was present when this background was split, so that different objects were visible through different
holes. The findings show clearly that the important factor in deployment of attention is not just the
closure of the contours, as this was the same for the rectangles perceived as objects and as holes,
but the perceptual organization of the regions as different surfaces in depth. The region cued inside
a hole is the background surface, consistently with the idea that a hole is a ground region. That is,
what is seen inside the hole belongs to a surface that extends beyond the contour of the aperture.
Another paradigm that has been used to study attention is that of multiple objects tracking, in
which observers track moving items among identical moving distractors (Pylyshyn, & Storm, 1988;
Scholl, 2009). Horowitz and Kuzmova (2011) compared performance when tracking figures and when
tracking holes. Holes were as easy to track as figures. Therefore, Horowitz and Kuzmova concluded
that holes are proto-objects, that is, bundles that serve as tokens to which attention can be deployed.
The results from multiple objects racking are consistent with the results from visual search tasks.
Observers can find and attend to locations where a hole is present. How far can we go in perceiv-
ing holes and their shape as if they were the same as objects? To answer that question Bertamini
and Helmy (2012) used a shape interference task. Observers were presented with simple shapes
and had to discriminate a circle from a square (see Figure 14.6). However, there was also an irrel-
evant surrounding contour that could be either a circle or a square. Different (incongruent) inside
and outside contours produced interference, but the effect was stronger when they formed an
object-with-hole, as compared with a hierarchical set of surfaces or a single hole separating differ-
ent surfaces (a trench). This result supports the hypothesis that the interference is constrained by
which surface owns the contour, and that the shape of a hole cannot be processed independently
of the shape of the object-with-hole.
Figures and Holes 291
Congruent
Incongruent
Fig. 14.6 In the top row there is a square contour surrounded by another square contour. This is
true for both the object and the hole. In the bottom row there is a square contour surrounded by a
circular contour. Therefore, these are examples in which the two contours are congruent (same) or
incongruent (different). What is different between objects and holes is that in the case of holes the
surrounding contour is part of the same surface that also defines the hole.
Conclusions
This chapter has shown the surprisingly large range and diversity of the studies of holes. Some
authors have focused on the nature of holes. We have seen the implications of this characteriza-
tion for accounts of the perception of holes. Can they act as objects or at least as proto-objects?
Other authors have used holes because they are convenient stimuli to manipulate key variables, in
particular figure-ground and contour ownership.
We can confidently say that humans are not blind to holes. Observers can remember the shape
of holes, they can search among holes and they can perform multiple tracking of holes. For some
tasks there is little difference between holes and objects. Therefore, the more difficult question to
answer is to what extent holes are treated by vision on a par with objects, and conversely to what
extent they are different from other ground regions.
In terms of local coding of convexity, it appears that holes are not similar to objects and that
convexity is assigned relative to the foreground surface (Bertamini & Mosca, 2004). In terms of
global shape analysis, here also the shape of a hole cannot be treated independently of the shape of
the foreground surface that is the object-with-hole (Bertamini & Helmy, 2012). On the one hand,
this makes holes less of a curiosity in the sense that they are not an exception to the principles
of figure-ground, and in particular they are not an exception to the principle of unidirectional
contour ownership (Bertamini, 2006). On the other hand holes as ground regions provide the
292 Bertamini and Casati
ideal comparison for their complements. We can compare congruent contours perceived as either
objects (foreground) or holes (background) to test the role of a change in figure-ground relation-
ships while at the same time factors such as shape, size, and closure are fixed.
References
Albrecht, A. R., List, A., & Robertson, L. C. (2008). Attentional selection and the representation of holes
and objects. J Vision 8(13): 1–10.
Arnheim, R. (1948). The holes of Henry Moore: on the function of space in sculpture. J Aesthet Art
Criticism 7(1): 29–38.
Arnheim, R. (1954). Art and Visual Perception: A Psychology of the Creative Eye (Berkeley: University of
California Press).
Baylis, G. C., & Driver, J. (1993). Visual attention and objects: evidence for hierarchical coding of location.
J Exp Psychol Hum Percept Perform 19(3): 451–470.
Bell, J., Hancock, S., Kingdom, F. A. A., & Peirce, J. W. (2010). Global shape processing: which parts form
the whole? J Vision 10(6): 16.
Bertamini, M., (2001). The importance of being convex: An advantage for convexity when judging position.
Perception, 30: 1295–1310.
Bertamini, M. (2006). Who owns the contour of a hole? Perception 35: 883–894.
Bertamini, M., & Croucher, C. J. (2003). The shape of holes. Cognition 87: 1, 33–54.
Bertamini, M., & Farrant, T. (2006). The perceived structural shape of thin (wire-like) objects is different
from that of silhouettes. Perception 35: 1265–1288.
Bertamini, M., & Helmy, M. S. (2012). The shape of a hole and that of the surface-with-hole cannot be
analysed separately. Psychonom Bull Rev 19: 608–616.
Bertamini, M., & Hulleman, J. (2006). Amodal completion and visual holes (static and moving). Acta
Psychol 123: 55–72.
Bertamini, M., & Lawson, R. (2006). Visual search for a figure among holes and for a hole among figures.
Percept Psychophys 58: 776–791.
Bertamini, M., & Mosca, F. (2004). Early computation of contour curvature and part structure: Evidence
from holes. Perception 33: 35–48.
Bertamini, M., & Wagemans, J. (2012). Processing convexity and concavity along a 2D
contour: figure-ground, structural shape, and attention. Psychonom Bull Rev 20(2): 197–207.
Bozzi, P. (1975). Osservazione su alcuni casi di trasparenza fenomenica realizzabili con figure a tratto.
In Studies in Perception: Festschrift for Fabio Metelli, edited by G. d’Arcais, pp. 88–110. Milan/
Florence: Martelli- Giunti.
Casati, R. (2009) Does topological perception rest on a misconception about topology? Philosoph Psychol
22(1): 77–81.
Casati, R., & Varzi, A. C. (1994). Holes and Other Superficialities. Cambridge, MA: MIT Press.
Casati, R., & Varzi, A. C. (1996). Holes. In The Stanford Encyclopedia of Philosophy, edited by
Edward N. Zalta. Available at: http://plato.stanford.edu/
Cavedon, A. (1980). Contorno e disparazione retinica come determinanti della localizzazione in profondità: le
condizioni della percezione di un foro. Università di Padova Istituto di Psicologia Report 12.
Chen, L. (1982). Topological structure in visual perception. Science 218: 699–700.
Chen, L. (2005). The topological approach to perceptual organization. Visual Cogn 12(4): 553–637.
Egly R., Driver, J., & Rafal, R. D. (1994). Shifting visual attention between objects and locations: evidence
from normal and parietal lesion subjects. J Exp Psychol Gen 123: 161–177.
Elder, J. H., & Zucker, S. W. (1993). The effect of contour closure on the rapid discrimination of
two-dimensional shapes. Vision Research 33(7): 981–991.
Figures and Holes 293
Elder, J. H., & Zucker, S. W. (1994). A measure of closure. Vision Res 34(24): 3361–3369.
Feldman, J., & Singh, M. (2005). Information along contours and object boundaries. Psychol Rev
112: 243–252.
Gibson, B. S. (1994). Visual attention and objects: one versus two or convex versus concave? J Exp Psychol
Hum Percept Perform 20(1): 203–207.
Gillam, B. J., & Grove, P. M. (2011). Contour entropy: a new determinant of perceiving ground or a hole.
Journal of experimental psychology. Hum Percept Perform 37(3): 750–757.
Giralt, N., & Bloom, P. (2000). How special are objects? Children’s reasoning about objects, parts, and
holes. Psychol Sci 11(6): 497–501.
Hoffman, D. D., & Richards, W. (1984) Parts of recognition. Cognition 18: 65–96.
Horowitz, T. S., & Kuzmova, Y. (2011). Can we track holes? Vision Res 51(9): 1013–1021.
Hulleman, J. & Humphreys, G. W. (2005). The difference between searching amongst objects and searching
amongst holes. Perception & Psychophysics, 67: 469–482.
Kanizsa G., & Gerbino W. (1976). Convexity and symmetry in figure-ground organization. In Vision and
Artifact, edited by M. Henle, pp. 25–32. New York: Springer.
Koenderink, J. J. (1990). Solid Shape. Cambridge, MA: MIT Press.
Koffka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt.
Lewis, D., & Lewis, S. (1983). Holes. In Philosophical Papers, edited by D. Lewis, Vol. 1, pp. 3–9.
New York: Oxford University Press.
Nelson, R., & Palmer, S. E. (2001). Of holes and wholes: the perception of surrounded regions. Perception
30: 1213–1226.
Nelson, R., Thierman, J., & Palmer, S. E. (2009). Shape memory for intrinsic versus accidental holes.
Attention, Percept Psychophys 71: 200–206.
O’Toole, A. J., & Walker, C. L. (1997). On the preattentive accessibility of stereoscopic disparity: Evidence
from visual search. Percept Psychophys 59: 202–218.
Palmer, S. E. (1999). Vision Science: Photons to Phenomenology. Cambridge, MA: MIT Press.
Palmer, S. E., Davis, J., Nelson, R., & Rock, I. (2008). Figure-ground effects on shape memory for objects
versus holes. Perception 37: 1569–1586.
Pomerantz, J. R. (2003). Wholes, holes, and basic features in vision. Trends Cogn Sci 7(11): 471–473.
Pomerantz, J. R., Sager, L. C., & Stoever, R. J. (1977). Perception of wholes and of their component
parts: some configural superiority effects. J Exp Psychol Hum Percept Perform 3(3): 422–435.
Pylyshyn, Z. W., & Storm, R. W. (1988). Tracking multiple independent targets: evidence for a parallel
tracking mechanism. Spatial Vision 3(3): 1–19.
Rubin E., (1921). Visuell wahrgenommene Figuren. Copenhagen: Gyldendals.
Scholl, B. J. (2009). What have we learned about attention from multiple object tracking (and vice versa)?
in Computation, Cognition, and Pylyshyn, edited by D. Dedrick & L. Trick, pp. 49–78. Cambridge,
MA: MIT Press.
Siegel, S. (2009). The visual experience of causation. Philosoph Q 59(236): 519–540.
Todd, J., Chen, L., & Norman, F. (1998). On the relative salience of Euclidean, affine, and topological
structure for 3-D form discrimination. Perception 27: 273–282.
Wang B., Zhou T. G., Zhuo Y., and Chen L. (2007). Global topological dominance in the left hemisphere.
Proc Nat Acad Sci USA 104: 21014–21019.
Zhou, K., Luo, H., Zhou, T., Zhuo, Y., and Chen, L. (2010). Topological change disturbs object continuity
in attentive tracking. Proc Nat Acad Sci USA 107(50): 21920–21924.
Chapter 15
Perceptual completions
Rob van Lier and Walter Gerbino
Our chapter cover completions of fragmentary proximal stimuli, like those observed during the free viewing
1
of “incomplete” images. It does not cover the filling-in of sensory holes like the blind spot and scotomas (for
such cases see Pessoa et al. 1998; Pessoa and De Weerd 2003).
The French expression “compléments amodaux” (which appears for instance in the title of Michotte et al.
2
1964) has been occasionally translated into English as “amodal complements” (Jackendoff 1992, pp. 163–164),
but the prevalent contemporary usage is “amodal completion.” The difference between complement and com-
pletion points to the contrast between the phenomenological notion discussed by Michotte and Burke (1951)
and the idea that amodal complements are the product of an active process of completion, already present for
instance in Glynn (1954), who worked on the Rosenbach phenomenon under Michotte’s guidance.
Perceptual completions 295
(a) (b)
(c) (d)
Fig. 15.1 Demonstrations from Kanizsa (1955). (a) Illusory triangle induced by line endings and black
sectors with a 1/3 support ratio. (b) Scission of a black region into a foreground cross with modal fuzzy
margins over an amodally completed square with sharp margins. (c) An illusory rectangle induced by
truncated octagons with concave notches. (d) Four crosses holding the same collinear contours available
in the truncated octagons.
Reproduced from ‘Quasi-Perceptual Margins in Homogeneously Stimulated Fields’, Gaetano Kanizsa, in Susan Petry
and Glenn E. Meyer (eds) The Perception of Illusory Contours, pp. 40–49, DOI: 10.1007/978-1-4612-4760-9_4
Copyright © 1987, Springer-Verlag New York. With kind permission from Springer Science and Business Media.
was analyzed by Petter (1956), who examined several determining factors, including relative
length.
The Michotte school credited Helmholtz for the definition of the amodal vs. modal dichotomy
(Burke 1952, p. 405). Amodal data are experienced without the modal property of the sense that
conveys the information on which they depend (typically, color in the case of vision). Koffka
(1935) used the expression “representation without color” (p. 178) to qualify the amodal presence
of the ground portion behind the figure, and discussed the one-sided function of borders (p. 183)
introduced by Rubin (1915/1921) as a key aspect of perceptual organization, connected with the
“double representation” (p. 178) of image regions that split into a foreground modal surface and
an amodal background.3
Amodal completion has much in common with the so-called “interposition cue to depth” (Helmholtz 1867;
3
English translation, 1924, 3rd volume, pp. 283–284), a notion that, despite having been strongly criticized
(Ratoosh 1949; Chapanis and McCleary 1953; Dinnerstein and Wertheimer 1957), often appears in the con-
temporary depth literature without any proper reference to unification and stratification factors, which are at
the core of completion phenomena.
296 van Lier and Gerbino
The contrast between configurations c and d in Figure 15.1 (Kanizsa 1955, figures 20 and 21)
demonstrates the role of figural incompleteness in co-determining amodal and, consequently,
modal completions. Kanizsa (1987) criticized the tendency to maximize structural regularity as
an explanatory factor but this organizational principle remains at the heart of perceptual comple-
tion theories.
Amodal and modal completions are linked by (i) the causal hypothesis (the first causes the
second); and (ii) the identity hypothesis (they share common geometric constraints, as suggested
by Kellman and Shipley 1991; Shipley and Kellman 1992). Much research has been devoted to
clarify such issues.
Amodal completion
Let us distinguish local vs. global completions. Local completions depend on features at or near the
occlusion boundary, whereas global completions depend on properties of the whole visual pattern.
Local completions
According to local completion models the shape completed behind the occluder depends on the
properties of the incoming, partly occluded, contours. The local features par excellence signaling
occlusion and triggering amodal completion are T-junctions; they arise at intersections where one
contour continues and another contour ends at that intersection. The continuous contour most
of the times belongs to the occluding object (closer to the observer), whereas the other contour
belongs to the partly occluded object (farther away from the observer; Helmholtz 1867/1924;
Ratoosh 1949). The issue of border ownership has been elaborated further in various studies
(Nakayama et al. 1989; see also Singh, this volume).
While T-junctions comprise a powerful local cue for occlusion (although there are exceptions;
Buffart et al. 1981; Chapanis and McCleary 1953), the form of the occluded shape is a matter of
quite some debate and varies from linear continuations (Kanizsa 1979, 1985; Wouterlood and
Boselie 1992) to inflected curved contours (Takeichi et al. 1995). In an influential paper Kellman
and Shipley (1991) advocated the so-called relatability criterion. This criterion predicts comple-
tions by a smooth curve when linear extensions would meet behind the occluding surface at
angles of 90 degrees or larger. When linear extensions would meet at smaller angles no amodal
completion is predicted. In response, Wouterlood and Boselie (1992) argued that edges could
be relatable without triggering amodal completion, and also that edges could be nonrelatable,
but still trigger amodal completion. After that, also Tse (1999a,b), Singh (2004), and Anderson
(2007a) questioned the effectiveness of the relatability criterion. No doubt, however, that the ideas
of Kellman and Shipley had great impact on thinking about perceptual completions. Fantoni and
Gerbino (2003), for example, modeled contour completion by a so-called vector field combina-
tion. Here, interpolated trajectories result from an algorithm that computes the vectors represent-
ing good continuation and minimal path. The field model is sensitive to both the local geometry
of contour fragments and shape characteristics such as symmetry. The latter can be implemented
by weighting the relative influence of good continuation versus minimal path. Besides these geo-
metrical properties also retinal distances are taken into account.
Global completions
Global completions depend on shape regularities like symmetry (Buffart et al. 1981; Sekuler
1994; Sekuler et al. 1994; van Lier et al. 1994, 1995a, 1995b). The preferred completion can be
the result of converging local and global completion tendencies, like in Figure 15.2a, where the
Perceptual completions 297
(a) (b)
(c) (d)
Fig. 15.2 (a) An occlusion pattern for which local and global completion tendencies converge to the
same shape. (b) Occlusion pattern with diverging local (left) and (right) global completions; (c) Local and
global completions of self-occluding parts; given the perceived indented cube on the left, the upper right
preserves most symmetry and can be regarded as the global completion. (d) The two blobs at both sides
of the pillar are readily perceived as connected with each other.
(c) Reproduced Rob van Lier and Johan Wagemans, From images to objects: Global and local completions of self-
occluded parts, Journal of Experimental Psychology: Human Perception and Performance, 25 (6), pp. 1721–1741,
http://dx.doi.org/10.1037/0096-1523.25.6.1721 © 1999, American Psychological Association. (d) Reprinted from
Cognitive Psychology, 39(1), Peter Ulric Tse, Volume Completion, pp. 37–68, Copyright © 1999, with permission
from Elsevier.
preferred completion results from good continuation of the partly occluded contours and also
reveals a highly regular shape. The local and global tendencies may also diverge into different
shapes (Figure 15.2b).
The Structural Information Theory (SIT) initiated by Leeuwenberg (1969, 1971) and further
developed since then (van der Helm and Leeuwenberg 1991, 1996; see also van der Helm, this
volume) provides an account of global regularities by means of regularity-based coding rules and
combines it with the minimum principle (Hochberg and McAlister 1953). Buffart et al. (1981)
applied SIT to occlusion patterns and demonstrated that preferred completions yielded the
298 van Lier and Gerbino
simplest codes. However, other studies showed that observers do not always perceive the most
regular shapes (Boselie 1988; Wouterlood and Boselie 1992; Kanizsa 1985; Rock 1983).
Sekuler (1994; Sekuler et al. 1994) investigated the tendencies toward local and global comple-
tions and showed that for partly occluded shapes with abundant regularity (e.g., comprising both
vertical and horizontal axes of symmetry after completion), global completions prevailed. Sekuler
(1994) proposed a completion model in which local and global strategies act independently and
are weighted against each other (e.g., depending on the occurrence of symmetry axes). The diverg-
ing completion tendencies were also investigated by van Lier et al. (1994, 1995a, 1995b). They
provided an integrative account within SIT in which the perceptual complexity of an interpreta-
tion is not only determined by the regularity of the perceived shapes but also by the positional
regularities between the shapes, and by the degree of occlusion (van Lier et al. 1994; van Lier
2001). Crucially, the shape regularities increase the plausibility of an interpretation, whereas the
positional regularities (i.e., coincidental regularities; Rock 1983) decrease an interpretation’s plau-
sibility. van der Helm (2000) additionally argued that, within a Bayesian framework, the shape
and positional complexities can be related to priors and conditionals, respectively.
The influence of regularities on amodal completion is a frequently discussed issue in the litera-
ture (Anderson 2007a,b; Kanizsa 1979; Kellman et al. 2007; Sekuler 1994, van Lier 1999; 2001;
van der Helm 2011; Wagemans et al. 2012) and may also lead to various pragmatic and theo-
retical stances to more or less rule out their effects. For example, to avoid influences of global
regularities Wouterlood and Boselie’s (1992) local completion model was set up only for irregular
patterns (implicitly acknowledging the influence of regularities), whereas Kellman and Shipley
(1991) excluded the effect of global regularities on amodal completion by asserting that global
completions result from cognitive interferences. In the general discussion we will briefly come
back to this issue.
of those volumes (see Figure 15.2c). Amodal 3D completions were also studied by Tse (1998,
1999a,b) who introduced the concept of “complete mergeability” stating that completion is not
triggered by contour relatability but by intermediate representations such as volumes. Roughly,
the principle of complete mergeability entails that separated volumes are amodally connected
behind an occluder along a trajectory defined by their visible surfaces such that they completely
merge (Figure 15.2d). In a follow up, Tse (2002) launched a contour propagation approach on
surface filling-in for projections of 3D objects. These ideas connect strongly with various 3D
shape perception notions (Koenderink 1990) that already had great impact on our general
understanding of the relation between 2D projections and 3D shape perception.
With the inclusion of 3D objects, and even more complex sceneries, the domain of amodal
completion further expanded. One may question whether these completions are all part of one
and the same underlying completion process or whether the generation of the amodal parts is dis-
tributed along different stages between retinal input and object/scene representation. Answering
such questions is obviously in need of further experimental research.
(a) Targets
TPC PC D
C PC D
(b)
Prime Test pair; Relative matching time
<
>
Fig. 15.3 (a) A display comprising a few stimulus combinations in the study of Gerbino and Salmaso
(1987). In a simultaneous matching task, matches could be topographical (T), phenomenal (P),
categorical (C), or different (D; i.e., a nonmatch). The phenomenal matches always involved amodal
completions. Matching times involving amodal completions (PC) were similar to matching times on
topographical matches (TPC); while both were faster than categorical matches. (b) A few prime/test
pair combinations in the primed matching task. When prime durations were larger than 200 ms, the
matching times following the occluded disks (third row) were similar to the matching times of the
complete disks (first row) and differed from matching times following the truncated disks (second row),
suggesting that the occluded disk in the prime has been amodally completed to a full disk.
(a) Reprinted from Acta Psychologica, 65 (1), W. Gerbino and D. Salmaso, The effect of amodal completion on
visual matching, pp. 22–25, Copyright © 1987, with permission from Elsevier. (b) Adapted from Allison B. Sekuler
and Stephen E. Palmer, Perception of partly occluded objects: A microgenetic analysis, Journal of Experimental
Psychology: General, 121(1), pp. 95–111, http://dx.doi.org/10.1037/0096-3445.121.1.95 © 1992, American
Psychological Association.
depending on particular shape properties, global completions often lead to larger facilitating
effects as compared to local completions.
cortical areas are involved in the process of amodal completion. As an experimental method they
used a sequential presentation paradigm to measure the so-called repetition suppression effect;
repetition of similar items leads to a reduction in BOLD activation. Kourtzi and Kanwisher (2001)
found such a suppression in the Lateral Occipital Complex (LOC) for the subsequent presenta-
tion of two patterns with reversed depth orders. In the latter patterns the physical contours were
different, due to occlusion, while the perceived shapes were identical. In a second experiment the
authors additionally showed that depth order reversal revealing the same contours but different
shapes did not induce repetition suppression. The suppression effect for the depth order reversal
when the same shapes are perceived shows that the LOC comprises representations of occluded
parts, exceeding the actual retinal input (see also Weigelt et al. 2007). Note that this does not imply
that these interpretations are actually established within the LOC.
Rauschenberger et al. (2006) also applied the repetition suppression paradigm and tuned in on
the time course of completion showing BOLD response modulation due to the literal shape after
100 ms exposures of an occlusion prime (a notched disk adjacent to a square) and BOLD response
modulations on the amodally completed shape (a full disk) after 250 ms exposures, and even
found such modulations in primary cortex areas V1 and V2. Further support for an initial mosaic
stage has been shown by Plomp et al. (2006) using MEG measurements. Also using MEG, de Wit
et al. (2006) found support for the prevalence of global as compared to local completions for a
set of highly regular shapes. Beside brain imaging research also single cell recordings in primates
revealed the impact of occlusion. For example, Sugita (1999) showed that neurons as early as
V1 and V2 responded to amodally completed bars under disparity conditions in which the central
part of a bar was perceived to be behind a partly occluding patch. In a more recent study Bushnell
et al. (2011) found that single neurons in V4 responded differently to real object contours as com-
pared to accidental contours caused by interposition of two partially overlapping surfaces.
Although there are still a number of open questions it is clear by now that amodal completion is
triggered relatively early in the visual process. It also appears to be early in an ontogenetic sense,
to be discussed next.
about the rear of relatively complex multi-object scenes such as Tse’s wrapped ghost figures (see
Figure 15.2d) in which the two blobs at each side are preferred to be connected. Apparently, the
results highly depend on the specific stimulus that is presented but also on the specific abilities
of the infant; age is but one of the crucial factors—the developmental stage of perceptual motor
abilities are important as well. A highly active baby has a more integrated view of her surrounding
world, including the ability to amodally complete hidden parts of objects (Soska et al. 2010). All
in all, care has to be taken not to over-generalize the experimental results.
Modal completion
Modal completions like the triangle in Figure 15.1a are often called illusory surfaces (or surfaces
bounded by illusory contours) to stress that—contrary to real surfaces—their boundaries cross
a broad region of homogeneous luminance. When the background is white they appear even
whiter, which is taken as the signature of modal completion.
Several types of illusory contours exist. Some fit in the category of perceptual completions eas-
ily, since they are conceivable as extrapolations or interpolations of image contours; others do not.
Configurations in Figure 15.4 involve lines and dots that act as inducers or modifiers of illusory
contours not aligned with explicit image contours. Ehrenstein (1941/1987) devised the pattern
in Figure 15.4a to demonstrate that brightness contrast does not explain blobs induced by line
endings. Blobs of increased brightness are clearly visible when line inducers are thin (four upper
rows), but disappear when the inducers are so thick that the central blob is totally or almost totally
enclosed (two bottom rows), contrary to the expectation that contrast should increase with the
amount of black surrounding the target region. In b–c panels of Figure 15.4 the so-called Koffka
cross (used to discuss completion in the blind spot by Koffka 1935, p. 145, figure 20) induces a
rounded square when the arms are large (b) but a circle when the arms are narrow (c).4
Even more intriguing is the way dots gracefully modify the illusory shape (Figure 15.4d),
becoming part of it instead of acting as partially occluded elements (like conventional inducers
do), and turning the illusory boundaries concave against the preference for convexity observed
in several figure/ground phenomena (Barenholtz 2010; Bertamini 2001; Bertamini and Lawson
2008; Fantoni et al. 2005; Kanizsa and Gerbino 1976). The incorporation of dots in blobs induced
by line endings of the Ehrenstein grid, the Koffka cross, and similar patterns has been discussed
The effect of line-ending separation on the illusory shape may be informative for computational theories of
4
(d) (e)
Fig. 15.4 Illusory figures induced by line patterns. (a) The Ehrenstein illusion in a variant of a
demonstration devised by Erhrenstein (1941, Figure 3; see also 1987); bright illusory blobs appear at
line endings in the four upper rows but not in the two lower rows, where the target white region is
totally or almost totally surrounded by black. (b) A broad-arm Koffka cross induces an illusory square
with rounded corners. (c) A narrow-arm Koffka cross induces an illusory disk. (d) Adding four dots to
the narrow-arm Koffka cross makes the illusory blob concave. (e) Past experience with the capital letter
E supports the illusory brightening of the letter body, consistent with top-left illumination; rotating the
page by 90 or 180 degrees impairs recognition of the letter E and destroys the illusory brightening.
Reproduced from ‘Can We See Constructs?’, Walter Gerbino and Gaetano Kanizsa, in Susan Petry and Glenn
E. Meyer (eds) The Perception of Illusory Contours, pp. 246–252, DOI: 10.1007/978-1-4612-4760-9_4 Copyright
© 1987, Springer-Verlag New York. With kind permission from Springer Science and Business Media.
by several authors (Day 1987; Day and Jory 1980; Gerbino and Kanizsa 1987; Kennedy 1987;
Minguzzi 1987; Sambin 1974) but still awaits a satisfactory explanation (Fantoni and Gerbino
2013; Vezzani 1999).
Figure 15.4e illustrates a category of illusory effects occurring when some two-tone images are
perceived as 3D objects under directional illumination, with sharp cast and attached shadows
(Ishikawa and Mogi 2011; Moore and Cavanagh 1998). Often, the emergence of the 3D struc-
ture takes the character of a visual discovery, involving a complex and irreversible figure/ground
switch favoured by past experience, like in pictures of the Gestalt completion test (Street 1931),
Mooney faces (Mooney 1957), and the dalmatian dog (for a discussion see Rock 1984). After the
reorganization that allows observers to overcome the initial camouflage, the discovered object
typically includes illusory surfaces classifiable as modal completions, similar to those used by Tse
(1998, 1999a; Tse and Albert 1998) to claim that illusory volumes can occur without the tangent
discontinuities that play such a crucial role in Kanizsa-like displays (Figure 15.5).
Illusory surfaces are perceived in a variety of conditions (broader than illustrated in our fig-
ures, which depict only some members of the family), against the idea that extrapolation and
interpolation of image contour fragments are the only mechanisms involved in their formation.
Therefore, the expression “modal completion” cannot be taken as denoting a hypothetical process
of joining input fragments by means of illusory additions, according to what Kogo and Wagemans
304 van Lier and Gerbino
Fig. 15.5 Illusory volumes constrained by global geometry. (a) The visible parts of the “sea monster”
are bounded by contours without tangent discontinuities; nevertheless they support an illusory surface
oriented in depth, occluding the amodal parts of the monster. (b) The amodally completed black
“worm” supports an illusory pole. (c) Partially occluded black rings surrounds a cylindrical illusory pole
Reproduced from Peter U. Tse, Illusory volumes from conformation, Perception 27(8) pp. 977–92, doi:10.1068/
p270977, Copyright © 1998, Pion. With kind permission from Pion Ltd, London www.pion.co.uk and www.
envplan.com.
(2013) consider a common misinterpretation found in the literature on mid-level vision. Rather,
it should be taken as denoting the phenomenal presence of parts devoid of an obvious local coun-
terpart (a luminance difference, in the case of surface contours) but supported by global stimulus
information and functional to the overall organization of the perceptual world.
Halko et al. (2008), who evaluated different theories of modal completion, pointed out that
extrapolation and interpolation mechanisms are insufficient to account for all aspects of illusory
contours, claimed that surface/figural processes are necessary, and experimentally supported the
general view that several mechanisms cooperate in the formation of illusory contours and (more
importantly) in the modulation of their vividness. Their conclusions agree with the central role of
illusory contours in vision science. Converging evidence indicates that they can be conceived as
a powerful effect of mid-level mechanisms constrained by image properties but oriented towards
scene analysis; i.e., they provide an ideal domain for testing propositions that link low-level repre-
sentations anchored to retinotopic properties and representations at the level of occluding objects
and 3D surfaces, available for recognition.
Fig. 15.6 Non-trivial effects of inducers. (a) Perceived incompleteness of inducers is unnecessary Aligned
contour fragments suffice to elicit an illusory triangle. (b,c) Regularly arranged rectilinear segments
lead to illusory contours that are much weaker than those induced by segments randomly varying in
orientation and length. (d,e,f) Convex inducers can support an illusory square, whose vividness in much
higher when they are irregular than regular.
(a) Reproduced from I. Rock and R. Anson, Illusory contours as the solution to a problem, Perception 8(6)
pp. 65–681, doi:10.1068/p080665, Copyright © 1979, Pion. With kind permission from Pion Ltd, London
www.pion.co.uk and www.envplan.com. (b and c) Reproduced from ‘Perceptual Grouping and Subjective
Contours’, Barbara Gillam, in Susan Petry and Glenn E. Meyer (eds) The Perception of Illusory Contours, pp. 268–
273, DOI: 10.1007/978-1-4612-4760-9_30 Copyright © 1987, Springer-Verlag New York. With kind permission
from Springer Science and Business Media. (d,e,f) Reproduced from M.K. Albert, Parallelism and the perception
of illusory contours, Perception 22(5) pp. 589–595, doi:10.1068/p220589, Copyright © 1993, Pion. With kind
permission from Pion Ltd, London www.pion.co.uk and www.envplan.com).
completeness/incompleteness matters was provided by van Lier et al. (2006), who discovered that
background contours are misaligned by an illusory square induced by pacmen but not by an
equivalent hole between crosses (following the same logic of c-d panels in Figure 15.1).
Rock (1983, p. 107; 1987, p. 64; Rock and Anson 1979) criticized perceived incompleteness as a
necessary condition on the basis of demonstrations like the one in Figure 15.6a. Each of the three
black regions looks as an irregular shape with a boundary that includes convexities and concavities
but does not convey a specific sense of incompleteness. Nevertheless, alignment of contour fragments
along a closed and regular boundary suffices for most observers to perceive an illusory surface. The
crucial role of alignment is confirmed by the reduced proportion of naïve observers who perceive an
illusory shape when the three relevant concavities cover a narrower angle, so that the interpolation of
distant contour fragments must be curvilinear and concave (not shown in Figure 15.6).
As emphasized by Rock (1987, p. 63), suboptimal patterns can support the perception of an
illusory shape after a figure/ground reorganization that entails the reversal of the occlusion polar-
ity of some contours fragments (in Figure 15.6a, those corresponding to the concave corners uni-
fied by the illusory triangle). This process can be influenced by set and knowledge, consistently
with the idea that inducer incompleteness cannot be taken as a pre-existing determinant of the
306 van Lier and Gerbino
formation of an illusory occluder. Nevertheless, when an illusory surface emerges in Figure 15.6a,
amodal completion—or at least, amodal continuation (Anderson 2007b; Gillam 2003; Minguzzi
1987)—becomes possible. In such cases amodal continuation follows, rather than precedes, the
reorganization that brings to modal life the illusory occluding surface. The causal relationship
between amodal and modal parts does not always hold.
Compare now Figure 15.6b and Figure 15.6c (Gillam 1987; Gillam and Chan 2002; Gillam and
Grove 2011). The illusory surface is more vivid when the inducing lines vary in orientation and
length (c) than when they group together in a regular array (b). Configural order acts as a global
factor affecting modal completion; which suggests that the degree of modal presence could be
taken as a measure of the amount of structural improvement involved in the mapping of a given
input into an organized pattern.
Another case in which the vividness of the illusory shape seems to be inversely related to induc-
ers’ regularity is illustrated in d-f panels of Figures 15.6 (Albert 1993). Parallelism of sides or,
more accurately, Ebenbreite (constant width; Morinaga 1941; Metzger 1953) is a powerful fac-
tor of figure/ground organization. When inducers are convex regions bounded by parallel sides
(rectangles in Figure 15.4d) the illusory square is barely visible, if it exists at all; one can easily
perceive only an orderly arrangement of rectangles sitting along a square perimeter. The illu-
sory square becomes visible (thanks to the pathognomonic lightness enhancement) when each
inducer is trapezoidal and can be locally improved by amodal continuation in the direction of a
parallelogram (Figure 15.6e), or triangular and can easily look as a small visible protrusion of an
indeterminate but clearly occluded shape (Figure 15.6f).
Kanizsa-type completions induced by pacmen and other extended regions depend on their ten-
dency towards amodal completion or, at least, amodal continuation. Processes activated by local
concavities and asymmetries are constrained by alignment and distance (to mention only the main
factors) and achieve a stable state by generating amodal parts that complete the input regions, but
also require the formation of an occluding surface partially bounded by illusory contours. Kanizsa
(1979) admitted that such aspects of perceptual organization are almost indistinguishable from
the generation and acceptance of object hypotheses, postulated by Gregory (1972) to account
for input gaps. The crucial difference regards “gaps”. According to Kanizsa the object-hypothesis
explanation fails to recognize that the very notion of “gap” is problematic. Rather than taking its
meaning for granted one should use illusory figures as an operational way of defining gaps and—
more generally—partial occlusions.
Petter-type completions occur when a single homogeneous region splits into two figures whose
stratification is, in optimal conditions, fully predictable on the basis of figural parameters. Petter
(1956) described several factors supporting a perceptual preference for a specific stratification
order in self-splitting figures. One is movement (if the black region deforms in a way consistent
with movement of one figure while the other remains stationary, the moving figure appears in
front). But static regions split as well, according to two figural factors: a preference for the order
that minimizes modal contours (a vs. b in Figure 15.7); and a preference for the modal completion
(a) (b)
(c) (d)
Fig. 15.7 Minimization of modal contours. (a) The bar is preferentially perceived in front of the disk
because such ordering, rather than the opposite, requires a modal contour shorter than the amodal
contour. (b) The disk is preferentially perceived in front of the bar because the two modal arcs are
shorter than the amodal rectilinear segments. (c) Petter (1956, p. 219) also hypothesized that the
preference for perceiving the larger shape in front depends on the higher support ratio (i.e., the modal
contour should be proportionally shorter), when modal and amodal contours have the same absolute
length. (d) Phenomenal undulation depends on the dominance of Petter’s rule over interposition, which
does not propagate from the unambiguous T-junctions joining the thin frame and the grey horizontal
bar towards the ambiguous L-junctions joining the thin frame and the black vertical bar.
(a) Reproduced from G. Petter, Nuove ricerche sperimentali sulla totalizzazione percettiva, Rivista di Psicologia, 50,
pp. 213–27, figure 9. (d) Reprinted from Acta Psychologica, 59(1), G. Kanizsa, Seeing and thinking, pp. 23–33,
Copyright © 1985, with permission from Elsevier.
308 van Lier and Gerbino
of contours with a higher support ratio (those in which the modal extrapolation is proportionally
shorter, relative to the length of the image-specified contour; Figure 15.7c). In static self-splitting
figures the tendency towards the minimization of modal contours agrees with the assumption that
representation costs are higher for modal than amodal contours of the same length, given that
modal contours are phenomenally visible though unsupported by local input evidence.
Kanizsa (1968/1979) referred to the first static factor (known as Petter’s rule) to explain strik-
ing demonstrations in which the perceived stratification order violates cognitive expectations.
Figure 15.7d displays a pattern modified from Kanizsa (1985) that illustrates a remarkable failure
of unambiguous T-junction information to propagate the stratification order over the whole thin
frame, because of the local dominance of Petter’s rule.
Tommasi et al. (1995) confirmed that the minimization of modal contours acts independently
of the empirical depth cue of relative size. Singh et al. (1999) established that Petter’s rule actu-
ally overcomes support ratio as a determinant of stratification of self-splitting figures, when the
two principles come into conflict, but also confirmed Petter’s intuition that support ratio matters,
when modal and amodal contour lengths are equal.
Kinetic illusory figures depend on relative motion between their implicit boundary and
an appropriate set of inducers (appearing/disappearing texture elements, lines changing in
length, deforming shapes). However, Bruno and Gerbino (1991) showed that their shape is
modulated by factors beyond relative motion. When radial lines rotate behind an implicit
triangle, the illusory figure is triangular and rigid; when the radial lines keep their absolute
orientation constant and change their length consistently with the occlusion of a rotating tri-
angle, the illusory figure appears as a deforming blob with a specific shape heavily dependent
on the number of inducing lines. Orientation affects the connectability of line endings and,
consequently, the modally completed shape (Fantoni and Gerbino 2013). A theory of illusory
object formation in dynamic displays, consistent with the identity hypothesis, has been formu-
lated by Palmer et al. (2006).
General discussion
Modal and amodal completions both deal with percepts that go beyond the retinal input. Kellman
and Shipley (1991) coined the identity hypotheses, which states that modal and amodal comple-
tions share the same underlying mechanisms and identical representations, at some processing
stage. This basically elegant idea has been highly debated in recent years (Albert 2007; Anderson
et al. 2002; Anderson 2007a, 2007b; Kellman et al. 2007; Singh 2004). According to the identity
hypothesis, one of the predictions is that modally and amodally completed contours should be
Perceptual completions 311
the same when the geometric properties of the shapes are the same. Anderson et al. (2002) and
Singh (2004) argued that this prediction is incorrect. More in particular they argued that modal
and amodal completions generate different percepts and that neurophysiological data are not in
line with the identity hypothesis (see also Anderson 2007a). Differential percepts also occur when
shape regularities like symmetry are involved. Such regularities seem to affect amodal completion
more than modal completion. Kellman et al. (2005a) argued that in such cases the amodal presence
is due to a process they referred to as Recognition from Partial Information (RPI) which would
then overrule the completion processes. Anderson (2007a) responded that splitting up the amodal
completion processes in two different processes (one identical to modal completions, based on
relatability criteria, and one sensitive to global regularities) lacks experimental support. So far, the
controversy continues; further investigations may shed more light on this issue.
A fruitful direction to push research forward lies in the development of neurally plau-
sible, computational models of perceptual grouping. Here we refer to the so-called DISC
(Differentiation-Integration for Surface Completion) model by Kogo et al. (2010) that accounts
for depth ordering of surfaces in 2D patterns. Their model is built on the notion of border owner-
ship; by means of appropriate feedback mechanisms image borders are assigned to surfaces and,
with that, more or less stable interpretations of an ambiguous pattern can be reached. The percep-
tion of modal completion, for example, is (re)produced when such border ownership signals arise
at the location of illusory contours. The DISC model appears sensitive to certain global stimulus
properties and bridges between amodal and modal completions (see also Kogo and van Ee, this
volume).
The role of shape regularities also touches upon the seeing-thinking issue in amodal comple-
tion as triggered by Kanizsa (1979, 1985; Kanizsa and Gerbino 1982; but see also Michotte et al.
1964) who demonstrated different completion tendencies due to perception versus knowledge.
According to Kanizsa, perception runs it own course even if knowledge would predict a different
outcome. The influence of knowledge on amodal completion is an issue that deserves more atten-
tion in future research (see also Gerbino and Zabai 2003; Vrins et al. 2009; Hazenberg et al., 2013).
For example Vrins et al. (2009) have shown that object-related knowledge such as the hardness of
materials (after Gerbino and Zabai 2003), may influence the perceptual outcome relatively early
in the perceptual process. Obviously, interpretations of occlusion scenes depend on bottom up
streams and top down streams, revealing a complex interplay between sensory input and world
knowledge. There is need of getting a clearer picture of the processes involved in amodal com-
pletion. In the end, however, it might turn out to be a hazardous enterprise to draw a firm line
between perception and cognition, certainly so at the cortical level.
Finally, we would like to remark that the scope of this chapter was restricted to a selection of, in
our view relevant, completion issues within the visual modality. There are filling-in effects in other
sensory modalities as well, such as the auditory domain (Bregman 1990; Riecke et al. 2012), or the
tactile domain (Flach and Haggard 2006; Geldard and Sherrick 1972). In all sensory modalities
the study of processes overcoming the interruption of ongoing input opens up a window to the
underlying representational processes. Given the outcomes of behavioral and neurocognitive
research in adults, infants, and animals, it has become clear that completion processes are funda-
mental for the perception of the surrounding world.
References
Albert, M. K. (1993). Parallelism and the perception of illusory contours. Perception 22: 589–95.
Albert, M. K. (2007). Mechanisms of amodal completion. Psychological Review 114: 455–69.
312 van Lier and Gerbino
Albert, M. K. and Hoffman, D. D. (2000). The generic-viewpoint assumption and illusory contours.
Perception 29: 303–12.
Anderson, B. L. (2007a). The demise of the identity hypothesis and the insufficiency and nonnecessity of
contour relatability in predicting object interpolation: Comment on Kellman, Garrigan, and Shipley
(2005). Psychological Review 114: 470–87.
Anderson, B. L. (2007b). Filling-in models of completion: Rejoinder to Kellman, Garrigan, Shipley, and
Keane (2007) and Albert (2007). Psychological Review 114: 509–27.
Anderson, B. L. (2009) Revisiting the relationship between transparency, subjective contours, luminance,
and color spreading. Perception 38: 869–71.
Anderson, B. L. and Julesz, B. (1995). A theoretical analysis of illusory contour formation in stereopsis.
Psychological Review 102: 705–43.
Anderson, B. L., Singh, M. and Fleming, R. W. (2002). The interpolation of object and surface structure.
Cognitive Psychology 44: 148–90.
Andersen, G. J. and Cortese, J. M. (1989). 2-D contour perception resulting from kinetic occlusion.
Perception and Psychophysics 46: 49–55.
Barenholtz, E. (2010). Convexities move because they contain matter. Journal of Vision 10: 1–12.
Barraza J. F. and Chen, V. J. (2006). Vernier acuity of illusory contours defined by motion. Journal of Vision
14: 923–32.
Bertamini, M. (2001). The importance of being convex: An advantage for convexity when judging position.
Perception 30: 1295–310.
Bertamini, M. and Lawson, R. (2008). Rapid figure-ground responses to stereograms reveal an advantage
for a convex foreground. Perception 37: 483–94.
Bertenthal, B. I., Campos, J. J., and Haith, M. M. (1980) Development of visual organization: The
perception of subjective contours. Child Development 51: 1072–80.
Boselie, F. (1988). Local versus global minima in visual pattern completion. Perception and Psychophysics
43: 431–45.
Bravo, M., Blake, R., and Morrison, S. (1988). Cats see subjective contours. Vision Research 28: 861–5.
Bregman, A. S. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge: MIT
Press.
Bremner, J. G., Slater, A. M., Johnson, S. P., Mason, U. C., and Spring, J. (2012). Illusory contour figures
are perceived as occluding contours by 4-month-old infants. Developmental Psychology 48: 398–405.
Bruno, N. (2001). Breathing illusions and boundary formation in space-time. In: T. Shipley and P.
J. Kellman (eds.). From Fragment to Objects. Segmentation and Grouping in Vision, pp. 402–27.
New York: Elsevier.
Bruno, N. and Gerbino, W. (1991). Illusory figures based on local kinematics. Perception 20: 259–73.
Bruno, N., Bertamini, M., and Domini, F. (1997). Amodal completion of partly occluded surfaces: Is there
a mosaic stage? Journal of Experimental Psychology 23: 1412–26.
Buffart, H., Leeuwenberg, E., and Restle, F. (1981). Coding theory of visual pattern completion. Journal of
Experimental Psychology: Human Perception and Performance 7: 241–74.
Bulf, H., Valenza, E., and Simion, F. (2009). The visual search of an illusory figure: A comparison between
6-month-old infants and adults. Perception 38: 1313–27.
Burke, L. (1952). On the tunnel effect. The Quarterly Journal of Experimental Psychology, 4: 121–38.
Reprinted in A. Michotte et collaborateurs (eds.) (1962), Causalité, permanence et réalité phénoménales,
pp. 374–406. Louvain: Publications Universitaires.
Bushnell, B., Harding, P., Kosai, Y., and Pasupathy A. (2011). Partial occlusion modulates contour-based
shape encoding in primate area V4. Journal of Cognitive Neuroscience 31: 4012–24.
Chapanis, A. and McCleary, R. A. (1953). Interposition as a cue for the perception of relative distance.
Journal of General Psychology 48: 113–32.
Perceptual completions 313
Condry, K. F., Smith, W. C., and Spelke, E. S. (2000). Development of perceptual organization. In:
F. Lacerda and M. Heiman (eds.), Emerging Cognitive Abilities in Early Infancy, pp. 1–28. Hillsdale,
NJ: Erlbaum.
Csibra, G. (2001). Illusory contour figures are perceived as occluding surfaces by 8-month-old infants.
Developmental Science 4: F7–F11.
Csibra, G., Davis, G., Spratling, M. W., and Johnson, M. H. (2000). Gamma oscillations and object
processing in the infant brain. Science 290: 1582–5.
Day, R. H. (1987). Cues for edge and the origin of illusory contours: an alternative approach. In: S. Petry
and G. E. Meyer (eds.). The Perception of Illusory Contours, pp. 53–61. New York: Springer.
Day, R. H. and Jory, M. K. (1980). A note on a second stage in the formation of illusory contours.
Perception and Psychophysics 27: 89–91.
de Wit, T. and van Lier, R. (2002). Global visual completion of quasi-regular shapes. Perception
31: 969–84.
de Wit, T. C. J., Mol, K. R., and van Lier, R. (2005). Investigating metrical and structural aspects of visual
completion: Priming versus searching. Visual Cognition 12: 409–28.
de Wit, T., Bauer, M., Oostenveld, R., Fries, P., and van Lier, R. (2006). Cortical responses to contextual
influences in amodal completion. Neuroimage 32: 1815–25.
de Wit, T. C. J., Vrins, S., DeJonckheere, P. J. N., and van Lier, R. (2008). Form perception of partly
occluded objects in 4-month-old infants. Infancy 13: 660–74.
Dinnerstein, D. and Wertheimer, M. (1957). Some determinants of phenomenal overlapping. The American
Journal of Psychology 70: 21–37.
Ehrenstein, W. (1941). Über Abwandlungen der L. Hermannschen Helligkeitserscheinung. Zeitschrift für
Psychologie, 150, 83–91. English translation, Modifications of the brightness phenomenon of L. Hermann.
In: S. Petry and G. E. Meyer (eds.) (1987). The Perception of Illusory Contours, pp. 246–52. New York: Springer.
Ehrenstein, W. H. and Gillam, B. J. (1998). Early demonstrations of subjective contours, amodal
completion, and depth from half-occlusions: “stereoscopic experiments with silhouettes” by Adolf von
Szily (1921). Perception 27: 1407–16.
Fantoni, C. and Gerbino, W. (2003). Contour interpolation by vector-field combination. Journal of Vision
3: 281–303.
Fantoni, C. and Gerbino, W. (2013). “Connectability” matters too: Completion theories need to be
complete. Cognitive Neuroscience 4: 47–8.
Fantoni, C., Bertamini, M., and Gerbino W. (2005). Contour curvature polarity and surface interpolation.
Vision Research 45: 1047–62.
Fantoni, C., Hilger, J. D., Gerbino, W., and Kellman, P. J. (2008). Surface interpolation and 3D relatability.
Journal of Vision 8: 1–19.
Feldman, J. and Tremoulet, P. (2006). Individuation of visual objects over time. Cognition, 99: 131–65.
Ffytche, D. H. and Zeki, S. (1996). Brain activity related to the perception of illusory contours. Neuroimage
3: 104–8.
Flach, R. and Haggard, P. (2006). The cutaneous rabbit revisited. Journal of Experimental
Psychology: Human Perception and Performance 32: 717–32.
Flombaum, J. I. and Scholl, B. J. (2006). A temporal same-object advantage in the tunnel effect: Facilitated
change detection for persisting objects. Journal of Experimental Psychology: Human Perception and
Performance 32: 840–53.
Forkman, B. and Vallortigara, G. (1999). Minimization of modal contours: an essential cross-species
strategy in disambiguating relative depth. Animal Cognition 2: 181–5.
Geldard, F. and Sherrick, C. (1972). The cutaneous “rabbit”: A perceptual illusion. Science, 178: 178–9.
Gerbino, W. and Kanizsa, G. (1987). Can we see constructs?. In: S. Petry and G. E. Meyer (eds.). The
Perception of Illusory Contours, pp. 246–52. New York: Springer.
314 van Lier and Gerbino
Gerbino, W. and Salmaso, D. (1987). The effect of amodal completion on visual matching. Acta
Psychologica 65: 25–46.
Gerbino, W. and Zabai, C. (2003). The joint. Acta Psychologica 114: 331–53.
Gibson, J. J. (1979). The Ecological Approach to Visual Perception. Boston: Houghton Mifflin.
Gibson, J. J., Kaplan, G. A., Reynolds, H. N., and Wheeler, K. (1969). The change from visible to
invisible: A study of optical transitions. Perception and Psychophysics 5: 113–16.
Gillam, B. J. (1987). Perceptual grouping and subjective contours. In: S. Petry and G. E. Meyer (eds.). The
Perception of Illusory Contours, pp. 268–73. New York: Springer.
Gillam, B. J. (2003). Amodal completion—A term stretched too far: The role of amodal continuation.
Perception 32 (Suppl.): 27.
Gillam, B. J. and Chan, W. M. (2002). Grouping has a negative effect on both subjective contours and
perceived occlusion at T-junctions. Psychological Science 13: 279–83.
Gillam, B. J. and Grove, P.M. (2004). Slant or occlusion: global factors resolve stereoscopic ambiguity in
sets of horizontal lines. Vision Research 44: 2359–66.
Gillam, B. J. and Grove, P. M. (2011). Contour entropy: A new determinant of perceiving ground or a hole.
Journal of Experimental Psychology: Human Perception and Performance 37: 750–7.
Gillam, B. J. and Nakayama, K. (1999). Quantitative depth for a phantom surface can be based on
cyclopean occlusion cues alone. Vision Research 39: 109–12.
Glynn, A. J. (1954). Apparent transparency and the tunnel effect. Quarterly Journal of Experimental
Psychology 6: 125–39. Reprinted in A. Michotte et collaborateurs (eds.) (1962), Causalité, permanence et
réalité phénoménales, pp. 422–32. Louvain: Publications Universitaires.
Gregory, R. (1972). Cognitive contours. Nature 238: 51–2.
Halko, M. A., Mingolla, E., and Somers, D. C. (2008). Multiple mechanisms of illusory contour perception.
Journal of Vision 8: 1–17.
Hateren, J. H. van, Srinivasan M.V., and Wait, P.B. (1990) Pattern recognition in bees: orientation
discrimination. Journal of Comparative Physiology A, 167: 649–54.
Hazenberg, S. J. Jongsma, M., Koning, A., and van Lier, R. (2014). Differential familiarity effects in amodal
completion: Support from behavioral and electrophysiological measurements. Journal of Experimental
Psychology: Human Perception & Performance, 40: 669–84.
Hegdé, J., Albright, T. D., and Stoner, G. R. (2004). Second-order motion conveys depth-order
information. Journal of Vision 4: 838–42.
Helmholtz, H. von (1867). Handbuch der physiologischen Optik. Leipzig: Voss. English translation by J.
P. C. Southall of the third German edition (1910): Treatise on Physiological Optics. New York: Dover,
1924. Available at: <http://poseidon.sunyopt.edu/BackusLab/Helmholtz/>.
Hochberg, J. E. and McAlister, E. (1953). A quantitative approach to figural “goodness”. Journal of
Experimental Psychology 46: 361–4.
Hubbard, T. L. (2011). Extending Prägnanz: Dynamic aspects of mental representation and Gestalt
principles. In: L. Albertazzi, G. van Tonder, and, D. Vishwanath (eds.), Perception Beyond Inference: The
Information Content of Visual Processes, pp. 75–108. Cambridge, MA: MIT Press.
Ishikawa. T. and Mogi, K. (2011). Visual one-shot learning as an “anti-camouflage device”: a novel
morphing paradigm. Cognitive Neurodynamics 5: 231–9.
Jackendoff, R. S. (1992). Languages of the Mind: Essays on Mental Representation. Cambridge: MIT Press.
Johnson, S. P. and Aslin, R. N. (1995). Perception of object unity in 2-month-old infants. Developmental
Psychology 31: 739–45.
Johnson, S. P. and Aslin, R. N. (1996). Perception of object unity in young infants: The roles of motion,
depth, and orientation. Cognitive Development 11: 161–80.
Johnson, S. P. and Aslin, R. N. (1998). Young infants’ perception of illusory contours in dynamic displays.
Perception 27: 341–53.
Perceptual completions 315
Johnson, S. P., Bremner, J. G., Slater, A. M., Mason, U. C., and Foster, K. (2002). Young infants’ perception
of unity and form in occlusion displays. Journal of Experimental Child Psychology 81: 358–74.
Julesz, B. (1971). Foundations of Cyclopean Perception. Chicago: University of Chicago Press.
Jusczyk, P. W., Johnson, S. P., Spelke, E. S., and Kennedy, L. J. (1999). Synchronous change and perception
of object unity: evidence from adults and infants. Cognition 71: 257–88.
Kanizsa, G. (1954). Linee virtuali e margini fenomenici in assenza di discontinuità di stimolazione. Atti
del X convegno degli psicologi italiani, Chianciano Terme—Siena, 10–14 ottobre. Firenze: Editrice
Universitaria.
Kanizsa, G. (1955). Margini quasi-percettivi in campi con stimolazione omogenea. Rivista di Psicologia
49: 7–30. English translation, Quasi-perceptual margins in homogeneously stimulated fields. In S. Petry
and G. E. Meyer (eds.) (1987), The Perception of Illusory Contours, pp. 40–9. New York: Springer.
Kanizsa, G. (1968). Percezione attuale, esperienza passata e l’ “esperimento impossibile”. In: G. Kanizsa and
G. Vicario (eds.) Ricerche sperimentali sulla percezione. Trieste: Edizioni Università degli Studi, pp. 9–48.
English translation in: Kanizsa, G. (1979). Organization in Vision. New York: Praeger.
Kanizsa, G. (1979). Organization in Vision. New York: Praeger.
Kanizsa, G. (1985). Seeing and thinking. Acta Psychologica 59: 23–33.
Kanizsa, G. (1987). 1986 Addendum. In: S. Petry and G. E. Meyer (eds.) The Perception of Illusory Contours,
p. 49. New York: Springer.
Kanizsa, G. and Gerbino, W. (1976). Convexity and symmetry in figure-ground organization. In M. Henle
(ed.), Vision and Artifact, pp. 25–32. New York: Springer.
Kanizsa, G. and Gerbino, W. (1982). Amodal completion: Seeing or thinking? In: J. Beck (ed.),
Organization and Representation in Perception, pp. 167–190. Hillsdale, NJ: LEA.
Kanizsa, G., Renzi, P, Conte, S, Compostela, C., and Guerani, L. (1993). Amodal completion in mouse
vision. Perception 22: 713–21.
Kaplan, G. A. (1969). Kinetic disruption of optical texture: The perception of depth at an edge. Perception
and Psychophysics 6: 193–8.
Kavšek, M. (2004). Infant perception of object unity in static displays. International Journal of Behavioural
Development 28: 538–45.
Kawataba, H., Gyoba, J., Inoue, H., and Ohtsubo, J. (1999). Visual completion of partly occluded grating in
infants under 1 month of age. Vision Research 39: 3586–91.
Kellman, P. J. (1984). Perception of three-dimensional form by human infants. Perception and Psychophysics
36: 353–538.
Kellman, P. J. and Arterberry. M. E. (1998). The Cradle of Knowledge. Cambridge: MIT Press.
Kellman, P. J., and Shipley, T. F. (1991). A theory of visual interpolation in object perception. Cognitive
Psychology 23: 141–221.
Kellman, P. J., and Spelke, E. S. (1983). Perception of partly occluded objects in infancy. Cognitive
Psychology 15: 483–524.
Kellman, P. J., Spelke, E. S., and Short, K. R. (1986). Infant perception of object unity from translatory
motion in depth and vertical translation. Child Development 57: 72–86.
Kellman, P. J., Garrigan, P., and Shipley, T. F. (2005a). Object interpolation in three dimensions.
Psychological Review 112: 586–609.
Kellman, P. J., Garrigan, P., Shipley, T. F., Yin, C., and Machado, L. (2005b). 3-D interpolation in object
perception: Evidence from an objective performance paradigm. Journal of Experimental Psychology
31: 558–83.
Kellman, P. J., Garrigan, P, Shipley, T., and Keane, B. (2007). Interpolation processes in object
perception: Reply to Anderson (2007). Psychological Review 114: 488–502.
Kennedy, J. M. (1987). Lo, perception abhors not a contradiction. In: S. Petry and G. E. Meyer (eds.). The
Perception of Illusory Contours, pp. 253–61. New York: Springer.
316 van Lier and Gerbino
Kitaoka, A., Gyoba, J., Sakurai, K., and Kawabata, H. (2001). Similarity between Petter’s effect and visual
phantoms. Perception 30: 519–22.
Koenderink, J. (1990). Solid shape. Cambridge: MIT Press.
Koffka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt Brace.
Kogo, N. and Wagemans, J. (2013). The “side” matters: How configurality is reflected in completion.
Cognitive Neuroscience 4: 31–45.
Kogo, N., Strecha, C., van Gool, L., and Wagemans, J. (2010). Surface construction by a 2-D
differentiation–integration process: A neurocomputational model for perceived border ownership,
depth, and lightness in Kanizsa figures. Psychological Review 117: 406–39.
Komatsu, H. (2006) The neural mechanisms of perceptual filling-in. Nature Reviews Neuroscience 7: 220–31.
Koning, A. and van Lier, R. (2004) Mental rotation depends on the number of objects rather than on the
number of image fragments. Acta Psychologica 117: 65–77.
Kourtzi, Z. and Kanwisher, N. (2001). Representation of perceived object shape by the human lateral
occipital complex. Science 293: 1506–9.
Lawson, R. B. and Gulick, W. L. (1967). Stereopsis and anomalous contour. Vision Research 7: 271–97.
Lawson R. B. and Mount, D. C. (1967). Minimum condition for stereopsis and anomalous contour. Science
158: 804–6.
Lee T. S. and Nguyen M. (2001). Dynamics of subjective contour formation in the early visual cortex.
Proceedings of the National Academy of Sciences of the United States of America 98: 1907–11.
Leeuwenberg, E. L. J. (1969). Quantitative specification of information in sequential patterns. Psychological
Review 76: 216–20.
Leeuwenberg, E. L. J. (1971). A perceptual coding language for visual and auditory patterns. The American
Journal of Psychology 84: 307–49.
Leeuwenberg, E. L. J. and van der Helm, P. A. (2013). Structural Information Theory: The Simplicity of
Visual Form. Cambridge: Cambridge University Press.
Leyssen, S. (2011). “B moves farther than it should have done”: Perceived boundaries in Albert Michotte’s
experimental phenomenology of perception. In: M. Grote and M. Stadler (eds.) Membranes Surfaces
Boundaries: Interstices in the History of Science, Technology and Culture. Preprint 420, pp. 85–104.
Berlin: Max Planck Institute for the History of Science.
Mendola, J., Dale, A., Fischl, B., Liu, A., and Tootell, R. (1999). The representation of illusory and real
contours in human cortical visual areas revealed by functional magnetic resonance imaging. Journal of
Neuroscience 19: 8560–72.
Metzger, W. (1936). Gesetze des Sehens. Frankfurt: Kramer. English translation by L. Spillmann, S. Lehar, M.
Stromeyer, and M. Wertheimer (2006) The Laws of Seeing. Cambridge, MA: MIT Press.
Metzger, W. (1953). Gesetze des Sehens, 2nd edition. Frankfurt: Kramer.
Michotte, A. (1946/1963). The Perception of Causality. New York: Basic Books.
Michotte, A. and Burke, L. (1951). Une nouvelle énigme de la psychologie de la perception: le “donnée
amodal” dans l’experience sensorielle. Actes du XIII Congrés Internationale de Psychologie, Stockholm,
Proceedings and papers, pp. 179–80. Reprinted in: A. Michotte et collaborateurs (eds.) (1962), Causalité,
permanence et réalité phénoménales, pp. 347–71. Louvain: Publications Universitaires.
Michotte, A., Thinès, G., and Crabbé, G. (1964). Les Compléments Amodaux des Structure Perceptives.
Louvain: Publications Universitaires. English traslation, Amodal completion of perceptual structures.
In: G. Thinès, A. Costall, and G. Butterworth (eds.) (1991), Michotte’s Experimental Phenomenology of
Perception, pp. 140–67. Hillsdale, NJ: Erlbaum.
Minguzzi, G. F. (1987). Anomalous figures and the tendency to continuation. In: S. Petry and G. E. Meyer
(eds.). The Perception of Illusory Contours, pp. 71–5. New York: Springer.
Montaser-Kouhsari, L., Landy, M.S., Heeger, D. J., and Larsson, J. (2007) Orientation selective adaptation
to illusory contours in human visual cortex. Journal of Neuroscience 27: 2186–95.
Perceptual completions 317
Mooney, C. M. (1957). Age in the development of closure ability in children. Canadian Journal of
Psychology 11: 219–26.
Moore, C. and Cavanagh, P. (1998). Recovery of 3D volume from 2-tone images of novel objects. Cognition
67: 45–71.
Morinaga, S. (1941). Beobachtungen über Grundlagen und Wirkungen anschulich gleichmässiger Breite,
Archiv für die gesamte Psychologie 110: 310–48.
Nakayama, K. (2009). Nakayama, Shimojo, and Ramachandran’s 1990 paper. Perception 38: 859–77.
Nakayama, K., Shimojo, S., and Silverman, G. H. (1989). Stereoscopic depth: Its relation to image
fragmentation, grouping, and the recognition of occluded objects. Perception 18: 55–68.
Nakayama, K., Shimojo, S., and Ramachandran, V. S. (1990). Transparency: relation to depth, subjective
contours, luminance, and neon color spreading. Perception 19: 497–513.
Nieder, A. (2002). Seeing more than meets the eye: processing of illusory contours in animals. Journal of
Comparative Physiology A 188: 249–60.
Otsuka, Y., and Yamaguchi, M. K. (2003). Infants’ perception of illusory contours in static and moving
figures. Journal of Experimental Child Psychology 86: 244–51.
Palmer, S. E. (1999). Gestalt perception. In: R. A. Wilson and F. C. Keil (eds.). The MIT Encyclopedia of
Cognitive Science, pp. 344–6. Cambridge: MIT Press.
Palmer, E. M., Kellman, P. J., and Shipley, T. F. (2006). A theory of dynamic occluded and illusory object
perception. Journal of Experimental Psychology: General 135: 513–41.
Pan, Y., Chen, M., Yin, J., An, X. Zhang, X., Lu, Y., Gong, H., Li, W., and Wang, W. (2012). Equivalent
representation of real and illusory contours in macaque V4. The Journal of Neuroscience 32: 6760–70.
Pessoa, L. and De Weerd, P. (eds.) (2003). Filling-in: From Perceptual Completion to Cortical Reorganization.
New York: Oxford University Press.
Pessoa, L., Thompson, E. and Noë, A. (1998). Finding out about filling-in: a guide to perceptual completion
for visual science and the philosophy of perception. Behavioral and Brain Sciences 21: 723–48
(discussion 748–802).
Petter, G. (1956). Nuove ricerche sperimentali sulla totalizzazione percettiva. Rivista di Psicologia
50: 213–27.
Pinna, B. and Grossberg, S. (2006). Logic and phenomenology of incompleteness in illusory figures: New
cases and hypotheses. Psychofenia 9: 93–135.
Pinna, B., Ehrenstein, W. H., and Spillmann, L. (2004). Illusory contours and surfaces without amodal
completion and depth stratification. Vision Research 44: 1851–5.
Plomp, G., Liu, L., van Leeuwen, C., and Ioannides, A. (2006) The mosaic stage in amodal completion as
characterized by magnetoencephalography responses. Journal of Cognitive Neuroscience 18: 1394–405.
Purghé, F. and Katsaras, P. (1991). Figural conditions affecting the formation of anomalous surfaces: overall
configuration versus single stimulus part. Perception 20: 193–206.
Ramachandran, V. S. (1986). Capture of stereopsis and apparent motion by illusory contours. Perception
and Psychophysics 39: 361–73.
Ramsden, B., Hung, C., and Roe, A. (2001) Real and illusory contour processing in area V1 of the
primate: a cortical balancing act. Cerebral Cortex 11: 648–65.
Ratoosh, P. (1949). On interposition as a cue for the perception of distance. Proceedings of the National
Academy of Science USA 35: 257–9.
Rauschenberger, R. and Yantis, S. (2001). Masking unveils pre-amodal completion representation in visual
search. Nature 410: 369–72.
Rauschenberger, R., Liu, T., Slotnick, S.D., and Yantis, S. (2006) Temporally unfolding neural
representation of pictorial occlusion. Psychological Science 17: 358–64.
Rauschenberger, R., Peterson, M. A., Mosca, F., and Bruno, N. (2004). Amodal completion in visual
search: Preemption or context effects? Psychological Science 15: 351–5.
318 van Lier and Gerbino
Regolin, L. and Vallortigara, G. (1995) Perception of partly occluded objects by young chicks. Perception
and Psychophysics 57: 971–6.
Rensink, R. A. and Enns, J. T. (1998). Early completion of occluded objects. Vision Research 38: 2489–505.
Riecke, L., Micheyl, C., and Oxenham, A. (2012). Global not local masker features govern the auditory
continuity illusion. Journal of Neuroscience 32: 4660–64.
Ringach, D. and Shapley, R. (1996). Spatial and temporal properties of illusory contours and amodal
boundary completion. Vision Research 19: 3037–50.
Rock, I. (1983). The Logic of Perception. Cambridge: MIT Press.
Rock, I. (1984). Perception. New York: Freeman.
Rock, I. (1987). A problem-solving approach to illusory contours. In: S. Petry and G. E. Meyer (eds.). The
Perception of Illusory Contours, pp. 62–70. New York: Springer.
Rock, I. and Anson, R. (1979). Illusory contours as the solution to a problem. Perception 8: 665–81.
Rosenbach, O. (1902). Zur Lehre von den Urtheilstäuschungen. Zeitschrift für Psychologie 29: 434–48.
Rubin E. (1915). Synsoplevede Figurer. Copenhagen: Gyldendal. German translation (1921), Visuell
Wahrgenomme Figuren. Berlin: Gyldendal.
Sambin, M. (1974). Angular margins without gradient. Italian Journal of Psychology 1: 355–61.
Sampaio, A. C. (1943). La translation des objets comme facteur de leur permanence phénoménale
[The translation of objects as a factor in their phenomenal permanence]. Louvain: Éditions de
l’Institut Supérieur de Philosophie. Reprinted in: A. Michotte et collaborateurs (eds.) (1962),
Causalité, permanence et réalité phénoménales, pp. 33–90. Louvain: Publications Universitaires.
Seghier, M., Dojat, M., Delon-Martin, C., Rubin, C., Warnking, J., Segebarth, C., and Bullier, J.
(2000). Moving illusory contours activate primary visual cortex: an fMRI study. Cerebral Cortex
10: 663–70.
Seghier, M. L. and Vuilleumier, P. (2006). Functional neuroimaging findings on the human perception of
illusory contours. Neuroscience and Biobehavioral Reviews 30: 595–612.
Sekuler, A. (1994). Local and global minima in visual completion: effects of symmetry and orientation.
Perception 23: 529–45.
Sekuler, A. and Palmer, S. (1992). Perception of partly occluded objects: A microgenesis analysis. Journal of
Experimental Psychology: General 121: 95–111.
Sekuler, A., Palmer, S., and Flynn, C. (1994). Local and global processes in visual completion. Psychological
Science 5: 260–7.
Shepard, R. N. (2001). Perceptual-cognitive universals as reflections of the world. Behavioral and Brain
Sciences 24: 581–601.
Shipley, T. F. and Kellman, P. J. (1992). Perception of partly occluded objects and illusory figures: Evidence
for an identity hypothesis. Journal of Experimental Psychology: Human Perception and Performance
10: 106–20.
Shipley, T. F. and Kellman, P. J. (1994). Spatiotemporal boundary formation: Boundary, form, and motion
perception from transformations of surface elements. Journal of Experimental Psychology: General 123:
3–20.
Singh, M. (2004). Modal and amodal completion generate different shapes. Psychological Science 15: 454–9.
Singh, M., Hoffman, D. D., and Albert, M. K. (1999). Contour completion and relative depth: Petter’s rule
and support ratio. Psychological Science 10: 423–8.
Smith, W. C., Johnson, S. P., and Spelke, E. S. (2003). Motion and edge sensitivity in perception of object
unity. Cognitive Psychology 46: 31–64.
Soska, K. C. and Johnson, S. P. (2008). Development of three-dimensional object completion in infancy.
Child Development 79: 1230–6.
Soska, K. C., Adolph, K. E., and Johnson, S. P. (2010). Systems in development: Motor skill acquisition
facilitates three-dimensional object completion. Developmental Psychology 46: 129–38.
Perceptual completions 319
Sovrano, V. and Bisazza, A. (2008). Recognition of partly occluded objects by fish. Animal Cognition
11: 161–6.
Stanley, D. A. and Rubin, N. (2003). fMRI activation in response to illusory contours and salient regions in
the human Lateral Occipital Complex. Neuron 37: 323–31.
Street, R. F. (1931). A Gestalt Completion Test. New York: Teachers College, Columbia University.
Sugita, Y. (1999). Grouping of image fragments in primary visual cortex. Nature 401: 269–72.
Takeichi, H., Nakazawa, H., Murakami, I., and Shimojo, S. (1995). The theory of the curvature-constraint
line for amodal completion. Perception 24: 373–89.
Thornber, K. K. and Williams, L. R. (1997). Characterizing the distribution of completion shapes
with corners using a mixture of random processes. In: M. Pelillo and E. R. Hancock (eds.), Energy
Minimization Methods in Computer Vision and Pattern Recognition: Lecture Notes in Computer Science
Vol. 1223, pp. 19–34. Berlin: Springer.
Tommasi, L., Bressan, P., and Vallortigara, G. (1995). Solving occlusion indeterminacy in chromatically
homogeneous patterns. Perception 24: 391–403.
Tse, P. U. (1998). Illusory volumes from conformation. Perception 27: 977–92.
Tse, P. U. (1999a). Volume completion. Cognitive Psychology 39: 37–68.
Tse, P. U. (1999b). Complete mergeability and amodal completion. Acta Psychologica 102: 165–201.
Tse, P. U. (2002). A contour propagation approach to surface filling-in and volume formation. Psychological
Review 109: 91–115.
Tse, P. U. and Albert, M. K. (1998). Amodal completion in the absence of image tangent discontinuities.
Perception 27: 455–64
Valenza, E. and Bulf, H. (2011). Early development of object unity: evidence for perceptual completion in
newborns. Developmental Science 14: 1–10.
Vallortigara, G. and Tommasi, L. (2001). Minimization of modal contours: An instance of an evolutionary
internalized geometric regularity?. Behavioral and Brain Sciences 24: 706–7.
van der Helm, P. A. (2000). Simplicity versus likelihood in visual perception: From surprisals to precisals.
Psychological Bulletin, 126: 770–800.
van der Helm, P. A. (2011). Bayesian confusions surrounding simplicity and likelihood in perceptual
organization. Acta Psychologica 138: 337–46.
van der Helm, P. A. and Leeuwenberg, E. L. J. (1991). Accessibility, a criterion for regularity and hierarchy
in visual pattern codes. Journal of Mathematical Psychology 35: 151–213.
van der Helm, P. A. and Leeuwenberg, E. L. J. (1996). Goodness of visual
regularities: A nontransformational approach. Psychological Review 103: 429–56.
van Lier, R. (1999). Investigating global effects in visual occlusion: from a partly occluded square to the
back of a tree-trunk. Acta Psychologica 102: 203–20.
van Lier, R. (2001). Simplicity, regularity, and perceptual interpretations: A structural information
approach. In: T. Shipley and P. Kellman (eds.), From Fragments to Objects: Segmentation in Vision,
pp. 331–52. New York: Elsevier.
van Lier, R. and Wagemans, J. (1999). From images to objects: Global and local completions of self-occluded
parts. Journal of Experimental Psychology: Human Perception and Performance 25: 1721–41.
van Lier, R., van der Helm, P., and Leeuwenberg, E. (1994). Integrating global and local aspects of visual
occlusion. Perception 23:, 883–903.
van Lier, R., van der Helm, P., and Leeuwenberg, E. (1995a). Competing global and local completions in
visual occlusion. Journal of Experimental Psychology: Human Perception and Performance 21: 571–83.
van Lier, R., Leeuwenberg, E., and van der Helm, P. (1995b). Multiple completions primed by occlusion
patterns. Perception, 24: 727–40.
van Lier, R., de Wit, Tessa C. J., and Koning, A. (2006). Con-fusing contours and pieces of glass. Acta
Psychologica 123: 41–54.
320 van Lier and Gerbino
Vezzani, S. (1999). A note on the influence of grouping on illusory contours. Psychonomic Bulletin and
Review 6: 289–91.
von der Heydt, R. and Peterhans, E. (1989) Mechanisms of contour perception in monkey visual cortex.
I. Lines of pattern discontinuity. Journal of Neuroscience 9: 1731–48.
von der Heydt, R., Peterhans, E., and Baumgartner, G. (1984) Illusory contours and cortical neuron
responses. Science 224: 1260–2.
Vrins, S., De Wit, T., and van Lier, R. (2009). Bricks, butter, and slices of cucumber: Investigating semantic
influences in amodal completion. Perception 38: 17–29.
Vrins, S., Hunnius, S., and van Lier, R. (2011). Volume completion in 4.5-month-old infants. Acta
Psychologica 138: 92–9.
Wagemans, J., Feldman, J., Gepshtein, S., Kimchi, R., Pomerantz, J. R., van der Helm, P. A., and
van Leeuwen, C. (2012). A century of Gestalt psychology in visual perception: II. Conceptual and
theoretical foundations. Psychological Bulletin 138: 1218–52.
Wagemans, J., van Lier, R., and Scholl, B.J. (2006). Introduction to Michotte’s heritage in perception and
cognition research. Acta Psychologica 123: 1–19.
Waltz, D. (1975). Understanding line drawings of scenes with shadows. In: P. H. Winston (ed.). The
Psychology of Computer Vision, pp. 19–91. New York: McGraw Hill.
Weigelt, S., Singer, W., and Muckli, L., (2007). Separate cortical stages in amodal completion revealed by
functional magnetic resonance adaptation. BMC Neuroscience, 8:70 doi:10.1186/1471-2202-8-70.
Wertheimer, M. (1923). Untersuchungen zur Lehre von der Gestalt, II. Psychologische Forschung, 4, 301–50.
English translation in: L. Spillmann (ed.) (2012). On Perceived Motion and Figural Organization.
Cambridge: MIT Press.
Wouterlood, D. and Boselie, F. (1992). A critical discussion of Kellman and Shipley’s (1991) theory of
occlusion phenomena. Psychological Research 54: 278–85.
Yantis, S. (1995). Perceived continuity of occluded visual objects. Psychological Science 6: 182–6.
Yin, C., Kellman, P. J., and Shipley, T. (2000). Surface integration influences depth discrimination. Vision
Research 40: 1969–78.
Yonas, A., Craton, L., and Thompson, W. B. (1987). Relative motion: Kinetic information for the order of
depth at an edge. Perception and Psychophysics 41: 53–9.
Zanforlin, M. (1981). Visual perception of complex forms (anomalous surfaces) in chicks. Italian Journal of
Psychology 1: 1–16.
Zylinski, S., Darmaillacq, A.-S., and Shashar, N. (2012). Visual interpolation for contour completion by
the European cuttlefish (Sepia officinalis) and its use in dynamic camouflage. Proceedings of the Royal
Society B 279: 1–5.
Chapter 16
Introduction
Vision appears to be simple. We open our eyes and perceive a well organized world full of rec-
ognizable objects without any feeling of effort. The apparent ease with which we perceive the
world disguises the immense computational efforts necessary to segregate, localize, and recognize
objects. The difficulty of this task stems from the fact that (daytime) vision is based on the distrib-
uted pattern of activity across the millions of cones in the retina. This point-like representation
must be transformed by the neural circuitry of the visual system to produce our coherent percept.
The ultimate goal of this circuitry is to localize and recognize objects and to guide visually driven
behavior. To achieve this goal it is necessary to group together the activity patterns that are pro-
duced by one object (or figure) and to segregate these from patterns produced by other objects or
background regions.
The neuronal mechanisms by which the visual system segregates a figure from its back-
ground and groups together the elements belonging to the figure have been studied using a
texture-segmentation task. In the original version of this paradigm (Lamme 1995) a macaque
monkey was required to fixate on a central fixation dot. Then a full-screen texture composed of
thousands of oriented lines was presented. The texture contained a small square region made from
lines of the orthogonal orientation (Figure 16.1a) (a version using motion-defined textures was
also used and produced similar results). This region is perceived as a figure in front of, and there-
fore occluding, the background. The monkey’s task was to make an eye movement towards the
figure after the presentation of a go-cue. In some experiments (Self et al. 2012; Supèr et al. 2001)
there were also catch-trials with a uniform texture without a figure. On these trials the monkeys
were rewarded for maintaining fixation at the center after the presentation of the go-cue. Monkeys
generally perform very well on this task with performance levels of greater than ninety per cent
correct. The virtue of this paradigm is that it is possible to vary the position of the figure relative
to the receptive field(s) of the neuron(s) under study, while keeping the bottom-up activation of
the neurons constant (Figure 16.1b). If the figure is placed in the receptive field then the response
of the neuron to the figure can be tested (red condition in Figure 16.1b). If the figure is moved
elsewhere then the response to the background can be measured (blue condition). Importantly,
the orientation of the textures is always counterbalanced so that on average exactly the same line
elements fall into the RF in both the figure and ground conditions. This creates conditions in
which the visual information present in the RF is identical but the visual context is different. On
figure trials the RF falls on the behaviorally relevant texture, whereas on ground trials it falls on
the irrelevant background region.
(a) (b)
Spiking activity
figure
ground
4o
100ms Time
Ground Ground
Figure
Figure
Fig. 16.1 (a) An example stimulus used in the texture-segmentation task. The background texture
covers the entire screen and the monkey’s task is to make a saccade towards the small square
figure region. (b) In the figure condition the figure is centered on the RF of the recorded cell (red
condition). In the ground condition the figure is moved so that the RF falls on the background (blue
condition). Note that the orientation is also reversed so that identical line elements are present inside
the RF. The graph to the right illustrates the typical response of V1 cells. The early response (<100ms
after stimulus onset) is the same regardless of whether the RF was on the figure or background. In
the later time-period (>100ms) the responses to the figure (red line) are significantly higher than
those to the ground (blue line). The shaded grey region represents the modulation in firing and is
referred to as figure-ground modulation (FGM). (c) Boundaries can be detected through mutual
inhibition between cells tuned for the same orientation. Here cells on either side of the boundary
(the pink dashed line) have stronger responses than cells in the middle of the texture as they only
receive inhibition (the black bars) from one side. (d) Models of region growing suggest that the
figure-region becomes perceptually grouped through excitatory feedback from neurons in higher
visual areas tuned to the figural orientation (red cone). This leads to enhanced firing-rates across the
entire figure region.
The Neural Mechanisms of Figure-ground Segregation 323
The responses of neurons in V1 are modulated by the visual context (Figure 16.1b). In the previ-
ous studies responses for the large majority of neurons were stronger when the RF fell on a figure
compared to the background, on average by around forty per cent of the activity produced by the
background. We will refer to this modulation in firing-rate as figure-ground modulation (FGM).
Most notably this modulation did not begin until around 100ms after the onset of the texture
(40-50ms after the initial visual response in V1). The initial response was identical regardless of
the visual context showing that the input into V1 from the thalamus did not discriminate between
figure and ground. A follow-up study showed that figures defined by other cues (color, motion,
luminance, depth) produced similar levels of FGM in V1 (Zipser et al. 1996).
How does the visual system segregate such a texture? Psychophysical studies (Mumford et al.
1987; Wolfson and Landy 1998) have suggested that there are two complementary mechanisms
at work to segment the scene. The first is boundary detection, the enhancement of the borders
of the object (Figure 16.1c). We will propose that boundary detection is achieved through a
mixture of center-surround interactions mediated by feedforward anatomical connections and
mutual inhibition between neurons tuned for similar features mediated by horizontal connec-
tions within visual cortex. These processes rapidly enhance neural firing-rates at locations in
the visual scene where there are local changes in feature values. The second process is region
growing, which groups together regions of the scene with similar features (Figure 16.1d). We will
discuss evidence for a region growing process in which a surface label (also enhanced neuronal
activity) simultaneously arises across regions of similar feature values. We hypothesize that both
processes exist in visual cortex and work together to rapidly and accurately segment the visual
scene. The neural connection schemes for these processes are however, quite different, and their
timing differs too.
Boundary detection
Theory of boundary detection
A fundamental processing strategy in the visual system is to contrast feature information from
nearby regions of space. This strategy has the dual effect of making the visual system relatively
insensitive to uniform regions of the scene and enhancing the responses to regions in which
feature-values change. A well-known example of the neural implementation of this strategy is the
retinal ganglion cell. These cells have a center-surround receptive field organization; they respond
strongly to an increase or decrease in luminance restricted in size so that it selectively activates
the center mechanism. They are less driven however by uniform regions of luminance which
simultaneously activate the center and surround mechanism. This organization makes these cells
more responsive to luminance defined edges if the edge is correctly aligned with the receptive
field. A retinal ganglion cell would not however be able to signal the presence of the boundaries
in Figure 16.1a. These boundaries are defined by orientation and the luminance on each side of
the boundary is the same. Such orientation defined edges cannot be detected in the retina or
thalamus of primates because these structures lack cells that are selective for orientation; a cortical
mechanism is required.
In theory, orientation-defined texture boundaries could be detected by “orientation-opponent”
cells driven by one orientation in their center and the orthogonal orientation in their sur-
round. Such cells have however yet to be found in visual cortex. Instead it has been proposed
that these edges are detected through mutual inhibition between neurons tuned for the same
324 Self and Roelfsema
orientation (Grossberg and Mingolla 1985; Knierim and Van Essen 1992; Li 1999; Marr and
Hildreth 1980; Sillito et al. 1995). In such an iso-orientation inhibition scheme, the activity of
neurons that code image regions with a homogeneous orientation is suppressed, whereas the
amount of inhibition is smaller for neurons with RFs near a boundary so that their firing rate
is higher (Figure 16.1c). There is a good deal of evidence that iso-orientation suppression exists
in visual cortex. Cells in V1 that are well-driven by a line element of their preferred orienta-
tion are suppressed by placing line elements with a similar orientation in the nearby surround
(Knierim and Van Essen 1992). These surrounding elements do not drive the cell to fire them-
selves and are therefore demonstrably outside the classical receptive field of the V1 cells, yet
they strongly suppress the response of the cell to the center element. Importantly this suppres-
sion is greatly reduced if the line elements outside the RF are rotated so that they are orthogonal
to the preferred orientation of the cell. This result supports the idea that V1 neurons receive an
orientation-tuned form of suppression coming from regions surrounding the RF (Allman et al.
1985; Jones et al. 2001; Kastner et al. 1999; Levitt and Lund 1997; Nelson and Frost 1978; Sillito
et al. 1995). The time-course of this suppression is very rapid. Studies using grating stimuli have
determined that iso-orientation suppression can be observed within 25ms of the onset of the
visual response (Li et al. 2001; Nothdurft et al. 1999). One study which examined the latency
of this effect at the level of individual cells found even shorter latencies of around 7-10ms (Bair
et al. 2003). Thus, representations of the boundaries of objects in natural scenes are enhanced
and projected forwards to higher visual areas as part of (for luminance-defined boundaries), or
closely following (for texture-defined boundaries), the initial feedforward sweep of visual activ-
ity. Indeed, studies of the neuronal responses to the boundaries of texture defined figures in V1
(Lamme et al. 1999) and also in higher visual area V4 (Poort et al. 2012) find enhanced activity
at around 70ms after stimulus onset.
higher visual areas did so using anesthetized preparations and usually presented one object on the
screen at any one time. Studies in awake-behaving animals using multi-object scenes have revealed
that there are very strong inhibitory interactions which control the flow of information through
this feedforward network (Miller et al. 1993; Sheinberg & Logothetis 2001). Stimulus representa-
tions compete with one another so that at the level of IT there may only be active representations
for one or a few objects at a time (Desimone and Duncan 1995). This competition is strongly biased
by behavioral relevance so that relevant objects tend to win the representational battle (Luck et al.
1997; Reynolds et al. 1999). In natural images that typically contain many overlapping objects this
may mean that very few objects are represented at high levels of the visual system placing a severe
limit on the number of objects that can be grouped by fast feedforward processes.
In summary feedforward grouping of elements using complex receptive fields has many advan-
tages, such as its speed. It is unlikely, however, that feedforward processing would be able to
correctly group scenes containing novel objects and determine their location with high spatial
resolution. Furthermore, the inhibitory interactions that curtail the flow of information towards
higher visual areas imply that feedforward processes are not sufficient to group scenes contain-
ing multiple, overlapping or ambiguous objects. In these situations extra grouping processes are
required which are more flexible, but this additional flexibility may come at the cost of taking
more time.
Region growing
What is region growing?
How is the rest of the object grouped together once its boundaries have been detected? One mecha-
nism that has been used in computational models is region growing. Region growing is the coun-
terpart to the boundary detection process described above. Whereas boundary detection enhances
responses at the borders of an object, region growing has been proposed to begin in regions of
uniform feature-value and to spread outwards until encountering a feature-boundary (Grossberg
and Mingolla 1985), although we will later suggest that region growing proceeds simultaneously
across large regions of uniform texture. Region growing relies on statistical similarities between
features (Grossberg and Mingolla 1985; Mumford et al. 1987; Wolfson and Landy 1998). Regions
with similar features are grouped together and thereby segregated from regions with different fea-
ture values. Psychophysical studies have demonstrated that the performance of human observers
on shape discrimination tasks is best explained by models which use mechanisms for boundary
detection as well as for region growing (Mumford et al. 1987). Indeed, humans can discriminate
between textures which are physically separated from one another so that the boundary detection
process cannot be used (Wolfson and Landy 1998). Computational models of texture segmentation
stipulated that region growing requires an entirely different connection schemes than boundary
detection (Bhatt et al. 2007; Grossberg and Mingolla 1985; Poort et al. 2012; Roelfsema et al. 2002).
Whereas boundary detection requires iso-orientation inhibition, i.e. cells encoding the same feature
should inhibit one another (as was discussed above), region growing requires iso-orientation excita-
tion, which means that cells that represent similar features enhance each other’s activity.
It should be noted that this latency only applies to texture-defined boundaries. Luminance defined bounda-
1
ries can be detected through center-surround processes such as the receptive-field of the retinal ganglion cell
described above. Enhanced activity at luminance defined boundaries can be seen in the feedforward input
into V1 (Sugihara et al. 2011) and does not require the kinds of interactions that we discuss here.
The Neural Mechanisms of Figure-ground Segregation 327
V2 V4
V1 V2
V1
Input
V2
V1
Fig. 16.2 (a) A model of figure-ground segmentation (Roelfsema et al. 2002; Roelfsema and
Houtkamp 2011). Neurons encoding the edges of the figure have enhanced activity as they receive
less horizontal inhibition (orange arrows) from their neighbors. (b) The input stimulus produces
increased activity throughout the visual hierarchy (averaged across orientation maps), the edges of
the figure merge together in the large RFs of high-level areas such as TEO. (c) Neurons in higher
visual areas send feedback back to neurons in lower areas. This feedback is gated by the activity
of neurons in lower visual areas and enhances responses throughout the figure (region growing).
(d) The result of the model is that early responses are enhanced at the boundaries of the figure
whereas at later time-points the response enhancement also spreads to the center of the figure.
Reproduced from Attention, Perception, and Psychophysics, 7(8), pp. 2542–2572, Incremental grouping of image
elements in vision, Pieter R. Roelfsema, Copyright © 2011, Springer-Verlag. With kind permission from Springer
Science and Business Media.
the N/U or the hole, this is later overruled by feedback from higher areas which do not extract the
interior of these figures due to the poor spatial resolution of their RFs.
could only be observed with very small figures (up to 2° in diameter) and did not observe FGM
in the center of larger figures. They suggested that FGM is in fact a boundary detection signal and
becomes greatly reduced as one moves away from the boundary. Both of these viewpoints sug-
gest that there is no region growing signal present in V1 and that neural activity in V1 does not
reflect surface perception, but rather the presence of nearby boundaries. Poort et al. (2012) recon-
ciled these apparently conflicting findings by showing that region growing is only pronounced for
behaviorally relevant objects (see below).
spreading of boundary detection signals from the borders of the object (Li 1999; Rossi et al.
2001; Zhaoping 2005).
We have carried out two recent studies which directly investigated the contribution of feedfor-
ward, lateral, and feedback connections to boundary detection and region growing. In the first
(Poort et al. 2012) we studied the effect of task-relevance on the enhanced firing at the bounda-
ries and the center of a figure. We found that FGM at the center of the figure (region filling)
depends strongly on the task that the monkey is doing, whereas boundary detection has only a
weak dependence. This result indicates that the processes that underlie boundary detection are
largely stimulus-driven, in accordance with a strong contribution from lateral and feedforward
inhibition, and that region-filling indeed depends more strongly on feedback connections from
higher visual areas.
In the second study (Self et al. 2013) we made laminar recordings of activity in V1 while mon-
keys performed a figure-ground task. Importantly, these laminar recordings provide unique
information about the neural circuitry underlying FGM as they allow us to examine the synaptic
currents and spiking changes that are produced at the borders and center of a perceptual figure.
We found that boundary detection engages different laminar circuits than region-filling. Taken
together these studies suggest that FGM observed at the center of the figure is not an extension of
a boundary detection signal at the edges.
RF on Centre
Figure-detection
(c) (d)
FGM
0.1
0.05
0 0.06
400 400
−4 −2 −4 −2 s)
FGM
0 2 0 0 2 0 m
4 4 e(
m
Posi Posi Ti
tion 0 tion
(deg (deg
) )
Fig. 16.3 (a) The paradigm used to study the effect of attention on FGM. The monkeys were always
presented with two curves in the upper-half of the screen and a texture-defined figure in the bottom
half (shown in plain colors here for simplicity). On different days the monkey performed different
tasks. On curve-tracing days the monkey had to make an eye-movement towards the target circle
that was connected to the fixation-point by a curve. On figure-detection days he had to make a
saccade towards the figure. (b) The position of the figure relative to the RF was varied on each trial
to map out responses to the background, edge, and center of the figure. (c) The 3D color-plot shows
the amount of FGM according to position of the figure during the figure-detection task. The plot on
the left-hand side shows the response at the edge of the figure (red) vs. the center (blue). (d) FGM
during the curve-tracing task. When attention is directed to the curve-tracing task the level of FGM
is reduced in the center of the figure. The response at the edges was relatively unaffected.
Reprinted from Neuron, 75(1), Jasper Poort, Florian Raudies, Aurel Wannig, Victor A.F. Lamme, Heiko Neumann,
and Pieter R. Roelfsema, The Role of Attention in Figure-Ground Segregation in Areas V1 and V4 of the Visual
Cortex, pp. 143–56, Copyright © 2012, with permission from Elsevier.
demonstrated enhanced edge-responses when animals ignore a stimulus (Marcus and Van Essen
2002) or even when animals are anesthetized (Kastner et al. 1997; Nothdurft et al. 1999; Nothdurft
et al. 2000). In contrast, our results show that the responses at the figure center depend on the
task-relevance of the figure. When the figure is behaviorally relevant then responses at the center
of the figure are similar to those at the edge, but when attention was directed to the other task
the responses fell to approximately halfway between the edge responses and the response to the
background. This result leads us to draw two conclusions. Firstly, that the process responsible for
The Neural Mechanisms of Figure-ground Segregation 331
boundary-enhancement is different to the process responsible for FGM at the center of the figure.
Secondly, while FGM at the figure-center is influenced by attention, it still arises in the absence
of attention. These results are in good agreement with a study that examined the effect of atten-
tion on border-ownership cells (Qiu et al. 2007), which found that border-ownership signal can
also be observed outside the focus of attention, but that attention can amplify coding of border
ownership. These results are consistent with our hypothesis that boundary detection, which is
thought to rely on iso-orientation inhibition, depends on an early process that may rely on feed-
forward or lateral connections (Figure 16.2a), whereas the FGM at the figure center depends on
iso-orientation excitation, which is mediated by feedback from higher visual areas (Figure 16.2c).
A process that depends on the activity in higher visual areas is expected to depend more strongly
on the task-relevance of the figure.
What then is the advantage of enhancing neural activity on figures compared to background?
One possibility is that by increasing the responses of neurons in early visual areas, which have
small RFs providing excellent spatial resolution, the visual system can more accurately localize
the figure to guide behavior. The neuronal processes that are responsible for making a saccade
to the center of the figure might take advantage of the FGM, because it selectively labels all the
image elements of the figure. The spatial profile of FGM can therefore be read out by the saccadic
system to determine the center of gravity of the image elements that belong to the figure. We
assessed this possibility by examining the relationship between the level of FGM in V1 and the
spatial accuracy of the saccade. The animals in this study were required to make very accurate
saccades to a 2.5° window centered within the 4° figure. We found that the spatial profile of FGM
in V1 indeed predicted the landing-point of the saccade on the figure. On trials where FGM was
strongest on the left-hand side of the figure the animal tended to make saccades that landed to
the left of center. The opposite was observed on trials with strong FGM on the right-hand side.
Trials with modulation spread evenly through the figure were associated with the most accurate
saccades. This result suggests that the FGM signal in V1 is used by the motor-system to plan sac-
cades to the center of gravity of the image elements that belong to the figure, possibly through the
direct projections from V1 to the superior colliculus (Fries and Distel 1983; Wurtz and Albano
1980). These and previous results, taken together, show that the activity in V1 is closely associ-
ated with both the perception of the animal (Supèr et al. 2001) and the spatial accuracy of the
behavioral output.
spaced contacts. The advantage of these electrodes is that they also allow the application of cur-
rent source density (CSD) analysis to the local field potential (Mitzdorf 1985;Schroeder et al.
1991;Schroeder et al. 1998). This analysis reveals the laminar locations of current sinks (currents
flowing into neurons) and current sources (mostly passive current return to the extracellular
space). We recorded MUA and CSD responses evoked by the center and edge of the figure, as well
as to the background texture.
The results of this study were very revealing. Firstly we found strong laminar variations in the
strength of FGM at the center of the figure (Figure 16.4a). FGM was strongest in the superficial
and deep layers and significantly weaker in layer 4. The latency of modulation was relatively con-
stant across the layers, beginning at around 100ms after stimulus onset, so from latency analyses
0.1
MUA
layer 4
0.5 0
layer 4 0
0
Centre
deep –0.25
deep –0.5
–0.5 –0.5
0 100 200 0 0.1 0 100 200 300
Time (ms) MUA
Time from stimulus onset (ms)
0 0.03 0.06 0.10
MUA
(c) (d)
0.1
MUA
0.8
1 0.4
Laminar depth (mm)
0.5
layer 4
0.5
0
0
0
deep
Edge
–0.4
–0.5 –0.5
0 100 200 0 0.1 0 100 200 300
Time (ms) MUA Time from stimulus onset (ms)
0 0.03 0.06 0.1
MUA
Fig. 16.4 FGM in the center of the figure (a) and at the edge (c) averaged across a number of
penetrations. The color-plots show the laminar profile of FGM—the difference in MUA evoked
by figure and background. The edge specifically causes early FGM in the superficial layers (white
arrow in c). The panels above show the MUA-response averaged across all laminae; panels to the
right show MUA response averaged across time. (b) Difference in the CSD evoked by the figure
center and background. Warm colors show stronger sinks in the figure condition (and/or stronger
sources in the ground condition) and cooler colors stronger sources. The black arrows indicate the
first sinks that differentiate between figure and background at a latency of ~100ms in layer 5 and
layer 1. (d) The difference in CSD between the figure edge and the background. The earliest sinks
occur in upper layer 4/layer 3 and then in layer 2 (black arrows).
The Neural Mechanisms of Figure-ground Segregation 333
it was difficult to determine the source of this increase in spiking. Even more revealing was the
difference in current flow between the figure and ground conditions. In the figure condition we
observed extra current sinks flowing very superficially in layer 1 and/or upper layer 2 as well as
in layer 5 (Figure 16.4b). These layers are well-known to be the targets of feedback projections
from V2 to V1 (Anderson and Martin 2009; Rockland and Pandya 1979). These results therefore
support the idea that feedback projections, targeting layers 1 and 5, are the source of the increased
spiking in V1 for the center of the figure.
When we placed the boundary of the figure in the RF we observed an extra component to the
FGM signal that started at approximately 70ms after stimulus onset (arrow in Figure 16.4c). This
early boundary-FGM has also been observed in previous studies of texture-segregation (Lamme
et al. 1999; Nothdurft et al. 2000; Poort et al. 2012), but interestingly in our study the modula-
tion was confined entirely to the superficial layers of cortex. At later time-points (>100ms) this
modulation was followed by a pattern of spiking activity very similar to that observed at the figure
center. CSD analysis revealed an extra current sink in the edge condition compared to the center
at around 70ms beginning in upper layer 4 and extending into the superficial layers at the same
time as the increase in spiking in these layers (arrows in Figure 16.4d). It is clear from both the
pattern of MUA and CSD that the mechanisms underlying early FGM at the edge of the figure
differ from the mechanisms responsible for the FGM at the center. On the other hand, at later
time-points (>100ms) the MUA and CSD modulation at the edge resembled quite closely the
FGM at the center. We therefore suggest that the early edge FGM is the result of horizontal pro-
jections which are densest in upper layer 4 and superficial layers, whereas the later FGM at the
edge might reflect a feedback-signal targeting the entire figure-region. This study therefore pro-
vides good evidence that both boundary detection processes (mediated by local connections) and
region-filling processes (mediated by feedback connections) play a role in segregating textures
and that these processes occur in different layers of cortex, and at different times.
of a magnesium ion in the channel pore. At the more depolarized levels that occur if a cell receives
other sources of input, the magnesium block is removed and the channel begins to pass current.
This mechanism implies that NMDA-Rs can act as coincidence detectors that are only active if
the neuron is depolarized by AMPA-R activation (Daw et al. 1993). NMDA-Rs would therefore
be well-placed to mediate the gating of a feedback-based modulatory signal, as these receptors
are unable to activate neurons that are not receiving synaptic input from other sources. There is
some evidence to suggest that NMDA-Rs may be more strongly involved in feedback process-
ing than in feedforward transmission. For example responses in thalamo-cortical recipient layers
are unaffected by APV, a drug that blocks all NMDA-Rs (Fox et al. 1990; Hagihara et al. 1988).
Furthermore, NMDA has found to produce multiplicative effects on firing in the superficial and
deep layers of visual cortex (Fox et al. 1990) and NMDA-Rs therefore provide a possible mech-
anism for the gating of feedback by feedforward activity. It is unlikely however that feedback
connections target synapses that only possess NMDA-Rs as synapses without AMPA-Rs are not
functional. It is possible however that feedback connections target synapses that are particularly
rich in NMDA-Rs. An alternative possibility has been raised by through the work of Matthew
Larkum who has shown that NMDA-Rs are required to integrate the inputs to the apical dendrites
of layer 5 neurons (Larkum et al. 2009). These dendrites are found in layer 1, the layer which is
the predominant target of feedback connections. It may be possible therefore that feedback con-
nections target layer 1, but cannot effectively modulate the firing-rate of cells unless NMDA-Rs
are activated.
Modulation Index
FGM (Drug)
0.4
0.2 0.05
0
0
0 50 100 150 200 PRE DRUG
Time from figure onset (ms)
(b) 0.25
APV Figure
0.8
Ground 0.2
FGM (Pre)
Modulation Index
0.6
Normalised MUA
FGM (Drug)
0.15
0.4
0.1
0.2
0.05
0
0
0 50 100 150 200 PRE DRUG
Time from figure onset (ms)
(c) IFENPRODIL
Figure
1.2
Ground
1 FGM (Pre)
FGM (Drug) 0.2
Normalised MUA
0.8
Modulation Index
0.6
0.4 0.1
0.2
0
0
0 50 100 150 200 PRE DRUG
Time from figure onset (ms)
Fig. 16.5 (a) An example of the effect of an injection of CNQX (an AMPA receptor antagonist). The
blue curves show the pre-drug response, the red drug shows the response recorded immediately
after the pressure injection of CNQX. The drug strongly reduced the initial response but had no
significant effect on the level of FGM. The right-hand graph shows a pre- and post-drug modulation
index score which is independent of the overall activity level (calculated as (Fig-Gnd)/(Fig+Gnd) using
the average activity from 0-200ms post-stimulus). (b) An example of the effect of APV, a broadband
NMDA-R antagonist. The drug has a minor effect of the initial activity level, but strongly reduces
FGM. (c) Ifenprodil blocks NMDA-Rs containing the NR2B subunit. This drug paradoxically increases
responses in general, but also causes a strong reduction in the level of FGM.
The Neural Mechanisms of Figure-ground Segregation 337
The effect of ifenprodil in this experiment was particularly interesting. Ifenprodil blocks
NMDA-Rs which contain the NR2B subunit (Williams 1993). This drug would therefore be
expected to generally reduce neural activity. In contrast we found that ifenprodil increases neu-
ral activity, while at the same time reducing figure-ground modulation. This combination of
effects suggests that NMDA-Rs containing the NR2B-subunit may be situated predominantly on
interneurons involved in inhibiting neural responses. It is not possible to determine from this
data whether the general effect of ifenprodil on excitability involves the same mechanisms that
produce the reduction in FGM. It may be possible to determine more precisely the role of the
different receptor subtypes by examining the distribution of different NMDA subunits on the dif-
ferent cell-types of V1 in future studies.
Acknowledgements
The research leading to these results has received funding from the European Union Sixth and
Seventh Framework Programmes (EU IST Cognitive Systems, project 027198 ‘‘Decisions in
Motion’’ and project 269921 ‘‘BrainScaleS’’) and a NWO-VICI grant awarded to P.R.R.
338 Self and Roelfsema
References
Allman, J., Miezin, F., and McGuinness, E. (1985). Stimulus specific responses from beyond the classical
receptive field: neurophysiological mechanisms for local-global comparisons in visual neurons. Annu
Rev Neurosci 8: 407–30.
Anderson, J.C. and Martin, K.A. (2009). The synaptic connections between cortical areas V1 and V2 in
macaque monkey. J Neurosci 29: 11283–93.
Angelucci, A. and Bullier, J. (2003). Reaching beyond the classical receptive field of V1 neurons: horizontal
or feedback axons? J Physiol Paris 97: 141–54.
Angelucci, A., Levitt, J.B., Walton, E.J., Hupe, J.M., Bullier, J., and Lund, J.S. (2002). Circuits for local and
global signal integration in primary visual cortex. J Neurosci 22: 8633–46.
Bair, W., Cavanaugh, J.R., and Movshon, J.A. (2003). Time course and time-distance relationships for
surround suppression in macaque V1 neurons. J Neurosci 23: 7690–701.
Bhatt, R., Carpenter, G.A., and Grossberg, S. (2007). Texture segregation by visual cortex: perceptual
grouping, attention, and learning. Vision Res 47: 3173–211.
Brincat, S.L. and Connor, C.E. (2004). Underlying principles of visual shape selectivity in posterior
inferotemporal cortex. NatNeurosci 7: 880–6.
Craft, E., Schutze, H., Niebur, E., and von der Heydt, R. (2007). A neural model of figure-ground
organization. J Neurophysiol 97: 4310–26.
Crick, F. and Koch, C. (1998). Constraints on cortical and thalamic projections: the no-strong-loops
hypothesis. Nature 391: 245–50.
Daw, N.W., Stein, P.S., and Fox, K. (1993). The role of NMDA receptors in information processing. Annu
Rev Neurosci 16: 207–22.
Dehaene, S., Sergent, C., and Changeux, J.P. (2003). A neuronal network model linking subjective reports
and objective physiological data during conscious perception. Proc Natl Acad Sci USA 100: 8520–5.
Desimone, R. and Duncan, J. (1995). Neural mechanisms of selective visual attention. Annu Rev Neurosci
18: 193–222.
Douglas, R.J. and Martin, K.A. (2004). Neuronal circuits of the neocortex. Annu Rev Neurosci 27: 419–51.
Ekstrom, L.B., Roelfsema, P.R., Arsenault, J.T., Bonmassar, G., and Vanduffel, W. (2008). Bottom-up
dependent gating of frontal signals in early visual cortex. Science 321: 414–17.
Felleman, D.J. and Van Essen, D.C. (1991). Distributed hierarchical processing in the primate cerebral
cortex. Cereb Cortex 1: 1–47.
Fox, K., Sato, H., and Daw, N. (1990). The effect of varying stimulus intensity on NMDA-receptor activity
in cat visual cortex. J Neurophysiol 64: 1413–28.
Fries, W. and Distel, H. (1983). Large layer VI neurons of monkey striate cortex (Meynert cells) project to
the superior colliculus. Proc R Soc Lond B Biol Sci 219: 53–9.
Gilbert, C.D. and Wiesel, T.N. (1983). Clustered intrinsic connections in cat visual cortex. J Neurosci 3:
1116–33.
Grossberg, S. and Mingolla, E. (1985). Neural dynamics of form perception: boundary completion, illusory
figures, and neon color spreading. Psychol Rev 92: 173–211.
Hagihara, K., Tsumoto, T., Sato, H., and Hata, Y. (1988). Actions of excitatory amino acid antagonists on
geniculo-cortical transmission in the cat’s visual cortex. Exp Brain Res 69: 407–16.
Jehee, J.F., Lamme, V.A., and Roelfsema, P.R. (2007). Boundary assignment in a recurrent network
architecture. Vision Res 47: 1153–65.
Johnson, R.R. and Burkhalter, A. (1994). Evidence for excitatory amino acid neurotransmitters in forward
and feedback corticocortical pathways within rat visual cortex. Eur J Neurosci 6: 272–86.
Jones, H.E., Grieve, K.L., Wang, W., and Sillito, A.M. (2001). Surround suppression in primate V1.
J Neurophysiol 86: 2011–28.
The Neural Mechanisms of Figure-ground Segregation 339
Kastner, S., Nothdurft, H.C., and Pigarev, I.N. (1997). Neuronal correlates of pop-out in cat striate cortex.
Vision Res 37: 371–6.
Kastner, S., Nothdurft, H.C., and Pigarev, I.N. (1999). Neuronal responses to orientation and motion
contrast in cat striate cortex. Vis Neurosci 16: 587–600.
Kayaert, G., Biederman, I., Op de Beeck, H.P., and Vogels, R. (2005). Tuning for shape dimensions in
macaque inferior temporal cortex. Eur J Neurosci 22: 212–24.
Knierim, J.J. and Van Essen, D.C. (1992). Neuronal responses to static texture patterns in area V1 of the
alert macaque monkey. J Neurophysiol 67: 961–80.
Kogo, N., Strecha, C., Van, G.L., and Wagemans, J. (2010). Surface construction by a 2-D
differentiation-integration process: a neurocomputational model for perceived border ownership, depth,
and lightness in Kanizsa figures. Psychol Rev 117: 406–39.
Lamme, V.A. (1995). The neurophysiology of figure-ground segregation in primary visual cortex. J Neurosci
15: 1605–15.
Lamme, V.A., Rodriguez-Rodriguez, V., and Spekreijse, H. (1999). Separate processing dynamics for
texture elements, boundaries and surfaces in primary visual cortex of the macaque monkey. Cereb
Cortex 9: 406–13.
Larkum, M.E., Nevian, T., Sandler, M., Polsky, A., and Schiller, J. (2009). Synaptic integration in tuft
dendrites of layer 5 pyramidal neurons: a new unifying principle. Science 325: 756–60.
Levitt, J.B. and Lund, J.S. (1997). Contrast dependence of contextual effects in primate visual cortex. Nature
387: 73–6.
Li, W., Thier, P., and Wehrhahn, C. (2001). Neuronal responses from beyond the classic receptive field in
V1 of alert monkeys. Exp Brain Res 139: 359–71.
Li, Z. (1999). Visual segmentation by contextual influences via intra-cortical interactions in the primary
visual cortex. Network 10: 187–212.
Luck, S.J., Chelazzi, L., Hillyard, S.A., and Desimone, R. (1997). Neural mechanisms of spatial selective
attention in areas V1, V2, and V4 of macaque visual cortex. J Neurophysiol 77: 24–42.
Lumer, E.D., Edelman, G.M., and Tononi, G. (1997). Neural dynamics in a model of the thalamocortical
system. I. Layers, loops and the emergence of fast synchronous rhythms. Cereb Cortex 7: 207–27.
Marcus, D.S. and Van Essen, D.C. (2002). Scene segmentation and attention in primate cortical areas
V1 and V2. J Neurophysiol 88: 2648–58.
Marr, D. and Hildreth, E. (1980). Theory of edge detection. Proc R Soc Lond B Biol Sci 207: 187–217.
Martinez-Trujillo, J.C. and Treue, S. (2004). Feature-based attention increases the selectivity of population
responses in primate visual cortex. Curr Biol 14: 744–51.
Maunsell, J.H. and Van Essen, D.C. (1983). The connections of the middle temporal visual area (MT) and
their relationship to a cortical hierarchy in the macaque monkey. J Neurosci 3: 2563–86.
Miller, E.K., Gochin, P.M., and Gross, C.G. (1993). Suppression of visual responses of neurons in inferior
temporal cortex of the awake macaque by addition of a second stimulus. Brain Res 616: 25–9.
Mitzdorf, U. (1985). Current source-density method and application in cat cerebral cortex: investigation of
evoked potentials and EEG phenomena. Physiol Rev 65: 37–100.
Mumford, D., Kosslyn, S.M., Hillger, L.A., and Herrnstein, R.J. (1987). Discriminating figure from
ground: the role of edge detection and region growing. Proc Natl Acad Sci USA 84: 7354–8.
Nassi, J.J. and Callaway, E.M. (2009). Parallel processing strategies of the primate visual system. Nat Rev
Neurosci 10: 360–72.
Nelson, J.I. and Frost, B.J. (1978). Orientation-selective inhibition from beyond the classic visual receptive
field. Brain Res 139: 359–65.
Nothdurft, H.C., Gallant, J.L., and Van Essen, D.C. (1999). Response modulation by texture surround in
primate area V1: correlates of “popout” under anesthesia. Vis Neurosci 16: 15–34.
340 Self and Roelfsema
Nothdurft, H.C., Gallant, J.L., and Van Essen, D.C. (2000). Response profiles to texture border patterns in
area V1. Vis Neurosci 17: 421–36.
Perkel, D.J., Bullier, J., and Kennedy, H. (1986). Topography of the afferent connectivity of area 17 in the
macaque monkey: a double-labelling study. J Comp Neurol 253: 374–402.
Poort, J., Raudies, F., Wannig, A., Lamme, V.A., Neumann, H., and Roelfsema, P.R. (2012). The role of
attention in figure-ground segregation in areas V1 and V4 of the visual cortex. Neuron 75: 143–56.
Qiu, F.T., Sugihara, T., and von der, H.R. (2007). Figure-ground mechanisms provide structure for selective
attention. Nat Neurosci 10: 1492–9.
Reynolds, J.H., Chelazzi, L., and Desimone, R. (1999). Competitive mechanisms subserve attention in
macaque areas V2 and V4. J Neurosci 19: 1736–53.
Rockland, K.S. and Pandya, D.N. (1979). Laminar origins and terminations of cortical connections of the
occipital lobe in the rhesus monkey. Brain Res 179: 3–20.
Rockland, K.S. and Van Hoesen, G.W. (1994). Direct temporal-occipital feedback connections to striate
cortex (V1) in the macaque monkey. Cereb Cortex 4: 300–13.
Rockland, K.S. and Virga, A. (1989). Terminal arbors of individual “feedback” axons projecting from area
V2 to V1 in the macaque monkey: a study using immunohistochemistry of anterogradely transported
Phaseolus vulgaris-leucoagglutinin. J Comp Neurol 285: 54–72.
Roelfsema, P.R. (2006). Cortical algorithms for perceptual grouping. Annu Rev Neurosci 29: 203–27.
Roelfsema, P.R. and Houtkamp, R. (2011). Incremental grouping of image elements in vision. Atten Percept
Psychophys 73: 2542–72.
Roelfsema, P.R., Lamme, V.A., Spekreijse, H., and Bosch, H. (2002). Figure-ground segregation in a
recurrent network architecture. J Cogn Neurosci 14: 525–37.
Roelfsema, P.R., Khayat, P.S., and Spekreijse, H. (2003). Subtask sequencing in the primary visual cortex.
Proc Natl Acad Sci USA 100: 5467–72.
Rossi, A.F., Desimone, R., and Ungerleider, L.G. (2001). Contextual modulation in primary visual cortex
of macaques. J Neurosci 21: 1698–709.
Salin, P.A. and Bullier, J. (1995). Corticocortical connections in the visual system: structure and function.
Physiol Rev 75: 107–54.
Salin, P.A., Bullier, J., and Kennedy, H. (1989). Convergence and divergence in the afferent projections to
cat area 17. J Comp Neurol 283: 486–512.
Salin, P.A., Kennedy, H., and Bullier, J. (1995). Spatial reciprocity of connections between areas 17 and 18
in the cat. Can J Physiol Pharmacol 73: 1339–47.
Schroeder, C.E., Tenke, C.E., Givre, S.J., Arezzo, J.C., and Vaughan, H.G., Jr. (1991). Striate cortical
contribution to the surface-recorded pattern-reversal VEP in the alert monkey. Vision Res 31: 1143–57.
Schroeder, C.E., Mehta, A.D., and Givre, S.J. (1998). A spatiotemporal profile of visual system activation
revealed by current source density analysis in the awake macaque. Cereb Cortex 8: 575–92.
Self, M.W., Kooijmans, R.N., Super, H., Lamme, V.A., and Roelfsema, P.R. (2012). Different glutamate
receptors convey feedforward and recurrent processing in macaque V1. Proc Natl Acad Sci USA
109: 11031–6.
Self, M. W., van Kerkoerle, T., Supèr, H., and Roelfsema, P.R. (2013). Distinct roles of the cortical layers of
area V1 in figure-ground segregation. Curr Biol 23: 2121–9.
Sheinberg, D.L. and Logothetis, N.K. (2001). Noticing familiar objects in real world scenes: the role of
temporal cortical neurons in natural vision. J Neurosci 21: 1340–50.
Sherman, S.M. and Guillery, R.W. (1998). On the actions that one nerve cell can have on
another: distinguishing ‘drivers’ from ‘modulators’. Proc Natl Acad Sci USA 95: 7121–6.
Shmuel, A., Korman, M., Sterkin, A., Harel, M., Ullman, S., Malach, R., and Grinvald, A. (2005).
Retinotopic axis specificity and selective clustering of feedback projections from V2 to V1 in the owl
monkey. J Neurosci 25: 2117–31.
The Neural Mechanisms of Figure-ground Segregation 341
Sillito, A.M., Grieve, K.L., Jones, H.E., Cudeiro, J., and Davis, J. (1995). Visual cortical mechanisms
detecting focal orientation discontinuities. Nature 378: 492–6.
Stettler, D.D., Das, A., Bennett, J., and Gilbert, C.D. (2002). Lateral connectivity and contextual
interactions in macaque primary visual cortex. Neuron 36: 739–50.
Sugihara, T., Qiu, F.T., and von der, H.R. (2011). The speed of context integration in the visual cortex.
J Neurophysiol 106: 374–85.
Supèr, H., Spekreijse, H., and Lamme, V.A. (2001). Two distinct modes of sensory processing observed in
monkey primary visual cortex (V1). Nature Neuroscience 4: 304–10.
Tanaka, K. (1993). Neuronal mechanisms of object recognition. Science 262: 685–8.
Thorpe, S., Fize, D., and Marlot, C. (1996). Speed of processing in the human visual system. Nature
381: 520–2.
Treisman, A.M. and Gelade, G. (1980). A feature-integration theory of attention. Cogn Psychol 12: 97–136.
Treue, S. and Martinez-Trujillo, J.C. (1999). Feature-based attention influences motion processing gain in
macaque visual cortex. Nature 399: 575–9.
Wannig, A., Stanisor, L., and Roelfsema, P.R. (2011). Automatic spread of attentional response modulation
along Gestalt criteria in primary visual cortex. Nat Neurosci 14 1243–4.
Williams, K. (1993). Ifenprodil discriminates subtypes of the N-methyl-D-aspartate receptor: selectivity
and mechanisms at recombinant heteromeric receptors. MolPharmacol 44: 851–9.
Wolfe, J.M., Cave, K.R., and Franzel, S.L. (1989). Guided search: an alternative to the feature integration
model for visual search. J Exp Psychol Hum Percept Perform 15: 419–33.
Wolfson, S.S. and Landy, M.S. (1998). Examining edge—and region-based texture analysis mechanisms.
Vision Res 38: 439–46.
Wurtz, R.H. and Albano, J.E. (1980). Visual-motor function of the primate superior colliculus.
AnnuRevNeurosci 3: 189–226.
Zhaoping, L. (2005). Border ownership from intracortical interactions in visual area v2. Neuron 47: 143–53.
Zhou, H., Friedman, H.S., and von der, H.R. (2000). Coding of border ownership in monkey visual cortex.
J Neurosci 20: 6594–611.
Zipser, K., Lamme, V.A., and Schiller, P.H. (1996). Contextual modulation in primary visual cortex.
J Neurosci 16: 7376–89.
Chapter 17
Introduction
Perception of depth order in a natural visual scene, with multiple overlapping surfaces, is a highly
non-trivial task for our visual system. To interpret the visual input—in fact a 2D image containing
a collection of borders between abutting image regions—the visual system must determine how
the borders are being created: which of two overlapping surfaces is closer (‘figure’) and which
continues behind (‘ground’). This so-called ‘figure-ground’ determination involves integration
of contextual visual signals. In this chapter, we review the neural mechanisms of figure-ground
organization.
BOWN is being computed in a context-sensitive manner. The image in Figure 17.1J is being
perceived as a green disk on top of an orange rectangle, meaning that the part of the border within
the black circle is owned by the left side, the green disk. When the image is modified such as in
Figure 17.1K, it is perceived as an orange object on top of the large green rectangle and the same
part of the border within the circle is the edge of the orange object. The reversal of BOWN also
happens in Figure 17.1L and 1M even though the local properties within the circle are exactly
the same. This clearly indicates that BOWN cannot be determined by the local properties alone.
Figure-ground
interpretation
Owner side
(b) (d)
(c)
∗*
∗∗
**
(h) (i)
∗ ∗
(j) (k)
(l) (m)
Fig. 17.1 Continued.
Neural Mechanisms of Figure-ground Organization 345
(Craft et al. 2007; Sugihara et al. 2011; Zhang and von der Heydt 2010; Zhou et al. 2000). In
macaques, the horizontal connections extend in the range of 2~4 mm in V2 (Amir, Harel, and
Malach 1993; Levitt, Kiper, and Movshon 1994), (note, one degree corresponds to 4~6 mm in
macaques; see, for example, Polimeni, Balasubramanian, and Schwartz 2006). Reaching distal
parts in cortical space using horizontal connections would require polysynaptic connections at
the cost of an increased processing period. Furthermore, the unmyelinated axons of these hori-
zontal connections have low conduction velocities (0.3 m/s; Girard, Hupe, and Bullier 2001).
Based on this analysis, as well as on the fact that the latencies in response were relatively invariant
under different figure sizes, Zhou et al. (2000) suggested that the global interactions in the BOWN
computation are achieved by feedforward-feedback loops. Such loops are physiologically realistic
because it has been shown that the feedforward-feedback connections involving myelinated axons
with conduction velocity of about 3.5 m/s (Girard et al. 2001), being ten times faster than the hori-
zontal connections. In addition, if the signals are conducted ‘vertically’ between layers, the size of
the figural surfaces would have less influence on the conduction distances. They proposed that the
collective BOWN signals activate a ‘grouping cell’ at a higher processing level, and that the group-
ing cell’s output is fed back to the BOWN-sensitive neurons (Figure 17.2C; Craft et al. 2007).
Fig. 17.1 The concept of border-ownership (BOWN) and its properties. (a) When an image on the left
is presented, it is interpreted as an orange rectangle on top of a green surface (right). (b) A symbol
of BOWN signal used in this chapter. The straight line is aligned to the boundary and the side arrow
indicates the owner side. (c) At each location of boundaries, there are two possible ownerships that
compete. (d) After establishing the interpretation of an image, one of the competing sides becomes
the owner: inside of the rectangle in this example. (e) There could be multiple surfaces overlapping.
BOWN has to be determined for individual boundary sections between different pair of surfaces.
Here, the orange oval owns the boundary with the brown square (asterisk), but the boundary
between the orange oval and the blue square is owned by the blue square (double asterisks).
(f) In some cases BOWN cannot be determined such as in this example. There are no cues to favour
one of the two owner sides of the middle boundary. (g) BOWN can be reversed along a single
boundary section. The vertical boundary is perceived to be owned by the orange rectangle near the
bottom but owned by the green surface near the top. (h) The convexity preference of BOWN. The
white regions are associated with more convex shapes than the black regions and hence subjects
often report the white regions on top of the black background. (https://dl.dropboxusercontent.
com/u/47189685/Convexity%20Context%20Stimuli.zip). (i) The convexity is not a deterministic
factor. On the left, the central disk may be perceived as on top of the oval but on the right, with the
consistent texture with the background, the enclosed area is perceived as a hole with a part of the
background seen through it. (j) and (k) In (j), the ownership of the boundary between the orange disk
and the green rectangle belongs to the left while, in (k), it belongs to the right. The local properties
around the boundary are exactly the same in the two images (compare the local properties within the
black circles). Only the rest of the image, the global configuration is different. (l) and (m). The owner
side is reversed without changing the local properties within the black circles.
A 1 2 3 4 5 6
(a)
(b)
(c)
10°
20
(spikes/sec)
Response
10
0
a b a b a b a b a b a b
B V1 (n=7) C
Grouping cell
V4 (n=17)
Preferred
Non-preferred
Fig. 17.2 Continued.
Neural Mechanisms of Figure-ground Organization 347
While the competition for the BOWN pair concerns an assignment of local depth order there is also
competition between global interpretations. A stimulus such as shown in Figure 17.3C—the famous
face-vase illusion by Rubin (1921)—evokes two competing perceptual interpretations (two faces vs one
vase). When the two faces are perceived as ‘figures’, the vase is perceived as part of ‘background’. When
perception switches, this relationship is reversed. Hence, this is a bistable figure-ground stimulus. The
perceptual switch corresponds to the reversal of the ownership of the borders. In Figure 17.3D, the
BOWN signal associated with the face side, B1, indicates that the face is closer to the viewer and the
competing BOWN signal, B2, indicates that the vase is closer. The associated depth map for each of the
interpretations specifies either the face or the vase as figural surface, while the locally assigned BOWN
signals coherently indicate the owner side (Figure 17.3E and 17.F). Bistable figure-ground perception
is a key phenomenon to investigate how global aspects of figure-ground organization and local com-
petitive BOWN computations are being integrated. Moreover, it reveals the temporal dynamics of the
underlying mechanisms (see ‘Computation of bistable figure-ground perception’).
Fig. 17.2 BOWN-sensitive neurons in macaque visual cortex. (a) The images shown here are presented so
that the boundary between the surfaces matches the orientation and the position of the classic receptive
fields (black oval) of the recorded neuron. Perceived owner side is reversed between the six figures on
the top (a1~6) and the ones on the bottom (b1~6). In the columns 1, 2, 5, and 6, the figures on the top
row create BOWN on the left side, while the bottom row create on the right side. In the columns in 3 and
4, BOWN is on the right on the top and on the left on the bottom. As shown in c, the neural responses
reflect the reversal of the ownership showing, in this example, the preference to the right side. (b) The
time course of the neural response to BOWN. The BOWN-sensitive component (the difference between
the responses to the preferred and non-preferred owner side) emerges quickly after the stimulus onset.
(c) Because of the short onset latency of BOWN-sensitive component and the minimum dependency to
size, Craft et al. (2007) hypothesized that BOWN is computed by feedback connections. A ‘grouping cell’
at a higher level collects the BOWN signals through the feedforward connections and quickly distribute
the signal to the congruent BOWN signals through the feedback connections.
(a) Reproduced from Hong Zhou, Howard S. Friedman, and Rüdiger von der Heydt, Coding of Border Ownership
in Monkey Visual Cortex, The Journal of Neuroscience, 20(17), pp. 6594–6611 Copyright © 2000, The Society
for Neuroscience. (b) Reproduced from Hong Zhou, Howard S. Friedman, and Rüdiger von der Heydt, Coding of
Border Ownership in Monkey Visual Cortex, The Journal of Neuroscience, 20(17), pp. 6594–6611 Copyright ©
2000, The Society for Neuroscience. (c) Data from Edward Craft, Hartmut Schütze, Ernst Niebur, and Rüdiger von
der Heydt, A Neural Model of Figure–Ground Organization, Journal of Neurophysiology, 97(6), pp. 4310–4326
DOI: 10.1152/jn.00203.2007, 2007.
348 Kogo and van Ee
(a) (b)
(c) (d)
B2
B1
(e) (f)
Depth
Depth
Y
Y X
X
Fig. 17.3 (a) BOWN-sensitive neurons may be distributed to cover the whole visual field (grey square)
and, at each location (e.g. black dot), there is a bank of neurons assigned for different orientations and
for opposite ownership sides. (b) At the end of the computation, one of the competing signals may
become more dominant than the other. (c–f) When a ‘face or vase’ image (c) is presented, bistable
figure-ground perception is created. The perceptual switch of figure-ground corresponds to the coherent
reversal of BOWN at each location. For example, at the boundary on the ‘nose’ (d), the ownerships are
constantly reversing (B1 and B2) corresponding to the perception of ‘face’ (e) or ‘vase’ (f).
2002; see also Self and Roelfsema, this volume). In this model multiple layers were hierarchically
organized through feedforward and feedback connections, and increasing receptive field size with
higher levels of processing accounted for the filling in of segmented areas.
Qiu, Sugihara, and von der Heydt (2007) demonstrated the effect of attention on BOWN-
sensitive activity and they argued that grouping cells (integrating the BOWN signals)
Neural Mechanisms of Figure-ground Organization 349
constitute an efficient platform to implement selective attention (Craft et al. 2007; Mihalas
et al. 2011). FMRI results by Fang, Boyaci, and Kersten (2009) demonstrated that area V2 in
humans is sensitive to BOWN and that this BOWN sensitivity can be modified by attention. A
recent study by Poort et al. (2012) reported that a characteristic late component in the neural
responses—reflecting the perception of figure-ground—can also be modified by attention.
Neural correlates of figure-ground organization have also been investigated using other experi-
mental paradigms. Appelbaum et al. (2006, 2008) exposed observers to a homogeneous texture in
which figure and background differed only in their flicker frequencies. Using steady-state EEG in
combination with fMRI, they reported that the ‘frequency tagged’ signals from the figure resided
in the lateral cortex, while the ones for the background resided in the dorsal cortex. Likova and
Tyler (2008), using a different random-dot refresh rate for figure and background, reported that
fMRI signals in V1 and V2 were associated with a suppression of the background. They suggested
that the suppression reflected feedback from higher processing levels.
Using MEG, Parkkonen et al. (2008) investigated neural activity corresponding to a perceptual
switch during bistable figure-ground perception. They used a modified face or vase image in which
noise was superimposed. The noise was updated with distinct frequency tags for the face region and
the vase region. They reported activity modulations in the early visual cortex including primary
visual cortex corresponding to the perceptual switches. Because the perceptual switches are linked
to the way the image is interpreted at a higher level (by coherently integrating the lower-level sig-
nals), they suggested that top-down influences modify low-level neural activity. Other studies using
face or vase images also reported the involvement of top-down feedback in perceptual switching:
patients with lesions in the prefrontal cortex were less able to exert voluntary control over percep-
tual switching than normal subjects (Windmann et al. 2006), suggesting that the prefrontal cortex is
capable of controlling perceptual switching by sending feedback signals to the lower level. In addi-
tion, variation of the fMRI activity in the fusiform face area correlates to the subsequent perception
of a face, indicating that the ongoing level of face-sensitive neural activity influences the lower-level
activity involved in the switching (Hesselmann and Malach 2011). Pitts et al. (2007; 2011) reported
that the P1 and N1 components in EEG signals correlated to a perceptual face-vase switch and
they suggested that the perceptual switch was modulated by attention. These empirical data suggest
dynamic interactions between lower-level processing and higher-level processing.
∗ Figure units
Boundary units
(c) E Object
representations
∗
Figure-ground/depth segregation
Binocular cues
Configuration
Monocular cues
ss
cues
ce
ac
ed
∗
as
B C
-b
ge
A Image
Ed
Fig. 17.4 (a) The familiarity of shape influences figure-ground perception. When an image with
the silhouette of a girl on both sides is presented, subjects tend to choose the ‘girl’ areas as
figures. When the same image is presented upside down (right), this bias disappears. Note that the
geometrical properties of the boundaries are the same in both images. Only on the left, the familiar
shape is recognized. (Reproduced from Mary A. Peterson, Erin M. Harvey, and Hollis J. Weidenacher,
Shape recognition contributions to figure-ground reversal: Which route counts? Journal of
Experimental Psychology: Human Perception and Performance, 17(4), pp. 1075–1089. http://dx.doi.
org/10.1037/0096-1523.17.4.1075, Copyright © 1991, American Psychological Association) (b)
A model proposed by Vecera and O’Reilly. The ‘boundary’ unit (corresponding to BOWN signals),
‘figure’ unit (for figure-ground organization, red asterisk), and ‘object’ unit (shape/object detection)
are hierarchically organized with mutual connections between layers. (Reproduced from Shaun
P. Vecera and Randall C. O'Reilly, Figure-ground organization and object recognition processes:
Neural Mechanisms of Figure-ground Organization 351
In behavioural studies, Peterson (Peterson, Harvey, and Weidenbacher 1991) reported that
when an image is segmented into several competing shapes, the one that has a familiar shape
tends to be chosen as a figure. In Figure 17.4A left, the two black areas are perceived as a silhouette
of a woman. Subjects selected these black areas as a figure more often than the white area. This
is not due to the local properties, such as the curvature of the borders because when the image is
shown upside down (Figure 17.4A right), the subjects’ preference for choosing the black areas as
figure was significantly reduced. This result suggests that information of competing areas is ana-
lysed at a higher level, and that the familiarity of the shapes can influence which area becomes the
figure through feedback projections (see also Peterson, this volume).
Using hierarchical layers that are interconnected by feedforward-feedback connections,
Kienker et al. (1986) incorporated the effect of attention on figure-ground organization. Vecera
and O’Reilly (1998) further elaborated on this work (Figure 17.4B). This model, with hierar-
chical layers that are mutually connected, includes a figure-ground layer (‘figure unit’) and an
object-detection layer (‘object unit’). The figure-ground layer is situated before the object-detection
process but they interact with one another through mutual connections. Vecera and O’Reilly
noted that the results by Peterson et al., on the influence of familiarity on figure-ground organi-
zation, could be explained this way, but Peterson pointed out that the model can reproduce the
effect of familiarity only when the low-level figure-ground cues are ambiguous (Peterson 1999,
but see the counter-argument by Vecera and O’Reilly 2000). Using examples in which the unam-
biguous low-level cues can be superseded by the familiarity cues (Peterson et al. 1991; Peterson
and Gibson 1993), Peterson argued that the figure-ground-first approach is limited and offered
a different model (Figure 17.4C). Note that, in Vecera’s model, a layer is connected only to the
immediately higher and the immediately lower layer: the connections do not go beyond them to
connect to the two (or more) layers forward or backward directly (Figure 17.4D left). On the con-
trary, Peterson’s model has a bypass that connects the sensory signals (low-level properties before
figure-ground) directly to the object-detection layer (Figure 17.4C). In other words, the key ele-
ment in Peterson’s model involves mutual connections between multiple layers (Figure 17.4D,
right, see Felleman and Van Essen 1991 for multi-level mutual connections).
Some neurophysiological studies investigated the relationship between depth order percep-
tion and neural activity in the lateral occipital complex (LOC) in humans, and inferior-temporal
region (IT) in monkeys. When a surface is presented repeatedly, the brain areas that are activated
in response to the shape of the surface adapt, and neural activity declines. Using fMRI, Kourtzi
and Kanwisher (2001) found the same amount of adaptation in area LOC both when a surface is
presented behind bars and in front of bars (Figure 17.5A). Note that when the surface is behind
the bars, the surface is segmented into several subregions. If depth order had not been computed,
these subregions would not have been recognized as parts of a single surface. This result suggests
that the shape of the object is established after the depth computation, causing adaptation in object
area LOC. Furthermore, they showed that when an image is divided into two areas, and stereo
Contrast a c
reversal
e g
Figure-ground reversal
b d
Completely different Same depth
(c)
Shape #2~4
0.25
% signal change from fixation baseline
f h
0.20 Same shape
0.10
Identical
0.05 (b) 40
a c
Spikes/s
20
0 Contrast
reversal 0
0 1 2 3 4 5 6 7 8 9 10
–0.05 Shape # 1 2 3 4
Time (sec)
e g
Spikes/s
20
Figure-ground reversal
(d)
0
Shape # 1 2 3 4
0.25 Same shape
% signal change from fixation baseline
f h
0.20
0.15
Identical
0.05
0 1 2 3 4 5 6 7 8 9 10
–0.05
Time (sec)
Fig. 17.5 Neurophysiological studies showing the relationship between the depth order of surfaces
and the neural activity reflecting their shapes. A. From Kourtzi and Kanwisher (2001). a. The ‘same
shape’ condition with reversed depth order. An object is perceived to be behind the bars (left) or
in front of the bars (right). b. The ‘same contour’ condition with reversed depth order. Using a
stereoscope, the depth order of the two halves in the image can be reversed, the figure (F) could
be the left half (left) or the right half (right). c. FMRI recording from LOC (lateral occipital complex
in human) showing the equivalent amount of adaptation when the same shapes are presented
in sequence, irrespective of the reversal of the depth order (orange: same shape with reversed
depth order, red: same shape without the reversal). (Reprinted with permission from Kourtzi and
Kanwisher, 2001) B. From Baylis and Driver (2001). a. Stimuli used. Note that in the contrast reversal
and the mirror reversal, the shape of the surface that is perceived to be a figure is the same. Only
in the figure-ground reversal, the other side of the central boundary becomes the figure (hence the
shape of the perceived figure changes). b. A representative pattern of responses from a single cell in
IT (inferior temporal cortex in macaque). The numbers 1~4 correspond to the different shapes and
the letters a~h correspond to the figural surfaces indicated inside the figure in a. The overall pattern
of the plot does not change significantly by the contrast reversal or the mirror reversal, but it does by
the figure-ground reversal.
Reprinted by permission from Macmillan Publishers Ltd: Nature Neuroscience, 4(9), Gordon C. Baylis and Jon
Driver, Shape-coding in IT cells generalizes over contrast and mirror reversal, but not figure-ground reversal,
pp. 937–942, doi:10.1038/nn0901-937, Copyright © 2001, Nature Publishing Group.
Neural Mechanisms of Figure-ground Organization 353
disparity specifies that one of the two regions is figure (Figure 17.5Ab), adaptation is observed only
when the same region is presented as a figure in the second presentation (Figure 17.5Ad). Based
on these results, Kourtzi and Kanwisher suggested that figure-ground processing occurs prior to
shape perception. Baylis and Driver (2001) used elaborated images (Figure 17.5Ba) in combina-
tion with single-unit recordings from monkeys. In these images, the central border was either kept
constant or mirror-reversed and contrast polarity was reversed. In addition, by creating borders
to enclose one of the two divided regions, they created eight different images. In these images,
the ‘mirror-reversal’ condition and the ‘contrast-reversal’ condition create the perception that the
figures have the same shape. In the ‘figure reversal’ condition (the opposite side is enclosed and
perceived as the figure); on the other hand, the shape of the figure is changed. The neural responses
in IT neurons showed clear correlation in the mirror-reversal and the contrast-reversal conditions
but not in the figure-reversal condition. Because the shape of the figure was kept constant in the
former two conditions while in the latter condition it changed, Baylis and Driver suggested that the
figure-ground organization influences the shape detection process in IT.
Although these neurophysiological data suggest an apparent sequence of the signal processing
with the figure-ground analysis first and the shape analysis later, they do not exclude the possibility
that the information of the two areas competing for depth order are both analysed at the higher
level. It is possible that the two competing BOWN signals for opposite owner sides are sent to the
higher level to analyse the shape information at both sides that then, in turn, influence the BOWN
computation. It is also possible that the borders between the competing areas are ‘parsed’ and being
sent to the higher level via a bypassed route as suggested by Peterson (1999, Figure 17.4C). This
transient phase of signal processing may not be reflected in the long time-scale fMRI recordings of
Kourtzi and Kanwisher, and it may not be detected in the correlation analysis of Baylis and Driver.
However, it should be noted that, so far, there is no evidence for the influence of the neural activ-
ity in IT (or LOC) on the lower-level BOWN signals. Moreover, even if this feedback occurs, the
shape-detection mechanism has to overcome the longer latency of the computation: the latency
of IT responses is much longer than the BOWN-sensitive responses and an additional conduc-
tion time is required for the feedback (see Brincat and Connor 2006; Bullier 2001). Therefore, two
possibilities still remain: either the dynamic mutual interaction between the BOWN-sensitive
area and the shape-sensitive area indeed occurs, or there is a dissociation between low-level
‘BOWN-sensitive’ neural activity and the cognitive level of figure-ground organization. In a
dynamically organized visual system with a multi-level mutual connection (Figure 17.4D right),
the apparent sequence of the signal processing may depend on the context of each given image as
well as the state of the brain. Future research is needed to provide clearer descriptions of mecha-
nisms underlying such a dynamic system.
Computational models
The early figure-ground computational modelling work of Kienker et al. (1986) implemented
an ‘edge unit’ that was excited when a surface was present at its preferred side, and inhib-
ited when it was not. Such edge-assignment computation is in fact equivalent to BOWN
computation. Ever since this pioneering work, several computational models have been
developed for figure-ground organization (Domijan and Setic 2008; Finkel and Sajda 1992;
Grossberg 1993; Kelly and Grossberg 2000; Kumaran, Geiger, and Gurvits 1996; Peterhans and
Heitger 2001; Roelfsema et al. 2002; Sajda and Finkel 1995; Thielscher and Neumann 2008;
Vecera and O’Reilly 1998; Williams and Hanson 1996). More relevant, after the discovery of
BOWN-sensitive neurons (Zhou et al. 2000, see ‘Discovery of border-ownership-sensitive neu-
rons’), recent computational models particularly focus on modelling the responses of these
BOWN-sensitive neurons (Baek and Sajda 2005; Craft et al. 2007; Froyen, Feldman, and Singh
(a) (b)
(c) (d)
B22
B00
BB1
Fig. 17.6 (a) and (b) To reproduce the opposite perceived depth order of images in Figure 17.1J
and K, the global relationships between the BOWN signals need to be reflected. The computational
models have to implement an algorithm for the global interaction so that the ownership at the
location indicated by the black dot, for example, is on the left in (a) and on the right in (b). Note that
the dashed lines here indicate the interactions and do not indicate direct axonal connections. (c) To
create the convexity preference, an algorithm must enhance the BOWN signals that are ‘facing’ each
other as shown left. In this way, BOWN signals with inward preference would be the winner, making
the interior of the enclosed boundary as the figure (right). (d) If the algorithm works in favour of the
BOWN pairs directing outward, the outside of the boundary would be the figure (foreground), and
the interior would become a hole (concavity preference). (e)~(g). BOWN computation and complex
shapes. (e) When a surface with a complex shape is presented, a rule of ‘consistency’ in BOWN
signals by detecting the convexity relationship maybe violated. In the algorithm, the pair of BOWN
Neural Mechanisms of Figure-ground Organization 355
2010; Jehee, Lamme, and Roelfsema 2007; Kikuchi and Akashi 2001; Kikuchi and Fukushima
2003; Kogo et al. 2010; Layton, Mingoll, and Yazdanbakhsh 2012; Mihalas et al. 2011; Sakai
and Nishimura 2006; Sakai et al. 2012; Zhaoping 2005). As described above, one of the promi-
nent properties of figure-ground perception is its context sensitivity. While BOWN signals are
assigned locally, their activity reflects the global configuration. How does the brain process such
global information?
signals, B0 and B1, are considered to be ‘in agreement’ while B0 and B2 are not. (f) The grouping cells
group coherent BOWN signals within the relatively compact parts of the complex shape but may not
group distal but consistent pairs (e.g. B0 and B2) in a complex shape. (g) The model that implemented
the dynamic interaction between the skeleton signals and BOWN signals detects the ‘consistency’ of
BOWN signals such as B0, B1, and B2, based on their association to the same skeleton.
356 Kogo and van Ee
the BOWN signals of a convex region (inside being the figure). The pair in Figure 17.6D, on the
other hand, indicate the relationship of BOWN signals for a concave surface (outside being the
figure, inside being a hole). To reproduce the convexity preference, the BOWN pairs for convexity
have to gain stronger mutual excitation than the BOWN pairs for concavity. The mutual excita-
tion and inhibition rules in Zhaoping’s model, the inner side preference in Jehee’s model, as well
as the geometric definition of agreement in Kogo’s model, all work in favour of the BOWN pairs
in the convex configurations. In Craft’s model, the BOWN signals’ vector components matching
the inward direction of the annulus enable grouping of BOWN signals that point to one another.
Hence, it also favours convex configurations. Convexity preference of the visual system and its
possible origin in the ecological factors embodies Gestalt psychology (Kanizsa and Gerbino 1976;
Koffka 1935; Rubin 1958). It is possible that the enclosure of the contours of individual objects
and the general tendency of finding convex shapes in the environment may have caused the visual
system to develop such biased processing.
BOWN is not just about the computation of figure-ground organization with only one figural
surface present in the image. The model should be able to assign depth order for multiple surfaces
(Figure 17.1E). For this, the local configuration of a T-junction plays a key role. A T-junction
is created when three surfaces with different surface properties overlap. The existence of a
T-junction strongly suggests that the surface above the top of the T is the occluder and the stem of
the T belongs to one of the surfaces that are occluded. Depth order can be modelled by process-
ing the consistency of the occluder side according to this rule (Thielscher and Neumann 2008).
Zhaoping, Craft, Kogo, and Froyen’s models, mentioned above, implemented an algorithm to
reflect the configuration of T-junctions and are capable of computing depth order for overlapping
surfaces. A different model developed by Roelfsema et al. (2002) computes filling in of textured
surfaces by reflecting the increasing size of receptive fields in the hierarchy of the visual cortex,
but it is unknown how this model incorporates depth order implied by T-junctions (note that the
configuration of T-junctions is independent of surface size).
One of the challenges of the current theories of BOWN computation is how to create BOWN
signals properly in complex shapes. This demands further elaboration of current computa-
tional models. When an object such as shown in Figure 17.6E is presented, the figure-ground
organization is immediately clear. However, the consistency-detection algorithm implemented
in, for example, Kogo’s DISC model, is not coherent with BOWN along the border of complex
shapes. The BOWN signal at the black dot (B0) is in agreement with the one that points to it,
e.g. B1. On the other hand, the BOWN signals far from it, e.g. B2, violate the ‘consistency’ rule,
while it is perceptually evident that they are in agreement. In Craft’s model, the grouping cells
with the annulus-shaped receptive field may detect the consistency of BOWN signals at close
distances within a complex shape (e.g. B0 and B1); nevertheless, the BOWN signals far apart
such as B0 and B2 would not be grouped by the grouping cells (Figure 17.6F). To detect con-
sistency of BOWN signals it may be necessary to group the grouping cells along the surface.
Although iterative computation of current models exhibits robustness to a certain extent, it
is unknown if their responses fully match human perception. The approach taken by Froyen
using the dynamic interactions of the BOWN signals and the skeleton signals may give a hint
as to how to solve this problem. As shown in Figure 17.6G, if BOWN signals belong to the
same skeleton, they are considered to be consistent (B0, B1, and B2 are all in agreement with the
skeleton of the surface).
The analysis of the onset latencies of BOWN-sensitive neural activity led von der Heydt’s
group to conclude that the BOWN signals are being grouped at a higher level with ‘grouping
cells’. Coincidentally, the research on shape recognition led to the development of the concept of
Neural Mechanisms of Figure-ground Organization 357
skeleton. Note that grouping cells are activated along the medial axis of the surface. This means
that the requirement of the BOWN signal grouping and the requirement of the shape representa-
tion have in fact merged into identical concepts. It is interesting to investigate whether the neural
activity that corresponds to the grouping and medial axis signals actually exists in the visual neu-
ral system. Lee et al. (1998) reported that the late modulation of neural activity in V1 (see ‘Brain
activity correlated to figure-ground organization and involvement of feedback’) shows a peak,
possibly reflecting the increased neural activity at the higher level associated with the centre of
the surface. They suggested that this corresponds to the medial axis computation. In more recent
work, Hung, Carlson, and Connor (2012) reported that neurons in macaque inferior temporal
cortex (IT) are tuned to the medial axis of a given object and Lescroart and Biederman (2013)
reported that fMRI signals become more and more tuned to the medial axis starting from V3
to higher processing levels in the visual cortex. The current insights concerning neural mecha-
nisms may suggest that we are now approaching an increasingly integrated view of the underlying
mechanisms.
Discussion
This chapter commenced by describing the importance of assigning depth order at borders to
establish figure-ground organization. We then described that neurons in visual cortex show
responses corresponding to the perceived depth order at borders. Thus, the concept of edge
assignment, developed by behavioural studies, has a neural counterpart: the BOWN-sensitive
neurons. Insight on the underpinning neural activity and how this activity leads to figure-ground
perception is still developing.
BOWN signals may be considered to be binary signals in the sense that occlusion cues only
indicate depth order but not quantitative depth (unlike stereo disparity). Nevertheless, consider
configurations such as in Figure 17.1E and 17.1G. In Figure 17.1E, multiple surfaces overlap. The
perceived depth between the blue rectangle and the orange oval is smaller than the perceived
depth between the blue rectangle and the green rectangle. Furthermore, Figure 17.1G indicates
358 Kogo and van Ee
Depth computation
(a) (b)
Figure
figure
FF Ground
FB
+
–
Y X
(face area–vase area)
Difference of depth
200
–200
0 200 400 600 800 1000
Iteration
Y X
Fig. 17.7 A computational model of bistable figure-ground perception. (a) It is assumed that BOWN
signal at each location is computed by the global interaction. (b) The BOWN signals are sent, through
the feedforward connections (FF), to the higher level, and are integrated to create the depth map. The
result is then sent back, through the feedback connections (FB), to the BOWN computation layer. (c)
The response of the model plotted as the depth difference between the face area and the vase area.
The positive values indicate that the face perception is dominant and the negative values indicate the
vase perception. In the model, the noise is given to BOWN signals and hence the depth values fluctuate.
Furthermore, the adaptation process and its recovery are implemented in the feedback signals. The
iteration of the feedback system creates the strong ‘face’ response at first in this example. Due to the
adaptation, the response gradually weakens and, the fluctuated response eventually reverses to the
‘vase’ response. Adaptation of the vase response causes the decrease of the response while adaptation
of the face signals is being recovered. This causes the perceptual switch again. In the long time course,
the model shows the stochastic perceptual switch between the face and the vase responses.
Reprinted from Vision Research, 51(18), Naoki Kogo, Alessandra Galli, and Johan Wagemans, Switching dynamics
of border ownership: A stochastic model for bi-stable perception, pp. 2085–98, Copyright (2011), with permission
from Elsevier.
that, when there are inconsistent occlusion cues along a border, the depth difference along the
border gradually changes. Whether the BOWN-sensitive signals in visual cortex reflect these
quantitative differences or whether these differences emerge after the BOWN signals have been
integrated for the depth map need to be answered by future research.
As described above, current computational models reflect the convexity bias that is also present
in perception. However, as shown in Figure 17.1I, this convexity preference can be overcome
Neural Mechanisms of Figure-ground Organization 359
by the consistency of the surface properties such as textures. Does the BOWN-sensitive neural
activity reflect this reversal of ownership to create the perception of holes? In more general terms,
the fact that some BOWN-sensitive neurons are also sensitive to luminance contrast (Zhou et al.
2000) suggests that they are capable of reflecting surface properties. For future research, it would
be important to study the role of the surface properties in the BOWN computation.
Neurons tuned as T-junction detectors have not been found in the visual cortex. It has been
suggested that end-stopped cells play a key role (Craft et al. 2007). Yazdanbakhsh and Livingstone
(2006) reported that end-stopped cells in V1 (macaque) are sensitive to the contrast of abut-
ting surfaces that create junctions. Whether these contrast-sensitive end-stopped cells act as
T-junction detectors that are connected to the depth-order computation process should be
answered by future research.
Although electro-physiological studies have shown that the lower level visual cortex is involved
in face-vase perceptual bistability, no direct recordings of neural activities have been reported
that can be correlated to the perceptual switch. While the input signals are kept constant for the
face-vase stimulus, the ownership keeps changing. It is known that higher-level functions, such as
attention and familiarity of shape, can influence the switch. Examining the role of feedback modi-
fication of BOWN signals in perceptual bistability would give important insight into mechanistic
organization (see also Alais and Blake, this volume, for more discussion on bistable perception).
To explain the short latency of the BOWN-sensitive components in neural responses, it has
been argued that BOWN signals must be grouped at a higher level. This opens up a new possibil-
ity in which the higher-level functions dynamically influence the BOWN signals. Whether such
grouping can be found, and where grouping is accomplished, remains to be answered. It is crucial
now, more than ever, to investigate how border detection, BOWN, depth order, shape detection,
and other functions at the higher level, are organized through dynamic feedback system.
The context sensitivity of figure-ground organization is the hallmark of Gestalt psychology.
We discussed how figure-ground perception emerges from the global configuration of the image.
This possibility invites future investigation of the neural mechanisms underlying the BOWN
computations.
References
Amir, Y., M. Harel, and R. Malach (1993). ‘Cortical Hierarchy Reflected in the Organization of Intrinsic
Connections in Macaque Monkey Visual Cortex’. Journal of Comparative Neurology 334(1): 19–46.
Appelbaum, L. G., A. Wade, V. Vildavski, M. Pettet, and A. Norcia (2006). ‘Cue-Invariant Networks
for Figure and Background Processing in Human Visual Cortex’. Journal of Neuroscience
26(45): 11695–11708.
Appelbaum, L. G., A. Wade, V. Vildavski, M. Pettet, and A. Norcia (2008). ‘Figure-Ground Interaction in
the Human Visual Cortex’. Journal of Vision 8(9).
Baek, K. and P. Sajda (2005). ‘Inferring Figure-Ground Using a Recurrent Integrate-and-Fire Neural
Circuit’. IEEE Transactions on Neural Systems and Rehabilitation Engineering 13(2): 125–130.
Baylis, G. C. and J. Driver (2001). ‘Shape-Coding in IT Cells Generalizes over Contrast and Mirror
Reversal, but not Figure-Ground Reversal’. Nature Neuroscience 4(9): 937–942.
Blum, H. (1973). ‘Biological Shape and Visual Science. I’. Journal of Theoretical Biology 38(2): 205–287.
Brincat, S. L. and C. E. Connor (2006). ‘Dynamic Shape Synthesis in Posterior Inferotemporal Cortex’.
Neuron 49(1): 17–24.
Bullier, J. (2001). ‘Integrated Model of Visual Processing’. Brain Research Reviews 36(2–3): 96–107.
Craft, E., H. Schutze, E. Niebur, and R. von der Heydt (2007). ‘A Neural Model of Figure-Ground
Organization’. Journal of Neurophysiology 97(6): 4310–4326.
360 Kogo and van Ee
Domijan, D. and M. Setic (2008). ‘A Feedback Model of Figure-Ground Assignment’. Journal of Vision
8(7): 1–27.
Fang, F., H. Boyaci, and D. Kersten (2009). ‘Border Ownership Selectivity in Human Early Visual Cortex
and its Modulation by Attention’. Journal of Neuroscience 29(2): 460–465.
Feldman, J. and M. Singh (2006). ‘Bayesian Estimation of the Shape Skeleton’. Proceedings of the National
Academy of Sciences 103(47): 18014–18019.
Felleman, D. J. and D. C. Van Essen (1991). ‘Distributed Hierarchical Processing in the Primate Cerebral
Cortex’. Cerebral Cortex 1(1): 1–47.
Finkel, L. H. and P. Sajda (1992). ‘Object Discrimination Based on Depth-from-Occlusion’. Neural
Computation 4(6): 901–921.
Froyen, V., J. Feldman, and M. Singh (2010). ‘A Bayesian Framework for Figure-Ground Interpretation’.
Advances in Neural Information Processing Systems 23: 631–639.
Girard, P., J. M. Hupé, and J. Bullier (2001). ‘Feedforward and Feedback Connections between Areas
V1 and V2 of the Monkey Have Similar Rapid Conduction Velocities’. Journal of Neurophysiology
85(3): 1328–1331.
Grossberg, S. (1993). ‘A Solution of the Figure-Ground Problem for Biological Vision’. Neural Networks
6(4): 463–483.
Hesselmann, G. and R. Malach (2011). ‘The Link between fMRI-BOLD Activation and Perceptual
Awareness is “Stream-Invariant” in the Human Visual System’. Cerebral Cortex 21(12): 2829–2837.
Hung, C.-C., E. T. Carlson, and C. E. Connor (2012). ‘Medial Axis Shape Coding in Macaque
Inferotemporal Cortex’. Neuron 74(6): 1099–1113.
Jehee, J. F., V. A. Lamme, and P. R. Roelfsema (2007). ‘Boundary Assignment in a Recurrent Network
Architecture’. Vision Research 47(9): 1153–1165.
Kanizsa, G. and W. Gerbino (1976). ‘Convexity and Symmetry in Figure-Ground Organization’. In Vision
and Artifact, edited by M. Henle, pp. 25–32. New York: Springer.
Kelly, F. and S. Grossberg (2000). ‘Neural Dynamics of 3-D Surface Perception: Figure-Ground Separation
and Lightness Perception’. Perception & Psychophysics 62(8): 1596–1618.
Kienker, P. K., T. J. Sejnowski, G. E. Hinton, and L. E. Schumacher (1986). ‘Separating Figure from
Ground with a Parallel Network’. Perception 15(2): 197–216.
Kikuchi, M. and Y. Akashi (2001). ‘A Model of Border-Ownership Coding in Early Vision’. In Artificial
Neural Networks—ICANN 2001, 2130, edited by G. Dorffner, H. Bischof, and K. Hornik, pp. 1069–1074.
Berlin, Heidelberg: Springer.
Kikuchi, M. and K. Fukushima (2003). ‘Assignment of Figural Side to Contours Based on Symmetry,
Parallelism, and Convexity’. In Knowledge-Based Intelligent Information and Engineering Systems, 2774,
edited by V. Palade, R. J. Howlett, and L. Jain, pp. 123–130. Berlin, Heidelberg: Springer.
Koffka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt Brace & World.
Kogo, N., C. Strecha, L. van Gool, and J. Wagemans (2010). ‘Surface Construction by a 2-D
Differentiation-Integration Process: A Neurocomputational Model for Perceived Border Ownership,
Depth, and Lightness in Kanizsa Figures’. Psychological Review 117(2): 406–439.
Kogo, N., A. Galli, and J. Wagemans (2011). ‘Switching Dynamics of Border Ownership: A Stochastic
Model for Bi-Stable Perception’. Vision Research 51(18): 2085–2098.
Kourtzi, Z. and N. Kanwisher (2001). ‘Representation of Perceived Object Shape by the Human Lateral
Occipital Complex’. Science 293(5534): 1506–1509.
Kumaran, K., D. Geiger, and L. Gurvits (1996). ‘Illusory Surface Perception and Visual Organization’.
Network-Computation in Neural Systems 7(1): 33–60.
Lamme, V. A. (1995). ‘The Neurophysiology of Figure-Ground Segregation In Primary Visual Cortex’.
Journal of Neuroscience 15(2): 1605–1615.
Neural Mechanisms of Figure-ground Organization 361
Lamme, V. A., K. Zipser, and H. Spekreijse (1998). ‘Figure-Ground Activity in Primary Visual Cortex
is Suppressed by Anesthesia’. Proceedings of the National Academy of Sciences of the United States of
America 95(6): 3263–3268.
Lamme, V. A., V. Rodriguez-Rodriguez, and H. Spekreijse (1999). ‘Separate Processing Dynamics for
Texture Elements, Boundaries and Surfaces In Primary Visual Cortex of the Macaque Monkey’. Cerebral
Cortex 9(4): 406–413.
Lamme, V. A., H. Super, R. Landman, P. R. Roelfsema, and H. Spekreijse (2000). ‘The Role of Primary
Visual Cortex (V1) in Visual Awareness’. Vision Research 40(10–12): 1507–1521.
Layton, O. W., E. Mingolla, and A. Yazdanbakhsh (2012). ‘Dynamic Coding of Border-Ownership in
Visual Cortex’. Journal of Vision 12(13): 8, 1–21.
Lee, T. S., D. Mumford, R. Romero, and V. A. Lamme (1998). ‘The Role of the Primary Visual Cortex in
Higher Level Vision’. Vision Research 38(15–16): 2429–2454.
Lescroart, M. D. and I. Biederman (2013). ‘Cortical Representation of Medial Axis Structure’. Cerebral
Cortex 23(3): 629–637.
Levitt, J. B., D. C. Kiper, and J. A. Movshon (1994). ‘Receptive Fields and Functional Architecture of
Macaque V2’. Journal of Neurophysiology 71(6): 2517–2542.
Likova, L. T. and C. W. Tyler (2008). ‘Occipital Network for Figure/Ground Organization’. Experimental
Brain Research 189(3): 257–267.
Mihalas, S., Y. Dong, R. von der Heydt, and E. Niebur (2011). ‘Mechanisms of Perceptual Organization
Provide Auto-Zoom and Auto-Localization for Attention to Objects’. Proceedings of the National
Academy of Sciences of the United States of America 108(18): 7583–7588.
Nakayama, K., S. Shimojo, and G. H. Silverman (1989). ‘Stereoscopic Depth: Its Relation to Image
Segmentation, Grouping, and the Recognition of Occluded Objects’. Perception 18(1): 55–68.
Parkkonen, L., J. Andersson, M. Hämäläinen, and R. Hari (2008). ‘Early Visual Brain Areas Reflect the
Percept of an Ambiguous Scene’. Proceedings of the National Academy of Sciences of the United States of
America 105(51): 20500–20504.
Peterhans, E. and F. Heitger (2001). ‘Simulation of Neuronal Responses Defining Depth Order and
Contrast Polarity at Illusory Contours in Monkey Area V2’. Journal of Computational Neuroscience
10(2): 195–211.
Peterson, M. A., E. M. Harvey, and H. J. Weidenbacher (1991). ‘Shape Recognition Contributions to
Figure-Ground Reversal: Which Route Counts?’ Journal of Experimental Psychology: Human Perception
and Performance 17(4): 1075–1089.
Peterson, M. A. and B. S. Gibson (1993). ‘Shape Recognition Inputs to Figure-Ground Organization in
Three-Dimensional Displays’. Cognitive Psychology 25(3): 383–429.
Peterson, M. A. (1999). ‘What’s in a Stage Name? Comment on Vecera and O’Reilly (1998)’. Journal of
Experimental Psychology: Human Perception and Performance 25(1): 276–286.
Peterson, M. A. and E. Salvagio (2008). ‘Inhibitory Competition in Figure-Ground Perception: Context and
Convexity’. Journal of Vision 8(16): 1–13.
Pitts, M. A., A. Martínez, J. B. Brewer, and S. A. Hillyard (2011). ‘Early Stages of Figure-Ground
Segregation during Perception of the Face-Vase’. Journal of Cognitive Neuroscience 23(4): 880–895.
Pitts, M. A., J. L. Nerger, and T. J. R. Davis (2007). ‘Electrophysiological Correlates of Perceptual
Reversals for Three Different Types of Multistable Images’. Journal of Vision 7(1): 6, 1–14.
Polimeni, J. R., M. Balasubramanian, and E. L. Schwartz (2006). ‘Multi-Area Visuotopic Map Complexes
in Macaque Striate and Extra-Striate Cortex’. Vision Research 46(20): 3336–3359.
Poort, J., F. Raudies, A. Wannig, V. A. Lamme, H. Neumann, and P. R. Roelfsema (2012). ‘The Role
of Attention in Figure-Ground Segregation in Areas V1 and V4 of the Visual Cortex’. Neuron
75(1): 143–156.
362 Kogo and van Ee
Qiu, F. T., von der Heydt R. (2005) Figure and ground in the visual cortex: v2 combines stereoscopic cues
with Gestalt rules. Neuron 47(1): 155–66.
Qiu, F. T., T. Sugihara, and R. von der Heydt (2007). ‘Figure-Ground Mechanisms Provide Structure for
Selective Attention’. Nature Neuroscience 10(11): 1492–1499.
Roelfsema, P. R., V. A. Lamme, H. Spekreijse, and H. Bosch (2002). ‘Figure-Ground Segregation in a
Recurrent Network Architecture’. Journal of Cognitive Neuroscience 14(4): 525–537.
Rubin, E. (1921). Visuell wahrgenommene figuren. Copenhagen: Glydenalske bogahndel.
Rubin, E. (1958). ‘Figure and Ground’. In Readings in Perception, edited by D. Beardslee, pp. 35–101.
Princeton: Van Nostrand.
Sajda, P. and L. H. Finkel (1995). ‘Intermediate-Level Visual Representations and the Construction of
Surface Perception’. Journal of Cognitive Neuroscience 7(2): 267–291.
Sakai, K. and H. Nishimura (2006). ‘Surrounding Suppression and Facilitation in the Determination of
Border Ownership’. Journal of Cognitive Neuroscience 18(4): 562–579.
Sakai, K., H. Nishimura, R. Shimizu, and K. Kondo (2012). ‘Consistent and Robust Determination of
Border Ownership Based on Asymmetric Surrounding Contrast’. Neural Networks 33: 257–274.
Scholte, S., J. Jolij, J. Fahrenfort, and V. Lamme (2008). ‘Feedforward and Recurrent Processing in Scene
Segmentation: Electroencephalography and Functional Magnetic Resonance Imaging’. Journal of
Cognitive Neuroscience 20(11): 2097–2109.
Sugihara, T., F. T. Qiu, and R. von der Heydt (2011). ‘The Speed of Context Integration in the Visual
Cortex’. Journal of Neurophysiology 106(1): 374–385.
Supèr, H., H. Spekreijse, and V. A. Lamme (2001). ‘Two Distinct Modes of Sensory Processing Observed in
Monkey Primary Visual Cortex (V1)’. Nature Neuroscience 4(3): 304–310.
Supèr, H., C. van der Togt, H. Spekreijse, and V. A. Lamme (2003). ‘Internal State of Monkey Primary
Visual Cortex (V1) Predicts Figure-Ground Perception’. Journal of Neuroscience 23(8): 3407–3414.
Supèr, H. and V. A. Lamme (2007). ‘Altered Figure-Ground Perception in Monkeys with an Extra-Striate
Lesion’. Neuropsychologia 45(14): 3329–3334.
Thielscher, A. and H. Neumann (2008). ‘Globally Consistent Depth Sorting of Overlapping 2D Surfaces in
a Model Using Local Recurrent Interactions’. Biological Cybernetics 98(4): 305–337.
Vecera, S. P. and R. C. O’Reilly (1998). ‘Figure-Ground Organization and Object Recognition Processes: An
Interactive Account’. Journal of Experimental Psychology: Human Perception and Performance
24(2): 441–462.
Vecera, S. P. and R. C. O’Reilly (2000). ‘Graded Effects in Hierarchical Figure-Ground Organization: Reply
to Peterson (1999)’. Journal of Experimental Psychology: Human Perception and Performance
26(3): 1221–1231.
Williams, L. R. and A. R. Hanson (1996). ‘Perceptual Completion of Occluded Surfaces’. Computer Vision
and Image Understanding 64(1): 1–20.
Windmann, S., M. Wehrmann, P. Calabrese, and O. Gunturkun (2006). ‘Role of the Prefrontal Cortex in
Attentional Control over Bistable Vision’. Journal of Cognitive Neuroscience 18(3): 456–471.
Yazdanbakhsh, A. and M. S. Livingstone (2006). ‘End Stopping in V1 is Sensitive to Contrast’. Nature
Neuroscience 9(5): 697–702.
Zhang, N. and R. von der Heydt (2010). ‘Analysis of the Context Integration Mechanisms Underlying
Figure-Ground Organization in the Visual Cortex’. Journal of Neuroscience 30(19): 6482–6496.
Zhaoping, L. (2005). ‘Border Ownership from Intracortical Interactions in Visual Area V2’. Neuron
47(1): 143–153.
Zhou, H., H. S. Friedman, and R. von der Heydt (2000). ‘Coding of Border Ownership in Monkey Visual
Cortex’. Journal of Neuroscience 20(17): 6594–6611.
Chapter 18
Introduction
A little over a century ago Sherrington (1906) established the concept of the receptive field in neu-
rophysiology. This was taken into the visual system by Hartline (1938) and Kuffler (1953), elabo-
rated into simple, complex and other classes of neurons by Hubel and Wiesel (1977), and elevated
into a neural doctrine by Barlow (1972). Central among the properties that emerged by study-
ing receptive fields is orientation selectivity. This became an organizing principle for explaining
boundary perception (among other visual features) (Hubel and Wiesel 1979), and much of mod-
ern visual neurophysiology is built on these foundations. So are substantial parts of computational
neuroscience. Computationally networks of these neurons whose properties are defined by recep-
tive fields are taken to define the machinery that supports boundary inferences.
A little less than a century ago Gestalt psychologists discovered a very different aspect of bound-
ary perception. Rubin (1915) produced a striking example of a reversible figure (Figure 18.1a). It
consists of black and white regions: in one organization the goblet becomes the figure and the dark
regions the background; in the other organization the dark faces become figure(s) and the white
region background. Figure and ground provided one part of the foundation for the Gestalt laws
of perceptual organization.
Rubin’s figure opened the door into a subtle property of boundaries: border ownership (Koffka
1935). In words, boundaries belong to the figure and not the ground. As the Rubin figure alternates,
so do the regions perceived as figure and ground, and so does the property of border ownership.
The entire process seems automatic, fast, and effortless. Paradoxically, while the figure/ground
and border ownership are alternating, the boundary remains fixed in retinal position: regardless
of which figural organization is perceived, the boundary contour passes through the same image
locations. It may, however, vary in apparent depth.
Understanding border ownership is important for understanding vision. At the top level is
the integration of the phenomenology with neural computation. But looking deeper reveals a
kind of catch-22 inherent in these computations: while borders define the figures they enclose,
border ownership depends on the figure. Cells with orientation-selective receptive fields signal
local information; border ownership requires global (figural) information. This observation has
enormous implications for the definition of a visual receptive field and for understanding visual
computations more generally.
The challenge for understanding border ownership is to break this mutual dependence.
Figure 18.1b illustrates how subtle this can be. The concept of figure is a difficult one to pin down,
364 Zucker
Fig. 18.1 Different “sides” of border and figural phenomena in perceptual organization. (a) Rubin’s vase:
the fixed border is perceived as belonging to the figure, not the background. Border ownership switches
with the figure/ground reversal, as does the position of an apparent light source.
(Reprinted from Computer Vision and Image Understanding, 85(2), Michael S. Langer and Steven W. Zucker,
Casting Light on Illumination: A Computational Model and Dimensional Analysis of Sources, pp. 322–35.
Copyright © 1997 with permission from Elsevier).
(b) Borders can induce apparent shape from shading, although the disc is constant in brightness.
I thank R. Shapley for this figure.
(Reproduced from Perception and Psychophysics, 37(1), pp 84–88, Nonlinearity in the perception of form, Robert
Shapley and James Gordon, Copyright © 1985, Springer-Verlag. With kind permission from Springer Science and
Business Media).
(c,d) In some cases borders can be too complicated to induce global figures.
and often it is related to surfaces and the many different facets of objects (Nakayama and Shimojo
1992). This example (Gordon and Shapley 1985) shows how adept we are at perceiving smooth
surfaces (and their shading) even when none is present! In a related observation, the apparent
position of the light source shifts in Figure 18.1a (Langer and Zucker 1997).
algorithm to solve it nor on its implementation. Although there may be many algorithms that
solve a given problem, and many ways to implement a particular algorithm, it may be precisely
the details of “implementation” (Figure 18.2) that provide the clue to understanding the problem.
Intuition from one level can inform modeling at another.
The challenge for understanding border ownership, in particular, is that any explanation must
in principle span all of these levels. The question is how to use them to help define the problem.
To make these general claims concrete, this chapter contrasts two lines of investigation. The first
abstracts neural computation in geometric terms. We start with finding those contours that com-
prise borders, and build the ideas into surface inference via stereo and shading analysis. Although
the circuit models (and mathematics) become more complex, the path through these different
inference tasks displays a common thread. In effect (all) different possibilities are present in a kind
of distributed code, and local conditions select from among them. The principle of good continu-
ation dominates, and global configurations are built from local ones. This defines one of the major
aspects of visual processing.
Border ownership, we argue, is different. Whether a figure is indicated (at a boundary position)
or not is a choice driven not by geometrical good continuation but rather by whether a border
exists that could enclose something. The details do not matter (very much) and global considera-
tions drive local ones. Instead of geometry the question is more one of topology, but in a softer
way than this notion is considered in mathematics. This can be thought of as a different aspect of
visual processing.
While distinct, these two aspects of visual modeling are not uncoupled, and therein, I believe,
lies the real challenge of border ownership. It is not just a question of integrating top-down with
bottom-up (Ullman et al. 2002); it is a question of how to do this without getting lost in the myriad
combinatorial possibilities that arise.
Our goal in this chapter is to help the reader find a path through these different possibilities. In
the end we develop a conjecture about border ownership, neural networks, and local fields that
V1
Fig. 18.2 Biological levels of explanation for perception vary with scale. (a) At the most macroscopic
scale, the visual system involves nearly half of the primate cortex plus sub-cortical and retinal
structures. (b) The first two cortical visual areas, V1 and V2, are shown. The existence of feedforward
and feedback connections between them establishes the networked nature of visual processing.
(c) Within each visual area are layers of neural networks, with neural projections between cells
in a layer and between layers. We shall abstract such networks into a columnar organization.
(d) Networks among neurons are established at synapses. Rarely considered in neural modeling is the
presence of glia (a portion of one of which, an astrocyte, is shown). These non-neuronal cells will be
important when we consider models for border ownership. (e) Finally, there are neurotransmitters,
modulators and other mechanisms at the biophysical level. The tradition in modeling is to
concentrate at (c), the neural networks level, but thinking about all levels can inspire theories.
366 Zucker
could provide a principled approach to doing this. But it is only one way of putting the different
ingredients together. As we hope becomes clear, border ownership is a challenge and a goal that
drives one to consider: What are the general themes that guide perceptual organization, and at
what level should they be described? We start with a review of the border ownership problem.
Fig. 18.3 The combinatorial complexity relating receptive fields and border ownership. (a) A dark figure
on a white background and (b) a white figure on a dark background present identical local patterns
to a neuron (small ellipse denotes receptive field). The border ownership response (Zhou et al. 2000):
those neurons preferring a dark figure, for example, would respond more vigorously to pattern (a) than
to (b); others might prefer light figures; and still others might not be border-ownership selective at
all. The light-dark pattern within the receptive field does not change, only the global arrangement of
which it is a part. (c,d) Other variations should respond similarly. The difficulty is to develop a circuit that
not only provides a border ownership response, but does so in a manner that is invariant to the global
completion.
Data from Hong Zhou, Howard S. Friedman, and Rüdiger von der Heydt, Coding of Border Ownership in Monkey
Visual Cortex, The Journal of Neuroscience, 20(17), pp. 6594–6611, 2000.
beginning. Viewing good continuation geometrically provides very powerful tools for analysis,
which can be extended onto surfaces, thus opening the door to areas such as stereo correspondence
and even shape-from-shading. Thinking of these tasks from the perspective of perceptual organiza-
tion provides a refreshing relationship among them. We review briefly three steps along this path.
Boundary detection seems straightforward. It is known that visual cortex contains neurons selec-
tive for different orientations, with each position covered by cells tuned to each orientation (Figure
18.4a,b). This suggests a classical approach: simply convolve an operator modeling an orientation-
selective receptive field against the image, simulating the neurons’ responses, and choose those with
high values. Unfortunately these purely local approaches simply do not work. Noise, additional
microstructure in the image, and the properties of object reflectance conspire to alter the responses
from the ideal. Some additional interactions are required, and this becomes our first view of local
and global interactions in boundary inference. (Later, when considering the units comprising a bor-
der-ownership model, we shall be forced to question this filtering view of receptive fields as well.)
Exploiting the functional organization of visual cortex, those neurons whose classical receptive
field centers overlap yields a columnar model for the superficial (upper) layers of visual cortex, V1
(a) (b) (c) Vi
II-III
IV
Incompatible
tangent
Fig. 18.4 Detection of local boundary signals. (a) Individual neurons in visual cortex are selective to dark/
bright pattern differences in the visual field; this is depicted by the (b) Gabor model of a receptive field.
Since such local measurements are noisy, contextual consistency along a boundary can be developed
geometrically. This involves circuits of neurons (c) that possess both local and long-range horizontal
connections. (d) Orientation columns abstract the superficial layers of V1. Rearranging the anatomy
yields groups of neurons (a column) selective for every possible orientation at each position in the visual
array. These columns are denoted by the vertical lines, indicating that at each retinotopic (x, y)-position
all (θ)-orientations are represented. Long-range horizontal connections define circuits among these
neurons, enforcing consistent firing among those (e) representing the orientations along a putative
contour. Geometry enters when we interpret an orientationally-selective cell’s response as signaling
the tangent to a curve. This tangent can in effect be transported along an approximation to the curve
(indicated as the osculating circle) to a nearby position. Compatible tangents agree in position and
orientation. (f) The transport operation can be “hardwired” in the long range connections, shown as
the “lift” of an arc of (osculating) circle in the (x, y)-plane into a length of helix in (x, y, θ) coordinates.
The result is a model for connection patterns in visual cortex indicating (g) straight, (h) small curvature,
or (i) high curvature.
Reproduced from Steven Zucker and Ohad Ben-Shahar, Geometrical computations explain projection patterns
of long-range horizontal connections in visual cortex Neural Computation, 16:3 (March , 2004), pp. 445–476
© 2004 Massachusetts Institute of Technology.
Border Inference and Border Ownership 369
(Hubel and Wiesel 1977). Although a mathematical simplification, it is useful for organizing com-
putations. In Figure 18.4d such orientation columns are denoted by vertical lines, indicating that
at each (x,y)-position in the retinotopic array (a discrete sampling of) all (θ) orientations are
represented.
We concentrate on these upper layers, and sketch several of the anatomical projections to and
from them. This, of course, is only a rough sampling (Casagrande and Kaas 1994, Douglas and
Martin 2004) of the many layers of visual processing (Felleman and Essen 1991).
1 Feedforward projections from layer 4 to layers 2/3 build up the local response properties.
These are likely supported by local circuits within layers 4 and layers 2/3 as well (Miller 2003;
Sompolinsky and Shapley 1997). Superficial V1 also has an organization into cytochrome
oxidase blobs and interblob areas, a distinction we shall not pursue in this chapter.
2 Long range horizontal connections (LRHC’s) (Rockland and Lund 1982; Bosking et al. 1997;
Angelucci et al. 2002; Figure 18.4c) define circuits among layer 2/3 neurons. Anatomical
studies reveal that these intrinsic connections are clustered (Gilbert and Wiesel 1983) and
orientation-dependent (Bosking et al. 1997), leading many to believe that consistent firing
among neurons in such circuits specifies the orientations along a putative contour (Kapadia
et al. 1995; Zucker et al. 1989; Field et al. 1993). This, in effect, uses context (along the contour)
to remove noisy responses that are inconsistent with their neighbors’ responses. It could also
reinforce weak or missing responses blocked by image structure.
3 Feedforward projections from layers 2/3 in V1 to higher visual areas (Salin and Bullier 1995;
Angelucci et al. 2002). V2, for example, has an elaborate organization into subzones as well,
including the thin, thick, and pale stripe areas (Roe and Ts’o 1997).
4 Feedback projections from higher visual areas to earlier visual areas (Rockland and Virga
1989; Angelucci et al. 2002). The structure of these feedback signals will be a significant
feature of models for border ownership, and is discussed in more detail later. For now we
emphasize that these feedback connections are patchy rather than targeted (Shmuel et al.
2005; Muir et al. 2011).
We now discuss the LRHC’s, because these are so naturally associated with boundary process-
ing (Adini et al. 1997). We concentrate on geometric properties to emphasize the connection
to good continuation. For a discussion of psychophysical properties, see Elder and Singh, this
volume. A model is sketched for V1 (Ben-Shahar and Zucker 2003) that predicts the first and
second order statistics of LRHC’s (Bosking et al. 1997). It could also subserve contrast integration
(Bonneh and Sagi 1998) and, over a larger scale, model (some of) the projections to V2 (Zucker
et al. 1989). As we show, however, these are insufficient for the border ownership problem, which
will require us to think more carefully about feedback projections.
Differential geometry provides a formalization of good continuation over short distance scales.
It specifies how orientations align along a contour. Interpreting the orientationally-selective cell’s
response as signaling the tangent to a curve, this tangent can be transported along an approxima-
tion to the curve (indicated as the osculating circle) to a nearby position.
Compatible tangents are those that agree with sufficient accuracy in position and orientation
following transport; this is co-circularity. The transport operation can be embedded in the long
range connections, and realized both geometrically (Figure 18.4f or in the retinotopic plane
(Fig. 18.4g,h,i. As we shall describe, many models of border ownership are based on similar ideas,
although it is the topological orientation (toward inside or outside of the figure) that is communi-
cated via the long-range horizontal projections.
370 Zucker
Sometimes complexity can reveal simplicity, and by lifting contours from the image into cor-
tical coordinates we show how Wertheimer’s (1923) original demonstration of the Principle of
Good Continuation simplifies. Crossing curves become simple in cortical coordinates (Figure
18.5). The intuition is that, like inertial motion of an object, things tend to keep going in the direc-
tion they were going. Only now it is in a geometric space (Parent and Zucker 1989; Sarti et al.
2008). At a discontinuity there are multiple orientations at the same position. They signal what
often amounts to a monocular occlusion event (Zucker et al. 1989); a contour ending can signal a
cusp (Lawlor et al. 2009).
It is important to note that not all discontinuities are visible, especially when individual con-
tours combine into a texture. Figure 18.5d shows what appears as a wavy surface behind occlud-
ers. Classical amodal completion (Kanizsa 1979) works to suggest a smooth surface even when
there are different numbers of stripes in each zone. (Such dense patterns will be relevant for shad-
ing analysis, shortly.)
240
120
0
2
1.5
1
0.5
0
–0.5
–1 1
0.5
–1.5 0
y –0.5
–2 –1
x
(d)
Fig. 18.5 Good continuation in (x, y, θ)-space explains why the “figure 8” in (a) is not seen as (b)
two “just touching” closed contours. The lift separates the crossing point into two distinct levels
(c), one corresponding to the lower orientation and the other to the higher value of orientation.
The lift further provides an early representation of corners and junctions, for example at points
of monocular occlusion. (d) For textures there is completion across occluders, even though there
are different numbers of contours in each segment; this is relevant to texture and shading flow
continuations.
Border Inference and Border Ownership 371
(d) (e)
Transport in R3
i j
M
y
x Tp(M)
z
C1
p q
pair i N(p) N(q)
Il Ir
Cr
yl
yr
xl
pair j xr
Fig. 18.6 The stereo problem for space curves. (a, b) Tree branches meander through depth and
may appear in different ordering when projected into the left and right eyes (highlighted box).
(c) Color-coded depth along the branches. In early visual areas the boundaries of these branches
are complicated arrangements of short line segments (tangents) inferred from the left and right
images. Notice the smooth variation of depth along the branches, even though they occasionally
cross one another. (d) Geometry of stereo correspondence: pairs of projected image tangents need
to be coupled to reveal a tangent in space. Good continuation (in space) then amounts to good
continuation among pairs of (left, right) tangents. (e) The stereo problem for surfaces can be posed in
similar terms, except now the surface normal drives the computation.
Reproduced from International Journal of Computer Vision, 69(1), pp 59–75, Contextual Inference in Contour-
Based Stereo Correspondence, Gang Li and Steven W. Zucker, Copyright ©2006, Kluwer Academic Publishers.
With kind permission from Springer Science and Business Media.
The curvature of the body is the betrayer, light and shadow are
its accomplices.
(Metzger (2006, p. 107)).
Although the Gestalt psychologists realized intuitively that the inference of shape from shad-
ing information involved some of the same ideas as good continuation, to our knowledge it is
rarely approached in that fashion. Instead the stage was set initially by Ernst Mach in the 1860’s
(see Ratliff (1965) and taken up with enthusiasm in computer vision (Horn and Brooks 1989).
However, none of these approaches involved perceptual organization; they were based either
on a first-order differential equation or on regularization techniques. We now sketch a percep-
tual organization approach to inferring shape from shading information, based on the model in
Kunsberg and Zucker (2014) and Kunsberg and Zucker (2013), to provide a flavor of how general
geometric good continuation can be.
In each of the previous problems good continuation was used to provide constraints between
nearby possible interpretations—e.g., how nearby orientations behave along a curve with each
interpretation deriving from an image measurement. For the inference of shape from shading
information, we start with the cortical representation of the shading (Figure 18.7a). Ideally, cells
tuned to low spatial frequencies will respond maximally when, e.g. the excitatory receptive field
domain is aligned with the brighter pixels; the inhibitory domain of an oriented receptive field will
then align with the darker regions. These maximal-responding cells define the shading flow field
in cortical space (Breton and Zucker 1996).
Corresponding to this shading flow is an illuminated surface, and therein lies the heart of the
difficulty: the surface is situated in 3D space, the light source is situated in 3D space (relative to the
surface and the viewer) but the image is only 2D. Solving this inverse problem will require both
assumptions about how images are formed and what types of surfaces exist in the world.
The trick is to think about what happens on the surface when you move through the shading
flow field. Taking a step in the direction signaled by a cell amounts to taking a step along an iso-
phote on the surface. For Lambertian reflectance, this implies that the tangent plane (to the sur-
face) has to rotate precisely so the brightness remains constant. Or, moving normal to the shading
flow implies the brightness gradient must be changing in another measureable fashion (contrast).
Together these constraints on the flow changes correspond to changes in the surface curvatures,
revealing a family of possible surface patches for each patch of shading flow (Figure 18.7). This
provides the “column” of possible local surface patches, analogous to the column of possible ori-
entations at a position for contours. Boundary and interior conditions could then select from
among these, just as the induced boundary contrast yielded a shape percept in Figure 18.1b.
Fascinatingly, understanding shape-from-shading also illuminates other aspects of boundaries
that we enjoy in art and drawings (see DeCarlo et al. 2003).
(a)
0.3
0.2
Shading flow
0.1 field
0.0
–0.1
Tangents to
–0.2 isophotos
–0.3
–0.3 –0.2 –0.1 0.0 0.1 0.2 0.3
(b)
Orientation Shading flow Possible local surfaces
hypercolumns
Response
z
y
x
continuation could be thought of as selecting from among these possibilities according to linking
constraints. For contours it was co-circularity; for stereo it was pairs of (left, right) pairs of ori-
ented binocular responses; and finally the shading flow and surface patches. Curvature provided
the constraint in each case, dictating how the pieces could be glued together. The whole, in effect,
is built up by assembling the pieces in concert with their neighbors. Things fit together like a jig-
saw puzzle; and the different puzzles fit together at a higher level; it is all beautifully coupled into
one large network.
Border ownership, we assert, is different. It requires feedback from beyond geometric neigh-
bors and includes whole assemblies of cells. Neural action-at-a-distance affects local decisions,
and this action has to do with the global arrangement of boundary fragments; that is, with figural
properties.
Border Inference and Border Ownership 375
We now speculate on which aspects of neural systems could play a fundamental role in the
solution of the border ownership computation. We discuss two main classes of models: those
in which the global information is obtained by a propagation process, and the second in which
global information is conveyed back to local decisions by downward propagation of information
from higher visual areas to lower ones. Both classes raise interesting theoretical questions that can
be related to topology. The first class deals with the question of whether a contour is oriented; the
second with whether a surface is contained. For reasons developed below, we believe the second
class is more appropriate to border ownership computations.
A combinatorial problem arises at the heart of these “topological” computations, and this
demands special consideration. It was already hinted at in Figure 18.3: how can the feedback
connections be “wired up” so that the many possible completions all support the same border—
ownership neuron consistently? Trying to learn all possible connections seems wasteful, if not
infeasible; that level of detail seems inappropriate. Rather, some type of generalized shape feed-
back seems more suitable, one that provides a figure signal without details.
A conjecture about this general figure problem is the final topic covered. It involves a local field
potential whose value signals certain key properties of distant boundaries. While this breaks the
central paradox of border ownership, it is highly speculative. It is included in the spirit of trying to
start a discussion about whether “standard” approaches to neural computation, such as those just
discussed for good continuation, suffice. Among the questions raised are the following: how are
feedforward, feedback, and lateral connections coordinated? Does neural computation involve
only neurons, or should the surrounding substrate be included as well. And finally, given this
larger picture, should the classical—or even the extra classical—version of receptive field give
way to more general computational structures? This is where we confront the levels issue raised
in Figure 18.2.
(a) (b)
Vi
I
II-III
IV
(c) (d) Vj
Vi
I
G II-III
IV
(e) (f)
–0
–10
–20
200
150
100 20
40
50 60
80
Fig. 18.8 Neural models for computing border ownership. (a) Topological indicators or their proxy (e.g.,
the bright side of a boundary) could be propagated along a contour by utilizing long-range horizontal
connections (b) within an area. To establish closure it is necessary to go “all the way around” the
figure, however, which takes too long in neural terms. (c) Feedback integrating boundary information
from higher areas (d) could provide information about the existence of a figure, for example when
a circular arrangement of edge detectors feeds back to a single integrating “grouping” neuron G to
approximately signal the square figure (Craft et al. 2007). (e) To specify the correct grouping neurons for
complex shapes is combinatorially difficult for complex shapes; there are many interior “balls” that could
provide feedback. (f) The distance map (here shown in the negative) is the foundation for such shape
descriptions. Peaks (or valleys in this case) are the most distant points to the boundary; their locations
define the skeleton of the shape.
Data from Edward Craft, Hartmut Schütze, Ernst Niebur, and Rüdiger von der Heydt, A Neural Model of Figure–
Ground Organization, Journal of Neurophysiology, 97(6), pp. 4310–4326, DOI: 10.1152/jn.00203.2007, 2007.
pointing in the y-direction. Now, holding tight, after walking around the circle completely the
orientation of the arrow would be the same. But doing this on a Möbius strip is different: after
walking around once the arrow is pointing in the opposite direction; a second rotation is required
to align them. Topological consistency formally is the question of whether the local bases for each
fiber can be glued together so that the arrow does not reverse. Clearly, for general boundaries, to
guarantee consistency it is necessary to propagate information all the way around; the circle in the
image is orientable; the Möbius strip is not (Arnold 1962).
Border Inference and Border Ownership 377
Although this approach is beautiful in its mathematical simplicity, the global requirement for
orientability makes timing an issue for this class of models. For large figures it could take a long
time for information to propagate all the way around, but the evidence is that there is simply not
enough time for the signal to propagate that far (Craft et al. 2007).
A more plausible class of models involves feedback from higher visual areas (Felleman and
Essen 1991). Prominent projections exist from V1 to V2, V2 to V4 and V4 to inferotemporal (IT)
cortex, where much of high-level visual shape analysis is thought to reside (Hung et al. 2012).
There is a corresponding feedback projection for each of these forward projections. Since this
carries the integrated, higher-level information about shape back to lower areas it seems a natural
component to border ownership models. After all, it is this global, shape-based feedback that
could support border ownership (Section 1.2); supporting physiological evidence exists (e.g.,
Super and Lamme 2007; Self and Roelfsema, this volume and a number of models have been
developed (Craft et al. 2007; Sajda and Finkel 1995; Super and Romeo 2011).
Feedback is important because a 2D shape is an area surrounded by boundary and it is this fea-
ture of boundaries that could be fed back (Figure 18.8). The logic for accomplishing this is shown
in Figure 18.8b and is based on the idea that, briefly, shapes can be approximated by circular
arrangements of border-selective cells at the right positions. For certain simple shapes it is this
arrangement of boundary responses that could be fed back and integrated into a border-ownership
response. One way to do this is by a putative “grouping neuron” (Craft et al. 2007), but therein lies
the problem: Since there are many different circles contained in a general figure (e.g., Figure 18.8c)
how should these be integrated together into a single entity? When is a shape simple enough for
this to work? Does the distant completion matter (Figure 18.3c,d)?
This is the first part of the combinatorial problem faced by early border ownership models
and is related to certain figural representations. It suggests how shape models could inform the
border ownership computation. To build up a construct that we shall need shortly, imagine that
the shape was made of paper, and that it was ignited at every boundary point simultaneously.
The fire would burn inward and extinguish itself at distinguished points—the skeleton of the
shape (Blum 1973; Kimia et al. 1995). At the root of such algorithms is the distance map, or a
plot of the (shortest) distance to the boundary from any interior point (the negative of the dis-
tance map is shown in Figure 18.8d); it gives the time for the fire to reach that point. Maximal
values are the locus of maximal enclosed circles that touch the shape in (at least) two points
and are singularities of its gradient (Siddiqi et al. 2002). The Blum fire propagation solves the
issue of selecting the maximal enclosed circles by physics; we shall shortly suggest how a brain
might do this.
The second difficulty faced by border ownership models is that borders need not be closed
topologically. This is illustrated by visual search tasks (Figure 18.9) in which the time to find the
target among a group of distractors is a surrogate for how similar their cognitive representations
might be. Somehow, for broken contours or occluded figures we do not require the exact distance
map but only certain of its key features.
Generative models (Hinton and Ghahramani 1997; Hinton et al. 2006; Rao et al. 2002) pro-
vide for top-down feedback motivated by the question of how neural activity in higher areas
could generate patterns of activity in earlier areas resembling those from the bottom-up stimu-
lus. But the problem with border ownership is combinatorial: many patterns should evoke the
same relevant back projection. One possibility involves a probabilistic interpretation of the skel-
eton (Froyen et al. 2010), although this provides no connection to neurophysiology. We suggest
another approach.
378 Zucker
(a)
Target Distractor Target Distractor
(b)
3000 3000
Target Dist
2500 2500 Open
Response time (ms)
2000 2000
Target Dist
1500 1500
Target Dist
1000 1000
Closed
500 500
8 16 24 8 16 24
Display size Display size
Fig. 18.9 In visual search one seeks an example figure among a field of distractors as rapidly as possible.
(a) Examples of two displays with a figure embedded among distractors. Notice how much easier the
task is for the closed rather than the open figures. This suggests the power of closure. (b) Data showing
that nearly closed figures are effectively the same as closed figures, and that the arrangement of
contour fragments is key to the effect.
Reprinted from Vision Research, 33(7), James Elder and Steven Zucker, The effect of contour closure on the rapid
discrimination of two-dimensional shapes, pp. 981–91, Copyright © 1993. With permission from Elsevier.
Enclosure fields
Once in a conversation, the late Karl Lashley, one of the most
important psychologists of the time, told me quietly: “Mr.
Kohler, the work done by the Gestalt psychologists is surely
most interesting. But sometimes I cannot help feeling that you
have religion up your sleeves.
(Köhler (1969, p. 48)).
Border ownership is about action-at-a-distance: how distant edges influence local boundary
decisions. Such phenomena occur not only in neuroscience but in developmental biology more
widely. In this section we build up the idea of an enclosure field, a relaxation of the topological
definition of closure, and show that it carries information about borders at a distance in a manner
Border Inference and Border Ownership 379
that integrates over incompletions and shape variations. In the next section we develop it into a
conceptual circuit model.
To build intuition, we start with what, at first, seems like a completely different situation: a
growing plant. We ask: how are new veins signaled in a juvenile leaf? Somehow the cell furthest
from existing veins must signal them to send a new shoot in that direction. The hormone auxin
is involved in the process, a simple model for which can be developed along the following lines
(Dimitrov and Zucker 2006; see Figure 18.10a). Imagine that each cell in a rectangular areole (or
patch of tissue surrounded by existing veins) produces auxin at a constant rate, that it diffuses
across cell membranes, and that existing vasculature clears it away. Abstractly this implies a simple
reaction–diffusion equation: the change in concentration at a point is proportional to the amount
that is produced there plus the relative amount that diffuses in and away. A boundary condi-
tion—zero concentration at the veins—lets us calculate the solution. The steady state equilibrium
(Figure 18.10b) has a “hot spot” in the center and drops off to zero. Note that although it could
(c) (d)
0.03
0.029
Concentration
0.028
0.027
0.026
0.025
0.024
0 5 10 15 20 25 30 35 40
(f) x 10–4
(e) 8
Concentration
6
Difference
4
2
0
0 5 10 15 20 25 30 35 40
Fig. 18.10 Two ways to build the enclosure field concept. The left column is relevant to biology
(interior production) and the other to neuroscience (boundary feedback). The illustration shows
a rectangular figure. (a) Interior production has each “cell” (i.e. pixel) producing, with diffusion
between neighboring cells and zero concentration at the existing veins (boundary). (c) The equilibrium
concentration along the central black line shows a peak at the center, while the magnitude of the
gradient (e) shows a peak at the boundary. This peak gradient is proportional to the distance to the
concentration “hot spot.” (b) Production from existing veins has only the boundary cells (pixels)
producing. Diffusion leads to spreading and catalysis leads to destruction. (d) Notice that now there is
a concentration minimum but still a magnitude of gradient peak (f) proportional to distance.
380 Zucker
appear that the hot spot developed from overproduction, say to lack of nutrient, this specialization
is not necessary. But it is even more important to look at the boundary, where the concentration
gradient (magnitude) is maximal. This is where the signal is most useful, because it is where cells
need to start differentiating from ground type to vein type. Structurally here is the main point: the
absolute value of the gradient is in proportion to the distance to the hot spot (Figure 18.10e).
While the actual biology is more complex (Dimitrov and Zucker 2009a; Dimitrov and Zucker
2009b), action-at-a-distance has been achieved: a signal is available to control vascular growth.
There is a mathematical dual to this result that amounts to letting the system run in the oppo-
site direction. Instead of having the tissue produce auxin and the veins clear it, auxin could be
produced by the existing veins and could then diffuse inwards. Adding a destruction term to the
equation (so that the change in concentration at a point is proportional to the amount that dif-
fuses in minus the amount catabolized away) prevents the concentration from increasing beyond
bound (Dimitrov and Zucker 2009a; Dimitrov and Zucker 2009b) but the logic remains the
same: the value of the auxin field contains information about the distance map. This is precisely
what is required for border ownership. See Figure 18.10 (right column).
It is this dual result that is relevant to neurobiology because there is a different way to pro-
duce it than by hormones. To appreciate it, consider the feedback from higher areas about border
segments (and possibly their arrangement) as analogous to the existing vasculature: instead of
signaling the areole’s boundary, as veins could in plants, the feedback signals information about
the figural boundary. What is relevant for border ownership is not that there is a hotspot of auxin
at the center, but rather that there exists a “center” to some figure plus the side on which it lies.
Certain properties of this enclosure field are illustrated in Figure 18.11. As we describe next, the
relevant signal could be in the form of a local field potential instead of auxin. But the mathematics
remains qualitatively the same.
(d)
(e)
10
9
8
7
6
5
4
3
2
1
Fig. 18.11 Illustrations of the enclosure field. (a,b,c) Increasing segment length shows the field as
more of the “enclosing boundary” is available. It increases with convexity and integrates over gaps.
(d) Figures like those used in the search task. (e) The enclosure field. Notice how the target emerges
in concentration whether or not the boundary is complete.
Figure 18.12 illustrates how an enclosure field model could work. The LFP is built up from cur-
rents that derive from both intrinsic neuronal activity and feedback connections. Most importantly,
there is accumulating evidence that physiological fluctuations in the LFP can control when neurons
spike (Frohlich and McCormick 2010); the composite is called a phase-of-firing code (Montemurro
et al. 2008; Panzeri et al. 2010). Although in vivo research in visual cortex is lacking, it is known that
such codes can coordinate activity in different brain areas (e.g., Brockmann et al. 2011); we assert
that they provide the coupling between the local field and the border-selective neurons.
Finally, it must be stressed that there are other cell types in the neuronal surround, primarily
glia, and we here focus on one of these, the astrocytes (Figure 18.12d). It has recently been con-
jectured that glia could play a role in neuronal function (Araque and Navarrete 2010). Although
astrocytes are non-spiking, they do have channels, glial transmitters (e.g., glutamate) and provide
a gap-junction coupled tesselation of extra-neuronal space (Nedergaard et al. 2003). And they
play a role in synaptic development (Araque et al. 1999). In summary, it seems increasingly likely
that glia could be playing a significant role in controlling the LFP and its neuronal interaction,
and in integrating it with neuronal activity. The enclosure field model suggests a concrete way in
which they could be involved.
The model is clearly radical. If correct (even in part) it suggests that neural modeling must
extend beyond neurons to include the substrate in which neurons are embedded plus other cell
types. Synaptic interaction must extend beyond classical second order: local field potentials mat-
ter as well as spike timing and synaptic arrangement.
(a) (b)
Vi Vj Vi Vj
I I
II-III II-III
IV IV
(c) (d)
V Vi Vj
II-III
IV
Fig. 18.12 The enclosure field model for border ownership involves feedback from higher areas and
integration via local field potentials. (a) The LFP is shown (gray) emanating from neuronal processes; it
also derives (b) from feedback projections. The composite field controlling border ownership derives
from their superposition. (c) The LFP can control neuronal spiking activity. Shown are action potentials
on top of local field fluctuations. This particular neuron prefers to fire when the LFP is depolarized. (d)
Astrocytes tessellate the volume surrounding large numbers of neurons. Each blob in the tessellation
suggests a single astrocyte domain.
Reprinted from Trends in Neurosciences, 26(10), Maiken Nedergaard, Bruce Ransom, and Steven A. Goldman,
New roles for astrocytes: Redefining the functional architecture of the brain, pp. 523–30, Copyright © 2003, with
permission from Elsevier.
Border Inference and Border Ownership 383
The implications of ascribing an information-processing role to glia are wide ranging but can-
not be ignored. In a striking experiment human glia have been shown to greatly increase learning
and synaptic plasticity in adult mice (Han et al. 2013). Second, glia may play a role in disease. It is
known, for example, that there is an increase in glia among autistic individuals. Since this holds
even in visual cortex (Tetreault et al. 2012), perhaps it explains the perceptual organization differ-
ences that are expressed in autism (Simmons et al. 2009).
Finally, the consideration of border ownership as part of what causes a neuron’s activity greatly
complicates the notion of receptive field. As described above (Figure 18.4b, receptive fields are
normally characterized as, e.g. Gabor patches with even/odd symmetry, plus an orientation and a
scale. When the border ownership component is included, the locus of retinotopic positions that
can influence firing becomes very large. Receptive fields in early vision no longer have the crisp
interpretation of a Gabor patch and can be a very complicated function of the stimulus. Receptive
fields become a network property, in short, and not a convolution filter.
Conclusions
A science . . . gains in value and significance not by the number
of individual facts it collects but by the generality and power of
its theories . . .
(Koffka (1935, p. 9)).
Border ownership in particular, and Gestalt phenomena in general, have provided a long-term
challenge to visual modelers. While the phenomena are easy to demonstrate, explaining them has
required an integration of many different theoretical constructs. Here we tried to lay out a logical
basis for this, by contrasting the geometric ideas underlying borders, stereo, and shading analysis
on the way to surface inferences against the topological ideas underlying border ownership. The
chapter took a neurogeometric tone and, in the end, we explored both traditional style models
of neuron-to-neuron computation plus extensions to them. The topological challenge of border
ownership revealed an association to field-theoretic models, which in turn broadened the scope
of modeling to include local field potentials and glia as well as neurons. The end was a model
enlarged drastically in scope. The chapter opened with a brief review of the receptive field concept
in neurophysiology and closed with a radically enlarged view from Gestalt psychology. While this
is certainly not the last word in border ownership, we hope it is indicative of the types of intel-
lectual debate that modeling must face.
Acknowledgements
Supported by AFOSR, ARO, NIH and NSF. I thank J. Wagemans, N. Kogo, and reviewers for
comments on the manuscript; and B. Kunsberg, D. Holtmann-Rice, M. Lawlor, and P. Dimitrov
for discussion.
References
Adini, Y., Sagi, D., and Tsodyks, M. (1997). Excitatory-inhibitory network in the visual
cortex: Psychophysical evidence. Proceedings of the National Academy of Sciences (USA) 94: 10426–31.
Angelucci, A., Levitt, J. B., Walton, E. J. S., Hupe, J.-M., Bullier, J., and Lund, J. S. (2002). Circuits for local
and global signal integration in primary visual cortex. The Journal of Neuroscience 22(19): 8633–46.
Araque, A. and Navarrete, M. (2010). Glial cells in neuronal network function. Philosophical Transactions
of the Royal Society, Series B 365: 2375–81.
384 Zucker
Araque, A., Parpura, V., Sanzgiri, R., and Haydon, P. (1999). Tripartite synapses: glia, the unacknowledged
partner. Trends in Neurosciences 22: 208–15.
Arnold, B. H. (1962). Intuitive concepts in elementary topology. Englewood Cliffs: Prentice Hall.
Barlow, H. B. (1972). Single units and sensation: A neuron doctrine for perceptual psychology. Perception
1(4): 371–94.
Ben-Shahar, O. and Zucker, S. W. (2003). Geometrical computations explain projection patterns of
long-range horizontal connections in visual cortex. Neural Computation 16: 445–76.
Blum, H. (1973). Biological shape and visual science (Part I). Journal of Theoretical Biology 38: 205–87.
Bonneh, Y. and Sagi, D. (1998). Effects of spatial configuration on contrast detection. Vision Research
38: 3541–53.
Bosking, W., Zhang, Y., B., S., and Fitzpatrick, D. (1997). Orientation selectivity and the
arrangement of horizontal connections in the tree shrew striate cortex. The Journal of Neuroscience
17(6): 2112–27.
Breton, P. and Zucker, S. (1996). Shadows and shading flow fields. In Proceedings of Computer Vision and
Pattern Recognition (CVPR), pp. 782–789.
Brockmann, M., Pschel, B., Cichon, N., and Hanganu-Opatz, I. (2011). Coupled oscillations mediate
directed interactions between prefrontal cortex and hippocampus of the neonatal rat. Neuron
71(2): 332–47.
Buzski, G., Anastassiou, C. A., and Koch, C. (2012). The origin of extracellular fields and currents EEG,
ECOG, LFP and spikes. Nature Reviews Neuroscience 13: 407–20.
Casagrande, V., and Kaas, J. (1994). The afferent, intrinsic, and efferent connections of primary visual
cortex in primates. In: A. Peters, and K. Rockland (eds.) Cerebral cortex: Primary visual cortex in
primates, Vol. 10, pp. 201–259. New York: Plenum Press.
Chavane, F., Monier, C., Bringuier, V., Baudot, P., Borg-Graham, L., Lorenceau, J., and Fregnac, Y. (2000).
The visual cortical association field: A Gestalt concept or a psychophysiological entity? Journal of
Physiology (Paris) 94: 333–42.
Craft, E., Schutze, H., Niebur, E., and von der Heydt, R. (2007). A neural model of figure-ground
organization. Journal of Neurophysiology 97(6): 4310–26.
DeCarlo, D., Finkelstein, A., Rusinkiewicz, S., and Santella, A. (2003). Suggestive contours for conveying
shape. ACM Transactions on Graphics 22(3): 848–55.
Dimitrov, P. and Zucker, S. W. (2006). A constant production hypothesis that predicts the dynamics of leaf
venation patterning. Proceedings of the National Academy of Sciences (USA) 13(24): 9363–8.
Dimitrov, P. and Zucker, S. W. (2009a). Distance maps and plant development #1: Uniform production and
proportional destruction. arXiv.org, arXiv:0905.4446v1 [q-bio.QM], 1–39.
Dimitrov, P. and Zucker, S. W. (2009b). Distance maps and plant development #2: Facilitated transport and
uniform gradient. arXiv.org, arXiv:0905.4662v1 [q-bio.QM](24), 1–46.
Douglas, R. J. and Martin, K. A. C. (2004). Neuronal circuits of the neocortex. Annual Review of
Neuroscience 27: 419–51.
Dubuc, B. and Zucker, S. W. (2001). Complexity, confusion, and perceptual grouping. Part II. Mapping
complexity. International Journal of Computer Vision 42(1/2): 83–115.
Elder, J. and Zucker, S. W. (1993). Contour closure and the perception of shape. Vision Research
33(7): 981–91.
Felleman, D. and Essen, D. V. (1991). Distributed hierarchical processing in the primate cerebral cortex.
Cerebral Cortex 1: 1–47.
Field, D., Hayes, A., and Hess, R. (1993). Contour integration by the human visual system: evidence for a
local association field. Vision Research 33: 173–93.
Frohlich, F. and McCormick, D. (2010). Endogenous electric fields may guide neocortical network activity.
Neuron 67: 129–43.
Border Inference and Border Ownership 385
Froyen, V., Feldman, J., and Singh, M. (2010). A Bayesian framework for figure-ground interpretation.
In: J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta (eds.) Advances in Neural
Information Processing Systems, Vol. 23, pp. 631–9). Available online at: http://papers.nips.cc/book/
advances-in-neural-information-processing-systems-23-2010
Gilbert, C. and Wiesel, T. (1983). Clustered intrinsic connections in cat visual cortex. The Journal of
Neuroscience 3(5): 1116–33.
Gordon, J. and Shapley, R. (1985). Nonlinearity in the perception of form. Perception & Psychophysics
37: 84–8.
Han, X., Chen, M., Wang, F., Windrem, M., Wang, S., Shanz, S. et al. (2013). Forebrain engraftment by
human glial progenitor cells enhances synaptic plasticity and learning in adult mice. Cell Stem Cell
12(3): 342–53.
Hartline, H. K. (1938). The response of single optic nerve fibers of the vertebrate eye to illumination of the
retina. American Journal of Physiology 121: 400–15.
Henrie, J. and Shapley, R. (2005). LFP power spectra in V1 cortex: The graded effect of stimulus contrast.
Journal of Neurophysiology 94(1): 479–90.
Hinkle, D. A. and Connor, C. E. (2002). Three-dimensional orientation tuning in macaque area V4. Nature
Neuroscience 5(7): 665–70.
Hinton, G. and Ghahramani, Z. (1997). Generative models for discovering sparse distributed
representations. Philosophical Transactions of the Royal Society of London, Series B, Biological Sciences
352: 1177–90.
Hinton, G. E., Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural
Computation 18: 1527–54.
Horn, B. K. P. and Brooks, M. J. (eds.) (1989). Shape from shading. Cambridge, MA: MIT Press.
Huang, P.-C., Chen, C.-C., and Tyler, C. W. (2012). Collinear facilitation over space and depth. Journal of
Vision 12(2): 1–9.
Hubel, D. H. and Livingstone, M. S. (1987). Segregation of form, color, and stereopsis in primate area 18.
The Journal of Neuroscience 7(11): 3378–415.
Hubel, D. H. and Wiesel, T. N. (1977). Functional architecture of macaque monkey visual cortex.
Proceedings of the Royal Society of London, Series B 198: 1–59.
Hubel, D. H. and Wiesel, T. N. (1979). Brain mechanisms of vision. Scientific American 241: 150–62.
Hung, C.-C., Carlson, E. T., and Connor, C. E. (2012). Medial axis shape coding in macaque
inferotemporal cortex. Neuron 74(6): 1099–113.
Hunt, J. J., Mattingley, J. B., and Goodhill, G. J. (2012). Randomly oriented edge arrangements dominate
naturalistic arrangements in binocular rivalry. Vision Research 64: 49–55.
Kanizsa, G. (1979). Organization in vision: Essays on Gestalt perception. New York: Praeger.
Kapadia, M., Ito, M., Gilbert, C., and Westheimer, G. (1995). Improvement in visual sensitivity by
changes in local context: Parallel studies in human observers and in V1 of alert monkeys. Neuron
15: 843–56.
Katzner, S., Nauhaus, I., Benucci, A., Bonin, V., Ringach, D., and Carandini, M. (2009). Local origin of
field potentials in visual cortex. Neuron 61: 35–41.
Kimia, B., Tannenbaum, A., and Zucker, S. W. (1995). Shapes, shocks, and deformations. Part I. The
components of two-dimensional space and the reaction-diffusion space. International Journal of
Computer Vision 15: 189–224.
Koenderink, J. J., van Doorn, A., and Wagemans, J. (2013). SFS? Not likely! i–Perception 4:
299–302.
Koffka, K. (1935). Principles of Gestalt psychology. New York: Harcourt, Brace and World.
Köhler, W. (1969). The task of Gestalt psychology. Princeton: Princeton University Press.
386 Zucker
Kuffler, S. W. (1953). Discharge patterns and functional organization of mammalian retina. Journal of
Neurophysiology 16(1): 37–68.
Kunsberg, B. and Zucker, S. W. (2013). Characterizing ambiguity in light source invariant shape from
shading. Available at: <http://arxiv.org/abs/1306.5480>.
Kunsberg, B. and Zucker, S. (2014). How shading constrains surface patches without knowledge of light
sources, SIAM Journal on Imaging Sciences 7(2): 641–688.
Lamme, V. (1995). The neurophysiology of figure ground segregation in primary visual cortex. The Jorunal
of Neuroscience 15: 1605–15.
Langer, M. and Zucker, S. W. (1997). Casting light on illumination: A computational model and
dimensional analysis of sources. Computer Vision and Image Understanding 65(2): 322–35.
Lawlor, M., Holtmann-Rice, D., Huggins, P., Ben-Shahar, O., and Zucker, S. W. (2009). Boundaries,
shading, and border ownership: A cusp at their interaction. Journal of Physiology (Paris) 103: 18–36.
Lee, T. S., Mumford, D., Romeo, R., and Lamme, V. A. F. (1998). The role of the primary visual cortex in
higher level vision. Vision Research 38: 2429–54.
Li, G. and Zucker, S. W. (2006). Contour-based binocular stereo: Inferencing coherence in stereo tangent
space. International Journal of Computer Vision 69(1): 59–75.
Li, G., and Zucker, S. W. (2010). Differential geometric inference in surface stereo. IEEE Transactions on
Pattern Analysis and Machine Intelligence 32(1): 72–86.
Marr, D. (1982). Vision. San Francisco: W.H. Freeman.
Metzger, W. (2006). Laws of seeing. Cambridge, MA: MIT Press.
Miller, K. D. (2003). Understanding layer 4 of the cortical circuit: A model based on cat V1. Cerebral Cortex
13: 73–82.
Minsky, M. and Papert, S. (1969). Perceptrons: An Introduction to Computational Geometry. Cambridge,
MA: MIT Press.
Montemurro, M. A., Rasch, M. J., Murayama, Y., Logothetis, N. K., and Panzeri, S. (2008). Phase-of-firing
coding of natural visual stimuli in primary visual cortex. Current Biology 18(5): 375–80.
Muir, D. R., Costa, N. M. A. D., Girardin, C. C., Naaman, S., Omer, D. B., Ruesch, E., Grinvald, A., and
Douglas, R. J. (2011). Embedding of cortical representations by the superficial patch system. Cerebral
Cortex 21(10): 2244–60.
Nakayama, K. and Shimojo, S. (1992). Experiencing and perceiving visual surfaces. Science
257(5075): 1357–63.
Nedergaard, M., Ransom, B., and Goldman, S. (2003). New roles for astrocytes: Redefining the functional
architecture of the brain. Trends in Neurosciences 26(10): 523–30.
Orban, G. (2008). Higher order visual processing in macaque extrastriate cortex. Physiology Reviews
88(1): 59–89.
Panzeri, N., S. Brunel, Logothetis, N., and Kayser, C. (2010). Sensory neural codes using multiplexed
temporal scales. Trends in Neurosciences 33(3): 111–20.
Parent, P. and Zucker, S. W. (1989). Trace inference, curvature consistency and curve detection. IEEE
Transactions on Pattern Analysis and Machine Intelligence 11(8): 823–39.
Pasupathy, A. and Connor, C. (2002). Population coding of shape in area V4. Nature Neuroscience
5(12): 1332–8.
Pind, J. L. (2012). Figure and ground at 100. The Psychologist 25(1): 90–1.
Poggio, G. F. and Fisher, B. (1977). Binocular interaction and depth sensitivity of striate and pre-striate
cortical neurons of the behaving rhesus monkey. Journal of Neurophysiology 40(1): 392–405.
Rao, R., Olshausen, B. and Lewicki, M. (Eds.) (2002). Probabilistic models of the brain: Perception and
neural function. Cambridge, MA: MIT Press.
Ratliff, F. (1965). Mach bands: Quantitative studies on neural networks in the retina. San
Francisco: Holden-Day.
Border Inference and Border Ownership 387
Rockland, K. and Lund, J. (1982). Widespread periodic intrinsic connections in the tree shrew visual
cortex. Science 215: 1532–4.
Rockland, K. and Virga, A. (1989). Terminal arbors of individual feedback axons projecting from area
V2 to V1 in the macaque monkey: a study using immunohistochemistry of anterogradely transported
phaseolus vulgaris-leucoagglutinin. Journal of Comparative Neurology 285: 54–72.
Roe, A. W. and Ts’o, D. Y. (1997). The functional architecture of area V2 in the macaque monkey.
In: K. Rockland, J. Kaas and A. Peters (eds.) Extrastriate cortex in primates, Vol. 12, pp. 295–333.
New York: Plenum.
Rubin, E. (1915). Synsoplevede Figurer: Studier i psykologisk Analyse. Frste Del. Gyldendalske Boghandel,
Nordisk Forlag. Visually experienced figures: Studies in psychological analysis. Part one.
Sajda, P. and Finkel, L. (1995). Intermediate-level visual representations and the construction of surface
perception. Journal of Cognitive Neuroscience 7: 267–91.
Sakai, K. and Nishimura, H. (2004). Determination of border ownership based on the surround context of
contrast. Neurocomputing 58: 843–8.
Salin, P. A. and Bullier, J. (1995). Corticocortical connections in the visual system: structure and function.
Physiological Reviews 75: 107–54.
Sarti, A., Citti, G., and Petitot, J. (2008). The symplectic structure of the primary visual cortex. Biological
Cybernetics 98(1): 33–48.
Sherrington, C. S. (1906). The integrative action of the nervous system. New York: C. Scribner and Sons.
Shmuel, A., Korman, M., Sterkin, A., Harel, M., Ullman, S., Malach, R., and Grinvald, A. (2005).
Retinotopic axis specificity and selective clustering of feedback projections from v2 to v1 in the owl
monkey. The Journal of Neuroscience 25: 2117–31.
Siddiqi, K., Bouix, S., Tannenbaum, A. R., and Zucker, S. W. (2002). Hamilton-Jacobi skeletons.
International Journal of Computer Vision 48: 215–31.
Simmons, D. R., Robertson, A. E., McKay, L. S., Toal, E., McAleer, P., and Pollick, F. E. (2009). Vision in
autism spectrum disorders. Vision Research 49: 2705–39.
Sincich, L. and Horton, J. (2002). Divided by cytochrome oxidase: a map of the projections from V1 to V2
in macaques. Science 295: 1734–7.
Sompolinsky, H. and Shapley, R. (1997). New perspectives on the mechanisms for orientation selectivity.
Current Opinion in Neurobiology 7: 514–22.
Super, H. and Lamme, V. A. (2007). Altered figure-ground perception in monkeys with an extra-striate
lesion. Neuropsychologia 45(14): 3329–34.
Super, H. and Romeo, A. (2011). Feedback enhances feedforward figure-ground segmentation by changing
firing mode. PLoS ONE 6(6): e21641.
Tetreault, N. A., Hakeem, A. Y., Jiang, S., Williams, B. A., Allman, E., Wold, B. J., and Allman, J. M.
(2012). Microglia in the cerebral cortex in autism. Journal of Autism and Developmental Disorders
42(12): 2569–84.
Ullman, S., Vidal-Naquet, M., and Sali, E. (2002). Visual features of intermediate complexity and their use
in classification. Nature Neuroscience 5: 682–7.
Wagemans, J., Elder, J. H., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M., and von der Heydt, R.
(2012). A century of Gestalt psychology in visual perception: I. Perceptual grouping and figure-ground
organization. Psychological Bulletin 138(6): 1172–217.
Wertheimer, M. (1923). Untersuchungen zur Lehre von der Gestalt (Part II). Psychologische Forschung 4: 301–50.
Zeki, S. and Shipp, S. (1988). The functional logic of cortical connections. Nature 335: 311–17.
Zhaoping, L. (2005). Border ownership from intracortical interactions in visual area V2. Neuron
47: 143–53.
388 Zucker
Zhou, H., Friedman, H., and von der Heydt, R. (2000). Coding of border ownership in monkey visual
cortex. The Journal of Neuroscience 20: 6594–611.
Zipser, K., Lamme, V. A. F., and Schiller, P. H. (1996). Contextual modulation in primary visual cortex. The
Journal of Neuroscience 16(22): 7376–89.
Zucker, S. W. (2012). Local field potentials and border ownership: a conjecture about computation in visual
cortex. Journal of Physiology (Paris) 106: 297–315.
Zucker, S. W., Dobbins, A., and Iverson, L. (1989). Two stages of curve detection suggest two styles of
visual computation. Neural Compution 1: 68–81.
Zucker, S. W. and Hummel, R. A. (1979). Toward a low-level description of dot clusters: labeling edge,
interior, and noise points. Computer Graphics and Image Processing 9: 213–33.
Section 5
Lightness
Lightness refers to the perceived white/gray/black dimension of a surface. The physical property
that corresponds to lightness is reflectance, that is, the percentage of light a surface reflects. White
surfaces reflect about 90% of the light they receive while black surfaces reflect only about 3%.
Thus, lightness refers to the perception of a concrete property of an object. (Lightness should not
be confused with brightness, which concerns perception of the raw intensity of light reflected by
the object, which is not a property of the object itself.)
striking the surface. For example, a black surface in sunlight can easily reflect more light than a
white surface in shadow. Indeed, any luminance can come from any shade of gray. This implies
that the light reflected from a surface to your eye, by itself, cannot reveal the reflectance of that
surface. In principle lightness can only be determined using the surrounding context. The exact
role of context is the focus of many theoretical disputes, but the indispensable role of perceptual
structure cannot be doubted.
The central problem of lightness is that of lightness constancy. The perceived lightness of an
object remains approximately (but not entirely) constant even when the illumination level changes.
In view of the spoiling role played by variations in illumination, von Helmholtz (1866/1924) logi-
cally suggested that lightness could be recovered by dividing the luminance of a surface by an
unconscious estimate of its incident illumination, but without a clear idea of how illumination can
be estimated, his suggestion remains little more than a promissory note.
Wallach Experiment
In 1948, Hans Wallach published an elegant experiment that soon became a classic. He presented
a disk of homogeneous luminance surrounded by a fat annulus also of homogenous luminance.
Holding the luminance of the disk constant, he showed that it could, nevertheless, be made to
appear as any shade of gray between black and white simply by varying the luminance of the
annulus. He then presented observers with two disk/annulus displays and asked them to adjust
the luminance of one disk to make it appear as the same shade of gray as the other disk. The set-
tings made by the observers showed that the disks appear as equal shades of gray not when they
have the same luminance value, but when the disk/annulus luminance ratios are equal. This find-
ing led Wallach to propose the simple idea that the lightness of an object is a direct function of the
ratio between the luminance of the object and the luminance of its adjacent region.
Explains constancy
Wallach’s paper was celebrated for several reasons. First, when the illumination level changes,
although the luminance of an object changes, the luminance ratio between the object and its
Perceptual Organization in Lightness 393
immediate background does not. Wallach noted that this is exactly what would be expected if
lightness were a function of the object/surround luminance ratio.
Supporting evidence
Wallach’s results were consistent with Weber’s law, and with a great deal of evidence from vari-
ous senses of a logarithmic relationship between physical energy and perceived magnitude. Later
findings from stabilized images and physiological work implied that the luminance ratio at each
edge is just what is encoded at the retina (Barlow and Levick, 1969; Troy and Enroth-Cugell, 1993;
Whittle and Challands, 1969).
student of the Gestaltists. However, this point was not essential to Wallach’s thinking; it merely
came from his empirical finding that the lightness of a disk does not change when the disk and
annulus are separated in depth, but for the contrast theorists who attributed lightness to lateral
inhibition, any finding that lightness depends on perceived depth would represent a fundamental
challenge.
Von Helmholtz’s claim that lightness depends on taking the illumination into account implies
a close depth/lightness linkage, but empirical support was scarce. Mach (1922/1959, p. 209) had
observed that if a white card is folded in half, placed on a table like a tent or roof, and illuminated
primarily from one side, both sides of the roof appear white, although one side appears shadowed.
However, when the card can be perceptually reversed so that it appears concave, as an open book,
then ‘the light and the shade stand out as if painted thereon.’ The lightness of the shadowed side
changes even though the retinal image (and with it any inhibitory effect) has remained constant.
However, attempts to capture Mach’s depth effect in the laboratory showed little or no success
(Beck, 1965; Epstein, 1961; Flock and Freedberg, 1970; Hochberg and Beck, 1954). Experiments
by Gilchrist (1977, 1980), using a greater luminance range, and a richer context that allowed the
target to form a different luminance ratio in each of two perceived spatial positions, showed that a
change in depth could cause the lightness of a target surface to change almost from one end of the
black/white scale to the other, with no essential change in the retinal image.
Once again, however, we see that these findings were anticipated by the Gestaltists, who clearly
sketched an intimate relationship between depth and lightness. Koffka (1935, p. 246) had empha-
sized the importance of coplanarity. After noting that lightness is a product of luminance ratios
between image patches that belong together, he wrote, ‘Which field parts belong together, and
how strong the degree of this belonging together is, depends upon factors of space organization.
Clearly, two parts at the same apparent distance will, ceteris paribus, belong more closely together
than field parts organized in different planes.’ Gelb (1932), Wolff (1933), and Kardos (1934) had all
demonstrated an effect of depth on lightness. Radonjić et al. (2010) replicated one of the Kardos
experiments and found that a change in perceived depth changed the perceived lightness of a tar-
get disk by 4.4 Munsell steps, with no change in the retinal image.
The idea that lightness crucially depends on the perceived 3D structure of the visual field is by
now firmly established. Empirical findings supporting a strong dependence of lightness on per-
ceived depth have been reported by Adelson (1993, 2000), Knill and Kersten (1991), Logvinenko
and Menshikova (1994), Pessoa et al (1996), Schirillo et al (1990), Spehar et al (1995), Taya et al
(1995), and others.
problem. The spatial version of the problem was, with a few exceptions, ignored, as can easily
be seen in the theories. All three of Hering’s physiological factors invoked to account for con-
stancy ignore the problem of spatial illumination edges. Pupil size may be relevant to an over-
all shift in illumination level, but is hardly helpful when viewing a complex scene with multiple
regions of light and shadow. The same can be said for adaptation of the photoreceptors. As for
‘reciprocal interaction in the somatic visual field,’ later called lateral inhibition, when two identi-
cally gray papers lie under different illuminations, they produce different neural excitations at
the retina. Hering argued that the neural exaggeration of the difference at each the edge between
each gray paper and its background (a reflectance boundary) can mitigate that difference (Hering,
1874/1964, p. 141). However, he failed to recognize that if the difference in excitation on the two
sides of an illumination boundary (cast across a surface of homogeneous reflectance) is exagger-
ated, the problem of bringing neural excitation levels into line with perceived lightness levels is
made worse, not better. Hering was not stupid. We must conclude that he simply did not consider
the implications for lightness constancy of applying lateral inhibition to an illumination boundary.
Von Helmholtz (1866/1924), Hering (1874/1964), and Katz (1935, p 279) all suggested that per-
ceived illumination level was determined by the average luminance in the scene. This suggestion
makes sense only if you are thinking about a change of illumination (over the whole scene) from
time 1 to time 2. It makes no sense when a scene is divided into two adjacent regions of high and
low illumination. It is ironic that Katz also fell into this trap, given that the method of asymmetri-
cal matching he used so extensively in his early studies of lightness constancy featured exactly this
spatial version of the constancy problem: side-by-side regions of illumination and shadow.
In this sense, Wallach took a very traditional approach. This neglect of illumination edges is
very natural. In one study, Kardos (1934) asked his subjects to describe the entire laboratory
scene. They faithfully described the room and all its contents, but did not spontaneously mention
any of the shadows. When he asked them whether they see any shadows they replied that yes, of
course, they see the shadows, but they had not thought to mention them. This makes some sense.
While reflectance is an intrinsic property of a surface or object, the level of illumination on it is
not. Likewise, in spatial perception, the size of an object is an essential property, but its distance
from the observer is not. The visual system is tuned primarily to the intrinsic properties of objects,
much less to an accidental, temporary property like illumination level (see also Anderson, this
volume). The shading on a sculpture is instantly absorbed in the creation of a 3D percept such
that the luminance gradients across the object are scarcely noticed. It is natural that our percep-
tual system homes in on the essential features of the environment, not on the fleeting and fickle
variations in illumination. Ironically, however, this truth-seeking aspect of visual functioning may
have blinded both Wallach and the classic theorists to the important problem posed by spatial
illumination edges.
The preoccupation among students of lightness constancy by the temporal version of the prob-
lem for so long allowed relatively simplistic solutions to obscure the thornier aspects of the prob-
lem. As Arend (1994, p. 160) has clearly noted, ‘Lightness constancy over multiple-illuminants in
a single scene places much greater demands on candidate constancy models than does constancy
in single-illuminant scenes.’
To summarize, Wallach’s ratio principle works fine when applied to reflectance edges, but fails
when applied to illuminance edges. Here, we see one of several reasons why his ratio principle
cannot be reduced to lateral inhibition – that neural mechanism is blind to the kind of edge. The
visual system as a whole, however, cannot be blind to this distinction. If it were, lightness con-
stancy would fail catastrophically. The problem of edge classification, then, cannot be ignored.
396 Gilchrist
Koffka clearly recognized that luminance ratios at edges (which he called gradients) were criti-
cal to lightness, as can be seen in the first of two propositions he offered (Koffka, 1935, p. 248): ‘(a)
the qualities of perceived objects depend upon gradients of stimulation . . .’ But his appreciation
of the edge classification problem can be seen in his second proposition: ‘(b) not all gradients
are equally effective as regards the appearance of a particular field part . . .’ On the same page he
presents the problem of edge classification in concrete terms: ‘. . . given two adjoining retinal areas
of different stimulation, under what conditions will the corresponding parts of the behavioral
(perceptual) field appear of different whiteness but equal [perceived illumination], when of dif-
ferent [perceived illumination] but equal whiteness? A complete answer to this question would
probably supply the key to the complete theory of color perception in the broadest sense.’ (As
before I have substituted the modern term ‘perceived illumination’ for Koffka’s equivalent term
‘brightness.’) Although J. J. Gibson never worked substantially in lightness, Koffka’s influence on
him (presumably due to their decade of overlap at Smith College) can be seen in Gibson’s (1966,
p. 215) question, ‘Why is a change in color not regularly confused with a change in illumination?’
If the discrimination of reflectance and illumination edges is so fundamental to lightness percep-
tion, how is it done? Although a complete answer has not yet been achieved, we can cite many reveal-
ing empirical findings. The first factor often mentioned is edge sharpness. Illumination boundaries
typically contain a penumbra, while reflectance boundaries are more typically sharp, stepwise changes.
In his famous spot-shadow experiment, Hering (1874/1964, p. 8) created a cast shadow by sus-
pending an object in front of a piece of white paper. The shadow was perceived as such, presum-
ably due to its penumbra. However, when Hering painted a thick black line along the penumbra,
the shadow was perceived as a dark gray stain or a painted region. His thick black line obscured
the penumbra. The same phenomenon can be demonstrated without the black line, using a slide
projector. If a glass slide containing a small opaque disk glued to its center is placed in a slide pro-
jector and projected onto a large white wall, the disk will appear as a shadow when the projector
is somewhat out of focus, but it will appear as a darker surface color when the projector is brought
into focus. In the checker-block image by Adelson (2000), shown in Figure 19.1, however, the
edges within the two circles are equally sharp. Yet one is perceived as a reflectance edge, while the
other is perceived as an illuminance edge.
If luminance edges contain crucial information about lightness and illumination, intersec-
tions where edges cross one another are especially informative. In terms of the relative luminance
values in the four quadrants of an intersection, we find two basic patterns: ratio-invariant and
difference-invariant (Gilchrist et al, 1983). When an illumination boundary crosses a reflectance
boundary, a common pattern, the result is ratio-invariance. Although the change in illumination
changes absolute values, it does not change the luminance ratio along the reflectance edge. The
same is true along the illumination boundary; the luminance ratio is constant regardless of the
reflectance on which it is projected.
However, when two illumination edges cross each other, as when there are two or more light
sources, the intersections show difference-invariance, not ratio invariance. Difference-invariance
is also found when the boundary of a veiling luminance intersects a more distant edge, regardless
of its type.
Fig. 19.1 These two edges are locally identical, although one is perceived as a reflectance change
and the other as an illumination change.
Reproduced from Pentti I. Laurinen, Lynn A. Olzak, and Tarja L. Peromaa, Psychological Science, 8(5), pp. 386–
390, doi:10.1111/j.1467-9280.1997.tb00430.x, Copyright © 1997 by SAGE Publications. Reprinted by Permission
of SAGE Publications.
constant. Yarbus (1967) used a display similar to the simultaneous contrast pattern. Two red target
disks were placed on adjacent black and white backgrounds. As expected, the two disks appeared
slightly different in lightness. He then made the boundaries of the black and white backgrounds
disappear by retinally stabilizing them, causing the targets to appear to lie on a single homogen-
ous field. This made the targets appear far more different in lightness, even though the luminance
ratio at the disk border did not change. The implication is that the lightness of the disk depends
not only on the luminance ratio between the disk and its immediate background, but also upon
the luminance ratio at the edge of the background.
In the famous Gelb (1929) effect, a black paper appears white when it is suspended in midair and
illuminated by a spotlight. However, it appears black as soon as a (real) white background is placed
immediately behind the black paper within the spotlight. These phenomena seem ideally consist-
ent with Wallach’s ratio principle. However, in 1995 Cataliotti and Gilchrist published experiments
on the Gelb effect in which they broke the perceptual change into a series of steps. They started
with a black square in a spotlight. It appeared white. Then, they added a dark gray square next to
it, also in the spotlight. The new square (having a higher luminance) appeared completely white,
but caused the original square to darken to light gray. Then a middle gray square was added, and
so on, until the display contained a row of 5 squares, all standing in the spotlight. Each time a new
(and brighter) square was added it appeared white and caused the other squares to appear darker.
The goal was to test whether the darkening effect caused by the addition of a brighter mem-
ber was a contrast effect based on lateral inhibition, or (as they suspected) an anchoring effect.
Their test relied on the well-known fact that lateral inhibitory effects drop off precipitously with
distance across the retina. The question was thus, when each brighter square is added, does it
darken the adjacent square more than it darkens the others? In other words, as the novel brighter
square moves farther away from the original square does its darkening effect on the original
square weaken? The answer turned out to be ‘no.’ The darkening effect depended only on the
degree to which each novel square raised the highest luminance in the row, not on its location.
398 Gilchrist
This implies that the darkening effect they found, in what has come to be called the staircase Gelb
effect, is an anchoring phenomenon.
These results also demonstrate that luminance ratios between non-adjacent surfaces can deter-
mine lightness just as much as those between adjacent surfaces. This is intuitively reasonable.
Land and McCann (1971), and Arend (1973) suggested that, if the retina encodes luminance
ratios at edges, ratios between remote surfaces can be computed by mathematically integrating the
series of edge ratios that lie along any path between the remote surfaces. Such an edge-integration
would be consistent with the results reported by Yarbus (1967), Arend et al. (1971), Gilchrist et al.
(1983), and Cataliotti and Gilchrist (1995).
Once again, an analysis by Koffka (1935, p. 248) shows his understanding of the role of remote
luminance ratios, and an experiment by Koffka and Harrower (1931) demonstrated it empirically.
In light of subsequent physiological work, it seems likely that such an integration is achieved
through spatial filtering – that is, through the integration of information from center-surround
receptive fields of varying location and scale (Blakeslee and McCourt, 1999).
Gestalt Theory
The concept of perceptual organization is intimately associated with the Gestalt theorists (see
Wagemans, this volume). They were the first to recognize the fundamental importance of this
problem. Different theories had sought to explain the perceived size of an object, but Wertheimer
(1923) realized that the very perception of an object at all is a perceptual achievement.
Long before the emergence of Gestalt theory, it had become obvious that perception could
not be explained by sensations associated with local stimulation. Hering (1874/1964, p. 23) had
written, ‘Seeing is not a matter of looking at light-waves as such, but of looking at external things
mediated by these waves; the eye has to instruct us, not about the intensity or quality of the light
coming from external objects at any one time, but about these objects themselves.’ However, that
shortcoming was conventionally addressed by assuming a cognitive modification of those sensa-
tions, typically based on prior experience. The Gestaltists forcefully rejected this duality of raw
sensations and cognitive modification, arguing that perception is the product of a unitary pro-
cess. Gelb (1929, excerpted in Ellis, 1938, p. 207) wrote: ‘Our visual world is not constructed by
‘accessory’ higher (central, psychological) processes from a stimulus-conditioned raw material of
‘primary sensations’ and sensation-complexes . . . ‘ Köhler (1947, p. 103) wrote, ‘Our view will be
that, instead of reacting to local stimuli by local and mutually independent events, the organism
responds to the pattern of stimuli to which it is exposed; and that this answer is a unitary pro-
cess, a functional whole which gives, in experience, a sensory scene rather than a mosaic of local
sensations.’
These Gestalt ideas did not fail on their own merits. Nor were they superseded by superior
ideas. Rather, they were eclipsed by external factors, specifically the tragic events surround-
ing World War II. The Gestaltists were forced to flee. The center of the scientific world shifted
to the United States, and its behaviorist hegemony. Gestalt thinking was seen as embarrass-
ingly metaphysical, especially when compared with the promises of the new, non-mentalistic
reductionism. However, for the question of lightness perception, the decades that followed
could be called the dark ages because the experiments were done in dark rooms and very little
progress was made. It was in this context that Wallach presented his ratio theory, but while
ratio theory may have been celebrated by the reductionists, it failed to reflect the rich insights
that had been offered by the Gestaltists.
Perceptual Organization in Lightness 399
Illumination came only with the cognitive revolution of the late 1960s, which legalized discus-
sion of internal processes. Influenced by David Marr (1982), artificial intelligence, and machine
vision, lightness theorists began to think in terms of inverse optics. Perhaps the decomposition of
the retinal image by the visual system is the mirror inverse of the manner in which the image is
initially composed by the multiplication of reflectance and illumination.
Various image decomposition models were proposed. Bergström (1977) suggested that the pat-
tern of reflected light is analyzed into common and relative components, analogous to Johansson’s
ingenious vector analysis of motion (see Giese, this volume; Herzog and Ögmen, this volume).
Thus, luminance variations in the image are attributed to changes in reflectance, illumination, and
planarity. Adelson and Pentland (1996) offered a similar approach couched in a vivid metaphor,
whereby painters, lighting designers, and metal benders cooperate to produce any given image in
the most economical way. Ekroll et al (2004) have provided additional evidence for an analysis
into common and relative components in the chromatic domain.
Barrow and Tenenbaum (1978) suggested that the retinal image can be treated as a multiple
image composed of separate layers, which they called intrinsic images. Gilchrist proposed an
intrinsic image approach in which luminance ratios at edges are encoded, classified as due to
reflectance or illuminance, and integrated within each class to produce separate reflectance and
illuminance maps (Gilchrist, 1979; Gilchrist et al., 1983). Arend (1994) and Blake (1985) offered
similar approaches.
has commented, the Helmholtzian approach is overkill (see also Koenderink, this volume, chapter
on Gestalts as ecological templates). Whereas the decomposition models are concerned primarily
with constancy, mid-level models give substantial attention to lightness illusions and failures of
constancy. In the same spirit, Singh and Anderson (2002) offered a mid-level account of perceived
transparency that has proven to account for the empirical data better than Metelli’s (1974) classic
inverse-optics approach.
It is debatable whether the decomposition models should be considered high-level or mid-level.
Although they are often treated as high-level, the decomposition models do not require a cogni-
tive component. There are no raw sensations and there is no appeal to past experience. On the
other hand, the decomposition models posit a very complete representation of the world.
Grouping by illumination
In fact, Koffka (1935, p. 246) hinted at just such a grouping by illumination. Using the term ‘appur-
tenance’ as a synonym for belongingness, Koffka wrote, ‘a field part x is determined in its appear-
ance by its “appurtenance” to other field parts. The more x belongs to the field part y, the more
will its whiteness be determined by the gradient xy, and the less it belongs to the part z, the less
will its whiteness depend on the gradient xz.’ When Koffka suggests that the whiteness (lightness)
of a surface depends on the luminance ratio between that surface and other surfaces to which it
belongs, he is talking about surfaces that lie in the same field of illumination.
Grouping by planarity
Gilchrist’s findings on coplanar ratios can be thought of as grouping by planarity. In a chapter
called ‘In defense of unconscious inference’ Irvin Rock (1977) sought to offer a Helmholtzian
402 Gilchrist
Fig. 19.2 Grouping by illumination (A & B; C & D) and grouping by reflectance (A & C; B & D).
account of those findings, writing, ‘When regions of differing luminance are phenomenally local-
ized in one plane, the perceptual system operates on the assumption that they are receiving equal
illumination’ (Rock 1977, p. 359).
This, too, was anticipated by Koffka (1935, p. 246) who wrote, ‘Which field parts belong together,
and how strong the degree of this belonging together is, depends upon factors of space organiza-
tion. Clearly, two parts at the same apparent distance will, ceteris paribus, belong more closely
together than field parts organized in different planes.’
In the Gilchrist (1980) experiments, depth perception allowed the visual system to organize
retinal patches into perceived planes. The surfaces within each plane, as is often the case, shared a
common illumination level. However, for purposes of lightness computation, which is more fun-
damental, grouping by planarity or grouping by illumination? Radonjić and Gilchrist (2013) have
recently teased these factors apart. They replicated Gilchrist’s (1980) earlier experiments involving
dihedral planes, but with one change. One of the two planes was further divided into two fields
of illumination by an illumination boundary. In this case, the lightness of the critical target was
determined, not by the highest luminance in that plane, but by the highest luminance within the
same region of illumination (which comprised only part of that plane).
Grouping by illumination makes sense. Von Helmholtz had glibly suggested that, to com-
pute lightness, the visual system must take the illumination level into account, but specifying
how this might be done is another matter. Von Helmholtz never did. Boyaci et al. (2003) and
Ripamonti et al. (2004) have proposed that the visual system takes into account the direction
and intensity of the light source, using cues like cast shadows, attached shadows, and glossy
highlights (Boyaci et al., 2006). Such a hypothesized process, however, would be computa-
tionally very expensive and perhaps impossible in the real world. There is virtually never only
Perceptual Organization in Lightness 403
a single light source. Consider your immediate environment as you read this. How many light
sources are there? Remember that you must include any windows, and remember that every
surface reflects light onto other surfaces.
Grouping by proximity
Studies of the so-called brightness induction effect of a brighter ‘inducing field’ on a darker ‘test
field’ were reported by Cole and Diamond (1971), Dunn and Leibowitz (1961), Fry and Alpern
(1953), and Leibowitz et al. (1953). All found that, with luminances held constant, the perceived
brightness (and presumably lightness) of the darker test field decreases as the separation between
the two is reduced. Although they attributed this result to spatial function of lateral inhibition,
it perfectly satisfies Koffka’s claim that ‘The more x belongs to the field part y, the more will its
whiteness be determined by the gradient xy . . .’ McCann and Savoy (1991) and Newson (1958)
found the same results testing lightness explicitly, but without attribution to lateral inhibition.
Gogel and Mershon (1969) showed that changes in depth proximity (rather than lateral prox-
imity) produce the same effect on lightness. Their result cannot be attributed to lateral inhibition.
It is important to note that these test and inducing fields were either floating in mid-air, or pre-
sented against a totally dark background. When the fields are connected by a continuous series of
coplanar patches (as in Cataliotti and Gilchrist, 1995), little or no such proximity effect is found,
presumably because they are already strongly organized as a group of patches.
Grouping by similarity
Laurinen et al. (1997) superimposed shallow luminance modulations on each of the four parts
of the simultaneous contrast display, as shown in Figure 19.3. They found that the contrast effect
is substantially weakened if the modulation frequency on each target is different from that of its
background. Bonato et al. (2003) also found this result by varying the type of texture, rather than
the scale. Conversely, the contrast effect can be strengthened by giving one target and its back-
ground one frequency (or texture), while giving the other target and its background a different
frequency. Color can also be used to modulate similarity among regions of the contrast display
without altering relative luminance. Olkkonen et al. (2002) found that when both targets share a
common color and the two backgrounds share a different color, the illusion is reduced. In group-
ing terms, increasing the belongingness of each target and its immediate surround by giving them
404 Gilchrist
a common color, while simultaneously decreasing the belongingness between the two surrounds
by giving them different colors, tends to produce local lightness computations within each sur-
round, thus enhancing the perceived difference between targets. However, increasing the belong-
ingness between the two surrounds, as Olkkonen et al did, promotes a more global computation
within the whole pattern, and this reduces the contrast effect.
Fig. 19.3 (Left side) Depending on which regions are grouped by spatial frequency similarity, the
contrast effect can be weakened (top two examples) or strengthened (bottom example). (Upper
right) Benary effect. (Lower right) White’s illusion.
Reproduced from Pentti I. Laurinen, Lynn A. Olzak, and Tarja L. Peromaa, Psychological Science, 8(5),
pp. 386–390, doi:10.1111/j.1467-9280.1997.tb00430.x, Copyright © 1997 by SAGE Publications. Reprinted by
Permission of SAGE Publications.
Perceptual Organization in Lightness 405
slightly darker, presumably because it appears to belong to the white background. The lower tri-
angle appears lighter because it appears to belong to the black cross.
In 1979, Michael White introduced an illusion that now bears his name. While the Benary effect
is weaker than the standard simultaneous contrast effect, White’s illusion is much stronger (see
Figure 19.3). Moreover, the effect is counter to that suggested by adjacency, given that the gray
bars that appear lighter actually share more boundary length with white than with black. This
asymmetry is pushed even farther in the Todorović illusion (Todorović, 1997).
Assimilation predictions
Fig. 19.5 The inequality signs show on which side the shorter target bars are predicted to appear
lighter, according to assimilation. Perceived lightness contradicts these predictions.
Adapted from B.L. Anderson, A theory of illusory lightness and transparency in monocular and binocular images:
the role of contour junctions, Perception, 26(4), pp. 419–53, doi:10.1068/p260419, Copyright © 1997, Pion.
With kind permission from Pion Ltd, London www.pion.co.uk and www.envplan.com.
is parsed into frameworks of illumination that are typically adjacent, like countries on a map.
Empirical support for both frameworks and layers exists. Although the relative merits of frame-
works and layers are debated (see Anderson and Winawer, 2008), these contending approaches
may ultimately turn out to be aspects of a single Gestalt account. But the outlines of such an inte-
gration are not obvious at present because the components into which the image is parsed, lay-
ers versus frameworks, seem mutually exclusive. Nevertheless, Bressan (2006a) has proposed the
concept of the overlay framework, in which a layer is also a framework. But this use of the term
framework departs substantially from that of Koffka or Kardos.
Conclusions
There is as yet no consensus on how surface lightness is computed by the brain. The fundamental
problem is that any luminance can come from any reflectance. Thus, the problem can be solved
Perceptual Organization in Lightness 409
only by using the surrounding context. Simply using the luminance ratio between a target surface
and its background is woefully inadequate. The lightness of a surface has been shown to depend
on many aspects of the perceptual structure of the image, including perceived 3D arrangement,
classification of edges, and long-distance luminance relationships. These problems of perceptual
organization have been confronted mainly by either parsing the image into overlapping layers
representing illumination and reflectance or into frameworks within which lightness is computed
by comparing luminances. It is hoped that further research will lead to models that incorporate
the strengths of both approaches.
References
Adelson, E. H. (1993). Perceptual organization and the judgment of brightness. Science 262, 2042–2044.
Adelson, E. H. (2000). Lightness perception and lightness illusions. In The New Cognitive Neuroscience, 2nd
edn, edited by M. Gazzaniga, pp. 339–351. Cambridge, MA: MIT Press.
Adelson, E. H., and Pentland, A. P. (1996). The perception of shading and reflectance. In Perception as
Bayesian Inference, edited by D. Knill and W. Richards, pp. 409–423. New York: Cambridge University
Press.
Agostini, T., and Galmonte, A. (2000). Contrast and assimilation: the belongingness paradox. Rev Psychol
7(1-2): 3–7.
Agostini, T., and Galmonte, A. (2002). Perceptual organization overcomes the effect of local surround in
determining simultaneous lightness contrast. Psychol Sci 13(1): 89–93.
Agostini, T., and Proffitt, D. R. (1993). Perceptual organization evokes simultaneous lightness contrast.
Perception 22(3): 263–272.
Anderson, B. (1997). A theory of illusory lightness and transparency in monocular and binocular
images: the role of contour junctions. Perception 26: 419–453.
Anderson, B., and Winawer, J. (2008). Layered image representations and the computation of surface
lightness. J Vision 8(7): 1–22.
Arend, L. (1994). Surface colors, illumination, and surface geometry: intrinsic-image models of human
color perception. In Lightness, Brightness, and Transparency, edited by A. Gilchrist, pp. 159–213.
Hillsdale: Erlbaum.
Arend, L. E. (1973). Spatial differential and integral operations in human vision: implications of stabilized
retinal image fading. Psychol Rev 80, 374–395.
Arend, L. E., Buehler, J. N., and Lockhead, G. R. (1971). Difference information in brightness perception.
Percept Psychophys 9: 367–370.
Barlow, H. B., and Levick, W. R. (1969). Three factors limiting the reliable detection of light by retinal
ganglion cells of the cat. J Physiol 200: 1–24.
Barrow, H. G., and Tenenbaum, J. (1978). Recovering intrinsic scene characteristics from images. In
Computer Vision Systems A. R. Hanson and E. M. Riseman, pp. 3–26. Orlando: Academic Press.
Beck, J. (1965). Apparent spatial position and the perception of lightness. J Exp Psychol 69:P 170–179.
Beck, J. (1966). Contrast and assimilation in lightness judgements. Percept Psychophy 1: 342–344.
Benary, W. (1924). Beobachtungen zu einem Experiment über Helligkeitskontrast (Observations
concerning an experiment on brightness contrast). Psychol Forsch 5: 131–142.
Bergström, S. S. (1977). Common and relative components of reflected light as information about the
illumination, colour, and three-dimensional form of objects. Scand J Psychol 18: 180–186.
Bindman, D., and Chubb, C. (2004). Brightness assimilation in bullseye displays. Vision Res 44(3): 309–319.
Blake, A. (1985). Boundary conditions for lightness computation in Mondrian world. Comp Vision Graphics
Image 32: 314–327.
Blakeslee, B., and McCourt, M. E. (1999). A multiscale spatial filtering account of the White effect,
simultaneous brightness contrast and grating induction. Vision Res 39: 4361–4377.
410 Gilchrist
Bonato, F., Cataliotti, J., Manente, M., and Delnero, K. (2003). T-junctions, apparent depth, and perceived
lightness contrast. Percept Psychophys 65(1): 20–30.
Boyaci, H., Doerschner, K., and Maloney, L. (2006). Cues to an equivalent lighting model. J Vision
6: 106–118.
Boyaci, H., Maloney, L., and Hersh, S. (2003). The effect of perceived surface orientation on perceived
surface albedo in binocularly viewed scenes. J Vision 3: 541–553.
Bressan, P. (2001). Explaining lightness illusions. Perception 30: 1031–1046.
Bressan, P. (2006a). Inhomogeneous surrounds, conflicting frameworks, and the double-anchoring theory
of lightness. Psychonom Bull Rev 13: 22–32.
Bressan, P. (2006b). The place of white in a world of grays: a double-anchoring theory of lightness
perception. Psychol Rev 113(3): 526–553.
Bressan, P. (2007). Dungeons, gratings, and black rooms: a defense of the double-anchoring theory of
lightness and a reply to Howe et al. Psychol Rev 114: 1111–1114.
Cataliotti, J., and Gilchrist, A. L. (1995). Local and global processes in lightness perception. Percept
Psychophys 57(2), 125–135.
Cole, R. E., and Diamond, A. L. (1971). Amount of surround and test inducing separation in simultaneous
brightness contrast. Percept Psychophys 9: 125–128.
Cornsweet, T. N. (1970). Visual Perception. New York: Academic Press.
Duncker, D. K. (1929). Uber induzierte Bewegung (Ein Beitrag zur Theorie optisch wahrgenommener
Bewegung). Psychol Forsch 12: 180–259.
Dunn, B., and Leibowitz, H. (1961). The effect of separation between test and inducing fields on brightness
constancy. J Exp Psychol 61(6): 505–507.
Economou, E., Zdravkovic, S., and Gilchrist, A. (2007). Anchoring versus spatial filtering accounts of
simultaneous lightness contrast. J Vision 7(12), 1–15.
Ekroll, V., Faul, F., and Niederee, R. (2004). The peculiar nature of simultaneous colour contrast in uniform
surrounds. Vision Res 44: 1756–1786.
Ellis, W. D. (Ed.). (1938). A Source Book of Gestalt Psychology. New York: Humanities Press.
Epstein, W. (1961). Phenomenal orientation and perceived achromatic color. J Psychol 52: 51–53.
Festinger, L., Coren, S., and Rivers, G. (1970). The effect of attention on brightness contrast and
assimilation. Am J Psychol 83: 189–207.
Flock, H. R., and Freedberg, E. (1970). Perceived angle of incidence and achromatic surface color. Percept
Psychophys 8: 251–256.
Fry, G. A., and Alpern, M. (1953). The effect of a peripheral glare source upon the apparent brightness of an
object. J Opt Soc Am 43: 189–195.
Gelb, A. (1929). Die ‘Farbenkonstanz’ der Sehdinge (The color of seen things). In Handbuch der normalen
und pathologischen Physiologie, Vol. 12, edited by W. A. von Bethe, pp. 594–678. Berlin: Julius Springer.
Gelb, A. (1932). Die Erscheinungen des simultanen Kontrastes und der Eindruck der Feldbeleuchtung.
Zeitschr Psychol 127: 42–59.
Gibson, J. J. (1966). The Senses Considered as Perceptual Systems. Boston: Houghton Mifflin.
Gilchrist, A. (1979). The perception of surface blacks and whites. Scient Am 240: 112–123.
Gilchrist, A. (1988). Lightness contrast and failures of constancy: a common explanation. Percept
Psychophys 43(5): 415–424.
Gilchrist, A. (1994). Absolute versus relative theories of lightness perception. In Lightness, Brightness, and
Transparency, edited by A. Gilchrist, pp. 1–33. Hillsdale: Erlbaum.
Gilchrist, A. (2006). Seeing Black and White. New York: Oxford University Press.
Gilchrist, A., Delman, S., and Jacobsen, A. (1983). The classification and integration of edges as critical to
the perception of reflectance and illumination. Percept Psychophys 33(5): 425–436.
Perceptual Organization in Lightness 411
Gilchrist, A., and Jacobsen, A. (1984). Perception of lightness and illumination in a world of one
reflectance. Perception 13, 5–19.
Gilchrist, A., Kossyfidis, C., Bonato, F., Agostini, T., Cataliotti, J., Li, X., et al. (1999). An anchoring theory
of lightness perception. Psychol Rev 106(4): 795–834.
Gilchrist, A. L. (1977). Perceived lightness depends on perceived spatial arrangement. Science 195: 185–187.
Gilchrist, A. L. (1980). When does perceived lightness depend on perceived spatial arrangement? Percept
Psychophys 28(6): 527–538.
Gogel, W. C., and Mershon, D. H. (1969). Depth adjacency in simultaneous contrast. Percept Psychophys
5(1): 13–17.
Hartline, H., Wagner, H., and Ratliff, F. (1956). Inhibition in the Eye of Limulus. J Genet Physiol
39: 357–673.
Helmholtz, H., von (1866/1924). Helmholtz’s Treatise on Physiological Optics. New York: Optical Society of
America.
Helson, H. (1964). Adaptation-Level Theory. New York: Harper & Row.
Hering, E. (1874/1964). Outlines of a Theory of the Light Sense, translated by L. M. H. D. Jameson.
Cambridge, MA: Harvard University Press.
Hochberg, J. E., and Beck, J. (1954). Apparent spatial arrangement and perceived brightness. J Exp Psychol
47: 263–266.
Jameson, D., and Hurvich, L. M. (1964). Theory of brightness and color contrast in human vision. Vision
Res 4: 135–154.
Jameson, D., and Hurvich, L. M. (1989). Essay concerning color constancy. Ann Rev Psychol 40: 1–22.
Johansson, G. (1950). Configurations in Event Perception. Uppsala: Almqvist & Wiksell.
Kardos, L. (1934). Ding und Schatten [Object and Shadow]. Zeitschr Psychol Erg bd 23.
Katz, D. (1935). The World of Colour. London: Kegan Paul, Trench, Trubner & Co.
Knill, D., and Kersten, D. (1991). Apparent surface curvature affects lightness perception. Nature
351(May): 228–230.
Koffka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt, Brace, and World.
Koffka, K., and Harrower, M. R. (1931). Colour and Organization II. Psychol Forsch 15: 193–275.
Köhler, W. (1947). Gestalt Psychology. New York: Liveright.
Kozaki, A., and Noguchi, K. (1976). The relationship between perceived surface-lightness and perceived
illumination. Psychol Res 39: 1–16.
Land, E. H., and McCann, J. J. (1971). Lightness and retinex theory. J Opt Soc Am A 61: 1–11.
Laurinen, P. I., Olzak, L. A., and Peromaa, T. (1997). Early cortical influences in object segregation and the
perception of surface lightness. Psychol Sci 8(5): 386–390.
Leibowitz, H., Mote, F. A., and Thurlow, W. R. (1953). Simultaneous contrast as a function of separation
between test and inducing fields. J Exp Psychol 46: 453–456.
Logvinenko, A., and Menshikova, G. (1994). Trade-off between achromatic colour and perceived
illumination as revealed by the use of pseudoscopic inversion of apparent depth. Perception
23(9): 1007–1024.
Mach, E. (1865). Über die Wirkung der räumlichen Vertheilung des Lichtreizes auf die Netzhaut.
Sitzungsberichte der mathematisch-naturwissenschaftlichen Classe der kaiserlichenAkademic der
Wissenschaften 52(2): 303–322.
Mach, E. (1922/1959). The Analysis of Sensations (Vol. English Translation of Die Analyse der
Empfindungen, 1922). New York: Dover.
Marr, D. (1982). Vision. San Francisco: Freeman.
McCann, J. J., and Savoy, R. L. (1991). Measurements of lightness: dependence on the position of a white in
the field of view. Proc SPIE 1453: 402–411.
412 Gilchrist
Achromatic transparency
Walter Gerbino
Transparency experienced in sensory perception provides a basis for the transparency metaphor, frequently
1
encountered in fields as diverse as philosophy of mind (Hatfield 2011), linguistics (Libben 1998), and politics.
Chuang et al. (2009) discuss the dominance of achromatic constraints in visualization.
2
414 Gerbino
(a) (b)
a b
p q
(c)
(d)
Fig. 20.1 Apparent transparency. The abpq pattern in panel a is usually perceived as a dark bar on
top of a white cross (though an alternative perceptual solution is possible) and not as the mosaic
of irregular shapes shown in panel b. The pattern in panel c is a control for the effect of figural
organization on perceived color: the adjacencies are kept constant, while good continuation of
contours at junctions is eliminated. According to Metzger, transparency is not perceived in panel d
because both black and white regions have a good shape and the addition of the grey region would
not generate figures with a better shape.
Adapted from Metzger, Wolfgang. translated by Lothar Spillman., Laws of Seeing, figure 131, modified, © 2006
Massachusetts Institute of Technology, by permission of The MIT Press.
Achromatic transparency plays a special role in perceptual organization for the following
reasons:
• it provides an ideal case for the application of the tendency to Prägnanz, which may be taken as
the distinctive trait of the Gestalt theory of perception;
• under optimal conditions it appears as an organized outcome strongly constrained by geometric
and photometric information, and highly functional, being formally equivalent to the solution
of a pervasive inverse-optics problem;
• under suboptimal conditions it reveals the links between color and form (a leitmotif of Gestalt
psychology; Koffka 1935, pp. 260–264; see Section “Transparency and motion”).
Consider how Metzger (1936/2006) set up the problem in Chapter 8 of Gesetze des Sehens,
discussing a demonstration from Fuchs (1923). Figure 20.1a is normally perceived as a dark trans-
parent bar on top of a white cross, not as the mosaic in Figure 20.1b.3 The bar and the cross inter-
sect in such a way that each ‘claims as its own’ the superposition region, requiring the scission of
The pattern in Figure 20.1a supports two transparency solutions. See Figure 20.7 for an analysis of bivalent
3
4-region patterns.
Achromatic Transparency 415
its grey substance into two components that perceptual organization makes as similar as possi-
ble to bar and cross lightnesses. The double-belongingness of the superposition region depends,
locally, on the good continuation of contours meeting at X-junctions and, more globally, on the
improvement of form regularity. Metzger (1936/2006) referred to his Fig. 27 to claim that the
strength of such factors is well established by classical demonstrations with intertwined outline
patterns (Köhler 1929; Wertheimer 1923/2012).4
Figure 20.1c (not in Metzger 1936/2006; drawn following Kanizsa 1955) is a control. All adja-
cencies in Figure 20.1a are maintained, but contours of neither the bar nor the cross keep a con-
stant trajectory at X-junctions. The dark bar survives as a unit, being supported by the topological
condition (see Section “Topological and figural conditions”); but the sense of transparency is
weakened, and the color appearance of the superposition region is different from the one in
Figure 20.1a.
Figure 20.1d displays a counterexample in which the same greys of Figure 20.1a are combined
in a pattern that is perceived as a mosaic of three adjacent squares, though compatible—in prin-
ciple—with the overlapping of two homogeneous rectangles, with the same front/back ambiguity
and alternating transparency observable in the cross/bar display of Figure 20.1a.
Much of the theoretical weight of transparency depends on the colors seen when the inter-
section region belongs to both the dark bar and the light cross (panel a), rather than appearing
as an isolated surface (panel b). Figural belongingness modulates the scission of the sensation
(Spaltung der Empfindung; Hering 1879) and impacts on perceived intensity and color appear-
ance. Helmholtz (1910/1924, originally published in 1867) framed real transparency as a problem
of recognizing the components of a light mixture, using knowledge acquired in ordinary environ-
ments in which at least the mixture of illumination and reflectance components is pervasive. In
the Helmholtzian view, the same ratiomorphic process supports the discounting of illumination
associated with the approximate constancy of opaque surface colors, the perception of shadows,
the separation of filter properties from background properties, and analogous recovery problems.
‘Just as we are accustomed and trained to form a judgment of colours of bodies by eliminating
the different brightness of illumination by which we see them, we eliminate the colour of the
illumination also. [. . .] Thus too when we view an object through a coloured mantle, we are not
embarrassed in deciding what colour belongs to the mantle and what to the object.’ (Helmholtz
1924, p. 287.)
Helmholtz’s emphasis on observers’ ability to evaluate light mixture components conflicts with
the plain argument developed in Figure 20.1. The same light mixture sometimes is phenomenally
split into components, sometimes not, depending on stimulus conditions. The discovery of condi-
tions for the occurrence of phenomenal transparency (independent of its veridicality) is the goal
of a long tradition of research oriented by Gestalt ideas (Fuchs 1923; Kanizsa 1955, 1979; Koffka
1935; Metelli 1970, 1974, 1975; Moore-Heider 1933; Tudor-Hart 1928), among which a special
place is held by the idea that double-belongingness is a peculiar organization producing charac-
teristic effects on perceived color (Kanizsa 1955; Musatti 1953; Wallach 1935/1996).
Since transparency can be observed in line-drawing displays (Bozzi 1975), without specific
photometric information, let us consider geometric conditions first.
In the Gestalt tradition the ‘apparent/real’ dichotomy is used to stress that real transparency (i.e., a layer with
4
non-zero transmittance) is neither necessary nor sufficient to support a transparency percept; apparent trans-
parency is perceived in mosaics of opaque surfaces. Like for motion, the apparent/real dichotomy stimulates the
search for the proximal conditions supporting the perception of transparency, independent of its veridicality.
416 Gerbino
Topological condition
The topological condition has been formulated as follows (Kanizsa 1955). To belong to two subu-
nits each candidate region must be in contact with the other (reciprocal contact constraint) and
with only one of the remaining regions (Figure 20.2). At the level of regions, the condition is
satisfied when contours meet at a generic 4-side junction, even without good continuation at the
contour level (Figure 20.1c).
Kanizsa (1955, 1979) and Metelli (1975, 1985b) discussed various controversial configurations
connected to the topological condition. Kanizsa (but not Metelli) concluded that the topological
condition is necessary, though not sufficient. Panels b–d in Figure 20.2 depict violations that lead
to the loss of the compelling transparency percept observed in Figure 20.2a. However, the bro-
ken layer depicted in Figure 20.2c does not completely forbid transparency, being consistent with
common observations of shadows falling over a 3D step, with non coplanar background regions.
Arguing that the topological condition is necessary, Kanizsa (1979, Fig. 8.9) claimed that trans-
parency is hardly seen in Figure 20.3a.6
Apart from being necessary or not, what is the meaning of the topological condition? Does
it capture a figural constraint at the level of regions or does it relate to photometric conditions
described in Section “Photometric conditions”? The second hypothesis is supported by a manipu-
lation of borders done by Metelli (1985b). Transparency of the oblique square in Figure 20.3b dis-
appears if one eliminates the adjacency of to-be-grouped regions by superposing a thick outline
on the borders of the intersection region (Figure 20.3c). Transparency is not blocked, however, if
all regions are bounded by thick outlines that can become part of the transparency solution, with
the upright square perceived on top of the oblique square (Figure 20.3d). The isolation effect in
Figure 20.3c is reminiscent of the loss of the film appearance in a shadow whose penumbra is sup-
pressed by a thick outline.7
Figural conditions
Figural aspects play a major role in transparency and, when strengthened by motion, can overcome
contradictory photometric information. Kanizsa (1955, 1979) and Metelli (1974) emphasized the role
An extended notation for the double-belongingness of p and q regions would be (ap)(pq)(qb). In the compact
5
notation above the subunit corresponding to the transparent layer is marked by square brackets, while the
background subunits are marked by round brackets.
You may disagree.
6
See discussions of Hering’s shadow/spot demonstration in Metzger (1936/2006, Fig. 132) and Gilchrist
7
(2006, p. 21).
Achromatic Transparency 417
(a) (b)
(c) (d)
Fig. 20.2 Topological condition. (a) Canonical 4-region display fulfilling all geometric and photometric
requirements. Panels b–d illustrate three ways in which the topological condition can be violated. (b)
Regions that should be unified into a single layer are not in reciprocal contact, while touching both
background regions. (c) The reciprocal contact constraint is fulfilled, but both candidate layer regions are
in contact also with both background regions. (d) The topological condition is violated also when the
inner contour of a unitary layer (i.e., the one that divides the two constituent regions) is not aligned with
the contour that divides the background regions.
Data from G. Kanizsa, Condizioni ed effetti della trasparenza fenomenica, Rivista di Psicologia, 49, pp. 3–19,
1955.
of good continuation at X-junctions as the critical local factor supporting vivid impressions of trans-
parency, other things being equal (i.e., once the topological condition is fulfilled and keeping the
intensity pattern constant). However, they considered also more global figural factors, like the shape
of regions.
Figural conditions for the double-belongingness of regions to be grouped into a layer agree with
those that govern the segmentation of outline patterns and have been studied within a research
tradition that goes from Wertheimer (1923/2012) to the most recent developments of Structural
Information Theory (SIT; Leeuwenberg and van der Helm 2013). Wertheimer (1923/2012), com-
menting on his Figs. 33 and 34, observed that Fuchs (1923) utilized the same laws of unification/
segregation when studying transparent surfaces in the period 1911–1914 and found they strongly
affect color. Wertheimer’s Fig. 33 is an outline version of Figure 20.3b, while Wertheimer’s Fig. 34
is similar to Figure 20.1d. These and other famous outline patterns (like the pair of intertwined
hexagons) support the idea that figural segmentation crucially depends on the tendency towards
the ‘good whole Gestalt’ (Wertheimer 1923, p. 327; Wagemans, Chapter 1, Section “Wertheimer’s
“Gestalt laws” (1923)”, this volume).
418 Gerbino
(a) (b)
(c) (d)
Fig. 20.3 According to Kanizsa (1979) the pattern in panel a shows that the topological condition
cannot be violated without destroying perceived transparency. Adapted from G. Kanizsa, Organization
in Vision, Figure 9.6, p. 160, Praeger, Santa Barbara, USA, Copyright © 1979, Praeger. Panels b–d (from
Metelli 1985b) show the effect of thick outlines. The transparency perceived in panel b is destroyed by a
thick outline surrounding the superposition region (panel c). A thick outline surrounding all regions can
be integrated in the transparency percept (panel d).
In an early application of SIT to visual and auditory domains, Leeuwenberg (1976, 1982;
Leeuwenberg and van der Helm 2013; see also van der Helm, Chapter 50, this volume) computed
a measure of preference for pattern segmentation based on the ratio between the complexity of the
mosaic solution and the complexity of the transparency solution. Using patterns like those in Figure
20.4 and coding only figural complexity (independently of photometric conditions), he obtained a
high correlation between the theoretical preference measure and transparency judgments.
Singh and Hoffman (1998) provided a major contribution to the idea that figural conditions go
beyond the local good continuation at X-junctions. They used displays with X-junctions that pre-
served the local good continuation of background and layer contours, and asked observers to rate
perceived transparency on a 1-7 scale. Observers were more sensitive to the size of turning angles
at the extrema of curvature of the layer boundary when they were negative minima than positive
maxima. Average ratings ranged from 1.5 (close to perfect mosaic) to 6 for negative minima, and
from 4 to 6 for positive maxima. Furthermore, Singh and Hoffman (1998) found that the prox-
imity of the extrema of curvature to the background boundary increased the detrimental effect
on transparency ratings. Their results show that the competition between mosaic and double-
belongingness solutions depends on properties like negative extrema, which are relevant for the
parsing of shapes into parts (Singh, Chapter 12, this volume).
All geometric factors known to affect relative depth may be effective in making the transpar-
ent layer more salient and in modulating the preference for one transparency solution when
Achromatic Transparency 419
(a) (b)
Fig. 20.4 According to Leeuwenberg’s coding approach (1976, 1982) perceived transparency is
predicted by a preference measure, with a value of 1 for the balance between mosaic and transparency
solutions. Preference values are 11.90 in panel a and 0.56 in panel b. This preference measure takes into
account only figural (not photometric) aspects.
Reproduced from Emanuel Leeuwenberg and Peter A. van der Helm, Structural Information Theory: The Simplicity
of Visual Form, Cambridge University Press, Cambridge, UK, Copryight © 2012, Cambridge University Press, with
permission.
Based on evidence from texture segmentation in motion transparency, Glass patterns, and stereopsis, such a
8
number has been evaluated as equal to two (Edwards and Greenwood 2005; Gerbino and Bernetti 1984; Kanai
et al. 2004; Mulligan 1992; Prazdny 1986), three (Weinshall 1991), four (Hiris 2001), and dependent on the
cueing of attention (Felisberti and Zanker 2005).
420 Gerbino
(a) (b)
Fig. 20.5 Transparency in outline patterns (Bozzi 1975). In panel a thinning all lines included within
the oblique rectangle makes it appear foggy. In panel b the misalignment is perceived as the effect
of a distorting superposed layer.
condition for transparency, but phenomenal transparency should involve a characteristic color
appearance, different from the appearance of the same region when seen as part of a mosaic.
This is the case in patterns like those in Figure 20.5, devised by Bozzi (1975) to demonstrate that
the experience of an interposed layer or substance, capable of modifying the appearance of the
background, can be obtained also in the limited and artifactual world of line drawings. Taken as a
whole, Bozzi’s demonstrations suggest that the perception of an interposed layer—at least in some
conditions—amounts to the recovery of the causal history of shapes (Leyton 1992). The milky
layer perceived in panel a accounts for the thinning of vertical lines, while the distorting glass
perceived in panel b accounts for their lateral shift. Bozzi was well aware of the possibility that
line thinning (panel a) may be equivalent to an intensity change, which would make at least some
of his line drawings not less interesting, but similar to other effects involving assimilation and
filling in. The degree of connection between Bozzi’s outline displays portraying transparency and
phenomena like achromatic neon spreading and flank transparency is debatable (Wollschläger
et al. 2001, 2002; Roncato 2012). However, this objection does not apply to Figure 20.5b and other
displays that depict a background transformation more complex than a simple change of inten-
sity due to layer superposition. Line drawings are highly symbolic and transparency mediated by
the specific transformations they can afford might go beyond the domain covered in this chapter.
Photometric conditions
To support transparency, the pattern of intensities of adjacent regions must satisfy a requirement
that, at an abstract level, complements the good continuation of contour trajectories. The equiva-
lent of a discontinuity in contour trajectory is an abrupt change of surface values (apparent trans-
mittance, lightness, or others to be defined).
Consider contour trajectories in the neighborhood of an X-junction originated by layer
superposition. In general, background regions are divided by a continuous reflectance edge
(R-edge), while the superposed layer and background regions are divided by a continuous
transmittance-reflectance-illumination edge (TRI-edge). Following Nakayama et al. (1989) the
latter edge is intrinsic to layer regions (it belongs to them) but extrinsic to regions seen as unoc-
cluded background (it does not belong to them). Topological and figural conditions tell that both
edges should be smoothly continuous at the X-junction.
Consider now intensities in the neighborhood of the X-junction. Photometric conditions tell
when one of the two crossing edges can be classified as a TRI-edge; i.e., when the intensity of each
Achromatic Transparency 421
double-function region is consistent with the mixing of photometric properties of the adjacent
background region and those of an ideally homogeneous layer resulting from the grouping of two
adjacent double-function regions. Notions such as scission (Metelli 1970; Anderson 1997), vector
analysis in the photometric domain (Bergström 1977, 1982, 1994), atmospheric transfer func-
tion (Adelson 2000) capture the same idea. A rather general term is layer decomposition, used by
Kingdom (2011) to qualify brightness, lightness, and transparency models—alternative to image
filtering—that explain achromatic phenomena as a consequence of extracting components from
each stimulus intensity (the invariant of alternative partitioning solutions). For historical and
conceptual reasons let us illustrate the algebraic model proposed by Metelli (1970, 1974, 1975)
which—despite limitations that will be pointed out—provides an effective frame of reference for
the whole discussion on photometric conditions of transparency.9
Metelli’s model
Metelli’s model is derived from a simplistic case of real transparency, the episcotister setting uti-
lized to manipulate light mixtures (Fuchs 1923; Koffka 1935; Moore-Heider 1933; Tudor-Hart
1928). The episcotister model is representative of a broad class of ecological settings, which in
principle should consider more parameters (Richards et al. 2009), but—more importantly—has
the virtue of being a simple and essential decomposition-and-grouping model.
As shown in Figure 20.1, a layer appears transparent only if partially superposed on a back-
ground that includes at least two regions of different reflectance.10 Metelli’s model provides a way
of evaluating the amount of photometric information carried by a generic X-junction in which
an R-edge intersects a TRI-edge. The R-edge is the simple boundary between two adjacent back-
ground regions, differing in reflectance but equally illuminated; while the TRI-edge is a complex
boundary arising from the superposition of a layer of variable transmittance and reflectance, and/
or a change in illumination.
In the original model the input variables are the four reflectances that, in a cardboard display,
mimic the light coming from two adjacent background surfaces a and b, and from the light mix-
tures p and q, obtained by rotating an episcotister (spinning disk with apertures and opaque sec-
tors of variable reflectance) in front of background surfaces a and b, under the critical assumption
that the episcotister and background surfaces are equally illuminated.11 The fact that the situation
referred to in the episcotister model does not involve physically transparent materials should not
be seen as a problem. When an episcotister rotates faster than fusion speed, its effects on p and
q intensities are equivalent to those generated by static layers as a thin veil or an optical filter.
Neither the temporal (episcotister) nor the spatial (veil, filter) light mixtures follow the equations
known as the episcotister model if the constraint of uniform illumination is not fulfilled; both
Kanizsa (1955, 1979) sometimes used the label ‘chromatic conditions’ as a synonim of photometric condi-
9
tions, discussing achromatic displays. To avoid confusions that would obviously arise in a chapter entitled
‘Achromatic transparency,’ conditions related to region intensities (expressed as either reflectances or lumi-
nances) will be called ‘photometric.’
This formulation covers transparency perceived in the 3-region display, studied for instance by Masin
10
(1984). His observers perceived as transparent a real filter suspended in front of a background that includ-
ed a square projectively enclosed by the filter. However, the objective separation in depth was large enough
to provide valid disparity information.
11 In this chapter small letters are used for dimensionless numbers (reflectances abpq and other coefficients
with meaningful values between 0 and 1) and capital letters for luminances (in Section “Reflectances or
luminances?”). For further details see Gerbino et al. (1990) and Gerbino (1994). The transparency litera-
ture is full of different symbols for the same entities. I apologize for possible confusions.
422 Gerbino
should be described by the so-called filter model if the layer is very close or in contact with the
background, as it actually looks in the flatland of impoverished 4-region displays (Beck et al. 1984;
Gerbino 1994; Richards et al. 2009).12
Basically, the episcotister model takes regions grouped as (a[p)(q]b) according to figural con-
straints and verifies if p and q intensities are compatible with the constrained sum of two compo-
nents described by the following equations:
p = ta + f (1)
q = tb + f (2)
Equations 1 and 2 make clear that the episcotister model is a straightforward decomposition-
and-grouping model. Each intensity of a region to be grouped into the layer is reduced to the sum of
a multiplicative component and an additive component (the scission aspect): the first is the constant
fraction t of the corresponding background region; the second is a common component that—what-
ever the t value between 0 and 1—attenuates the background contrast a/b.
Equations 1 and 2 describe how a and b intensities are modified by a rotating episcotister with
an open sector of size t and an effective reflectance f, equal to the product of the size of the com-
plementary solid sector (1-t) by its reflectance r. Since both t and r are proper fractions (t is the
relative size of the opening of the episcotister and r is a reflectance), neither can be smaller than
zero or larger than 1.
Equations 1 and 2 refer to direct optics. For instance, knowing background reflectance a,
filter transmittance t and filter reflectance r, one can derive the effective reflectance of the
superposition area p. However, such a system of two equations becomes a useful psychophys-
ical model if one realizes (as Metelli did) that it provides unique solutions for both t and r,
constituting a plausible inverse-optics model for the recovery of layer properties (not explicit
in the stimulus) from the pattern of input values (Marr 1982, pp. 89–90). Relevant solutions
are as follows:
t = ( p − q) / (a − b) (3)
f = (aq − bp ) / (a − b ) (5)
Taking the episcotister as a physical model of real transparency Metelli proposed that layer
transmittance and reflectance are perceived in the same way in which the reflectance of an opaque
background surface is perceived as its lightness. Layer transparency (perceived transmittance,
increasing with t) and layer lightness (perceived reflectance, increasing with r) are derived from
the pattern of stimulation.
In the transparency literature, expressions like ‘episcotister model’ and ‘filter model,’ or ‘episcotister equations’
12
and ‘filter equations,’ should not be taken as referring to a specific device (a spinning disk with open sectors
vs. a piece of smoked glass), but to two extreme types of background illumination: in the so-called episcotister
model the background is illuminated exactly like the layer (a condition easily obtained if the layer is sus-
pended in mid air, far away from the background); in the so-called filter model the background is illuminated
only through the layer (a condition which quite frequently occurs when a filter is in contact with the ground).
Achromatic Transparency 423
(a) (b)
t = 0.27 t = 0.43
r = 0.20 r = 0.40
(c) (d)
t = 0.53 t = 0.60
r = 0.60 r = 0.80
Fig. 20.6 The four panels illustrate that, keeping background intensities constant (a = 0.90;
b = 0.10), approximately the same attenuation of background contrast (p/q = 0.25 a/b) is compatible
with different pairs of t and r values (shown in each panel). Intensities of p and q regions are as
follows: (a) p = 0.12; q = 0.05; (b) p = 0.39; q = 0.17; (c) p = 0.61; q = 0.27; (d) p = 0.76; q = 0.34.
The hypothesis that perceptual dimensions of transparency parallel the physical properties of
the layer is quite controversial (Albert 2006, 2008; Anderson 2008; Anderson, Chapter 22, this
volume; Anderson et al. 2006, 2008a, b; Masin 2006; Singh and Anderson 2002, 2006). According
to Kingdom (2011, Section 9) further research is needed to identify the appropriate perceptual
dimensions and the best methods for obtaining valid data from observers. However, as remarked
by Anderson et al. (2008a, p. 1150), researchers should not expect that all variables included in
generative physical models like Equations 1 and 2 have a perceptual meaning. Furthermore, they
should consider the possibility that perception is sensitive to other variables. For instance, solu-
tions for t, r, f (Equations 3, 4, 5) are more complex than the simple intensity ratio available at
each image boundary; while attenuation of border contrast is probably the most salient physical
consequence of layer superposition.13 Note that t and r values, against intuition, are not related to
contrast attenuation in a simple way (Figure 20.6). For a theory of transparency based on contrast
attenuation see Anderson (2003).
The attenuation of border contrast is also behind the notion of veiling luminance, a hybrid term that
13
combines the phenomenal transparency of a metaphorical veil with a physical measure of input intensity
(Gilchrist, 2006, pp. 196–197). When spontaneously perceived as a veil, added light is experienced as the
cause of the reduced visibility of otherwise well-contrasted borders (a case of real transparency without
X-junctions).
424 Gerbino
Reflectances or luminances?
Clearly, the choice of reflectances as input variables is controversial and raised several discussions
(Beck 1985; Beck et al. 1984; Gerbino 1994; Metelli 1985a; Masin 2006). Reflectances are distal
values, and a model should express perceptual values as a function of proximal, not distal, values.
On the other hand, under homogeneous illumination reflectances can be taken as luminances
in arbitrary units, making the distinction irrelevant. Another type of criticism refers, instead, to
the possibility of taking lightnesses (i.e., perceived reflectances derived from a transformation of
luminances) as the input for the model. This approach is theoretically consistent with the exist-
ence of a stage in which all four regions of the canonical display are represented as opaque sur-
faces, each with its own lightness, and of a subsequent stage in which a better solution is achieved
(Rock 1983, pp. 138–139).
An unfortunate implication of the use of reflectances is Metelli’s idea that r= 1 constitutes an
effective upper boundary for transparency. Reformulating the episcotister model in terms of lumi-
nances (Gerbino 1988, 1994; Gerbino et al. 1990) helps to understand that this constraint can be
relaxed. Using luminances as input values, Equations 1 and 2 change as follows:
P = tA + F (6)
Q = tB + F (7)
In Equations 6 and 7 also the additive component F is a luminance, equal to (1−t) r Ie, where
Ie is the illumination falling on the episcotister, in principle different from the illumination Ib
falling on background regions whose reflectances are a and b.14 Following the inverse-optics
logic there is no reason to reject values of the additive component F larger than (1−t) Ib,
(i.e., r = 1), since they are compatible with more illumination falling on the layer than on the
background. In principle one could decompose even smaller F values as involving an increase
of the illumination on a layer with r < 1. But this solution would be against the minimum prin-
ciple (which leads to a decomposition with uniform illumination, unless required by specific
stimulus information).
Photometric conditions of the episcotister luminance model are conveniently represented in
the diagram devised by Remondino (1975). Figure 20.7 includes two diagrams, to represent two
transparency solutions, one for each of the two edges crossing at the X-junction, for two 4-region
patterns having in common two luminances (30 and 80, in arbitrary units). In general, photomet-
ric conditions for the TRI-edge can be satisfied for both edges, only one, or none. In the pattern
at the bottom the two solutions correspond to the following APQB orderings: (80, 40, 20, 30) and
(80, 30, 20, 40), with t = 0.40 and 0.25, respectively, and r = 0.13 in both cases. Both transparency
solutions of the pattern at the top violate the r ≤ 1 constraint, but can be interpreted as cases in
which a layer made of perfectly white particles is more illuminated than the background (Ie= 1.3
Ib, if r = 1). The aspect of the diagram with the most prominent theoretical meaning is the shaded
region representing the set of PQ values compatible with a given AB pair and with the constraints
of the episcotister luminance model.
As anticipated in Footnote 11, capital letters are used for luminances and light intensities, while small let-
14
t2 = 0.7 t2 = 0.5
TRI-edge R-edge
r = 1.0 r = 1.0
Ie = 1.3 Ib Ie = 1.3 Ib
30 60 30 60
B Q2 B* A*
R-edge TRI-edge
A P2 Q2 P2
80 95 80 95
100 100
P2 = 95
A A=Q2
t2 A
P1
t1 A
P1=B*
t1 A
t2 B*
0 0
0 t1 B Q1 B Q2 100 0 t1 B Q1 B A* P2
t2 A*
t2 B
30 20 30 20
B Q1 P1 Q1
R-edge TRI-edge
A P1 A B
t1 = 0.4 80 40 80 40 t1 = 0.25
r = 0.13 r = 0.13
Ie =Ib Ie =Ib
TRI-edge R-edge
Fig. 20.7 A convenient visualization of transparency solutions in 4-region patterns is the diagram
proposed by Remondino (1975). Coordinates represent luminances in arbitrary units. Two 4-region
patterns are considered here, both compatible with two transparency solutions, corresponding to
two different t values. The component r has a low value (r = 0.13) in both solutions for the bottom
pattern; while it exceeds the r = 1 boundary (dashed line) in both solutions for the top pattern. Each
shaded trapezoidal region in the two diagrams represents the space of valid PQ luminance pairs for
a given AB pair (square symbol). Such a space is actually open in the direction of higher PQ values,
since the additive component (visualized by the projection of the oblique arrow on each axis) can take
any positive value, if constraints on illumination are relaxed. PQ pairs are shown in the two diagrams
as circular symbols, filled for the pattern at the bottom and empty for the pattern at the top.
Effects of transparency
Transparency can be conceived of as the effect of appropriate stimulus conditions, but also as the
cause of specific changes in other perceptual properties. Kanizsa (1955) articulated this logic refer-
ring to Figure 20.8a, an ambiguous pattern supporting either an occlusion solution (a light lamina
with holes in front of an oblique opaque bar) or a transparency solution (a milky rectangular filter
in front of a rectangle with holes). The dominance of one solution over the other depends on the
relative intensities of the three regions (Ripamonti and Gerbino 2001); but when conditions are such
that both solutions are easily perceived, a clear effect of form organization on color is observed. In
Metelli (1985b, p. 304) reminded us that the devil—notoriously an excellent observer—treats Peter
15
Schlemihl’s shadow as a thin mantle laying on the terrain: ‘He shook my hand, knelt down in front of me
without delay, and I beheld him, with admirable dexterity, gently free my shadow, from the head down to
the feet, from the grass, lift it up, roll it together, fold it, and finally tuck it into his pocket.’ (Chamisso, The
Wonderful History of Peter Schlemihl).
Achromatic Transparency 427
Fig. 20.8 The ambiguous three-intensity pattern in panel a (Kanizsa 1955) can be perceived as a light
lamina with four holes in front of an oblique rectangle (like in panel b) or as a transparent oblique
rectangle in front of a lamina with holes (like in panel c). The addition of a thin outline disambiguates
the transparent layer, which takes on a definite milky appearance. The same color appearance is
observed in panel a, when the oblique rectangle appears in front.
Reproduced from G. Kanizsa, Condizioni ed effetti della trasparenza fenomenica, Rivista di Psicologia, 49,
pp. 3–19, Figure 12, Copyright © 1955, The Author.
the occlusion solution (that may be primed by panel b, where intensity conditions do not favor trans-
parency) the oblique bar is amodally completed but its modal parts have a hard surface color. In the
transparency solution the oblique bar is similar to the one in panel c, where the white outline makes
the bar unambiguously in front. Coming in front is associated with a distinctive change in color
appearance. The bar appears modally completed in front by the addition of illusory contours and all
its surface acquires a milky appearance (van Lier and Gerbino, Chapter 15, this volume).
There are two theoretically important points. First, the specific color appearance of transparent
surfaces cannot be explained by image properties only, given that the image remains the same dur-
ing occlusion/transparency reversals. Second, changes are consistent with scission: an invariant
stimulus-specified quantity splits into a layer component and a background component. Kanizsa
(1955) remarked that the measurement of such components is made difficult by opposite tenden-
cies in different observers: some focus their attention on the transparent layer in front, some on
surfaces seen through the layer.
As regards other effects (or at least, other couplings involving transparency) Kersten et al.
(1992) provided a nice demonstration of the interplay between transparency and rotation in
depth. Gerbino (1975) found that shrinkage by amodal completion extends to rectangles partially
occluded by a layer of variable transparency, and its amount correlates with the perceived opacity
of the layer. Sigman and Rock (1974; Rock 1983, p. 171) demonstrated that an opaque occluder,
but not a transparent object, vetoes the perception of stroboscopic motion, according to the idea
that this type of apparent motion is mediated by perceptual intelligence. Moving from the obser-
vation that transparency can be perceived in low-contrast disk-surround displays (Masin and
Idone, 1981), Ekroll and Faul (2012a, 2012b, 2013) argued that the perception of transparency can
provide a unifying account of simultaneous color contrast phenomena.16
Musatti (1953) articulated a theory of simultaneous color contrast, based on scission of the proximal color,
16
Motion transparency
In random dot kinematograms (RDK), grouping by common fate (Brooks, Chapter 4, this volume)
leads to the segmentation of textured overlapping surfaces. This phenomenon is usually called
motion transparency and has been intensively utilized to study motion mechanisms (Braddick
and Qian 2001; Curran et al. 2007; Durant et al. 2006; Meso and Zanker, 2009; van Doorn and
Koenderink 1982a, b), the maximum number of independent planes that the visual system can
effectively segregate (Edwards and Greenwood 2005; Gerbino and Bernetti 1984; Mulligan 1992),
depth ordering (Schütz 2011), global vs. local motion (Kanai et al. 2004), and directional biases
(Mamassian and Wallace 2010).
Transparency perceived in RDK is a by-product of grouping by motion and does not involve
layer decomposition with color changes. However, figure/ground stratification is correlated
with small but reliable effects on lightness and perceived contrast. As noted since Rubin
(1915/1921) and demonstrated by Wolff (1934; Gilchrist 2006) the figure appears more con-
trasted than the ground; and perceived contrast within the figure is higher than perceived
contrast within the ground (Kanizsa 1979). Since attention is normally directed towards the
figure, one should also consider that attention can enhance contrast, as postulated by James
(1890) and demonstrated in several studies (Barbot et al. 2012; Carrasco et al. 2000; Prinzmetal
et al. 2008; Treue 2004).
Musatti (1953, p. 555) attributed to Metzger the honor of first observing transparency in stereokinetic
17
displays. Metzger mentioned the effect in the second edition of Gesetze des Sehens (1953) and discussed
(1955) the paradoxical fact that stereokinesis can make a disk transparent and sliding over another also
when the color of the superposition region is physically unplausible, as later reported by Hupé and
Rubin (2000).
Achromatic Transparency 429
Conclusion
Principles of perceptual organization prove to be an important source of inspiration for the under-
standing of phenomenal transparency. Concern for the physical plausibility of transparency models
has sometimes obscured the fundamental fact that notions like scission and layer decomposition,
combined with grouping by surface color similarity and contour good continuation satisfactor-
ily account for perception. Interested readers will find extensive treatments of other aspects of
phenomenal transparency in recent empirical and theoretical papers (Anderson, Chapter 22, this
volume; Faul and Ekroll 2011, 2012; Kingdom 2011; Kitaoka 2005; Koenderink et al. 2008, 2010;
Richards et al. 2009). Important evidence on the neural mechanisms related to the assignment of
border ownership in transparency patterns has been found by Qiu and von der Heydt (2007).
References
Adelson, E. H. (2000). ‘Lightness perception and lightness illusions’. In The New Cognitive Neurosciences,
edited by M. Gazzaniga, 2nd ed., pp. 339–51 (Cambridge, MA: MIT Press).
Albert, M. K. (2006). ‘Lightness and perceptual transparency’. Perception 35: 433–43.
Albert, M. K. (2008). ‘The role of contrast in the perception of achromatic transparency: Comment on
Singh and Anderson (2002) and Anderson (2003)’. Psychological Review 115: 1127–43.
Anderson, B. L. (1997). ‘A theory of illusory lightness and transparency in monocular and binocular
images: the role of contour junctions’. Perception 26: 419–53.
Anderson, B. L. (2003). ‘The role of occlusion in the perception of depth, lightness, and opacity’.
Psychological Review 110: 785–801.
Anderson, B. L. (2008). ‘Transparency and occlusion’. In The Senses: A Comprehensive Reference, edited by
A. I. Basbaum, A. Kaneko, G. M. Shepherd, and G. Westheimer, Vol. 2, Vision II, T. D. Albright and R.
H. Masland (Volume eds.), pp. 239–44 (San Diego: Academic Press).
Anderson, B. L. (2014). ‘The perceptual representation of transparency, lightness, and gloss’. In Oxford
Handbook of Perceptual Organization, edited by J. Wagemans, Chapter 22 (Oxford: Oxford University
Press).
Anderson, B. L. and Schmid, A. C. (2012). ‘The role of amodal surface completion in stereoscopic
transparency’. Frontiers in Psychology 3: 1–11.
Anderson, B. L. and Winawer, J. (2005). ‘Image segmentation and lightness perception’. Nature 434: 79–83.
Anderson, B. L., Singh, M., and Meng, J. (2006). ‘The perceived transmittance of inhomogeneous surfaces
and media’. Vision Research 46: 1982–95.
Anderson, B. L., Singh, M., and O’Vari, J. (2008a). ‘Natural psychological decompositions of perceived
transparency: Reply to Albert’. Psychological Review 115: 144–51.
Anderson, B. L., Singh, M., and O’Vari, J. (2008b). ‘Postscript: Qualifying and quantifying constraints on
transparency’. Psychological Review 115: 151–3.
Arnheim, R. (1974). Art and Visual Perception. [1954] (Berkeley: University of California Press).
Barbot, A., Landy, M. S., and Carrasco, M. (2012). ‘Differential effects of exogenous and endogenous
attention on second-order texture contrast sensitivity’. Journal of Vision 12: 1–15.
Beck, J. (1985). ‘Perception of transparency in man and machine’. Computer Vision, Graphics, and Image
Processing 31: 127–38.
Beck, J., Prazdny, K. and Ivry, R. (1984). ‘The perception of transparency with achromatic colors’.
Perception and Psychophysics 35: 407–22.
Bergström, S. S. (1977). ‘Common and relative components of reflected light as information about the
illumination, colour, and three-dimensional form of objects’. Scandinavian Journal of Psychology
18: 180–6.
430 Gerbino
Ekroll, V. and Faul, F. (2013). ‘Transparency perception: the key to understanding simultaneous color
contrast’. Journal of the Optical Society of America A 30: 342–52.
Elder, J. H. (2014). ‘Bridging the dimensional gap: Perceptual organization of contour in two-dimensional
shape’. In Oxford Handbook of Perceptual Organization, edited by J. Wagemans, Chapter 11 (Oxford:
Oxford University Press).
Epstein, W. (1982). ‘Percept-percept couplings’. Perception 11: 75–83. Reprinted in I. Rock (ed.) (1997).
Indirect Perception, pp. 17–29 (Cambridge, MA: MIT Press).
Faul, F., and Ekroll, V. (2011). ‘On the filter approach to perceptual transparency’. Journal of Vision 11: 1–33.
Faul, F. and Ekroll, V. (2012). ‘Transparent layer constancy’. Journal of Vision 12: 1–26.
Fazl, A., Grossberg, S., and Mingolla, E. (2008). ‘View-invariant object category learning, recognition,
and search: How spatial and object attention are coordinated using surface-based attentional shrouds’.
Cognitive Psychology 58: 1–48.
Felisberti, F. and Zanker, J. M. (2005). ‘Attention modulates perception of transparent motion’. Vision
Research 45: 2587–99.
Fuchs, W. (1923).’ Experimentelle Untersuchungen über das simultane Hintereinandersehen auf derselben
Sehrichtung’. Zeitschrift für Psychologie 91: 145–235.
Gerbino, W. (1975). ‘Perceptual transparency and phenomenal shrinkage of visual objects’. Italian Journal of
Psychology 2: 403–15.
Gerbino, W. (1988). ‘Models of achromatic transparency: A theoretical analysis’. Gestalt Theory 10: 5–20.
Gerbino, W. (1994). ‘Achromatic transparency’. In Lightness, Brightness, and Transparency, edited by A. L.
Gilchrist, pp. 215–55 (Hillsdale, NJ: Erlbaum).
Gerbino, W. and Bernetti, L. (1984). ‘One, two, many: textural segregation on the basis of motion’.
Perception 13: A38–A39.
Gerbino, W., Stultiens, C., Troost, J., and de Weert, C. (1990). ‘Transparent layer constancy’. Journal of
Experimental Psychology: Human Perception and Performance 16: 3–20.
Gibson, J. J. (1975). ‘Three kinds of distance that can be seen, or how Bishop Berkeley went wrong’.
In Studies in Perception: Festschrift for Fabio Metelli, edited by G. B. Flores D’Arcais, pp. 83–7
(Firenze: Martello-Giunti).
Gibson, J. J. (1979). The Ecological Approach to Visual Perception (Boston: Houghton Mifflin).
Gilchrist, A. L. (2005). ‘Lightness perception: Seeing one color through another’. Current Biology 15,
9: 330–2.
Gilchrist, A. L. (2006). Seeing Black and White (New York: Oxford University Press).
Hatfield, G. (2011). ‘Transparency of mind: The contributions of Descartes, Leibniz, and Berkeley to
the genesis of the modern subject’. In Departure for Modern Europe: A Handbook of Early Modern
Philosophy (1400–1700), edited by H. Busche, pp. 361–75 (Hamburg: Felix Meiner Verlag).
Helmholtz, H. von (1867). Handbuch der physiologischen Optik (Leipzig: Voss). English translation by
J. P. C. Southall (ed.) of the third [1910] German edition (1924). Treatise on Physiological Optics.
(New York: Dover). Available at <http://poseidon.sunyopt.edu/BackusLab/Helmholtz/>
Hering, E. (1879). ‘Der Raumsinn und die Bewegungen des Auges’. In Handbuch der Physiologie der
Sinnesorgane, edited by L. Hermann, 3(1), S343-601 (Leipzig: Vogel).
Hiris, E. (2001). ‘Limits on the perception of transparency from motion’. Journal of Vision 1: 377a.
Hochberg, J. (1974). ‘Higher-order stimuli and inter-response coupling in the perception of the visual
world’. In Perception: Essays in Honor of James J. Gibson, edited by R. B. McLeod and H. L. Pick, Jr., pp.
17–39 (Ithaca, NY: Cornell University Press).
Hupé, J.-M., and Rubin, N. (2000). ‘Perceived motion transparency can override luminance / color cues
which are inconsistent with transparency’. Investigative Ophthalmology and Visual Science Supplement
41: 721.
James, W. (1890). The Principles of Psychology (New York: Holt).
432 Gerbino
Kanai, R., Paffen, C. L., Gerbino, W., and Verstraten, F. A. (2004). ‘Blindness to inconsistent local signals
in motion transparency from oscillating dots’. Vision Research 44: 2207–12.
Kanizsa, G. (1955). ‘Condizioni ed effetti della trasparenza fenomenica’. Rivista di Psicologia 49: 3–19.
Kanizsa, G. (1979). Organization in Vision (New York: Praeger).
Katz, D. (1925). Der Aufbau der Tastwelt (Leipzig: Barth). English translation by L. E. Krueger (ed.) (1989).
The World of Touch (Hillsdale, NJ: Erlbaum).
Kepes, G. (1944). Language of Vision (Chicago: Paul Theobald). Reissued 1995 (New York: Dover
Publications).
Kersten, D., Bülthoff, H. H., Schwartz, B., and Kurtz, K. (1992). ‘Interaction between transparency and
structure from motion’. Neural Computation 4: 573–89.
Kingdom, F. A. A. (2011). ‘Lightness, brightness and transparency: A quarter century of new ideas,
captivating demonstrations and unrelenting controversy’. Vision Research 51: 652–73.
Kitaoka, A. (2005). ‘A new explanation of perceptual transparency connecting the X-junction
contrast-polarity model with the luminance-based arithmetic model’. Japanese Psychological Research
47: 175–87.
Klee, P. (1961). The Thinking Eye, edited by J. Spiller (London: Lund Humphries).
Koenderink, J., van Doorn, A., Pont, S., and Richards, W. (2008). ‘Gestalt and phenomenal transparency’.
Journal of the Optical Society of America A 25: 190–202.
Koenderink, J., van Doorn, A., Pont, S., and Wijntjes, M. (2010). ‘Phenomenal transparency at
X-junctions’. Perception 39: 872–83.
Koffka, K. (1935). Principles of Gestalt Psychology (New York: Harcourt Brace).
Köhler, W. (1929). Gestalt Psychology (New York: Liveright).
Kramer, P. and Bressan, P. (2009). ‘Clear waters, murky waters: why transparency perception is good for
you and underconstrained’. Perception 38: 871–2, discussion 877.
Kramer, P. and Bressan, P. (2010). ‘Ignoring color in transparency perception’. Rivista di Estetica 43: 147–59.
Krueger, L. E. (1982). ‘Tactual perception in historical perspective: David Katz’s world of touch’. In
Tactual Perception: A Sourcebook, edited by W. Schiff and E. Foulke, pp. 1–54 (Cambridge: Cambridge
University Press).
Land, E. H. and McCann, J. J. (1971). ‘Lightness and retinex theory’. Journal of the Optical Society of
America 61: 1–11.
Leeuwenberg, E. L. J. (1976).’ Figure-ground specification in terms of structural information’. In Advances
in Psychophysics, edited by H. G. Geissler and Y. M. Zabrodin, pp. 325–37 (Berlin: Deutscher Verlag der
Wissenschaften).
Leeuwenberg, E. L. J. (1982). ‘The perception of assimilation and brightness contrast’. Perception and
Psychophysics 32: 345–52.
Leeuwenberg, E. L. J. and van der Helm, P. A. (2013). Structural Information Theory: The Simplicity of
Visual Form (Cambridge: Cambridge University Press).
Leyton, M. (1992). Symmetry, Causality, Mind (Cambridge, MA: MIT Press, Bradford Books).
Libben, G. (1998). ‘Semantic transparency in the processing of compounds: Consequences for
representation, processing, and impairment’. Brain and Language 61: 30–44.
Mamassian, P. and Wallace, J. M. (2010). ‘Sustained directional biases in motion transparency’. Journal of
Vision 10: 1–12.
Mamassian, P., Knill, D. C., and Kersten, D. (1998). ‘The perception of cast shadows’. Trends in Cognitive
Sciences 2: 288–95.
Marr, D. (1982). Vision (San Francisco, CA: Freeman).
Masin, S. C. (1984). ‘An experimental comparison of three- versus four-surface phenomenal transparency’.
Perception and Psychophysics 35: 325–32.
Achromatic Transparency 433
Wertheimer, M. (1923). ‘Untersuchungen zur Lehre von der Gestalt, II’. Psychologische Forschung 4: 301–
50. English translation in L. Spillmann (ed.) (2012). On Perceived Motion and Figural Organization
(Cambridge, MA: MIT Press).
Wolff, W. (1934). ‘Induzierte Helligkeitsveränderung’. Psychologische Forschung 20: 159–94.
Wollschläger, D., Rodriguez, A. M., and Hoffman, D. D. (2001). ‘Flank transparency: transparent filters
seen in dynamic two-color displays’. Perception 30: 1423–6.
Wollschläger, D., Rodriguez, A. M., and Hoffman, D. D. (2002). ‘Flank transparency: The effects of gaps,
line spacing, and apparent motion’. Perception 31: 1073–92.
Wuerger, S., Shapley, R., and Rubin, N. (1996). ‘On the visually perceived direction of motion by Hans
Wallach: 60 years later’. Perception 25: 1317–68.
Zanforlin, M. (2006). ‘Illusory space and paradoxical transparency in stereokinetic objects’.
In Visual Thought: The Depictive Space of Perception, edited by L. Albertazzi, pp. 99–104.
(Amsterdam: Benjamins).
Zanforlin, M. and Vallortigara, G. (1990). ‘The magic wand: a new stereokinetic anomalous surface’.
Perception 19: 447–57.
Chapter 21
Background
Trichromacy suggests a three-dimensional space for the organization of color. In his Bakerian
Lecture to the Royal Society, Thomas Young (1802) made the explicit connection between the
three-dimensionality of human color vision—that any spectral light can be matched by a combin-
ation of just three independent lights—and the existence of three types of physiological receptor,
distinguished by the wavelengths of light to which they respond most vigorously. At the start of
the eighteenth century, trichromacy had been exploited extensively for the practical purpose of
color reproduction for which only three primaries are needed; and indeed, by the late eighteenth
century, George Palmer (1777) and John Elliot (1780) had also made explicit early statements of
biological trichromacy (see Mollon 2003 for review).
In a remarkable short treatise from the thirteenth century, Robert Grosseteste sets out a three-
dimensional space of color in which three bipolar qualities—specifically the Latin pairings multa–
pauca, clara–obscura, and purum–impurum—are used in combination to account for all possible
colors (Dinkova-Bruun et al. 2013). The qualities multa–pauca and clara–obscura are considered
as properties of the light, and purum–impurum is considered as a property of the ‘diaphanous
medium’ in which light is incorporated. According to Grosseteste, whiteness is associated with
multa–clara–purum; and blackness with pauca–obscura–impurum. But Grosseteste moves away
from the Aristotelian one-dimensional scale of seven colors between white and black, instead
defining seven colors close to whiteness that are generated by diminishing the three bipolar quali-
ties one at a time (to give three different colors), or two at once (to give a further three), or all three
at once (to give the seventh). A further seven colors are produced by increasing the qualities from
Perceptual Organization of Color 437
blackness. By allowing infinite degrees of intensification and diminution of the bipolar qualities, he
describes a continuous three-dimensional space of color (Smithson et al. 2012).
Without wanting to over-interpret this particular text, it is worth noting several important
points that it raises about the perceptual organization of colour. First, for Grosseteste, the per-
ceptual experience of colour resides in a three-dimensional space, which can be conveniently
navigated via a combinatorial system. Second, the space of colours is continuous, but some direc-
tions in this space have a special status, for they identify discrete categories of colour. Third, the
interaction of light and materials is fundamental to our experience of colour—an observation
reiterated throughout the treatise and summarized in the opening statement, ‘Colour is light
embodied in a diaphanous medium.’ These three themes, albeit recast rather differently from the
thirteenth-century account, form the basis of the present chapter.
between test and comparison stimuli. All three color-proximity tasks failed these tests, suggesting
that observers do not employ a Euclidian distance measure when judging the similarity of colored
lights. The growth of the variability of judgments was consistent with the assumption that observ-
ers use a city-block metric.
Lights in Context
Metamerism—in which two lights with different spectral energy distributions are indiscriminable
because they offer the same triplet of cone signals—implies that the three-dimensional space of
cone signals is exhaustive in describing the gamut of color experience. This is true under certain
limited conditions of observation, for example when a small patch of light is seen in isolation
against a black surround, as if through an aperture. However, if we consider regions of extended
spatial extent, descriptions of color perception become more complex.
For extended spatial regions that are nonhomogeneous in chromaticity and luminance, the dom-
inant mode of perception is that of illuminated surfaces. The spectral composition of light reach-
ing the eye from a point in a scene of illuminated surfaces is a function of the spectrally selective
reflectances of the surfaces, and the spectral composition of the illumination. The extent to which
observers compensate for changes in the illumination to extract a stable representation of the color
properties of a surface is known as color constancy, and will be discussed later (see ‘Objects and
Illumination’). The tendency for human observers to exhibit at least partial color constancy means
that color perception of objects, and of the materials from which they are made, is categorically dif-
ferent from the perception of isolated lights, or of surfaces viewed through an aperture.
Furthermore, object-colors have additional qualitative dimensions: for example they can appear
glossy or matte; rough or smooth; cloudy or transparent. These qualities are associated with par-
ticular signatures of chromaticity and luminance variation across space. Katz (1911) dedicates the
first chapter of his book on color to classifying modes of appearance of color and the phenomen-
ology of illumination. He draws distinctions between ‘film colors and surface colors’; ‘transparent
film, surface and volume colors’; ‘mirrored color and lustre’ and ‘luminosity and glow’. These terms
all refer to how colors appear in space. Katz’s examples frequently refer to material dimensions
of color, such as metallic lustre or the lustre of silk or of graphite, yet he is careful to distinguish
between the phenomena and the conditions of their production. One hundred years on, the cor-
respondences between the physical and perceptual variables associated with these higher qualities
remain relatively poorly understood (for reviews see Adelson 2001; Anderson 2011; Anderson,
this volume). With advances in computer graphics, it has become possible to generate physic-
ally accurate renders of materials and their interaction with the light that illuminates them, thus
allowing carefully controlled experiments on perception of object-colors. It is clear that percep-
tual qualities associated with color variation across space provide systematic information about
the stuff from which objects are made (Fleming, Wiebel, and Gegenfurtner 2013). It is also clear
that these judgments are often based on a range of simple but imperfect images measurements
that correlate with material properties, rather than physically ‘correct’ inverse-optics computa-
tions (see section, ‘Perceptual correlates of material properties’).
of proportionality and additivity of metameric matches can fail (see Koenderink 2010 for review).
These subtleties in colorimetry impose important constraints on the perceptual organization of
color across the visual field, and across the lifetime. The extent to which color appearance is main-
tained despite such changes suggests the operation of sophisticated recalibration or constancy
mechanisms (Webster et al. 2010; Werner and Schefrin 1993), discussed in more detail below (see
‘Organization imposed by environmental factors’).
Individuals who are missing one of the three classes of cone are described as having dichro-
matic color vision. A subset of the dichromat’s color matches will fail to match for the normal
trichromat, but all of the normal trichromat’s matches will be acceptable to the dichromat. In this
way, dichromacy is a reduction, rather than an alteration, of trichromatic color vision. However,
individuals who are described as anomalous trichromats, by virtue of possessing a cone class
with spectral sensitivity shifted from that of the normal trichromat, will require different ratios
of matching lights in a color matching experiment. There will therefore be pairs of lights with
different spectral power distributions that are metamers for the normal trichromat but that are
discriminable to the anomalous trichromat. Deuteranomalous individuals—about 6 per cent of
men—rely on signals from S-cones and two forms of long-wavelength cone (L′ and L). The spec-
tral sensitivities of the L′—and L-cones are similar, but sufficiently different that comparison of
their signals yields a useful chromatic signal. By designing a set of stimuli that were separated
along this deuteranomalous dimension (but intermingled along the standard L versus M oppo-
nent dimension) Bosten et al. (2005) obtained multidimensional scaling data that revealed a color
dimension unique to these so-called ‘color deficient’ observers.
A female carrier of anomalous trichromacy has the potential to exhibit tetrachromatic vision,
since she expresses in her retina four cone classes that differ in their spectral selectivity—the
standard S, M, and L cones, plus cones expressing the anomalous M′ or L′ pigment. However,
merely expressing four classes of cone photoreceptors does not imply that the signals from these
photoreceptors can be neurally compared to support tetrachromatic perception. From a targeted
search for tetrachromatic women, in which seventeen obligate carriers of deuteranomaly and
seven obligate carriers of protanomaly were tested, Jordan et al. (2010) found only one participant
who could make reliable discriminations along the fourth dimension of color space—the color
dimension she shares with her deuteranomalous son.
The ‘opposite’ nature of these colored after-effects does not require that the sensitivity adjustment
occurs at an opponent site. Since complementary colored after-effects can be obtained with any
colored adapting light, they are consistent either with a reduction in sensitivity of the three cone
classes by an amount that depends on the extent to which each class was stimulated by the adapt-
ing light, or with a rebound response at an opponent post-receptoral site.
With intense adapting lights, the resulting sensitivity adjustments show independence between
cone classes (Williams and MacLeod 1979), but at these levels the photochemical process of
bleaching within the cones dominates over neural adjustments. Below bleaching levels colored
after-effects may still be obtained, and independent adjustments of neural gain within cone
classes—as suggested by von Kries (1878)—are likely to contribute to color appearance. To a first
approximation, Weber’s law holds independently for the three cone classes, but two significant
failures—transient tritanopia (Mollon and Polden 1975; Stiles 1949) and combinative euchro-
matopsia (Polden and Mollon 1980)—provide evidence for sensitivity adjustments at a post-
receptoral opponent site.
Slow temporal modulations of colored lights—from achromatic to saturated and back to achro-
matic—produce time-varying sensations. If the modulated region forms a figure against an achro-
matic surround, the figure merges with the background before figure and ground are objectively
equal, and a figure with the complementary color is apparent when there is no physical difference
between the figure and ground. The temporal signature of these after-effects, measured psycho-
physically, matches the time-varying response and rebound-response of retinal ganglion cells,
suggesting that the afterimage signals are generated in the retina, though they may subsequently
be modified by cortical processing (Zaidi et al. 2012).
retina and can be identified as morphologically distinct from the other cones (Curcio et al. 1991).
The S-cone pigment is coded on chromosome seven whereas both the M- and L-cone pigment
genes are carried on the X-chromosome and are 96 per cent homologous (Nathans, Thomas, and
Hogness 1986). The dichromatic system shared by most mammals achieves a two-dimensional
color discrimination by comparing the outputs of a short-wave sensitive receptor and a receptor
in the middle- to long-wavelength region of the spectrum. It is thought that the L- and M-cone
pigment genes diverged only fifty million years ago in our evolutionary history, perhaps confer-
ring a behavioural advantage to our primate ancestors in selecting ripe fruit against a background
of young leaves at a distance (Bompas, Kendall, and Sumner 2013; Regan et al. 2001; Sumner and
Mollon 2000a, 2000b) or at arm’s reach (Parraga, Torscianko, and Tollhurst 2002), and piggy-
backing on the machinery of spatial vision that operated with the longer wavelength receptor
(Martin et al. 2011).
There is some evidence that the S-cone signal, the basis of the ancient color vision system,
remains distinct from the machinery dedicated to the main business of photopic vision. The
S-cones, for example, show minimal projections to the subcortical pathways, and S-cone stim-
uli are processed differently from M- and L-cone stimuli in saccadic (but not attentional) tasks
(Sumner et al. 2002). This asymmetry suggests a further way in which not all ‘colors’ are equal in
specifying and shaping our perceptual world. S-cone isolating stimuli additionally elicit longer
reaction times than L/M-opponent stimuli (Smithson and Mollon 2004) and their signals are
delayed before combination with L- and M-cone signals (Lee et al. 2009). Within the color vision
system this presents a specific temporal binding problem (Blake, Land, and Mollon 2008).
Over-representation of units tuned to particular directions would provide a physiological basis for
the special status of some hues. However, there is a practical difficultly with testing this hypoth-
esis. For a meaningful discussion of the density with which cell-tuning samples the hue contin-
uum, we need to know how to scale the hue and saturation axes. Clumping of neurons’ preferred
directions in one region of hue-space is to be expected if the scaling of the underlying variable
is non-uniform or if some color directions are stimulated more strongly. One candidate scale is
the wavelength scale, but wavelength discrimination thresholds follow a ‘w’-shaped function of
wavelength (Pokorny and Smith 1970), so this is far from a perceptually uniform space. Stoughton
and Conway instead used test stimuli that were linear mixtures of the outputs of a RGB display
(i.e. R-G, G-B, and B-R). But this in itself may have meant that the strongest modulations of early
opponent cells were aligned with the unique hue directions, so that the responses of downstream
neurons inevitably showed a tuning preference for these directions (Mollon 2009).
Fig. 21.1 Illuminated glossy objects that illustrate several points about the interaction of light and
surfaces. The light reflected to the camera comes either from (i) direct specular reflections from
the surface in which the spectral content of the reflected light matches that of the illuminant, or
(ii) reflections from the body of the material in which the spectral content of the reflected light is
given by the illuminant modified by the spectral reflectance of the surface. Monge’s observation
is clear in the parts of the scene dominated by a single source of illumination, such as the front
of the purple mug. Significant chromatic variation is apparent across the purple-colored surface,
fading from purple to desaturated purple (mixed with white); whereas little chromatic variation is
apparent across the white-colored surface of the same mug.
Image: uncommongoods.com with permission.
environment. Some evidence for the special status of the skylight-sunlight locus in shaping our
perceptual apparatus is provided by the very low thresholds for chromatic discrimination of lights
in this region (Danilova and Mollon 2012).
linguistic labels for color that must be discrete. According to the Sapir-Whorf hypothesis, the per-
ception of stimuli depends on the names we give them, and the perception of color has provided
an important test case for the hypothesis. In a seminal study of the color terms used in twenty
unrelated languages, Berlin and Kay (1969) put forward two hypotheses: (1) there is a restricted
universal inventory of such categories; (2) a language adds basic color terms in a constrained
order. They have argued for an underlying structure to the lexicalization of color, which is based
on a universal neurobiological substrate (Kay and Berlin 1997; Kay and McDaniel 1978), but
which leaves scope for Whorfian effects to ‘distort’ perception (Kay and Kempton 1984). Their
thesis has become something of a ‘classic’ but has not achieved universal acclaim, being roundly
criticized by Saunders (2000) on both scientific and anthropological grounds.
If our perceptual space of color were dependent on linguistic labels we might expect several
(testable) consequences: (1) stimuli within categories (given the same name) should look more
similar than those between categories (given different names), and this similarity should have
measureable effects on perceptual judgments (Kay and Kempton 1984); (2) these category-based
effects should be associated with different physical stimuli, depending on the native language
of the participant (Roberson and Hanley 2007; Winawer et al. 2007); (3) pre-language children
should show different perceptual judgments from post-language children (Daoutis et al. 2006);
and (4) training to use new color terms may influence perception (Zhou et al. 2010).
One study in particular has sparked significant research effort in this area. Gilbert et al. (2006)
claimed that between-category visual search is faster than within-category search (by 24 ms), but
only for stimuli presented in the right visual field, a result that they interpret as suggesting the
language centres in the left hemisphere are important in mediating the reaction-time benefit. Such
experiments, however, are riddled with difficulties. As discussed above, there are significant inter-
observer differences in factors that influence the very first stages of color perception (pre-recepto-
ral filtering by lens and macular pigment, differences in receptor sensitivities), and the observer’s
adaptation state has a strong influence on perceived color difference. Witzel and Gegenfurtner
(2011) ran several different versions of the Gilbert et al. study and related studies, but in each
case they included individual specification of color categories, and implemented careful control of
color rendering and of the adaptation state. They found that naming patterns were less clear-cut
than original studies suggested, and for some stimulus sets reaction times were better predicted by
JNDs than by category effects. As we saw with the search for the neural encoding of unique hues,
a recurrent difficulty is the choice of an appropriate space from within which to select test stimuli.
Brown, Lindsey, and Guckes (2011) identified this need for an appropriate null hypothesis—if lin-
guistic category effects do not predict reaction times for visual search, what are they predicted by?
They replicated the Gilbert et al. study, making methodological improvements that were similar to
those introduced by Witzel and Gegenfurtner (2011), but added an independent measurement of
the perceived difference between stimuli (assessed via Maximum Likelihood Difference Scaling,
MLDS). They were unable to replicate Gilbert et al.’s result, and reaction times were simply pre-
dicted by the reciprocal of the scaled perceived difference between colors.
in V1 and thin bands in V2. Although these anatomical subregions have been shown by several
labs to contain a high proportion of cells that are selective for color and a high proportion of
cells that are not selective for orientation (see Gegenfurtner 2003 for review), it cannot be con-
cluded from these measurements that it is, for example, the color-selective cells in the thin stripes
that are not orientation selective. Within-cell measurements of color- and form-selectivity in a
large number of neurons in V1 and V2 of awake behaving monkeys show no correlation between
color and form responses (Friedman, Zhou, and von der Heydt 2003), providing no evidence for
segregation.
Sumner et al. (2008) tested fMRI responses to orientation signals that were defined by lumi-
nance, or by L/M-opponent or S-opponent chromatic modulation. At arrival in V1, S-cone
information is segregated from the pathways carrying form information, while L/M-opponent
information is not. Nevertheless Sumner et al. found successful orientation discrimination, in V1
and in V2 and V3, for luminance and for both color dimensions, suggesting that a proportion of
cells shows joint selectivity to both color and orientation.
Friedman et al. (2003) have explicitly tested the contributions of color-selective cells to the
analysis of edges and surfaces. They found no difference in edge-enhancement between color- and
luminance-selective cells. This contradicts the ‘coloring book’ notion that the form of an object is
processed through achromatic channels, with color being filled-in later, and by separate mecha-
nisms. Instead we see color, orientation, and edge-polarity multiplexed in cortical signals.
(a)
(b)
Fig. 21.2 (a) The Boynton Illusion. The wavy color contour between yellow and grey in the left-hand
image is captured by the smooth black contour. The wavy luminance contour between dark and
light grey in the right-hand image is robust to capture. (b) A plaid constructed by adding a vertical
LM-opponent grating and a horizontal S-opponent grating (left) appears to be dominated by violet-
lime variation when horizontal black contours are applied (middle); and dominated by cherry-teal when
vertical black contours are applied (right).
Data from Stuart Anstis, Mark Vergeer, and Rob van Lier, Luminance contours can gate afterimage colours and
‘real’ colours, Journal of Vision, 12(10), pp. 1–13, doi: 10.1167/12.10.2, 2012.
Contrast sensitivity for low-frequency L-M square-wave gratings can be facilitated by the add-
ition of luminance variation, but the facilitation is abolished at a relative phase of 90° (Gowdy,
Stromeyer, and Kronauer 1999). The result is consistent with integration of color between lumi-
nance edges and comparison across edges. Anstis, Verger, and Van Lier (2012) have further
investigated the ‘gating’ of color by contours. For a colored plaid constructed by superimposing
a blue-yellow vertical sinusoidal grating on a red-green horizontal sinusoidal grating, they used
contours defined by a combination of thick black lines and regions of random-dot motion. When
the contours were horizontal and aligned with the zero-crossings of the horizontal grating, the
plaid appeared red-green; when the contours were vertical and aligned with the zero-crossings of
the vertical grating, the plaid appeared blue-yellow (see Figure 21.2b).
equivalent performance is obtained for color- and luminance-defined contours if the color-
defined contours are presented with a further two-fold increase in contrast. When contours are
defined by alternating elements of color and luminance, performance declines significantly, but
not as much as would be expected from entirely independent processing of color and luminance
edges.
Texture gradients provide a strong monocular cue to depth. Zaidi and Li (2006) showed that
chromatic orientation flows are sufficient for accurate perception of 3D shape. The cone-contrast
required to convey shape in chromatic flows is less than the cone-contrast required in achromatic
flows, indicating that sufficient signal is present in orientation-tuned mechanisms that are also
color-selective. Identification of shape from chromatic flows is masked by luminance modula-
tions, indicating either joint processing of color and luminance in orientation tuned neurons, or
competing organizations imposed by color and luminance.
Troscianko et al. (1991) had previously shown that estimates of the slant of a surface defined
by texture gradients are the same for textures defined by chromaticity and those defined by chro-
maticity and luminance. These authors also find that gradients of brightness and saturation (in
the absence of texture gradients, or in addition to texture gradients) can modify perceived depth,
consistent with the gradual changes in luminance or saturation that are produced as a result of
the increase in atmospheric scattering with distance. Luminance gradients are important in con-
veying 3D shape, through a process described as shape-from-shading, and interactions between
luminance and color gradients have been interpreted with respect to the correspondence between
luminance and color gradients in the natural environment of illuminated surfaces (Kingdom
2003), which we discuss in ‘Configural effects’.
Color can facilitate object segmentation. For example, color vision can reveal objects that are
camouflaged in a greyscale image. Random chromatic variations can also hamper segmentation
of luminance-defined texture boundaries—a phenomenon that is exploited in both natural and
man-made camouflage (Osorio and Cuthill 2013, this volume). Interestingly this presents an
opportunity for dichromatic observers to break such camouflage, since they do not perceive the
chromatic variation (Morgan, Adam, and Mollon 1992).
In the classical random-dot stereogram, the arrays presented to left and right eyes are composed
of binary luminance noise. If the random-dot pattern is made equiluminant, such that the cor-
respondence of matching elements is defined only by their chromaticity, stereopsis fails (Gregory
1977). However, introducing color similarity to matching elements improves stereopsis (Jordan,
Geisler, and Bovik 1990), and in global motion the introduction of a color difference between
target and distractor elements reduces the number of target dots required to identify the direction
of motion (Croner and Albright 1997).
Improvement in thresholds for luminance-defined global motion in the presence of color simi-
larity between target elements suggests that color may be a useful cue for grouping elements
that would otherwise be camouflaged. This color advantage, however, is dependent on select-
ive attention, and disappears in displays that are designed to render selective attention useless
(Li and Kingdom 2001). The ‘Colour Wagon Wheel’ illusion (Shapiro, Kistler, and Rose-Henig
2012) lends further support to the idea that color provides a feature-based motion signal that can
become perceptually uncoupled from the motion-energy signal.
after-effects for shape-frequency and shape-amplitude, are selective for contours defined for the
S-opponent and L/M-opponent cardinal axes (Gheorghiu and Kingdom 2007). Contrast-contrast
effects, in which a region of fixed contrast appears to have a lower contrast when surrounded
by a region of high contrast, are selective for contrast within a cardinal mechanism (Singer and
Dzmura 1994). Plaids comprised of drifting gratings modulated along different cardinal direc-
tions appear to slip with respect to one another, whereas gratings modulated along intermediate
directions in color space tend to cohere (Krauskopf and Farell 1990).
McKeefry, Laviers, and McGraw (2006) present a more nuanced account of the separability
of color inputs to motion processing. They found that the traditional motion after-effect, where
prolonged viewing of a stimulus moving in one direction causes a stationary stimulus to appear to
move in the opposite direction, exhibited a high degree of chromatic selectivity. However, biases
in the perceived position of a stationary stimulus following motion adaptation, were insensitive to
chromatic composition. The dissociation between the two types of after-effect suggests that chro-
matic inputs remain segregated at early stages of motion analysis, while at later processing stages
there is integration across chromatic and achromatic inputs.
Grouping of elements that are similar in terms of the underlying physiological mechanisms
that process them is a recurrent theme in several modern accounts of perceptual organization.
For example, Gilchrist (this volume) shows how simultaneous contrast can be strengthened or
diminished by manipulating the relative spatial frequencies of the figure and ground of the stand-
ard display. Anderson (this volume) presents a strong argument for analysing scenes in terms
of physiologically relevant parameters, such as contrast ratios rather than luminance-difference
ratios. Whilst the Gestalt psychologists were critical of analyses that carve perception into under-
lying channels or modules, the organization of the underlying physiology may still be used to
inform us about the emergence of structure in perceptual experience. For it is likely that the
organization of our neural systems at least in part reflects the organization of our sensory world.
combinations that would be required to re-adapt the observer to a different norm (Vul, Krizay,
and MacLeod 2008).
Under conditions of binocular rivalry, it is possible for a pink-grey vertical grating presented to
the left eye and a green-grey horizontal grating presented to the right eye to be perceived as either
a horizontal or vertical pink-green grating—a perceptual misbinding of color from one eye into
a spatially selective part of the form defined in the other eye (Hong and Shevell 2006). It is also
possible to obtain afterimages of the misbound percept. Importantly, Shevell, St Clair, and Hong
(2008) argue that the afterimage is derived from a central representation of the misbound percept,
rather than as a result of resolution of rivalrous monocular afterimages. They showed that when
adapting stimuli were pulsed, simultaneously or in alternation to the two eyes, misbound after-
images were obtained only in the simultaneous condition. Since it is only this condition that has
rivalrous dichoptic stimuli, their results imply adaptation of a cortical mechanism that encodes
the observer’s (misbound) percept.
Reflected light
Sunlight
Reflectance
Color conversion
Reflected light
Skylight
Reflectance
1 1 1
L-cones M-cones S-cones
Signals under sunlight
0 0 0
0 0.5 1 0 0.5 1 0 0.5 1
Signals under skylight
Fig. 21.3 The light that reaches the eye from a surface depends on the spectral reflectance of the
surface and the spectral energy content of the illuminant (e.g. sunlight or skylight). Example spectral
energy distributions or reflectances are shown in the inset panels. The scatter plots show the L-, M-, and
S-cone signals for a set of 100 surfaces under skylight (x-axis) or sunlight (y-axis). The effect of changing
illumination is approximately described by a multiplicative scaling of the signals in the three cone classes.
The multiplicative constant for each cone class, and the gradients of the line on which the points fall,
depends on the illuminants that are compared. The red symbols represent the cone signals from a
surface with uniform spectral reflectance, which correspond to the signals from the relevant illuminant.
452 Smithson
(a) (b)
ratios for filtered and unfiltered regions and sequences that did not, observers identified the stable
cone-ratios with the transparent filter (Ripamonti and Westland 2003).
Faul and Ekroll (2002), however, contest the claim that invariance of cone-excitation ratios is
necessary for transparency. Westland and Ripamonti’s (2000) analysis was based on a simplified
model of transparency in which the effective reflectance (R´ (λ)) of a surface covered by a filter
was given by a wavelength-by-wavelength multiplication of the reflectance spectrum of the sur-
face (R(λ)) with the absorption spectrum of the filter (T(λ)), reduced by the internal reflectance of
the filter (r) and observed in double-pass, such that: R´(λ) = R(λ) [T(λ) (1−r)2]2. Starting from a
more complete model of physical filtering—in which the filter is specified by its absorption spec-
trum, thickness, and refractive index—Faul and Ekroll (2002) derive a psychophysical model of
perceptual transparency that uses a three-element scaling vector (operating on the cone signals)
to characterize the color and thickness of the filter (corresponding to the direction and magni-
tude respectively of the scaling vector) and an additional parameter to characterize the perceived
‘haziness’ of the filter. For the special case when the refractive index of the filter is equal to one,
and close to that of air, Faul and Ekroll’s model matches Westland and Ripamonti’s model, and
predicts constant cone-excitation ratios. For filters with higher refractive indices, the prediction
does not hold, and Faul and Ekroll’s model provides a better description of their perceptual data.
natural illuminants—explanations that might now sit comfortably within a Bayesian framework
(Feldman, Chapter 45, this volume).
Simple figure-ground displays are compatible with many different perceptual organizations.
The central disc may be an opaque surface lying on a colored background both illuminated by a
neutral light; the central disc may be an opaque surface lying on a neural background both under
spectrally biased illumination; or the central disc may be transparent so that the light reaching the
eye is a mixture of the properties of the transparent layer and of the underlying surface.
Ekroll et al. have argued for transparency-based interpretations of classical demonstrations
of simultaneous color contrast (Ekroll and Faul 2013). Whilst it is true that the simple displays
typically used to show simultaneous color contrast do not include the multiple surfaces that are
required to parse appropriately the contributions from a transparent layer and from the back-
ground or illumination, ambiguous arrangements may also be perceived in terms of surfaces,
filters, and illuminants. A transparency-based interpretation suggests new laws of simultaneous
contrast that have some empirical support, particularly when temporal von Kries adaptation is
taken into account (Ekroll and Faul 2012). Bosten and Mollon (2012) provide a detailed discus-
sion of different theories of simultaneous contrast.
Configural Effects
Color constancy is often cast as the problem of perceiving stable color appearance of a sur-
face under changes in the illumination of the surface. We might also consider positional
color constancy, which describes the invariance of surface color under changes in pos-
ition (von Helmholtz 1867; Young 1807). Illuminant color constancy requires the chro-
matic context of the surface to be taken into account, since for isolated matte surfaces there is
no way to disentangle illuminant and reflectance. Positional color constancy requires the chro-
matic context to be discounted, since color perception would otherwise be an accident of loca-
tion (Whittle and Challands 1969). Amano and Foster (2004) obtained surface color matches in
Mondrian displays in which they were able to change the simulated illuminant and the position
of the test surface. Accuracy was almost as good for positional and illuminant constancy as for
illuminant constancy alone. A reliable cue in these cases was provided by the ratios of cone excita-
tions between the test surfaces and a spatial average over the whole pattern.
In natural viewing, shadows or multiple light sources mean that it is common for scenes to
include multiple regions of illumination. If a perceptual system is to ‘discount’ the illumination
in such scenes, elements that share the same illumination must be grouped together to allow the
appropriate corrections to be applied. Gilchrist’s anchoring theory of lightness (Gilchrist et al.
1999) adopts the term ‘framework’ to specify the frame of reference within which the target stim-
ulus belongs (see also Duncker 1929; Koffka 1935; and Herzog and Öğmen 2013, this volume, for
their discussion of the perceived motion of a target within a frame of reference which may itself
be in motion). The principles that promote grouping according to common illumination are dis-
cussed in detail by Gilchrist (this volume).
Schirillo and Shevell (2000) tested the relationship between color appearance of a small test
patch and the spatial organization of surrounding patches. They used a small set of chromatic
stimuli and varied only the spatial arrangement in different conditions of the experiment, whilst
keeping constant the immediate surround of the test patch, the space-average chromaticity of
the whole scene, and the range and ensemble of chromaticities present. Strong color appearance
effects were found with spatial arrangements that allowed the left and right halves of the display to
be interpreted as areas with identical objects under different illuminations. In achromatic cases,
456 Smithson
Schirillo and Shevell (2002) showed that arranging grey-level patches to be consistent with sur-
faces covered by a luminance edge (i.e. one with a constant contrast ratio) caused shifts in bright-
ness that were in the direction predicted by a change in a real illuminant. Perceptual judgments
of color that are specific to the illuminant simulated in particular regions of the display can be
maintained even when eye-movements cause images of different regions to be interleaved on the
retina, implying that the regional specificity does not derive from peripheral sensory mechanisms
(Lee and Smithson 2012).
Geometric cues, such as X-junctions formed by the continuation of underlying contours across
the edges of a transparency, are vital for the perception of transparency in static scenes (see Figure
21.4). However, whilst X-junctions can promote perceptual scission, they are not necessarily
beneficial in identifying perceptual correlates of the spectral transmittance of the transparent
region, at least in cases where scission is supported by other cues, such as common motion. With
simulations of transparent overlays moving over a pattern of surface reflectances, rotating the
image region corresponding to the transparency by 180° disrupts X-junctions but does not impair
performance in the task of identifying identical overlays across different illuminant regions and
over different surfaces (Khang and Zaidi 2002). It seems that the identification of spectrally select-
ive transparencies in these conditions is well predicted by a process of color matching that oper-
ates with parameters estimated from the mean values in relevant image regions (Khang and Zaidi
2002; Zaidi 1998).
Geometric configuration is particularly important for the perception of three-dimensional sur-
faces and their interaction with illumination. Bloj, Kersten, and Hulbert (1999) showed that color
perception is strongly influenced by three-dimensional shape perception. A concave folded card
with trapezoidal sides can be perceived correctly as an inward-pointing corner, or can be mis-
perceived as a ‘roof ’ if viewed through a pseudoscope which reverses the binocular disparities
between the two eyes. Bloj et al. painted the left side of the folded card magenta, and the right
side white. The light reflected from the left side illuminated the right side, generating a strong
chromatic gradient across the white-painted area. Switching viewing mode from ‘corner’ to ‘roof ’
caused large changes in color-appearance matches to the white-painted side, from a desaturated
pink to a more saturated magenta.
Kingdom (2003) has shown that the perception of shape-from-shading is strong when chro-
matic and luminance variations are not aligned or are out of phase, and suppressed when they are
aligned and in-phase (see Figure 21.5). One interpretation is that spatially corresponding changes
of chromaticity and luminance are most likely to originate from changes in surface reflectance.
Harding, Harris, and Bloj (2012), however, have shown that the use of illumination gradients as a
cue to three-dimensional shape can be flexibly learned, leading to the acquisition of assumptions
about lighting and scene parameters that subsequently allow gradients to be used as a reliable
shape cue.
Concluding Remarks
The perceptual attribute of color has its own inherent structure. Colors can be ordered and
grouped according to their perceptual similarities. For lights in a void, color resides in a three-
dimensional space, constrained by the spectral sensitivities of the three, univariant cone mecha-
nisms and conveniently described by the perceptual qualities of hue, saturation, and brightness.
However, once placed in a spatial and temporal context, and related to other lights, the same
spectral distribution of light reaching the retina can change dramatically in appearance.
Additionally, some hues or color directions have a special status, and the relative influences
Perceptual Organization of Color 457
(a)
(b)
(c)
(d)
Fig. 21.5 When chromatic gratings (left-hand column) and luminance gratings (middle column) are
spatially aligned their combination appears flat (right-hand column, (a) and (c)): but, when they are
spatially misaligned, the luminance component readily contributes ‘shape from shading’ (right-hand
column, (b) and (d)).
Data from Frederick A.A. Kingdom, Colour bring relief to human vision, Nature Neuroscience 6 (6), pp. 641–644,
Figures 2a-4, 3a, and 6a-b, 2003.
of physiological, environmental, and linguistic factors in conferring this status remain fiercely
debated.
Color has a strong organizational influence on scenes. Color can be used to impose spatial
structure, for example when pitted against spatial proximity in conferring rival perceptual organi-
zations or in supporting contour integration. It allows grouping of elements that aid extraction of
depth from random-dot-stereograms, motion from global-motion stimuli, and form from cam-
ouflage. Although color has traditionally been studied in isolation from other perceptual attrib-
utes, and has often been considered as secondary to form perception, there is increasing evidence
that color and form processing interact in subtle and flexible ways.
Color perception is strongly influenced by scene organization, particularly when the spatial
arrangement of surfaces introduces spatio-chromatic signatures that are consistent with the
458 Smithson
References
Adelson, E. H. (2001). ‘On Seeing Stuff: The Perception of Materials by Humans and Machines’. Human
Vision and Electronic Imaging 6(4299): 1–12.
Amano, K. and D. H. Foster (2004). ‘Colour Constancy under Simultaneous Changes in Surface Position
and Illuminant’. Proceedings of the Royal Society B–Biological Sciences 271(1555): 2319–2326.
Anderson, B. L. (2011). ‘Visual Perception of Materials and Surfaces’. Current Biology 21(24): R978–R983.
Anstis, S., M. Vergeer, and R. Van Lier (2012). ‘Luminance Contours can Gate Afterimage Colors and
“Real” Colors’. Journal of Vision 12(10): 1–13.
Berlin, B. and P. Kay (1969). Basic Color Terms: Their Universality and Evolution. Berkeley: University of
California Press.
Blake, Z., T. Land, and J. Mollon (2008). ‘Relative Latencies of Cone Signals Measured by a Moving Vernier
Task’. Journal of Vision 8(16): 1–11.
Bloj, M. G., D. Kersten, and A. C. Hurlbert (1999). ‘Perception of Three-dimensional Shape Influences
Colour Perception through Mutual Illumination’. Nature 402(6764): 877–879.
Bompas, A., G. Kendall, and P. Sumner (2013). ‘Spotting Fruit versus Picking Fruit as the Selective
Advantage of Human Colour Vision’. iPerception 4(2): 84–94.
Bompas, A., G. Powell, and P. Sumner (2013). ‘Systematic Biases in Adult Color Perception Persist Despite
Lifelong Information Sufficient to Calibrate them’. Journal of Vision, 13(1): 19, 1–19.
Bosten, J. M., J. D. Robinson, G. Jordan, and J. D. Mollon (2005). ‘Multidimensional Scaling Reveals a
Color Dimension Unique to “Color Deficient” Observers’. Current Biology, 15(23): R950–R952.
Bosten, J. M. and J. D. Mollon (2012). ‘Kirschmann’s Fourth Law’. Vision Research 53(1): 40–46.
Boynton, R. M. and J. Gordon (1965). ‘Bezold-Brucke Hue Shift Measured by Color-naming Technique’.
Journal of the Optical Society of America 55(1): 78–86.
Perceptual Organization of Color 459
Brainard, D. H. and B. A. Wandell (1986). ‘Analysis of the Retinex Theory of Color-vision’. Journal of the
Optical Society of America A: Optics Image Science and Vision 3(10): 1651–1661.
Brainard, D. H., W. A. Brunt, and J. M. Speigle (1997). ‘Color Constancy in the Nearly Natural Image.1.
Asymmetric Matches’. Journal of the Optical Society of America A: Optics Image Science and Vision
14(9): 2091–2110.
Brainard, D. H. and L. T. Maloney (2011). ‘Surface Color Perception and Equivalent Illumination Models’.
Journal of Vision 11(5):1, 1–18).
Brown, A. M., Lindsey, D. T., & Guckes, K. M. (2011). ‘Color names, color categories, and color-cued visual
search: Sometimes, color perception is not categorical’. Journal of Vision, 11(12): 2, 1–21.
Burns, B. and B. E. Shepp (1988). ‘Dimensional Interactions and the Structure of Psychological Space—the
Representation of Hue, Saturation, and Brightness’. Perception & Psychophysics 43(5): 494–507.
Burns, S. A., A. E. Elsner, J. Pokorny, and V. C. Smith (1984). ‘The Abney Effect—Chromaticity
Coordinates of Unique and Other Constant Hues’. Vision Research 24(5): 479–489.
Cavina-Pratesi, C., R. Kentridge, C. A. Heywood, and A. D. Milner (2010a). ‘Separate Channels for
Processing Form, Texture, and Color: Evidence from fMRI Adaptation and Visual Object Agnosia’.
Cerebral Cortex 20(10): 2319–2332.
Cavina-Pratesi, C., R. Kentridge, C. A. Heywood, and A. D. Milner (2010b). ‘Separate Processing of
Texture and Form in the Ventral Stream: Evidence from fMRI and Visual Agnosia’. Cerebral Cortex
20(2): 433–446.
Craven, B. J. and D. H. Foster (1992). ‘An Operational Approach to Color Constancy’. Vision Research
32(7): 1359–1366.
Croner, L. J. and T. D. Albright (1997). ‘Image Segmentation Enhances Discrimination of Motion in Visual
Noise’. Vision Research 37(11): 1415–1427.
Curcio, C. A., K. A. Allen, K. R. Sloan, Connie L. Lerea, James B. Hurley, et al. (1991). ‘Distribution and
Morphology of Human Cone Photoreceptors Stained with Anti-blue Opsin’. Journal of Comparative
Neurology 312(4): 610–624.
Dacey, D. M. and B. B. Lee (1994). ‘The Blue-on Opponent Pathway in Primate Retina Originates from a
Distinct Bistratified Ganglion-cell Type’. Nature 367(6465): 731–735.
Danilova, M. V. and J. D. Mollon (2012). ‘Foveal Color Perception: Minimal Thresholds at a Boundary
between Perceptual Categories’. Vision Research 62: 162–172.
Daoutis, C. A., A. Franklin, A. Riddett, A. Clifford and I. R. L. Davies (2006). ‘Categorical Effects In
Children’s Colour Search: A Cross-linguistic Comparison’. British Journal of Developmental Psychology
24: 373–400.
Daw, N. W. (1962). ‘Why After-images Are Not Seen in Normal Circumstances’. Nature
196(4860): 1143–1145.
Delahunt, P. B., M. A. Webster, L. Ma, and J. S. Werner (2004). ‘Long-term Renormalization of Chromatic
Mechanisms Following Cataract Surgery’. Visual Neuroscience 21(3): 301–307.
Derrington, A. M., J. Krauskopf, and P. Lennie (1984). ‘Chromatic Mechanisms in Lateral Geniculate
Nucleus of Macaque’. Journal of Physiology (London) 357: 241–265.
de Weert, C. M. M. and N. A. W. H. van Kruysbergen (1997). ‘Assimilation: Central and Peripheral Effects’.
Perception 26: 1217–1224.
Dinkova-Bruun, G., G. E. M. Gasper, M. Huxtable, T. C. B. McLeish, C. Panti, and H. Smithson (2013).
The Dimensions of Colour: Robert Grosseteste’s De colore (Edition, Translation and Interdisciplinary
Analysis). Toronto, Canada: PIMS.
Duncker, D. K. (1929). ‚Uber induzierte Bewegung (Ein Beitrag zur Theorie optisch wahrgenommener
Bewegung)’. Psychologische Forschung 12: 180–259.
D’Zmura, M. and G. Iverson (1993). ‘Color Constancy.1. Basic Theory of 2-Stage Linear Recovery of
Spectral Descriptions for Lights and Surfaces’. Journal of the Optical Society of America A: Optics Image
Science and Vision 10(10): 2148–2163.
460 Smithson
Ekroll, V. and F. Faul (2012). ‘New Laws of Simultaneous Contrast?’ Seeing and Perceiving 25(2): 107–141.
Ekroll, V. and F. Faul (2013). ‘Transparency Perception: The Key to Understanding Simultaneous
Color Contrast’. Journal of the Optical Society of America A: Optics Image Science and Vision
30(3): 342–352.
Elliot, J. (1780). Philosophical Observations on the Senses of Vision and Hearing. London: J. Murry.
Faul, F. and V. Ekroll (2002). ‘Psychophysical Model of Chromatic Perceptual Transparency Based on
Substractive Color Mixture’. Journal of the Optical Society of America A: Optics Image Science and Vision
19(6): 1084–1095.
Fleming, R. W., R. O. Dror, and E. H. Adelson (2003). ‘Real-World Illumination and the Perception of
Surface Reflectance Properties’. Journal of Vision 3(5): 347–368.
Fleming, R. W. and H. H. Bülthoff (2005). ‘Low-level Image Cues in the Perception of Translucent
Materials’. ACM Transactions on Applied Perception 2(3): 346–382.
Fleming, R. W., F. Jakel, and L. T. Maloney (2011). ‘Visual Perception of Thick Transparent Materials’.
Psychological Science 22(6): 812–820.
Fleming, R. W., C. Wiebel, and K. Gegenfurtner (2013). ‘Perceptual Qualities and Material Classes’. Journal
of Vision 13(8):9, 1–20.
Foster, D. H. and S. M. C. Nascimento (1994). ‘Relational Color Constancy from Invariant Cone-Excitation
Ratios’. Proceedings of the Royal Society B-Biological Sciences, 257(1349): 115–121.
Foster, D. H., S. M. C. Nascimento, K. Amano, L. Arend, K. J. Linnell, et al. (2001). ‘Parallel Detection of
Violations of Color Constancy’. Proceedings of the National Academy of Sciences of the United States of
America 98(14): 8151–8156.
Friedman, H. S., H. Zhou and R. von der Heydt (2003). ‘The Coding of Uniform Colour Figures in
Monkey Visual Cortex’. Journal of Physiology (London) 548(2): 593–613.
Fuchs, W. (1923). ‘Experimentelle Untersuchungen über die Änderung von Farben unter dem Einfluss von
Gestalten (Angleichungserscheinungen) [Experimental investigations on the alteration of color under
the influence of Gestalten]’. Zeitschrift für Psychologie 92: 249–325.
Garner, W. R. (1974). The Processing of Information and Structure. Potomac, MD: Erlbaum.
Gegenfurtner, K. R. (2003). ‘Cortical Mechanisms of Colour Vision’. Nature Reviews Neuroscience
4(7): 563–572.
Gelb, A. (1938). ‘Colour Constancy’. In A Source Book of Gestalt Psychology, edited by D. Willis, pp. 196–209.
London: Kegan Paul, Trench, Trubner and Co.
Gheorghiu, E. and F. A. A. Kingdom (2007). ‘Chromatic Tuning of Contour-shape Mechanisms
Revealed through the Shape-frequency and Shape-amplitude After-effects’. Vision Research
47(14): 1935–1949.
Gilbert, A. L., T. Regier, P. Kay, and R. B. Ivry (2006). ‘Whorf Hypothesis is Supported in the Right Visual
Field but not the Left’. Proceedings of the National Academy of Sciences of the United States of America
103(2): 489–494.
Gilchrist, A., C. Kossyfidis, F. Bonato, T. Agostini, J. Cataliotti, et al. (1999). ‘An Anchoring Theory of
Lightness Perception’. Psychological Review 106(4): 795–834.
Goldstein, K. and A. Gelb (1925). ‘Über Farbennamenamnesie’. Psychologische Forschung 6: 127–186.
Gowdy, P. D., C. F. Stromeyer, and R. E. Kronauer (1999). ‘Facilitation between the Luminance and
Red-green Detection Mechanisms: Enhancing Contrast Differences across Edges’. Vision Research
39(24): 4098–4112.
Grassmann, H. (1853). ‘Zur Theorie der Farbenmischung’. Annalen der Physik und Chemie 89: 60–84.
Gregory, R. L. (1977). ‘Vision with Isoluminant Colour Contrast. 1. A Projection Technique and
Observations’. Perception 6(1): 113–119.
Harding, G., J. M. Harris, and M. Bloj (2012). ‘Learning to Use Illumination Gradients as an Unambiguous
Cue to Three Dimensional Shape’. PLoS ONE 7(4): e35950.
Perceptual Organization of Color 461
Ho, Y. X., M. S. Landy, and L. T. Maloney (2008). ‘Conjoint Measurement of Gloss and Surface Texture’.
Psychological Science 19(2): 196–204.
Hong, S. W. and S. K. Shevell (2006). ‘Resolution Of Binocular Rivalry: Perceptual Misbinding of Color’.
Visual Neuroscience 23(3–4): 561–566.
Hurvich, L. M. and D. Jameson (1957). ‘An Opponent-process Theory of Color Vision’. Psychological Review
64(6): 384–404.
Indow, T. and K. Kanazawa (1960). ‘Multidimensional Mapping of Munsell Colors Varying in Hue,
Chroma, and Value’. Journal of Experimental Psychology 59(5): 330–336.
Indow, T. and T. Uchizono (1960). ‘Multidimensional Mapping of Munsell Colors Varying in Hue and
Chroma’. Journal of Experimental Psychology 59(5): 321–329.
Indow, T. (1980). ‘Global Color Metrics and Color-appearance Systems’. Color Research and Application
5(1): 5–12.
Jansch, E. R. (1921). ‘Über den Farbenkontrast und die so genannte Berücksichtigung der farbigen
Beleuchtung’. Zeitsschrift für Sinnesphysiologie 52: 165–180.
Jones, P. D. and D. H. Holding (1975). ‘Extremely Long-term Persistence of the McCollough Effect’. Journal
of Experimental Psychology—Human Perception and Performance 1(4): 323–327.
Joost, U., B. B. Lee, and Q. Zaidi (2002). ‘Lichtenberg’s letter to Goethe on “Farbige Schatten”—
Commentary’. Color Research and Application 27(4): 300–301.
Jordan, J. R., W. S. Geisler, and A. C. Bovik (1990). ‘Color as a Source of Information in the Stereo
Correspondence Process’. Vision Research 30(12): 1955–1970.
Jordan, G. and J. D. Mollon (1997). ‘Unique Hues in Heterozygotes for Protan and Deutan Deficiencies’.
Colour Vision Deficiencies XIII 59: 67–76.
Jordan, G., S. S. Deeb, J. M. Bosten, and J. D. Mollon (2010). ‘The dimensionality of color vision in carriers
of anomalous trichromacy’. Journal of Vision 10(8):12, 1–19.
Katz, D. (1911). The World of Colour, trans. R. B. MacLeod, C. W. Fox. London: Kegan Paul, Trench,
Trubner and Co.
Kay, P. and C. K. McDaniel (1978). ‘Linguistic Significance of Meanings of Basic Color Terms’. Language
54(3): 610–646.
Kay, P. and W. Kempton (1984). ‘What Is the Sapir-Whorf Hypothesis’. American Anthropologist 86(1): 65–79.
Kay, P. and B. Berlin (1997). ‘Science not Equal Imperialism: There Are Nontrivial Constraints on Color
Naming’. Behavioral and Brain Sciences 20(2): 196–201.
Khang, B. G. and Q. Zaidi (2002). ‘Cues and Strategies for Color Constancy: Perceptual Scission, Image
Junctions and Transformational Color Matching’. Vision Research 42(2): 211–226.
King, D. L. (1988). ‘Assimilation Is Due to One Perceived Whole and Contrast Is Due to Two Perceived
Wholes’. New Ideas in Psychology 6(3): 277–288.
King, D. L. (2001). ‘Grouping and Assimilation in Perception, Memory, and Conditioning’. Review of
General Psychology 5(1): 23–43.
Kingdom, F. A. A. (2003). ‘Color Brings Relief to Human Vision’. Nature Neuroscience 6(6): 641–644.
Koenderink, J. (2010). Color for the Sciences. Cambridge, MA: MIT Press.
Koffka, K. (1931). ‘Some Remarks on the Theory of Colour Constancy’. Psychologische Forschung
16: 329–345.
Koffka, K. and M. R. Harrower (1931). ‘Colour and Organization II’. Psychologische Forschung 15: 193–275.
Koffka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt, Brace, and World.
Koffka, K. (1936). ‘On Problems of Colour-perception’. Acta Psychologica, 1, 129–134.
Krauskopf, J., D. R. Williams, and D. W. Heeley (1982). ‘Cardinal Directions of Color Space’. Vision
Research 22(9): 1123–1131.
Krauskopf, J. and B. Farell (1990). ‘Influence of Color on the Perception of Coherent Motion’. Nature
348(6299): 328–331.
462 Smithson
Land, E. H. and J. J. McCann (1971). ‘Lightness and Retinex Theory’. Journal of the Optical Society of
America 61(1): 1–11.
Land, E. H. (1986). ‘Recent Advances in Retinex Theory’. Vision Research 26(1): 7–21.
Lee, B. B., R. M. Shapley, M. J. Hawken, and H. Sun (2012). ‘Spatial Distributions of Cone Inputs to Cells
of the Parvocellular Pathway Investigated with Cone-isolating Gratings’. Journal of the Optical Society of
America A: Optics Image Science and Vision 29(2): A223–A232.
Lee, R. J., J. D. Mollon, Q. Zaidi, and H. E. Smithson (2009). ‘Latency Characteristics of the
Short-wavelength-sensitive Cones and their Associated Pathways’. Journal of Vision 9(12): 5, 1–17.
Lee, R. J. and H. E. Smithson (2012). ‘Context-dependent Judgments of Color that Might Allow Color
Constancy in Scenes with Multiple Regions of Illumination’. Journal of the Optical Society of America
A: Optics Image Science and Vision 29(2): A247–A257.
Li, H. C. O. and F. A. A. Kingdom (2001). ‘Segregation by Color/Luminance Does Not Necessarily
Facilitate Motion Discrimination in the Presence of Motion Distractors’. Perception & Psychophysics
63(4): 660–675.
Liebmann, S. (1927). ‘Über das Verhalten farbiger Formen bei Helligkeitsgleichheit von Figur und Grund’.
Psychologische Forschung 9: 300–353.
Linnell, K. J., and Foster, D. H. (1996). ‘Dependence of Relational Colour Constancy on the Extraction of a
Transient Signal’. Perception 25(2): 221–228.
McCollough, C. (1965). ‘Color Adaptation of Edge-detectors in the Human Visual System’. Science
149(3688): 1115–1116.
McIlhagga, W. H. and K. T. Mullen (1996). ‘Contour Integration with Colour and Luminance Contrast’.
Vision Research 36(9): 1265–1279.
McKeefry, D. J., E. G. Laviers, and P. V. McGraw (2006). ‘The Segregation and Integration of Colour in
Motion Processing Revealed by Motion After-effects’. Proceedings of the Royal Society B—Biological
Sciences 273(1582): 91–99.
MacLeod, D. I. A. (2003). ‘New Dimensions in Color Perception’. Trends in Cognitive Sciences 7(3): 97–99.
Maloney, L. T. and B. A. Wandell (1986). ‘Color Constancy—a Method for Recovering Surface Spectral
Reflectance’. Journal of the Optical Society of America A: Optics Image Science and Vision 3(1): 29–33.
Martin, P. R., E. M. Blessing, P. Buzas, B. A. Szmajda, and J. D. Forte (2011). ‘Transmission of Colour
and Acuity Signals by Parvocellular Cells in Marmoset Monkeys’. Journal of Physiology (London)
589(11): 2795–2812.
Mollon, J. D. and P. G. Polden (1975). ‘Colour Illusion and Evidence for Interaction between Colour
Mechanisms’. Nature 258: 421–422.
Mollon, J. D. (2003). ‘The Origins of Modern Color Science’. In Color Science, edited by S. Shevell.
Washington: Optical Society of America.
Mollon, J. D. (2006). ‘Monge—The Verriest Lecture, Lyon, July 2005’. Visual Neuroscience 23(3–4):
297–309.
Mollon, J. D. (2009). ‘A Neural Basis for Unique Hues?’ Current Biology 19(11): R441–R442.
Morgan, M. J., A. Adam, and J. D. Mollon (1992). ‘Dichromates Detect Color-camouflaged Objects
that Are Not Detected by Trichromates’. Proceedings of the Royal Society B—Biological Sciences
248(1323): 291–295.
Musatti, C. (1931). ‘Forma e assimilazione’ [Form and assimilation]. Archivo Italiano di Psicologica
9: 213–269.
Nathans, J., D. Thomas, and D. S. Hogness (1986). ‘Molecular Genetics of Human Color Vision—the
Genes Encoding Blue, Green, and Red Pigments’. Science 232(4747): 193–202.
Olkkonen, M. and D. H. Brainard (2010). ‘Perceived Glossiness and Lightness under Real-world
Illumination’. Journal of Vision 10(9): 5, 1–19.
Palmer, G. (1777). Theory of Colours and Vision. London: S. Leacroft.
Perceptual Organization of Color 463
Parraga, C. A., T. Troscianko, and D. J. Tolhurst (2002). ‘Spatiochromatic Properties of Natural Images and
Human Vision’. Current Biology 12(6): 483–487.
Pinna, B., G. Brelstaff, and L. Spillmann (2001). ‘Surface Color from Boundaries: A New “Watercolor”
Illusion’. Vision Research 41(20): 2669–2676.
Pokorny, J. and V. C. Smith (1970). ‘Wavelength Discrimination in the Presence of Added Chromatic
Fields’. Journal of the Optical Society of America 60(4): 562–569.
Polden, P. G. and J. D. Mollon (1980). ‘Reversed Effect of Adapting Stimuli on Visual Sensitivity’.
Proceedings of the Royal Society B—Biological Sciences 210(1179): 235–272.
Powell, G., A. Bompas, and P. Sumner (2012). ‘Making the Incredible Credible: Afterimages Are
Modulated by Contextual Edges More than Real Stimuli’. Journal of Vision 12(10): 17, 1–13.
Regan, B. C. and J. D. Mollon (1997). ‘The Relative Salience of the Cardinal Axes of Colour Space in
Normal and Anomalous Trichromats’. Colour Vision Deficiencies XIII 59: 261–270.
Regan, B. C., C. Julliot, B. Simmen, F. Vienot, P. Charles-Dominique, et al. (2001). ‘Fruits, Foliage and
the Evolution of Primate Colour Vision’. Philosophical Transactions of the Royal Society B—Biological
Sciences 356(1407): 229–283.
Ripamonti, C. and S. Westland (2003). ‘Prediction of Transparency Perception Based on Cone-excitation
Ratios’. Journal of the Optical Society of America A: Optics Image Science and Vision 20(9): 1673–1680.
Roberson, D. and J. R. Hanley (2007). ‘Color Vision: Color Categories Vary With Language After All’.
Current Biology 17(15): R605–R607.
Rushton, W. A. H. (1972). ‘Pigments and Signals in Color Vision’. Journal of Physiology (London)
220(3): 1–31P.
Rutherford, M. D. and D. H. Brainard (2002). ‘Lightness Constancy: A Direct Test of the
Illumination-estimation Hypothesis’. Psychological Science 13(2): 142–149.
Saunders, B. and J. van Brakel (1997). ‘Are There Nontrivial Constraints on Colour Categorization?’
Behavioral and Brain Sciences 20(2): 167–228.
Saunders, B. (2000). ‘Revisiting Basic Color Terms’. Journal of the Royal Anthropological Institute
6(1): 81–99.
Schirillo, J. A. and S. K. Shevell (2000). ‘Role of Perceptual Organization in Chromatic Induction’. Journal
of the Optical Society of America A—Optics Image Science and Vision 17(2): 244–254.
Schirillo, J. A. and S. K. Shevell (2002). ‘Articulation: Brightness, Apparent Illumination, and Contrast
Ratios’. Perception 31(2): 161–169.
Shapiro, A., W. Kistler, and A. Rose-Henig (2012). Color Wagon-Wheel (3rd place, Best Illusion of the
Year). http://illusionoftheyear.com/2012/color-wagon-wheel/.
Shepard, R. N. (1964). ‘Attention and the Metric Structure of the Stimulus Space’. Journal of Mathematical
Psychology 1(1): 54–87.
Shepard, R. N. (1991). ‘The Perceptual Organization of Colors: An Adaptation to Regularities of the
Terrestrial World?’ In J. Barkow, L. Cosmides, and J. Tooby (eds.), The Adapted Mind: Evolutionary
Psychology and the Generation of Culture. Oxford: Oxford University Press.
Shevell, S. K., R. St Clair, and S. W. Hong (2008). ‘Misbinding of Color to Form in Afterimages’. Visual
Neuroscience 25(3): 355–360.
Singer, B. and M. D’Zmura (1994). ‘Color Contrast Induction’. Vision Research 34(23): 3111–3126.
Smithson, H. E. and J. D. Mollon (2004). ‘Is the S-Opponent Chromatic Sub-System Sluggish?’ Vision
Research 44(25): 2919–2929.
Smithson, H. E. (2005). ‘Sensory, Computational and Cognitive Components of Human Colour Constancy’.
Philosophical Transactions of the Royal Society B—Biological Sciences 360(1458): 1329–1346.
Smithson, H. E., G. Dinkova-Bruun, G. E. M. Gasper, M. Huxtable, T. C. B. McLeish, et al. (2012).
‘A Three-dimensional Color Space from the 13th Century’. Journal of the Optical Society of America
A: Optics Image Science and Vision 29(2): A346–A352.
464 Smithson
Solomon, S. G. and P. Lennie (2005). ‘Chromatic Gain Controls in Visual Cortical Neurons’. Journal of
Neuroscience 25(19): 4779–4792.
Solomon, S. G., J. W. Peirce, and P. Lennie (2004). ‘The Impact of Suppressive Surrounds on Chromatic
Properties of Cortical Neurons’. Journal of Neuroscience 24(1): 148–160.
Stiles, W. S. (1949). ‘Increment Thresholds and the Mechanisms of Colour Vision’. Documenta
Ophthalmologica 3(1): 138–165.
Stockman, A. and D. H. Brainard (2009). ‘Color Vision Mechanisms’. In Vision and Vision Optics: The
Optical Society of America Handbook of Optics (3rd edn, Vol. 3), edited by Bass M., C. DeCusatis,
J. Enoch, V. Lakshminarayanan, G. Li, C. Macdonald, et al. New York: McGraw Hill.
Stoughton, C. M. and B. R. Conway (2008). ‘Neural Basis for Unique Hues’. Current Biology
18(16): R698–R699.
Sumner, P. and J. D. Mollon (2000a). ‘Catarrhine Photopigments are Optimized for Detecting Targets
against a Foliage Background’. Journal of Experimental Biology 203(13): 1963–1986.
Sumner, P. and J. D. Mollon (2000b). ‘Chromaticity as a Signal of Ripeness in Fruits Taken by Primates’.
Journal of Experimental Biology 203(13): 1987–2000.
Sumner, P., T. Adamjee, and J. D. Mollon (2002). ‘Signals Invisible to the Collicular and Magnocellular
Pathways can Capture Visual Attention’. Current Biology 12(15): 1312–1316.
Sumner, P., E. J. Anderson, R. Sylvester, J. D. Haynes, and G. Rees (2008). ‘Combined Orientation
and Colour Information in Human V1 for both L-M and S-cone Chromatic Axes’. Neuroimage
39(2): 814–824.
Tansley, B. W. and R. M. Boynton (1976). ‘A Line, Not a Space, Represents Visual Distinctness of Borders
Formed by Different Colors’. Science 191(4230): 954–957.
Tokunaga, R. and A. D. Logvinenko (2010). ‘Material and Lighting Dimensions of Object Colour’. Vision
Research 50(17): 1740–1747.
Troscianko, T., R. Montagnon, J. Leclerc, E. Malbert, and P. L. Chanteau (1991). ‘The Role of Color as a
Monocular Depth Cue’. Vision Research 31(11): 1923–1929.
von Helmholtz, H. (1867). Handbuch der physiologischen Optik (1st edn, Vol. 2). Leipzig: Leopold Voss.
Translation of 3rd edn, Helmholtz’s Treatise on Physiological Optics, 1909, edited by J. P. C. Southall, pp.
286–287. Washington, DC: Optical Society of America, 1924.
von Kries, J. (1878). ‘Beitrag zur Physiologie der Gesichtsempfindungen’ [ Physiology of Visual
Sensations]. In Sources of Color Science, ed. D. L. MacAdam, pp. 101–108. Cambridge, MA: MIT Press.
Vul, E., E. Krizay, and D. I. A. MacLeod (2008). ‘The McCollough Effect Reflects Permanent and Transient
Adaptation in Early Visual Cortex’. Journal of Vision 8(12):4, 1–12.
Webster, M. A., K. K. Devalois, and E. Switkes (1990). ‘Orientation and Spatial-Frequency Discrimination
for Luminance and Chromatic Gratings’. Journal of the Optical Society of America A: Optics Image
Science and Vision 7(6): 1034–1049.
Webster, M. A., K. Halen, A. J. Meyers, P. Winkler, and J. S. Werner (2010). ‘Colour Appearance
and Compensation in the Near Periphery’. Proceedings of the Royal Society B: Biological Sciences
277(1689): 1817–1825.
Werner, J. S. and B. E. Schefrin (1993). ‘Loci of Achromatic Points throughout the Life Span’. Journal of the
Optical Society of America A: Optics Image Science and Vision 10(7): 1509–1516.
Westland, S. and C. Ripamonti (2000). ‘Invariant Cone-Excitation Ratios May Predict Transparency’.
Journal of the Optical Society of America A: Optics Image Science and Vision 17(2): 255–264.
Whittle, P. and P. D. C. Challands (1969). ‘Effect of Background Luminance on Brightness of Flashes’.
Vision Research 9(9): 1095–1110.
Williams, D. R. and D. I. A. MacLeod (1979). ‘Interchangeable Backgrounds for Cone Afterimages’. Vision
Research 19(8): 867–877.
Perceptual Organization of Color 465
Winawer, J., N. Witthoft, M. C. Frank, L. Wu, A. R. Wade, et al. (2007). ‘Russian Blues Reveal Effects of
Language on Color Discrimination’. Proceedings of the National Academy of Sciences of the United States
of America 104(19): 7780–7785.
Witzel, C. and K. R. Gegenfurtner (2011). ‘Is There a Lateralized Category Effect for Color?’ Journal of
Vision 11(12):16, 1–25.
Wuerger, S. M., L. T. Maloney, and J. Krauskopf (1995). ‘Proximity Judgments in Color Space—Tests of a
Euclidean Color Geometry’. Vision Research 35(6): 827–835.
Wyszecki, G. and W. S. Stiles (1982). Color Science: Concepts and methods. Quantitative data and Formulae.
New York: Wiley.
Xian, S. X. (2004). ‘Perceptual Grouping in Colour Perception’. PhD, University of Chicago, Illinois.
Xian, S. X. and S. K. Shevell (2004). ‘Changes in Color Appearance Caused by Perceptual Grouping’. Visual
Neuroscience 21(3): 383–388.
Young, T. (1802). ‘The Bakerian Lecture. On the Theory of Light and Colours’. Philosophical Transactions of
the Royal Society of London 92: 12–48.
Young, T. (1807). A Course of Lectures on Natural Philosophy and the Mechanical Arts (Vol. I, lecture
XXXVIII). London: Joseph Johnson.
Zaidi, Q. (1998). ‘Identification of Illuminant and Object Colors: Heuristic-Based Algorithms’. Journal of the
Optical Society of America A: Optics Image Science and Vision 15(7): 1767–1776.
Zaidi, Q. and A. Li (2006). ‘Three-Dimensional Shape Perception from Chromatic Orientation Flows’.
Visual Neuroscience 23(3–4): 323–330.
Zaidi, Q., R. Ennis, D. C. Cao, and B. Lee (2012). ‘Neural Locus of Color Afterimages’. Current Biology
22(3): 220–224.
Zhou, K., L. Mo, P. Kay, V. P. Y. Kwok, T. N. M. Ip, et al. (2010). ‘Newly Trained Lexical Categories Produce
Lateralized Categorical Perception of Color’. Proceedings of the National Academy of Sciences of the
United States of America 107(22): 9974–9978.
Chapter 22
1 Theoretical preliminaries
The adaptive role of vision is to provide information about the behaviorally relevant properties
of our visual environment. Our evolutionary success relies on recovering sufficient information
about the world to fulfill our biological and reproductive needs while avoiding environmental
dangers. The attempt to understand vision as a collection of adaptations to specific computational
problems has shaped a growing body of research that treats vision as a decomposable collection of
‘recovery’ problems. In this view, perceptual outputs are understood as approximately ideal solu-
tions to specific recovery problems, which have been dubbed the ‘natural tasks’ of vision (Geisler
and Ringach 2009). From this perspective, the science of understanding visual processing pro-
ceeds by identifying an organism’s natural tasks, evaluating the information available to perform
each task, developing models of how to perform a task optimally, and discovering the mechanisms
that implement these solutions.
The first aspect of this method of approach—the identification of ‘natural tasks’—is arguably
the most important because it defines the problem that needs to be solved. It is also the least con-
strained. Any environmental property can be hypothesized to be something that could have adap-
tive value and therefore something that might provide a selective advantage to anyone equipped to
recover it. Presumably, however, only some aspects of our environment were involved in directly
shaping the evolution of our senses. The scientific challenge is to differentiate properties that
actually exerted selective pressure in shaping the design of our senses from those that merely
came along for the ‘evolutionary ride’ (perceptual ‘spandrels’). But there is currently no principled
means of making such distinctions. For example, a general argument could be (and has been)
made that the computation of surface lightness would be useful because it provides information
about an intrinsic property of the external world, but it is much harder to fashion a clear argument
about how the recovery of surface albedo provides a specific adaptive benefit, or that any such
benefit played a role in natural selection.
The second aspect of the adaptationist approach—identifying the information available for a
computation—is in principle more constrained. Natural scenes are replete with information that
could be used to sense a particular world property. Once a recovery problem has been identified,
it is possible to inventory the sources of information that exist in the natural world that can be
used to sense it. However, most recovery problems in vision (such as shape, depth, color, lightness,
etc.) are considered in isolation, often in informationally impoverished laboratory settings. This
approach has led to the nearly universal acceptance of a belief in the poverty of the stimulus: the
presumption that the images do not contain sufficient information to recover the aspects of the
world that we experience. This view is typically defended by demonstrating that it is impossible
The perceptual representation of transparency, lightness, and gloss 467
to derive a unique solution for a specific recovery problem based on the information available in
the images. Perception is construed as the outputs of a collection of under-constrained problems
of probabilistic inference, which are solved with the aid of additional information, assumptions,
or constraints. So construed, it is natural to turn to probability theory for guidance into how to
solve such inference problems ideally, which typically entails the application of Bayes’ theorem
(see Feldman’s chapter, this volume).
The third aspect of the adaptationist program is ostensibly the easiest, and is where theory
meets data. Percepts or perceptual performance of observers is compared to that of the Bayesian
ideal, constructed on a set of priors and likelihoods. When data and the Bayesian ideal are deemed
sufficiently similar, the explanatory circle is considered closed: the fit between model and data is
upheld as evidential support for the specification of the natural tasks, the selection of priors and
likelihoods needed to perform the inference, and the claim that perception instantiates a form
of Bayesian inference. All that remains is the discovery of the mechanisms that instantiate such
computations.
The preceding describes what may currently be considered one (if not the) dominant view on
how to approach the study and modeling of visual processes. My own view departs in a num-
ber of significant ways from this approach, which shapes both my selection of problems and the
theoretical approach taken to account for data. One of the main goals of this chapter is to provide
an overview of how my approach has shaped work in three areas of surface and material percep-
tion: transparency, lightness, and gloss. The gist of my approach may be articulated as follows.
First, I assume that the attempt to identify the ‘natural tasks’ of vision—i.e., the computational
‘problems’ that visual systems putatively evolved to solve—is at best a guessing game, and at worst
a theoretical fiction. Some of the ‘problems’ our visual systems seem to solve may be epiphenom-
enal outputs, not explicit adaptations. Second, the claim that vision is an ill-posed inference prob-
lem is a logical consequence of treating vision as a collection of recovery problems, for which it
can be shown that there is no closed form solution that can be derived from the information that
is currently available. But if the putative ‘recovery problem’ is misidentified, or the ‘information
available for solving it’ is artificially restricted (such as typically occurs in laboratory environ-
ments), then it may not be vision that is ill-posed, but our particular understanding of visual
processing that is misconstrued.
An alternative approach is to begin with what we visually experience about the world, and attempt
to determine what image properties modulate these experiences. The question is not whether there
is sufficient information in the images to specify the true states of the world, but rather, whether
there is sufficient information to explain what we experience about the world. This approach is
neutral as to the ‘computational goals’ of the visual system, or if even whether the idea of a compu-
tational goal has any real meaning for biological systems. Whereas the recovery of a world property
can be shown to be under-constrained by argument, the question whether there is sufficient infor-
mation available to explain what we experience about the world is an empirical question.
structure, despite the fact that they are conflated in the image. Much research into perceptual
organization has focused on how the visual system fills in missing information or groups image
fragments into a global structure or pattern. While such phenomena are an extremely important
aspect of our visual experience, one of the other fundamental organizational problems involves
understanding how the visual system disentangles different sources of image structure into the
distinct surface and material qualities that we experience. In what follows, I consider a variety of
segmentation problems in the perception of surface and material attributes, and the insights that
such problems shed on the broader theoretical issues raised above.
2.1 Transparency
One of the perceptually most explicit and theoretically challenging forms of image segmenta-
tion occurs in the perception of transparency. Historically, the study of transparency focused on
achromatic surfaces, which was largely due the seminal influence of Metelli’s model of transpar-
ency (Metelli 1970, 1974a, 1974b, 1985; see also Gerbino’s chapter, this volume). The perception
of (achromatic) transparent surfaces generates two distinct impressions: its perceived lightness
and its perceived opacity or ‘hiding power’. Metelli’s model was based on a simple physical device
known as an episcotister: a rapidly rotating disc with a missing sector. The proportion of the disk
that is ‘missing’ determines the amount of light transmitted from the underlying surfaces through
the episcotister blades, which is the physical correlate of a transparent surface’s transmittance.
The lightness (or albedo) of the transparent surface corresponded to the color of the paint used
on the front surface of the episcotister, which determines the color of the transparent layer (or for
achromatic paints, its lightness). Metelli’s model was restricted to ‘balanced’ transparency, which
referred to conditions where the episcotister had a uniform reflectance and transmittance, reduc-
ing each to a single scalar (number). For the simple bipartite fields Metelli used as backgrounds,
this allowed equations for the total reflected light in the regions of overlay to be written as a sum of
two components: a multiplicative transmittance term, which determined the weight for the con-
tribution of the underlying surface; and an additive term, which corresponds the light reflected
by the episcotister surface. By construction, Metelli considered displays containing two uniformly
colored background regions, which gave him a system of two equations and two unknowns that
could be solved in closed form. A significant body of work showed that the perception of trans-
parency is often well predicted by Metelli’s episcotister model: balanced transparency is perceived
when displays were consistent with the episcotister equations, but generally not otherwise. Note
that Metelli’s model served double duty as both a physical model of transparency and a psychologi-
cal model of the conditions that elicit percepts of transparency.
Despite these successes, Metelli himself noted a curious discrepancy between the predictions of
the episcotister model and perception: a light episcotister looks less transmissive than dark epis-
cotister (Metelli 1974a). From a ‘recovery’ point of view, this constitutes a perceptual error, and
hence non-ideal performance, but almost no experimental work was conducted to understand
this deviation from the predictions of Metelli’s model. We therefore performed a series of experi-
ments to test whether the physical independence of opacity and lightness is observed psychophys-
ically (Singh and Anderson 2002). Observers matched the transmittance of simulated surfaces
that varied in lightness, and the lightness of transparent filters that varied in transmittance. We
found that lightness judgments were modulated by simulated transmittance, and transmittance
judgments were modulated by simulated variations in lightness. Thus, although the transmittance
and reflectance of transparent layers are physically independent parameters in Metelli’s model,
they are not experienced as being independent perceptually.
The perceptual representation of transparency, lightness, and gloss 469
What theoretical conclusions can be drawn from these results? Metelli’s model treated a physical
model of transparency as a perceptual model of transparency. Our findings of mutual ‘contamin-
ation’ of the transmittance and lightness of the transparent filter implies one of two possibilities: (1)
there is no simple correspondence between the dimensions of a physical model and a perceptual
model, or (2) that Metelli’s model is the wrong physical model on which to base theories of per-
ceived transparency. With respect to (1), Metelli’s model equates the perceived opacity of an epis-
cotister with its physical transmittance, and hence cannot explain why light episcotisters look more
opaque than dark episcotisters. The dependence of perceived opacity on lightness can be readily
understood, however, if the visual system relied on image contrast to assess the hiding power of
transparent surfaces. A light episcotister reduces the contrast of underlying surface structure more
than an otherwise identical dark episcotister, and hence, should appear more opaque if the visual
system uses image contrast to assess perceived opacity1. Indeed, it seems almost inevitable that the
visual system utilizes contrast to judge the perceived opacity of transparent filters, since contrast
determines the visibility of image structure in general. But this implies that the visual system is
using the ‘wrong’ image properties to generate our experience of a world property, and hence will
almost always result in the ‘wrong’ answer. From the perspective of explaining our experience, such
issues are largely irrelevant; the only issue is whether there is sufficient information in the image to
explain what it is we experience about the world, not whether such percepts are veridical.
Alternatively, it could be (and has been) argued that the discrepancy between perception and
Metelli’s model merely provides evidence that there is something wrong with Metelli’s model, and
does not impact on the more general claim that perception can be identified with the recovery
of some physical model. Faul and Ekroll (2011) have made precisely this argument. They con-
tend that a subtractive filter model better captures the perception of chromatic transparency, and
hence may be a more appropriate model of achromatic transparency as well. Although there is
currently insufficient data to determine which of these alternatives is ultimately correct for achro-
matic stimuli, Faul and Ekroll reported substantial discrepancies between their filter model and
perceived transparency when the chromatic content of the illuminant was varied, despite demon-
strating that there was theoretically sufficient information for a much better level of performance
(Faul and Ekroll 2012). At this juncture, there is currently no physical model that maps directly
onto our experience of transparent surfaces, and it is largely a matter of scientific faith that such a
model may ultimately be discovered.
2.2 Lightness
The perception of lightness also has been treated as a kind of segmentation problem. For ach-
romatic surfaces, the term lightness (or albedo) refers to a surface’s diffuse reflectance. The light
returned to the eye is a conflated mixture of the illuminant, surface reflectance, and 3D pose.
There is currently extensive debate over the computations, mechanisms, and/or assumptions that
are responsible for generating our experience of lightness (see Gilchrist’s chapter, this volume).
There are four general theoretical approaches to the problem of lightness: scission (or layers
1 This reduction in contrast occurs for almost any definition of contrast, which includes a divisive normaliza-
tion term that is a function of integrated or mean luminance in the region over which contrast is defined.
Unfortunately, there is currently no general definition of contrast that adequately captures perceived contrast
in arbitrary images, so the precise way in which contrast is reduced depends on the definition of contrast used
in a particular context.
470 Anderson
models), equivalent illuminant models, anchoring models, and filter or filling-in models. I con-
sider each model class in turn.
or rule is needed to transform ambiguous information about relative lightness into an estimate of
absolute surface reflectance. For example, an image containing a 2:1 range of luminances could
be generated by surfaces with reflectances of three per cent and six per cent, or five per cent and
10 per cent, 40 per cent, 80, ad infinitum. Anchoring theory asserts that this ambiguity must be
resolved with an anchoring rule, such that a specific relative image luminance (such as the high-
est) is mapped onto a fixed lightness value (such as white). All other lightness values in a scene are
putatively derived by computing ratios relative to this anchor value. A number of fixed points are
possible (e.g., the average luminance could be grey, the highest luminance could be white, or the
lowest luminance could be black), but a variety of experiments, especially those from Gilchrist’s
lab, have suggested that in many contexts, the highest luminance is perceived as white.
Filtering and filling-in models
A third approach to lightness treat lightness percepts as the outputs of local image filters applied
directly to the images (Blakeslee and McCourt 2004; Dakin and Bex 2003; Kingdom and Moulden
1988, 1992; Shapiro and Lu 2011). Such approaches typically do not distinguish between per-
ceived lightness (perceived surface reflectance) and brightness (perceived luminance), at least not
explicitly in the construction of the model. Rather, a new image is generated from a set of transfor-
mations applied to the input image. In a strict sense, filter models are not truly lightness models,
since they simply transform one image into another image. Such models are more appropriately
construed as models of brightness than lightness, since there is no explicit attempt to represent
surface reflectance, or distinguish reflectance from luminance. Their relevance to understanding
lightness depends on the extent to which the distinction between brightness and lightness makes
biological or psychological sense for a given image or experimental procedure. Like anchoring
models, filter approaches to lightness do not explicitly segment image luminance into separate
components of reflectance and illumination.
In a related manner, a variety of filling-in models have been proposed that do not explicitly dis-
tinguish lightness and brightness (Grossberg and Mingolla 1985; Paradiso and Nakayama 1991;
Rudd and Arrington 2001). Such models invoke a two stage process: one that responds to the
magnitude and orientation of ‘edges’ (oriented contrast) and/or gradients, and a second process
that propagates information between such localized ‘edge’ responses to generate a fully ‘filled-in’
or interpolated percept of brightness or color.
The core claim of scission models is that our experience of lightness involves the decompos-
ition of the input into separable causes. One of the difficulties in assessing scission models is that
it is not always clear whether (or when) such separation occurs, or what criteria that should be
applied to determine whether such decomposition occurs. One can begin by posing a question
of sufficiency: Can scission induce transformations in perceived lightness when it is phenomen-
ally apparent? The most phenomenologically compelling sense of scission occurs in conditions
of transparency, which requires the satisfaction of both geometric and photometric conditions.
One technique for inducing scission involves manipulating the relative depth and photometric
relationships of stereoscopic Kanizsa figures such as those depicted in Figure 22.1. When the grey,
wedge-shaped segments of the Kanizsa figure’s inducing elements in Figure 22.1 are decomposed
into a transparent layer overlying a white disk (second and fourth rows of Figure 22.1), they appear
substantially darker than when the same grey segment appears to overlie a dark disk (first and third
rows of Figure 22.1). Note that the color of the underlying circular inducing element appears to be
Fig. 22.1 Stereoscopic Kanizsa figure demonstrating the role of scission on perceived lightness for
two different grey values. The small pie shaped inducing sectors are the same shade of dark grey
in the top two rows, and the same shade of light grey in the bottom two rows. When the left two
images are cross fused, or the right two images divergently fused, an illusory diamond is experience.
Note that the diamonds in the first and third rows appear much lighter than their corresponding
figures in the second and fourth rows.
Adapted from Trends in Cognitive Sciences, 2(6), Richard A Andersen and David C Bradley, Perception of three-
dimensional structure from motion, pp. 222–8, Copyright (1998), with permission from Elsevier.
The perceptual representation of transparency, lightness, and gloss 473
‘removed’ from the grey wedge-shaped segments and attributed to the more distant layer, which
putatively transforms the perceived lightness of the transparent layer. Note also that the direction of
the lightness transformation depends on which layer observers are asked to report. If observers are
asked to report the color of the far layer underneath the grey sectors of the top image, they report
it as appearing quite dark (nearly black), since this is the color of the interpolated disc. But if they
are asked to report the near layer of the transparent region, they report it as appearing quite light.
In order to provide more conclusive evidence for the effects of scission on perceived light-
ness, I constructed stereoscopic variants of Figure 22.1 using random noise textures. The goal
was to induce transparency in a texture such that the light and dark ‘components’ of the texture
would perceptually segregate into different depth planes. An example is presented in Figure 22.2.
When the left two columns are cross-fused, vivid percepts of inhomogeneous transparency can
be observed: The top image appears as dark clouds overlying light disks, and the bottom appears
as light clouds overlying dark disks. Note that the lightest components of the texture in the top
image appear as portions of the underlying disc in plain view, whereas the same regions in the
bottom image appear as the most opaque regions of the light clouds in the bottom image (and vice
versa for the dark regions). We subsequently showed that similar phenomena could be observed
in non-stereoscopic displays. In these images, scission was induced by embedding targets in sur-
rounds that contain textures that selectively group with either the light or dark ‘components’ of the
textures within the targets (Figure 22.3). As with their stereoscopic analogues, the white and black
chess pieces are actually physically identical (i.e., contain identical patterns of texture). Note that
the luminance variations within the texture of the chess piece figures are experienced as variations
in the opacity of a transparent layer that overlie a uniformly colored surface. The opacity of the
Fig. 22.2 Stereoscopic noise patterns can also be decomposed into layers in ways that induce large
transformations in perceived lightness. If the left two images are cross fused or the right two images
divergently fused, the top image appears to split into a pattern of dark clouds overlying light discs
(top), or light clouds overlying dark disks (bottom). The textures in the top and bottom are physically
identical.
Adapted from Neuron, 24(4), Barton L. Anderson, Stereoscopic Surface Perception, pp. 919–28, Copyright (1999),
with permission from Elsevier.
474 Anderson
Fig. 22.3 Scission can also be induced by a selective grouping the light and dark components of
texture of the targets (chess pieces) with the surround. The textures within the chess pieces in the
top and bottom images are identical, but appear as dark cloud overlying light chess pieces on the
top, and light clouds overlying dark chess pieces on the bottom.
Reprinted by permission from Macmillan Publishers Ltd: Nature, 434, Barton L. Anderson and Jonathan Winawer,
Image segmentation and lightness perception, pp. 79–83, doi: 10.1038/nature03271 Copyright © 2005, Nature
Publishing Group.
transparent surface is greatest for luminance values that most closely match the surround along the
borders of the chess pieces (dark on top, light on the bottom), and the least opaque when for lumi-
nance values that are most different from the surround (light on top, dark on the bottom). Note that
the lightest regions within the targets on the dark surround appear in plain view, and the darkest
regions within the targets appear in plain view on the light surround. This bias is evident for essen-
tially all ranges of target luminance tested, although this perceptual fact is in no way mandated by
the physics of transparency, particularly for underlying surfaces that do not appear black or white.
These phenomena demonstrate that scission can induce striking transformations in perceived
lightness in conditions of transparency, but it does not address the broader question of whether
The perceptual representation of transparency, lightness, and gloss 475
scission plays a role in generating our experience of lightness in conditions that do not generate
explicit percepts of multiple layers or transparency.
EIMs also assert that the perception of surface color and lightness is derived by decomposing
the image into estimates of the illuminant and surface reflectance. The evidence in support of this
model is, however, phenomenologically indirect. Work from Brainard’s and Maloney’s labs have
demonstrated that the parametric structure of a variety of matching data can be explained with a
two-stage model in which the first stage involves an estimation of the illuminant (an ‘equivalent
illuminant’), which is then used to derive observers’ reflectance matches from the input images
(Brainard and Maloney, 2011).
Unlike scission models or EIMs, anchoring theory asserts that lightness is derived without
explicitly decomposing the images into an explicit representation of illumination and reflectance.
The central premise of anchoring theory is that the visual system solves the ambiguity of lightness
by treating a particular relative luminance as a fixed (anchor) point on the lightness scale (namely,
that the highest luminance as white), independent of the level of illumination or absolute lumi-
nance values in a scene. To test this claim, we constructed both paper Mondrians displayed in an
otherwise uniformly black laboratory, and simulated Mondrians displayed on a CRT in a dark
black lab room (Anderson et al. 2008; Anderson et al. 2014). In all cases, the highest luminance
in the room was the central target patch of the Mondrian display. We varied both the reflectance
range and illumination level of the former (i.e., paper Mondrians), and the simulated reflectance
range and simulated illuminant levels of the latter simulated Mondrians. For restricted reflectance
ranges (3:1 or less), we found that the highest luminance could vary in perceived lightness as a
function of illumination. For our simulated illuminants and Mondrian displays, observers’ light-
ness matches (expressed as a percentage of reflectance) were a logarithmic function of (simulated)
illuminant, rather than an invariant ‘white’ as predicted by anchoring theory. These results suggest
that the apparent ‘anchoring’ of luminance to ‘white’ is a consequence of the particular experimen-
tal conditions that have been used to assess this model, rather than reflecting an invariant ‘anchor
point’ used to scale other lightness values.
Some recent data has provided some strong evidence against an explicit illumination estima-
tion model, and more generally, any most that relies on luminance ratios to compute perceived
lightness (such as anchoring theory). Radonjic et al. (2011) conducted experiments depicting
checkerboard displays in a display capable of displaying an extremely large dynamic range, and
found that observers mapped a very high dynamic range (~10,000:1) onto an extended lightness
range of 100:1, which spanned from ‘white’ to ‘dark black’ (the darkest values were obtained using
glossy papers). Such behavior would not be expected for any model that attempts to infer a phys-
ically realizable illuminant, or any realizable reflectance ratios of real surfaces, as embraced by
anchoring theory or the EIM.
One common assumption of anchoring theory and the EIM is that the visual system expli-
citly attempts to extract an estimate of lightness that corresponds to the physical dimension of
surface albedo. The results of Radonjić et al. (2011) provide compelling evidence against this
view. Just as our experience of transparency may not have any direct correspondence to the
physical dimensions that modulate perceived transparency (such as transmittance), the per-
ception of lightness may not represent an approximation of the physical dimension of surface
albedo. The results of Radonjic et al. provide evidence that directly challenge any attempt to
interpret the visual response as a ‘best guess’ as to the environmental sources that produced
their stimuli, since there is no combination of surface reflectance and illuminant that can
produce such stimuli (at least in a common illuminant). I will return to this general point in
the general discussion below.
476 Anderson
3 Gloss
The experience of gloss is another aspect of our experience of surface reflectance that has received a
growing amount of experimental attention. Whereas the concept of surface lightness has been cast
as the problem of understanding how we experience the diffuse reflectance of a surface, the percep-
tion of gloss is typically cast as the problem of understanding how we experience the specular ‘com-
ponent’ of reflectance. From a generative point of view, the diffuse and specular ‘components’ of
reflectance are treated as computationally separable. So construed, the problem of gloss perception
involves understanding how the visual system segments the image structure generated by specular
reflectance from diffuse reflectance (and all other sources of image structure).
The apparent intractability of this problem has inspired attempts to find computational
short-cuts to avoid the complexity of this decomposition problem. One approach asserts that the
visual system uses simple image statistics that do not require any explicit decomposition of the
images into distinct components of reflectance to derive our experience of gloss. Motoyoshi et al.
(2007) argued that perceived gloss was well predicted by an image’s histogram or sub-band skew, a
measure of the asymmetry of the pixel histogram (or response of center-surround filters) respect-
ively. This claim was evaluated for a class of stucco surfaces with a statistically fixed level of surface
relief that were viewed in fixed illumination field. In these conditions, glossy surfaces generated
images with a strong positive skew, whereas matte surfaces generated surfaces with negative skew.
The attractive feature of this kind of model is that it potentially reduces a complex mid-level vision
problem into a comparatively simple problem of detecting low-level image properties.
However, subsequent work has shown that our experience of gloss cannot be understood so
easily (Anderson and Kim 2009; Kim and Anderson 2010; Kim et al. 2011; Marlow et al. 2011;
Olkkonen and Brainard 2010, 2011). One of the main problems with the proposed image statistics
is that they fail to take into account the kind of image structure that predicts when gloss will or
(a) (b)
Fig. 22.4 The perception of gloss depends critically highlights appearing in the ‘right places’ of
a surface’s diffuse shading profile. In A, the highlights appear near the luminance maxima of the
diffused shading profile and have similar orientations, and the surface appears relatively glossy. In B,
the highlights have been rotated so that they appear with random positions and orientations relative
to the diffuse shading profile, and do not appear glossy.
Reproduced from Barton L. Anderson and Juno Kim, Image statistics do not explain the perception of gloss and
lightness, Journal of Vision, 9(11), pp. 1–17, figure 3, doi: 10.1167/9.11.10 © 2009, Association for Research in
Vision and Ophthalmology.
The perceptual representation of transparency, lightness, and gloss 477
won’t be perceived. Specular highlights, and specular reflections more generally, must appear in the
‘right places’ on surfaces to elicit a percept of gloss (see Figure 22.4). From a physical perspective,
specular highlights cling to regions of high surface curvature. The perception of gloss also requires
highlights to appear in specific places and have orientations consistent with surface shading for a
surface to appear glossy, a geometric constraint that is not captured by histogram or sub-band skew.
Although these results suggest that the visual system in some sense ‘understands’ the physics of
specular reflection, there are other findings that reveal that the extent of any such understanding
is limited. The perception of gloss has been shown to interact with a surface’s 3D shape and its
lighting conditions, which are physically independent sources of image variability (Ho et al. 2008;
Marlow et al. 2012; Olkkonen and Brainard 2011). These interactions have been observed by a var-
iety of authors and have resisted explanation. Indeed, these interactions are difficult to understand
from a physical perspective, since gloss and 3D shape are independent sources of image structure.
However, we recently presented evidence that these interactions can be understood as a conse-
quence of a simple set of image cues that the visual system uses to generate our experience of gloss,
which are only roughly correlated with a surface’s physical gloss level (Marlow et al. 2012). Some of
the intuition shaping this theoretical proposal can be gained by considering the surfaces depicted
in Figure 22.5. All of the surfaces in these images have the same physical gloss level, yet appear
to vary appreciably in perceived gloss. Each column contains surfaces with a common degree of
Oblique
illumination
Frontal
illumination
relief, and each row contains images that were placed in an illumination field with the same dir-
ection of the primary light sources. We varied the structure of the light field, the direction of the
primary light sources, and 3D surface relief. Observers performed paired comparison judgments
of the perceived gloss of all surfaces, where they chose which of a pair of surfaces was perceived as
glossier. The data revealed complex interactions between the light field and surface shape on gloss
judgments. As can be seen in Figure 22.6, the variation of the illumination field and shape had a sig-
nificant impact on the sharpness, size, and contrast of specular highlights in these images. We rea-
soned that if observers were basing their gloss judgments on these cues, then it should be possible
75 75 75
Perceived coverage
Perceived depth
50 50 50
Model
25 25 25
Gloss judgements
Perceived gloss
0 0 0
75 75
Perceived contrast
16%
50
Weighted
50
average
25 20% 25
0%
0 0
1 2 3 4 5
100 Sharpness Skew
Grace (frontal)
3
Perceived sharpness
75
2
Skew
50 Grace (oblique)
Illumination
25 1
Grove (oblique)
0 0
1 2 3 4 5 1 2 3 4 5
Relief height Relief height
Fig. 22.6 Data and model fits for the experiments we performed on the interactions between perceived
gloss, 3D shape (as captured by a measure of surface relief), and the illumination field. The stimuli were
viewed either with or without stereoscopic depth (the ‘disparity’ and ‘no disparity’ conditions respectively).
The different colored curves in each graph correspond to a different illumination direction of a particular
illumination field (called ‘Grace’). The gloss judgments are in the two top right panels. The panels on the
left represent the judgments of a separate group of observers of four different cues to gloss: the depth,
coverage, contrast, and sharpness of specular reflections. The panel labeled ‘skew’ was computed directly
from images. The dotted lines in the two graphs on the top right correspond to the best fitting linear
combination of the cues on the left, which account for 94 per cent of the variance of gloss judgments. The
weights are denoted in the boxes adjacent to the small arrows in the center of the graphs.
Reprinted from Current Biology, 22 (20), Phillip J. Marlow, Juno Kim, and Barton L. Anderson, The Perception and
Misperception of Specular Surface Reflectance, pp. 1909–13, figure 3, Copyright (2012), with permission from Elsevier.
The perceptual representation of transparency, lightness, and gloss 479
to model observers’ gloss judgments with a weighted combination of these image cues. However,
there is currently no known method for computing these cues directly from image. We therefore
had independent sets of observers judge each of these cues, and tested whether it was possible to
predict gloss judgments with a weighted sum of these cues. We found that a simple weighted sum
model was capable of predicting over 94 per cent of the variance of the other observers’ gloss judg-
ments. Thus, although the perception of surfaces with the same physical gloss level can appear to
vary significantly in perceived gloss, these effects can be understood with a set of relatively simple,
albeit imperfect, ‘cues’ that the visual system uses to generate our experience of gloss.
model of transmittance. We argued that one of the main reasons for this failure was that Metelli’s
model is based on a ratio of luminance differences, where are not available to a visual system that
transforms retinal luminance into local contrast signals. We showed that our matching data were
well predicted by a model in which observers matched contrast ratios, rather than luminance dif-
ference ratios. One of the key points of our model was to define transmittance in a way that was
consistent with intrinsic coding properties of the visual system, even if this results in the failure to
compute physically accurate measure of surface opacity. This general approach of a physiologic-
ally motivated model has also been pursued by a recent model of these results by Vladusich, who
proposed an alternative model of our transmittance matching data (Vladusich 2013). He shows
that our transmittance matching data can be captured with a modified version of Metelli’s model in
which log luminance values are used instead of luminance values (Vladusich, submitted). Like our
model, the choice to use Log luminance values cannot be derived from the physics of transparent
surfaces; they are derived from intrinsic response properties of the visual system.
The different theories of lightness perception are even more contentious and diverse than those
found in the transparency literature. One of the basic issues involves the distinction between light-
ness and brightness. The perception of lightness is then defined as the perception of diffuse (ach-
romatic) surface reflectance, whereas brightness is defined as the perception of image luminance.
The presumption is that these physical distinctions have psychological meaning. But this is far from
self-evident. The majority of work on lightness has used 2D (flat) matte displays of surfaces with
uniform albedos, for which the distinction between lightness and brightness is arguably least valid
(or meaningful) perceptually. For some experimental conditions, observers’ matching data will dif-
fer substantially if instructed to match either brightness or lightness. But in others, a difference in
instructions may make little or no difference. Consider, for example, the problem of matching the
‘brightness’ versus the ‘lightness’ of the checker-shadow illusion. A given patch appears a particular
shade of grey, and there is no evidence that observers could distinguish its brightness and light-
ness. In support of this view, we found that the perception of lightness increased as a function of
its luminance in both simulated and ‘real’ Mondrian displays. Moreover, the data of Radonjić et al.
(2011) demonstrate that observers will readily map a physically unrealized set of luminances, span-
ning 4 orders of magnitude, onto a lightness scale two orders smaller. These results are impossible
to reconcile with models that treat the problem of lightness as a recovery problem, since the range
of reflectances in a natural scene can only span a range of ~30:1.
In the perception of gloss, we found that observer’s experience of gloss can be well predicted by
a set of simple cues that are only imperfectly correlated with the physical gloss of a surface. Gloss
is not defined with respect to some physically specified dimension of surface optics, but with
respect to a set of cues the visual system uses as a proxy for an objectively defined surface property.
What general understanding can be gleaned from these patterns of results? All of these results
reveal the insufficiency of attempting to identify psychological dimensions of our experience with
physical sources of image variability. The fact that we have a particular experience of lightness,
gloss, or transparency does not imply that the dimensions of our experience map onto a particu-
lar physical dimension and/or its parameterization. The general argument used to justify ‘natural
tasks’ takes the generic form that ‘getting an environmental property right increases adaptive fit-
ness.’ The presumed identification of fitness with veridical perception is actually fallacious (see
Hoffman 2009; cf. Lewontin 1996), but even if such views were accepted, they are incapable of
distinguishing perceptual abilities that were actually shaped by natural selection from the ‘span-
drels’ that came along for the evolutionary ride. The fact that human observers will readily map
an ecologically unobtainable range of luminance values (in a single illuminant) onto lightness
estimates suggests that lightness may be one example of a perceptual spandrel. Although human
observers can usually distinguish reflectance differences from other sources of image variation, the
The perceptual representation of transparency, lightness, and gloss 481
perception of absolute lightness may simply be the result of low-level processes of adaptation that
allow the visual system to encode a particular range of luminance values. Indeed, I am aware of
no compelling evidence or argument about why lightness constancy per se provided an adaptive
advantage, or is something that the visual system is explicitly ‘designed’ to compute. A similar
argument holds for the perception of transparency and gloss. We can readily distinguish between
surfaces or media that transmit light from those that do not, or distinguish between surfaces that
reflect light specularly from those that do not. But the data also suggests that we do not scale these
dimensions in a way that is physically correct for any of these properties.
Although it is difficult to craft a compelling argument for the specific adaptive utility of develop-
ing a physically accurate model of lightness, gloss, and transparency, the fact that we experience
these different sources of variable as different underlying causes implies that the visual system is
capable of at least qualitatively distinguishing different sources of image structure. This ‘source
segmentation’ is arguably one of the most important general properties of our visual system. The
visual system may, in fact, be quite poor in estimating lightness in arbitrary contexts, but it is
nonetheless typically quite good at distinguishing image structure generated by lightness differ-
ences from illumination changes, or variations in the opacity of a transparent surface, or from
specular reflections. The identification of specular reflections as specular reflections depends on
their compatibility with diffuse surface shading and 3D surface geometry, and is modulated by the
structure, intensity, and distribution of image structure so identified, even if it does not accurately
capture the ‘true’ gloss level of a surface. And although the physical transmittance (or opacity) of a
surface does not vary as a function of its albedo or color, the psychological analog of opacity—its
‘hiding power’—will for a visual system that uses contrast to determine the visibility of image
structure. The visual system may not determine the ‘true’ opacity of a surface, but nonetheless
is effective at performing a segmentation that captures the presence or absence of transmissive
surfaces and media.
References
Adelson, E. H. (1999). ‘Lightness perception and lightness illusions’. In The new cognitive neurosciences, 2nd
ed., pp. 339–51. (Cambridge, MA: MIT Press).
Anderson, B. L. (1997). ‘A theory of illusory lightness and transparency in monocular and binocular
images: the role of contour junctions’. Perception 26(4): 419–53.
482 Anderson
Kingdom, F. A. (2011). ‘Lightness, brightness and transparency: a quarter century of new ideas,
captivating demonstrations and unrelenting controversy’. Vision Res 51(7): 652–73. doi: 10.1016/j.
visres.2010.09.012.
Kingdom, F., and Moulden, B. (1988). ‘Border effects on brightness: a review of findings, models and
issues’. Spat Vis 3(4): 225–62.
Kingdom, F., and Moulden, B. (1992). ‘A multi-channel approach to brightness coding’. Vision Res
32(8): 1565–82.
Lewontin, R.C. (1996). ‘Evolution as Engineering’. In Integrative Approaches to Molecular Biology, edited by
J. Collado et. al. (Cambridge, MA: MIT Press).
Marlow, P., Kim, J., and Anderson, B. L. (2011). ‘The role of brightness and orientation congruence in the
perception of surface gloss’. Journal of Vision 11(9): 1–12. doi: 10.1167/11.9.16
Marlow, P. J., Kim, J., and Anderson, B. L. (2012). ‘The perception and misperception of specular surface
reflectance’. Curr Biol 22(20): 1909–13. doi: 10.1016/j.cub.2012.08.009.
Metelli, F. (1970). ‘An algebraic development of the theory of perceptual transparency’. Ergonomic
13: 59–66.
Metelli, F. (1974a). ‘Achromatic color conditions in the perception of transparency’. In Perception: Essays in
honor of J.J. Gibson, edited by R. B. MacLeod and H. L. Pick, pp. 95–116. (Ithaca, NY: Cornell University
Press).
Metelli, F. (1974b). ‘The perception of transparency’. Scientific American 230: 90–8.
Metelli, F. (1985). ‘Stimulation and perception of transparency’. Psychol Res 47(4): 185–202.
Motoyoshi, I., Nishida, S., Sharan, L., and Adelson, E. H. (2007). ‘Image statistics and the perception of
surface qualities’. Nature 447(7141): 206–9. doi: 10.1038/nature05724.
Olkkonen, M., and Brainard, D. H. (2010). ‘Perceived glossiness and lightness under real-world
illumination’. Journal of Vision 10(9): 5. doi: 10.1167/10.9.5.
Olkkonen, M., and Brainard, D. H. (2011). ‘Joint effects of illumination geometry and object shape in the
perception of surface reflectance’. Iperception 2(9): 1014–34. doi: 10.1068/i0480.
Paradiso, M. A., and Nakayama, K. (1991). ‘Brightness perception and filling-in’. Vision Res
31(7–8): 1221–36.
Radonjić, A., Allred, S. R., Gilchrist, A. L., and Brainard, D. H. (2011). ‘The dynamic range of human
lightness perception’. Curr Biol 21(22): 1931–6. doi: 10.1016/j.cub.2011.10.013.
Rudd, M. E., and Arrington, K. F. (2001). ‘Darkness filling-in: a neural model of darkness induction’. Vision
Res 41(27): 3649–62.
Shapiro, A., and Lu, Z. L. (2011). ‘Relative brightness in natural images can be accounted for by removing
blurry content’. Psychol Sci 22(11): 1452–9. doi: 10.1177/0956797611417453.
Singh, M., and Anderson, B. L. (2002). ‘Toward a perceptual theory of transparency’. Psychological Review
109(3): 492–519. doi: 10.1037//0033–295x.109.3.492.
Vladusich, T. (2013). ‘Gamut relativity: A new computational approach to brightness and lightness
perception’. Journal of Vision 13(1): 1–21 doi: 10.1167/13.1.14.
Wallach, H. (1948) ‘Brightness constancy and the nature of achromatic colors’. Journal of Experimental
Psychology 38: 310–24.
Section 6
1 Note that the terms apparent/real motion may refer to the stimulus or to the percept generated by the stimu-
lus, depending on the context. Stroboscopic motion and sampled motion are synonymous terms for apparent
motion; the former derived from the equipment used to generate it (a stroboscope), while the latter term
highlights its relation to real motion (see Section Motion detection as orientation detection in space-time).
Apparent Motion and Reference Frames 489
Since this early work, there have been a large number of studies investigating systematically
the dependence of motion perception on a broader range of stimulus parameters. Around the
1980s, the focus of research shifted from explaining the complex phenomenology of motion to
the more basic question of how we detect motion. Several computational models have been pro-
posed and were eventually united under a broad umbrella. In The Computational Basis of Motion
Detection we briefly review these models after which we will return to the main theme of our
chapter, namely phenomenal and organizational aspects of motion.
Time
Time
(c)
Compare
Delay
(d)
Space Space
Time
Time
Fig. 23.1 (a) The trajectory of a stimulus moving with a constant speed can be described as an
oriented line in a space–time diagram. (b) Apparent motion stimulus is a sampled version of
continuous motion. (c) A motion detector samples the input at two spatial locations and carries out
a delay-and-compare operation. (d) The denser sampling in space–time yields an oriented receptive
field for the motion detector. This detector will become maximally active when the space–time
orientation of the motion stimulus matches the orientation of its receptive field.
(for a review see Albright and Stoner 1995). These areas are located in the dorsal stream as opposed
to the form-related areas located in the ventral stream. In sum, there is a broad range of evidence
for the existence of different systems dedicated to the processing of motion and form and that
motion constitutes an independent perceptual dimension. However, there is also evidence that
these systems are not strictly independent, but rather interact.
(a) (b)
Frame 1
Frame 1
ISI ISI
Frame 2 Frame 2
(c) (d)
Frame 1
Frame 1
ISI
ISI
Frame 2 Frame 2
Fig. 23.2 (a) A simple Ternus–Pikler display. (b) An apparent motion stimulus with two different
shapes. (c) The influence of shape is strong in correspondence matching when there is overlap
between stimuli (left) and becomes weaker as the overlap is eliminated (right). (d) A stimulus
configuration used by Ternus to investigate the relationship between local motion matches and
global shape configurations.
principles can be applied to stimuli in motion. The fundamental question he posed was what he
termed the problem of phenomenal identity: ‘Experience consists far less in haphazard multiplicity
than in the temporal sequence of self-identical objects. We see a moving object, and we say that
‘this object moves’ even though our retinal images are changing at each instant of time and for
each place it occupies in space. Phenomenally the object retains its identity’ (Ternus 1926). He
adopted a stimulus previously used by Pikler (1917), shown in Figure 23.2(A).
The first frame of this stimulus contains three identical elements. In the second frame, these elements
are displaced so that some of them overlap spatially with the elements in the previous frame. In the
example of Figure 23.2(A), the three discs are shifted by one interdisc distance so that two of the discs
overlap across the two frames. Given all identical elements in the two frames, one can then ask how
will the elements be grouped across the two frames? This question was later termed the ‘motion corre-
spondence’ problem. If we consider the central disc in frame 2 (Figure 23.2A), will this disc be grouped
with the rightmost disc of the first frame based on their common absolute spatial location, i.e. the same
retinal position, or will it be grouped with the central disc of frame 1 based on their relative position as
the central elements of spatial groups of three elements? The answer to this question turned out to be
quite complex, with several variables influencing the outcome. For example, when the ISI between the
two frames is short, the leftmost element in the first frame appears to move to the rightmost element
in the second frame while the spatially overlapping elements in the centre appear stationary (i.e. they
are grouped together). For longer ISIs, a completely different organization emerges: the three elements
appear to move in tandem as a group, i.e. their relative spatial organization prevails in the spatiotempo-
ral organization. These two distinct percepts are called element and group motion, respectively. Many
other variables, such as interelement separation, element size, spatial frequency, contrast, ISI, lumi-
nance, frame duration, eccentricity, and attention influence which specific organization emerges as
the prevailing percept (e.g. Pantle and Picciano 1976; Pantle and Petersik 1980; Breitmeyer and Ritter
492 ÖĞMEN AND HERZOG
1986a, 1986b; Casco and Spinelli 1988; Dawson et al. 1994; He and Ooi 1999; Alais and Lorenceau
2002; Ma-Wyatt et al. 2005; Aydin et al. 2011; Hein and Moore 2012). Like many other Gestalt grouping
phenomena, spatiotemporal grouping is governed by multivariate complex processes (see the demos
TP Feature Bias, TP Element Motion, TP Group Motion, TP Complex Configuration Long ISI, TP
Complex Configuration Short ISI).
Form–Motion Interactions
How local form information influences the perception of motion
The apparent motion stimulus lends itself nicely to the study of form–motion interactions (for other
examples of form motion interactions see Blair et al., this volume). Remember that Zeno claimed
that motion is an illusion created by the observer in order to reconcile the existence of an object
at two different spatial locations at two different instants of time. The observer would compare the
two stimuli from memory and if a suitable match is found a phenomenal identity will be attributed
to these two stimuli as two instances of the same object. Perceived motion from one object to the
other would signal the conclusion that these two objects are one and the same. Thus, according to
this view, form analysis is a precursor of motion perception and the match of the form of the two
objects is a prerequisite for motion perception. This can be tested directly by creating an apparent
motion stimulus where the shapes presented in the two frames are different (Figure 23.3; see also
the demo ‘AM—different shapes’). Many such experiments have been carried out showing that
form has little effect on the perception of apparent motion, i.e. motion percepts between the two
stimuli are strong (Kolers 1972). In the example of Figure 23.3, one perceives the square morphing
into a circle along the path of apparent motion. That the shape of an object in apparent motion
should remain constant can, in general, be expected to hold only for small displacements. This is
because, the proximal stimulus is a two-dimensional projection of a three-dimensional object, and
during motion one experiences perspective changes resulting in different views of the object. It is
this very fact that Ternus used in defining the problem of phenomenal identity.
In the case of the example shown in Figure 23.2(B) there is no motion ambiguity and the interpre-
tation of an object whose form changes (presumably due to perspective change) appears to be a nat-
ural solution. What happens, however, if the correspondences in the display are more complex and
represent ambiguities such as the ones shown in Figure 23.2(C)? Results indicate that form informa-
tion (or in general feature information such as colour or texture) can be used to resolve ambigui-
ties in the case where there is physical overlap between elements of the two frames (Ternus–Pikler
displays; see for example the demo ‘TP—feature bias’) but this influence becomes weaker when the
overlap is reduced and the distance between the elements is increased (Hein and Cavanagh 2012).
Taken together, all these results indicate that motion and form are separate but interacting systems.
(a)
(b)
a b c
Fig. 23.3 (a) Two stimulus configurations studied by Duncker. The top diagrams represent the
stimuli and the bottom ones depict the corresponding percepts. Left panels: induced motion.
Right panels: rolling wheel illusion. (b) An example illustrating Johansson’s vector decomposition
principles: a, the stimulus; b, the decomposition of the motion of the central dot so as to identify
common vector components for all three dots; c, the resulting percept.
appears to be that of a single object rotating 180 degrees in three dimensions (Ternus 1926). Note
that in these complex displays, multiple possible correspondences of motion exist (e.g. Dawson and
Wright 1994; Otto et al. 2008) and the percept may vary from subject to subject, or even from trial
to trial for the same subject. The reader can experiment with the demo ‘TP complex configuration’.
Having established that form and motion information interact, the next question is to under-
stand how. Combining signals from form and motion systems requires a common basis upon
which they can be expressed. In other words, what is the reference frame that allows interac-
tions between these two systems? We will proceed first by discussing reference frames within the
motion system and then by extending these reference frames to form computations.
Reference Frames
Relativity of motion and reference frames
The work of Gestalt psychologist Karl Duncker was instrumental in highlighting the importance
of reference frames in perception (Duncker 1929; for review see Wallach 1959; Mack, 1986). In
one of his experiments, he presented a small stimulus embedded in a larger one (Figure 23.3A,
left panel). He moved the large surrounding stimulus while keeping the smaller one stationary.
Observers perceived the smaller stimulus as moving in the direction opposite to the physical
motion of the surrounding stimulus (for a recent paper with demos see Anstis and Casco 2006).
To account for this illusory induced motion, he proposed that the larger surrounding stimulus
served as the reference frame against which the position of the embedded stimulus is computed.
494 ÖĞMEN AND HERZOG
The right panel of Figure 23.3(A) shows another configuration studied by Duncker, the ‘rolling
wheel’. If a light dot stimulus is placed on the rim of a wheel rolling in the dark, the perceived
trajectory of this dot is cycloidal. If a second dot at the centre of the wheel is added to the display,
one perceives the central dot to move in a linear trajectory and the dot on the rim is perceived
to rotate around the central dot. In other words, the central dot serves as a reference against
which the motion of the second dot is computed (for demos on the relativity of motion using the
Ternus–Pikler paradigm, the reader is referred to Boi et al. 2009).
To explain these effects, Johansson (1973) proposed a theory of vector analysis based on three
principles. The first principle states that elements in motion are always perceptually related to each
other. According to his second principle, simultaneous motions in a series of proximal elements
perceptually connect these elements into rigid perceptual units. Finally, when the motion vectors
of proximal elements can be decomposed to produce equal and simultaneous motion compo-
nents, per the second principle, these components will be perceptually united into the percept of
common motion. Figure 23.3B) illustrates these concepts. Figure 23.3B-a) shows the stimulus. By
the first principle, the movements of these dots are not perceived in isolation but are related to
each other. By the second principle, the top and bottom dots are connected together as a single
rigid unit moving together horizontally. By the third principle, a horizontal component equal to
and simultaneous with the horizontal motion of the top and bottom dots is extracted from the
motion of the central dot (Figure 23.3B-b). The resulting percept is the horizontal movement
of three dots during which the central dot moves up and down between the two flanking dots
(Figure 23.3B-c) (Johansson 1973).
In a more natural setting, the distal stimulus generates a complex optic flow pattern on the
retina. For example, while watching a street scene, one perceives the background (shops, houses,
etc.) as stationary, the cars and pedestrians as moving with respect to this stationary background,
and the legs and arms of pedestrians as undergoing periodic motion with respect to their body,
their hands moving with respect to the moving arms, etc. Thus, the stimulus can be analysed as
a hierarchical series of moving reference frames, and motions are perceived with respect to the
appropriate reference frame in the hierarchy (e.g. the hand with respect to the arm, the arm with
respect to the body). While powerful and intuitively appealing, the basic principles of this theory
are not sufficient to specify unambiguously how vectors will be decomposed in complex natu-
ralistic stimuli. In fact, a vector can be expressed as the sum of infinitely many pairs of vectors,
and it is not clear a priori how to predict which combination will prevail for complex stimuli.
The difficulty faced here is similar to the one encountered when we attempt to apply the Gestalt
‘laws’ derived from simple stimuli to complex stimuli. To address this issue, Gestaltists proposed
the ‘law of Prägnanz’ (or the law of good Gestalt) which states that among the different possible
organizations, the one that is the ‘simplest’ is the one that will prevail (Koffka 1935; Cutting and
Proffitt 1982; for a review see van der Helm, this volume). However, the criterion for ‘simplest’
remains arbitrary and elusive. The same concept has been adopted by other researchers who
tried to quantify the simplicity of organizations. For example, Restle (1979) adopted the coding
theory in which different solutions are expressed as quantifiable ‘codes’. A stimulus undergo-
ing circular motion can be described by three parameters: amplitude, phase, and wavelength.
Restle used the number of parameters describing a configuration as the ‘information load’ and
predicted that the configuration with the lowest information load would be the preferred (i.e.
perceived) configuration. Dawson (1991) used a neural network to combine three heuristics in
solving the correspondence problem. However, these approaches all suffer from the same general
problems: as acknowledged by Restle, the method does not have an automatic way to generate all
Apparent Motion and Reference Frames 495
possible interpretations. Moreover, the choice of parametrization and its generality, the heuris-
tics, their benefit and costs as well as the optimization criteria remain arbitrary.
A similar concept was also proposed by Pylyshyn in his FINST theory (Pylyshyn 1989). Several extensions
2
and variants of the object file theory have been proposed, including the detailed analysis of object updating
(Moore and Enns 2004; Moore et al. 2007) and hierarchies in object structures (Lin and He 2012).
496 ÖĞMEN AND HERZOG
is conveyed following the optics of the eye. The mechanism of image formation can be described
by projective geometry. Neighbouring points in the environment are imaged on neighbouring
photoreceptors in the retina. The projections from retina to early visual cortical areas preserve
these neighbourhood relationships creating a retinotopic representation of the environment. To
analyse the impact of motion on these representations we need to consider the dynamical proper-
ties of the visual system.
A fundamental dynamical property of vision is visible persistence: Under normal viewing con-
ditions, a briefly presented stationary stimulus remains visible for approximately 120 ms after
its physical offset (e.g. Haber and Standing 1970; Coltheart 1980). Based on this duration of
visible persistence, we would expect moving objects to appear highly blurred. For example, a
target moving at 10 degrees per second would generate a trailing smear of 1.2 degrees. The situ-
ation is similar to taking pictures of moving objects with a film camera at an exposure dura-
tion that mimics visible persistence. Not only do the moving objects exhibit extensive motion
smear, they also have a ghost-like appearance without any significant form information. This is
because static objects remain for long enough on a fixed region of the film to expose the chemi-
cals sufficiently while moving objects expose each part of the film only briefly, thus failing to
provide sufficient exposure to any specific part of the film. Similarly, in retinotopic representa-
tions, a moving object will stimulate each retinotopically localized receptive field briefly, and
incompletely processed form information would spread across the retinotopic space just like the
ghost-like appearances in photographs (Öğmen 2007). Unlike photographic images, however,
in human vision objects in motion typically appear relatively sharp and clear (Ramachandran
et al. 1974; Burr 1980; Burr et al. 1986; Bex et al. 1995; Westerink and Teunissen 1995; Burr and
Morgan 1997; Hammett 1997).
In normal viewing, we tend to track moving stimuli with pursuit eye movements and thereby
stabilize them on the retina. While pursuit eye movements can help reduce the perceived blur of
a moving object (Bedell and Lott 1996), the problem of motion blur remains for other objects
present in the scene, since we can pursue only one object at a time. Eye movements also cause
a retinotopic movement for the stationary background, creating the blur problem for the back-
ground. Furthermore, the initiation of an eye movement can take about 150–200 ms during which
a moving object can generate considerable blur. How does the visual system solve the problems
of motion blur and moving ghosts? A potential solution to the motion blur problem is the use of
mechanisms that inhibit motion smear in retinotopic representations (Öğmen 1993, 2007; Chen
et al. 1995; Purushothaman et al. 1998). A potential solution to the moving ghosts problem is the
use of reference frames that move along with moving objects rather than being anchored in reti-
notopic coordinates (Öğmen 2007).
(a) (b)
a
a b
c
b d
(c)
time
a hierarchical set of reference frames. These exo-centred reference frames3 establish and maintain
the identity of objects in space and time. As we discuss in the section Non-retinotopic Feature
Attribution, these reference frames can also provide the basis for feature attribution.
3 Reference frames can be broadly classified into two types: ego-centred reference frames are those centred on
the observer (e.g. eye-centred, head-centred, limb-centred); exo-centred reference frames are those centred
outside the observer (e.g. centred on an object in a scene).
498 ÖĞMEN AND HERZOG
Frame 1
ISI = 0
ISI (blank screen)
(c) Group motion
Frame 2
ISI = 100
100
Probe-vernier
ISI = 100 ms
80
60
40
1 2 3 1 2 3
Label of attended line
60
1 2 40
1 2 3
Label of attended line
Fig. 23.5 The Ternus–Pikler display (a) and the associated percepts of ‘element motion’. Reprinted
from Vision Research, 46 (19), Haluk Öğmen, Thomas U. Otto, and Michael H. Herzog, Perceptual
grouping induces non-retinotopic feature attribution in human vision, pp. 3234–42, Figure 1a
Copyright (2006), with permission from Elsevier. (b) and ‘group motion’. (c). The dashed arrows
in panels B and C depict the perceived motion correspondences between the elements in the two
frames. Experimental results for Ternus–Pikler stimulus. Reprinted from Vision Research, 46 (19),
Haluk Öğmen, Thomas U. Otto, and Michael H. Herzog, Perceptual grouping induces non-retinotopic
feature attribution in human vision, pp. 3234–42, Figure 1c Copyright (2006), with permission from
Elsevier. (d) and the control stimulus. Reprinted from Vision Research, 46 (19), Haluk Öğmen, Thomas
U. Otto, and Michael H. Herzog, Perceptual grouping induces non-retinotopic feature attribution in
human vision, pp. 3234–42, Figure 2a Copyright (2006), with permission from Elsevier. (e).
Reprinted from Vision Research, 46 (19), Haluk Öğmen, Thomas U. Otto, and Michael H. Herzog, Perceptual
grouping induces non-retinotopic feature attribution in human vision, pp. 3234–42, Figure 1b Copyright (2006),
with permission from Elsevier.
Reprinted from Vision Research, 46 (19), Haluk Öğmen, Thomas U. Otto, and Michael H. Herzog, Perceptual
grouping induces non-retinotopic feature attribution in human vision, pp. 3234–42, Figure 2c Copyright (2006),
with permission from Elsevier.
500 ÖĞMEN AND HERZOG
hand, not all processes are non-retinotopic; motion and tilt adaptation have been found to be
retinotopic (Wenderoth and Wiese 2008; Knapen et al. 2009; Boi et al. 2011a) indicating that they
are by-products of computations occurring prior to the transfer of information from retinotopic
to non-retinotopic representations.
Concluding Remarks
Motion is ubiquitous in the ecological environment and most biological systems devote extensive
neural processing to its analysis. This importance has been recognized by philosophers and sci-
entists who have carried out extensive studies on how motion is processed and perceived. While
there has been convergence in the types of computational models that can detect motion, the
broader issue of how motion is organized as a spatiotemporal Gestalt remains a challenging ques-
tion. The discovery of the relativity of motion led to the introduction of hierarchical reference
frames according to which part–whole relations can be constructed. This chapter has provided a
review of why reference frames are needed from ecological and neurophysiological (retinotopic
organization) perspectives. These analyses show that reference frames are needed not just for
motion computation but for all stimulus attributes. We expect future research to develop in more
depth the properties of these reference frames which will provide a common geometry wherein
all stimulus attributes can be processed jointly.
References
Alais, D. and J. Lorenceau (2002). ‘Perceptual grouping in the Ternus display: Evidence for an ‘association
field’ in apparent motion’. Vision Res 42: 1005–1016.
Albright, T. D. and G. R. Stoner (1995). ‘Visual motion perception’. Proc Natl Acad Sci USA 92: 2433–2440.
Anstis, S. and C. Casco (2006). ‘Induced movement: the flying bluebottle illusion’. J Vision
10(8): 1087–1092.
Aydin, M., M. H. Herzog, and H. Öğmen (2011). ‘Attention modulates spatio-temporal grouping’. Vision
Res 51: 435–446.
Bachmann, T. (1994). Psychophysiology of Visual Masking: the Fine Structure of Conscious Experience
(New York: Nova Science Publishers).
Barlow H. B. and W. R. Levick (1965). ‘The mechanism of directionally selective units in rabbit’s retina’.
J Physiol 178: 477–504.
Bedell, H. E. and L. A. Lott (1996). ‘Suppression of motion-produced smear during smooth-pursuit
eye-movements’. Curr Biol 6: 1032–1034.
Bex, P. J., G. K. Edgar, and A. T. Smith (1995). ‘Sharpening of blurred drifting images’. Vision Res 35: 2539–2546.
Boi, M., H. Öğmen, J. Krummenacher, T. U. Otto, and M. H. Herzog (2009). ‘A (fascinating) litmus test for
human retino- vs. non-retinotopic processing’. J Vision 9(13): 5.1–11; doi: 10.1167/9.13.5.
Boi, M., H. Öğmen, and M. H. Herzog (2011a). ‘Motion and tilt aftereffects occur largely in retinal, not in
object coordinates, in the Ternus–Pikler display’. J Vision 11(3): 7.1–11; doi: 10.1167/11.3.7, 2011.
Boi M., M. Vergeer, H. Öğmen, and M. H. Herzog (2011b). ‘Nonretinotopic exogenous attention’. Curr Biol
21: 1732–1737.
Breitmeyer, B. G. and Öğmen, H. (2006). Visual Masking: Time Slices through Conscious and Unconscious
Vision, 2nd edn (Oxford University Press: Oxford).
Breitmeyer, B. G. and A. Ritter (1986a). ‘The role of visual pattern persistence in bistable stroboscopic
motion’. Vision Res 26: 1801–1806.
Breitmeyer, B. G. and A. Ritter (1986b). ‘Visual persistence and the effect of eccentric viewing, element
size, and frame duration on bistable stroboscopic motion percepts’. Percept Psychophys 39: 275–280.
Apparent Motion and Reference Frames 501
Knapen T., Rolfs M., and Cavanagh P. (2009). ‘The reference frame of the motion aftereffect is retinotopic’.
J Vision 9(5):16, 1–7.
Koffka, K. (1935). Principles of Gestalt Psychology (New York: Harcourt).
Kolers, P. A. (1972). Aspects of Motion Perception (Oxford: Pergamon Press).
Korte, A. (1915). ‘Kinematoskopische Untersuchungen’. Z Psychol 72: 194–296.
Lin, Z. and S. He (2012). ‘Automatic frame-centered object representation and integration revealed by
iconic memory, visual priming, and backward masking’. J Vision 12(11): pii: 24;
doi: 10.1167/12.11.24
Lu, Z.-L. and G. Sperling (2001). ‘Three-systems theory of human visual motion perception: review and
update’. J Opt Soc Am A 18: 2331–2370.
Ma-Wyatt, A., C. W. G. Clifford, and P. Wenderoth (2005). Contrast configuration influences grouping in
apparent motion. Perception 34: 669–685.
Mack, A. (1986). ‘Perceptual aspects of motion in the frontal plane’. In Handbook of Perception and Human
Performance, edited by K. R. Boff, L. Kaufman, and J. P. Thomas (New York: Wiley), pp. 17-1–17-38.
McDougall, W. (1904). ‘The sensations excited by a single momentary stimulation of the eye’. British Journal
of Psychology, 1: 78–113.
Moore, C. M. and J. T. Enns (2004). ‘Object updating and the flash-lag effect’. Psychol Sci 15: 866–871.
Moore, C. M., J. T. Mordkoff, and J. T. Enns (2007). ‘The path of least persistence: object status mediates
visual updating’. Vision Res 47: 1624–1630.
Neuhaus, W. (1930). ‘Experimentelle Untersuchung der Scheinbewegung’. Arch Gesamte Psychol 75:
315–458.
Nishida, S. (2004). ‘Motion-based analysis of spatial patterns by the human visual system’. Curr Biol
14: 830–839.
Nishida, S., J. Watanabe, I. Kuruki, and T. Tokimoto (2007). ‘Human visual system integrates color signals
along a motion trajectory’. Curr Biol 17: 366–372.
Öğmen, H. (1993). ‘A neural theory of retino-cortical dynamics’. Neural Networks, 6: 245–273.
Öğmen, H. (2007). ‘A theory of moving form perception: Synergy between masking, perceptual grouping,
and motion computation in retinotopic and non-retinotopic representations’. Advances in Cognitive
Psychology, 3: 67–84.
Öğmen, H., T. Otto, and M. H. Herzog (2006). ‘Perceptual grouping induces non-retinotopic feature
attribution in human vision’. Vision Res 46: 3234–3242.
Otto, T. U., H. Öğmen, and M. H. Herzog (2006). ‘The flight path of the phoenix-the visible trace of
invisible elements in human vision’. J Vision 6: 1079–1086.
Otto, T. U., H. Öğmen, and M. H. Herzog (2008). ‘Assessing the microstructure of motion correspondences
with non-retinotopic feature attribution’. J Vision 8(7): 16.1–15; doi: 10.1167/8.7.16.
Otto, T. U., H. Öğmen, and M. H. Herzog (2009). ‘Feature integration across space, time, and orientation’.
J Exp Psychol: Human Percept Perform 35: 1670–1686.
Otto, T. U., H. Öğmen, and M. H. Herzog (2010a). ‘Attention and non-retinotopic feature integration’.
J Vision 10: 8.1–13; doi: 10.1167/10.12.8.
Otto, T. U., H. Öğmen, and M. H. Herzog (2010b). ‘Perceptual learning in a nonretinotopic frame of
reference’. Psychol Sci 21(8): 1058–1063.
Pantle, A. J. and J. T. Petersik (1980). ‘Effects of spatial parameters on the perceptual organization of a
bistable motion display’. Percept Psychophys 27: 307–312.
Pantle, A. and L. Picciano (1976). ‘A multistable movement display: evidence for two separate motion
systems in human vision’. Science 193: 500–502.
Piéron, H. (1935). ‘Le processus du métacontraste’. J Psychol Normale Pathol 32: 1–24.
Pikler, J. (1917). Sinnesphysiologische Untersuchungen (Leipzig: Barth).
Apparent Motion and Reference Frames 503
Purushothaman, G., H. Öğmen, S. Chen, and H. E. Bedell (1998). ‘Motion deblurring in a neural network
model of retino-cortical dynamics’. Vision Res 38: 1827–1842.
Pylyshyn, Z. (1989). ‘The role of location indexes in spatial perception: a sketch of the FINST spatial-index
model’. Cognition 32: 65–97.
Ramachandran, V. S., V. M. Rao, and T. R. Vidyasagar (1974). ‘Sharpness constancy during movement
perception’. Perception 3: 97–98.
Restle, F. (1979). ‘Coding theory of the perception of motion configurations’. Psychol Rev 86: 1–24.
Shimozaki S. S., M. P. Eckstein, and J. P. Thomas (1999). ‘The maintenance of apparent luminance of an
object’. J Exp Psychol: Human Percept Perform 25: 1433–1453.
Ternus, J. (1926). ‘Experimentelle Untersuchung über phänomenale Identität’. Psychol Forsch 7: 81–136.
Wallach, H. (1959). ‘The perception of motion’. Sci Am 201: 56–60.
Wenderoth P. and Wiese M. (2008). ‘Retinotopic encoding of the direction aftereffect’. Vision Research
48:1949–1954.
Wertheimer, M. (1912). ‘Experimentelle Studien uber das Sehen von Bewegung’. Z Psychol 61: 161–265.
Westerink J. H. D. M. and K. Teunissen (1995). ‘Perceived sharpness in complex moving images’. Displays
16: 89–97.
Chapter 24
Introduction: the ambiguity
of local motion signals
We live in a world of objects that move. To perceive them, the visual system must use information
in the motion signals available in the spatiotemporal structure of the optic array. These motion
signals, however, are inherently ambiguous. Thus, to perceive moving objects human perception
cannot simply record sensory signals. To overcome ambiguity (underdeterminacy) and to achieve
a coherent global interpretation, sensory motion signals must be combined across space and time.
In this chapter, we review strategies for performing such combination. We argue that the combi-
nation of motion signals cannot be reduced to relatively simple vector operations, such as aver-
aging or intersecting constraints in velocity space, but is instead a complex form of perceptual
organization, which dynamically takes into account the spatial structure of the stimulus. To set
the stage for our discussion of motion organization, we begin with a brief account of the two main
sources of local ambiguity in motion signals: the aperture problem (AP) and the edge classifica-
tion problem (ECP).
(a) (b)
(c) (d)
Fig. 24.1 The ambiguity of local motion signals. (a) Consider two contours moving in different
directions relative to the environment (e.g. horizontally and vertically, see black vectors). The
physical motions are the sum of components along the direction of the contour and in the direction
orthogonal to the contour (grey vectors). Because the contour is locally featureless, the component
along the contour cannot be recorded. Thus only the component orthogonal to the contour will
be available and the two physical motions will be indistinguishable (apright.mov, apdown.mov).
(b) In fact, an infinite class of physical motions having different speed and direction (dashed) will
be available as the same motion signal (black orthogonal vector). The orientation of the contour
defines a constraint line (CL) in velocity space. (c) An additional ambiguity arises when the contour
is interpreted as the border of a surface. Consider an orthogonal motion signal at a local point on a
contour. The signal could be due to the left surface progressively covering the background (visible to
its right), to a right surface progressively uncovering a background (visible to its left), or to a circular
hole moving over a stationary edge in the opposite direction. (d) Finally, when two borders meet to
form a T-junction, the local motion signal at the junction is along the hat of the T rather than in the
direction orthogonal to the moving contour.
viewpoint in the three-dimensional environment. Other issues include the perception of struc-
ture from motion (see Vezzani et al., this volume), and the analysis of moving edges in shadows,
shading, and highlights. In this chapter, we limit our discussion to organization in 2D and to the
segmentation of the scene into figures and grounds. When applied to this domain, the ECP refers
to the fact that the same local motion signal can be attributed to a leading surface edge (progres-
sively covering a background) or to a trailing edge (progressively revealing a background). This
distinction implies a classification of the edge in relation to the surface that owns it within the glo-
bal segmentation of the scene into figure and ground. In the example of Figure 24.1c, the leading
edge interpretation implies that the left surface is the figure and the edge belongs to it; the trailing
edge interpretation, conversely, implies that the right surface is the figure.
Edge classification in turn has consequences for the organization of local motions in relation
to a hierarchy of frames of reference, a topic that we address later in this chapter. Referring again
to the example, the leading edge interpretation implies that the left surface is moving relative to a
background to its right; the trailing edge interpretation, conversely, that the right surface is mov-
ing relative to a background to its left. Additionally, in both interpretations the edge is moving
relative to a stationary aperture. As an alternative, the edge (either belonging to the left or to the
right surface) could be interpreted as stationary, and the aperture itself could be interpreted as
moving relative to the edge and the two surfaces. Thus the same motion signal can be attributed
to either surface or to neither, depending on which region of the scene is interpreted as figure and
which as ground. Contemporary research has begun to reveal constraints and biases that may play
a role in solving this form of the ECP (Barenholz and Tarr 2009).
An important aspect of the ECP is related to surface edges that meet other edges to form a
T-junction (Figure 24.1d). In these cases, the motion signal at the junction is not orthogonal to
the contour forming the stem of the T but moves along the contour forming the hat of the T. As we
shall see in Section 3, these local ‘terminator’ motion signals play an important part in the global
perception of the movement of contours, and are themselves weighted differently depending on
their classification as ‘intrinsic’ to the line (true endings of a moving object) or ‘extrinsic’ (acci-
dental alignments due to occlusion).
preference, but in some cases it is parallel to it. In striate and extrastriate areas motion selectiv-
ity is secondary to direction selectivity (Gizzi et al. 1990). By contrast, in temporal areas there is
selectivity for global motion, defined as the motion of a whole pattern. When contours form a
pattern, neurons do not respond to the motion per se, but to the motion of the configuration as a
whole. Finally, several other visual areas are known to receive MT output, including areas coding
complex motions such as expansion and rotation (Tanaka and Saito 1989) and eye movements
(Schall 2000).
Although the functional interpretation of these networks remains the object of empirical inves-
tigation and theoretical debate (see Grossberg and Mingolla 1993; Grossberg 2011), it is clear
that higher-level motion processing in the human brain involves long-range, integrative interac-
tions. These interactions are thus quite consistent with the notion that global motion perception
involves sophisticated processes of organization and interpretation of the local signals to solve the
AP and ECP. In the following sections, we review some of these processes.
(a) V1
V1
FT
IOC
V2 V2
(b) V1
V1
VA
V2
IOC
V2 FT
Fig. 24.2 Three proposed solutions to the AP in plaid patterns. The intersection of constraints (IOC)
strategy consists in determining the unique vector that is consistent with both constraint lines of
the component motions. The feature tracking (FT) strategy consists in attributing to the global
pattern the motion of identifiable features such as the intersections between the component edges.
The vector average (VA) solution consists in computing the vector lying halfway between the two
components. (a) The IOC and FT strategies always yield the true pattern motion in a plaid, assuming
rigidity. (b) In Type-2 plaids, the VA solution can differ markedly from the IOC or FT solutions.
from the study of Type-2 plaids. Type-2 plaids have both component vectors lying on the same
side as the IOC resultant, such that the VA predictions differ markedly from those of the IOC–FT.
Perceived motion direction in Type-2 plaids has been reported to be biased toward the VA solu-
tion with short presentation times but to approach the IOC solution after a contrast-dependent
time lag (Yo and Wilson 1992). Similar results have been reported in plaids involving second-order
(i.e., texture boundary) motion signals (Wilson and Kim 1994; Cropper et al. 1994).
Type-2 plaids have also been used to assess the FT strategy. Alais et al. (1994) adapted partici-
pants to a translating Type-2 plaid (simultaneous adaptation condition) or to its alternately pre-
sented components (alternating adaptation). They found that perceived direction in the motion
after-effect reflected more the VA predictions after alternating adaptation, whereas it reflected
more the IOC–FT prediction after simultaneous adaptation. Because feature motion signals
were available when components were simultaneous, but not when they were alternated, these
results are consistent with a mechanism that retrieves the true plaid motion using FT. Follow
up experiments (Alais et al. 1997) have provided support for this conclusion by demonstrating
that both feature size and feature number modulate the bias in the FT direction. Overall, there-
fore, it seems that two mechanisms are involved in the perception of pattern motion in plaids, an
earlier integration mechanism that employs the VA strategy, and a slower and presumably more
Perceptual Organization and the Aperture Problem 509
global mechanism that employs the FT strategy. The interaction between these two mechanisms
can be captured by models that diffuse motion signals from the local to the global scale by parallel
excitatory connections weighted by distance (Loffler and Orbach 2003) or by motion-based pre-
dictive coding (Perrinet and Masson 2012).
(a)
(b)
Fig. 24.3 (a) The perceived direction of a translating grating depends on the shape of the
surrounding frame (barber-pole.mov). Suppose that for all gratings true motion is horizontal and
to the right (central grey vector). The grating within the circular frame will appear to move diagonally
in the direction orthogonal to the orientation of the contour. The grating within the vertical frame,
vertically downwards. That within the horizontal frame, horizontally and to the right. The grating
within the square will alternate between vertical and horizontal motion. The grating within the
narrower bent frame, finally, will appear to change direction as the aperture changes orientation
(perceived motions are represented by black vectors). (b) If a diamond shape is translated behind
three vertical bars without revealing the corners, each visible segment actually moves vertically as
shown on the left. These vertical motions are readily seen when only the segments are presented,
but become invisible after adding the occluding bars. In this case, observers perceive the true
motion of the diamond (shiffrar.mov, shiffrar-ill.mov). Without the occluding bars, the segment
terminators are perceived as intrinsic to the lines and their vertical motion overcomes the orthogonal
components. With the occluding bars, the segment terminators are perceived as extrinsic or
accidental (due to the occlusion interpretation). The vector average of the orthogonal components
determines the correctly perceived translation.
of oblique translating lines is underestimated compared to that of vertical lines. This bias increases
with the tilt and length of the line, as would be expected if the orthogonal and terminator signals
were weighted according to their perceptual salience (Castet et al. 1993). This in turn is consist-
ent with a wealth of physiological data. For instance, there is evidence that MT is implicated in
integrating not only local signals along multiple contours (Movshon et al. 1986), but also signals
along contours and at contour terminators (Pack 2001; Pack et al. 2003; Pack et al. 2004), and
with temporal dynamics consistent with the hypothesis that the integration stage occurs later in
processing than the coding of local motions (Pack and Born 2001).
Perceptual Organization and the Aperture Problem 511
visual system disregarded the motion of their terminators (vallobres-sottile.mov). When the
stripes and the frame were the same width, such that they formed a single perceptual unit, the bars
tended to move in the direction of the terminators (vallobres-spesso.mov). Related effects have
been demonstrated using illusory-surface frames (Bertamini et al. 2004) and by several manipula-
tions aimed at making the motion of contour terminators less salient or reliable (Lorenceau and
Shiffrar 1992). Consider, for instance, an outline diamond translating horizontally behind three
occluding bars (see Figure 24.3b). Suppose that the movement stops and reverses direction before
revealing the corners of the diamond, such that only the diagonal contours are visible in any given
frame. Participants will perceive the motion of the diamond correctly, as one would expect if the
orthogonal components were averaged to compute the motion of the whole. The terminators of
the diamond contours, however, bear a motion signal in the vertical direction as can be easily seen
by removing the occluding bars as in Figure 24.3b, right. Presumably, the visual system interprets
the up-down motion of the line terminators as being due to occlusion, and discards it from the
integration process.
Sliding effect
In his pioneering observations, Wallach (1935) was the first to note that adding a visible feature,
such as a dot, to a contour moving within an aperture fails to abolish the barperpole effect. He
justly noted that this is surprising, as the dot provides an unambiguous signal potentially specify-
ing the true motion of the contour. This unambiguous signal, however, does not typically affect
the motion of the contour. In most cases, instead, the moving contour continues to move in the
same direction as the corresponding contour without the feature (i.e., it shows the barberpole
effect). At the same time, the feature appears to move obliquely along the contour. This ‘sliding’
effect is quite robust (sliding.mov). For instance, it remains visible if several features are placed
on the line (Wallach 1935), and if the orientation of the aperture or the duration of the motion
are varied (Castet and Wuerger 1997). Critically, the sliding remains visible even with very brief
durations, which argues against an explanation in terms of retinal slip during smooth pursuit of
the line (Castet and Wuerger 1997). Thus, the sliding effect seems to be consistent with a hierar-
chical organization of the motion signals into separate frames of reference (separation of systems,
Duncker 1938). The motion of the feature is perceived in relation to the moving line, which in
turn is perceived in relation to the aperture. Consistent with this account, it has been shown that
the sliding effect is abolished when a conspicuous static frame of reference is placed outside the
aperture (Castet and Wuerger 1997).
Perceptual Organization and the Aperture Problem 513
(a) (b)
(c)
Fig. 24.4 Selected demonstrations of hierarchical organization affecting the solutions to the AP.
(a) In the chopstick illusion, two chopsticks appear to rotate counterclockwise in counterphase (top,
chopstick.mov). Isolating the ‘+’ at the cross-over by a circular aperture reveals that this central
feature is actually rotating clockwise (bottom, chopstick-occl.mov). However, clockwise rotation
is never perceived in the unoccluded chopsticks. (b) In the apparent rest demonstration (metelli2.
mov), a circle is rotated around its center. Other visual structures are presented within (left) or in other
instances by (right) the circle. This generates moving features at the intersections with the circular
contour. However, the circles appear completely stationary and the other structures appear to rotate
relative to it. (c) In the so-called ‘breathing illusions’, an illusory figure is rotated relative to stationary
elements. The movement is rigid but various deformations are perceived. For instance, with a square
rotating over four stationary disks, the figure appears to expand and shrink cyclically during the
rotation like a breathing lung (expansion.mov). With a triangle rotating over a spoke pattern, the
figure appears to deform, growing suddenly in one direction while shrinking in another during the
rotation. Interestingly, no comparable deformations are visible when the background elements are
rotated relative to the figure, although the relative motions are identical (nickeffect.mov).
514 Bruno and Bertamini
segments and therefore fail to capture the circle. A plausible reason for this outcome, given that
the pattern contains no disparity or figural information for figure-ground organization, is that the
circle itself remains stable relative to the observer and for this tends to become a reference for the
Y figure. In the second pattern reproduced in the figure, as in other variants studied by Metelli,
the circle is completed amodally behind the occluder and the rectangle appears to rotate above it.
Given that terminator signals are present at the T-junctions between the circle and the rectangle,
it could be argued that these terminators ought to be classified as extrinsic and therefore should
have no role in determining the circle movement. Presumably, this organization is further rein-
forced by the stability of the amodally completed circle relative to the observer, which makes it a
strong candidate frame of reference for the motion of the rectangle.
Breathing illusions
The role of the self as a frame of reference for the interpretation of visual motion is also apparent
in the so-called breathing illusions (for a review see Bruno 2001). These are cases where a figure,
such as for instance a square or a triangle (see Figure 24.4c), is rotated rigidly over other surround
elements. In typical demonstrations, the figures are illusory but equivalent configurations can
be obtained by reversing the depth order such that the elements become holes and the figure is
seen through them (note that this implies that the same optical transformations occur within, for
instance, the disks of the left figure). Although the rotation is perfectly rigid, the rotating figure
appears to deform in various ways. The square over the disks, for instance, appears to breathe, that
is, to shrink and expand cyclically during each cycle of rotation.
Shiffrar and Pavel (1991) suggested that the breathing percept arises because the motion of the
square is perceived in relation to different frames of reference when the corners are visible and
when they are not. According to their proposal, when the corners of the square are not visible
within one of the disks, because of the AP the center of rotation for each of the visible contours
is misperceived and placed near to, or at, the local center of the rotating side. As a consequence,
local motion signals that are oriented toward or away from the actual center of rotation become
available. These signals signal a change in size, and this causes the apparent breathing. However,
the deformations are never perceived when the background elements are rotated relative to a sta-
tionary figure (Bruno and Gerbino, 1991).
Given that in this modification all relative motions are exactly equivalent to the case where
the figure rotates, one might find this asymmetry surprising. However, considering what
structure acts as a frame of reference for the perceived motion reveals an obvious difference.
When the figure rotates, the disks or lines have the role of a stable frame of reference relative
to the observer, and the figures moves relative to these. When the disks rotate, conversely, it is
the figure that remains stable relative to the self. Thus all motion signals are coded in relation
to this frame of reference. Bruno and Gerbino (1991) and Bruno and Bertamini (1990) have
argued that the local motion signals that are coded in this fashion are critical to the bound-
ary formation process that reconstructs partly invisible edges from sparse spatiotemporal
information.
Recent results
Recent results have provided evidence that contributions to the solution of the AP in visual motion
perception may also come, surprisingly, from non-visual sources of information. These results are
in line with the currently increasing interest toward multisensory processes in perception (Calvert
et al. 2004). It has been long known that multisensory interactions bias the preferred percept in
multistable motion displays. For instance, adding an auditory signal switches the perception of
Perceptual Organization and the Aperture Problem 515
two dots moving in phase along an X pattern from streaming (one dot crosses over on top of the
other) to bouncing (the dots collide at the intersection of the X and bounce back; Sekuler et al.
1997). Tactile information about direction of rotation disambiguates the visual three-dimensional
structure of a computer-generated random-dot globe (Blake et al. 2004). During dichoptic view-
ing of dynamic rival stimuli, moving a computer mouse extends dominance durations and abbre-
viates suppression durations for the one rival stimulus moving in the same direction as the hand
movement (Maruya et al. 2007). The perceived direction of motion of an ambiguous visual dis-
play is biased by several aspects of preceding actions (Wohlschlager 2000). Finally, pursuit eye
movements promote coherent motion of four line segments that are ambiguous during fixation
(Hafed and Krauzlis 2006). These findings suggest that multisensory contributions as well as other
top-down non visual factors may affect the solution to the AP.
(a) (b)
Kinesthetic
Monitor
ror Visual
Mir
Cube
Fig. 24.5 Schematics of an apparatus for assessing the role of kinesthetic motion signals in the
solution of the AP. (a) A CRT monitor is suspended upon a mirror. Behind the mirror is a cube
manipulandum connected to a motion-tracking device. The participant moves the cube with one
hand while an image of the cube in its current position is rendered on the monitor. (b) On top of
the rendered cube experimental software presents a sinewave grating within a circular aperture.
Two motion signals are potentially available: a visual signal, which because of the AP is always in the
direction orthogonal to the orientation of the sinewave, and a kinesthetic signal that is a function of
the hand movement.
Reprinted from Current Biology, 20(10), Bo Hu and David C. Knill, Kinesthetic information disambiguates visual
motion signals, pp. R436–37, Figures 1a and 1b, Copyright (2010), with permission from Elsevier.
516 Bruno and Bertamini
also in the direction orthogonal to the orientation of the grating. Finally, when the aperture was
circular but a 200 ms delay was imposed between the visual and kinesthetic signals, almost all
reports were in the direction orthogonal to the grating orientation. These results are consistent
with a multisensory interaction of kinesthetic and visual signals occurring for simultaneous, but
not delayed stimulation (see Stein and Meredith 1993). These results also suggest that the weight
of the kinesthetic component is highest when visual information is most ambiguous (circular
aperture) and becomes less strong when unambiguous motion signals from terminators are pro-
vided (square aperture). Thus, this pattern can also be interpreted in terms of optimal Bayesian
integration (Ernst and Banks 2002) for visual and kinesthetic signals.
In a related experiment, DeLucia and Ott (2011) presented lines that translated within circular
or rectangular moving or stationary apertures. In one condition, participants passively viewed the
lines. In a second condition, they actively moved a joystick that controlled the direction of the
translating line. In accord with the barberpole effect, they found that with rectangular apertures
participants tended to report movement in the direction of the orientation of the aperture. With
circular apertures, conversely, they tended to report movement orthogonal to the orientation of
the line. For both apertures, however, active control of the line movement biased perceived move-
ments away from the orthogonal direction and in the direction of the joystick movement. Thus,
although the reported effects were smaller than those of Hu and Knill (2010), these results provide
converging evidence that kinesthetic signals contribute to the solution of the AP.
Top-down factors
In a second experiment, DeLucia and Ott (2011) also manipulated attentional load by asking
participants to report the motion of the line (no load condition) or both the motion of the line
and that of the aperture (load condition). While it is not clear how this manipulation affected the
spatial distribution of attention, results provided some evidence that this manipulation affects
the relative weighting of orthogonal and terminator motions in solving the AP. This result is in
line with previous reports that voluntary attentional control can influence contextual integration
processes in motion perception (Freeman and Driver 2008) and can modulate the spatial extent
over which local motion signals are integrated (Burr et al. 2009). It seems likely, therefore, that top
down processes may also have a role in solutions to the AP. Related studies suggest that these are
not limited to attention but can include expectations learned through perceptual (Graf et al. 2004)
or sensorimotor individual experience (Yabe et al. 2011), as well as high-level knowledge about
the visibility of surfaces during occlusion and disocclusion (McDermott et al. 2001).
Conclusions
We have reviewed strategies for solving the local ambiguities of motion signals (the AP) and for
perceiving coherent object motion. This is arguably one of the greatest challenges faced by the
human visual system. We have argued that the solution cannot be reduced to relatively simple vec-
tor operations, such as averaging or intersecting constraints in velocity space. Solutions to the AP
reflect complex processes of perceptual organization, which dynamically take into account visual
stimulus structure as well as additional constraints from nonvisual sensory channels. We believe
that studies on effects of perceptual organization on the solution to the AP will continue to be a
fertile and active area of research. In this area, key findings may come from studies of dynamic
grouping of connected surfaces (see Hock, this volume) and of interactions between motion and
form (see Blair et al, this volume).
Perceptual Organization and the Aperture Problem 517
References
Adelson, E. H. and Movshon, J. A. (1982). ‘Phenomenal coherence of moving visual patterns’. Nature
300(5892): 523–5.
Alais, D. M., Wenderoth, P. M., and Burke, D. C. (1994). ‘The contribution of 1-D motion mechanisms to
the perceived direction of drifting plaids and their aftereffects’. Vision Research 34: 1823-34.
Alais, D., Wenderoth, P., and Burke, D. (1997). ‘The size and number of plaid blobs mediate the
misperception of type-II plaid direction’. Vision Research 37(1) 143–50.
Albright, T. D. (1984). ‘Direction and orientation selectivity of neurons in visual area MT of the macaque’.
Journal of Neurophysiology 52(6): 1106–30.
Anderson, B. L. and Sinha, P. (1997). ‘Reciprocal interactions between occlusion and motion computations’.
Proc Natl Acad Sci USA, 94(7), 3477–80.
Anstis, S. (1990) ‘Imperceptible Intersections: The Chopstick Illusion’. In AI and the Eye, edited by A. Blake
and T. Troscianko, pp. 105–117 (John Wiley: Chichester).
Barenholz, E. and Tarr, M. J. (2007). ‘Reconsidering the role of structure in vision’. In Categories in use: The
Psychology of Learning and Motivation, edited by M. Markman and B. Ross vol. 47, pp. 157–180.
(Orlando, FL: Academic Press).
Bertamini, M., Bruno, N., and Mosca, F. (2004). ‘Illusory surfaces affect the integration of local motion
signals’. Vision Research 44(3): 297–308.
Blake, R., Sobel, K. V., and James, T. W. (2004). ‘Neural synergy between kinetic vision and touch’. Psychol
Sci 15(6): 397–402.
Blair (this volume), In The Handbook of Perceptual Organization, edited by J. Wagemans. (Oxford: Oxford
University Press).
Bressan, P., Ganis, G., and Vallortigara, G. (1993). ‘The role of depth stratification in the solution of the
aperture problem’. Perception 22(2): 215–28.
Bruno, N. (2001). ‘Breathing illusions and boundary formation in space-time’. In From Fragments to
Objects: Segmentation and Grouping in Vision (Advances in Psychology 130), edited by T. F. Shipley and
P. J. Kellman, pp. 531–56. (North-Holland).
Bruno, N. and Bertamini, M. (1990). ‘Identifying contours from occlusion events’. Perception and
Psychophysics 48(4): 331–42.
Bruno, N. and Gerbino, W. (1991) ‘Illusory figures based on local kinematics’. Perception 20: 259–74.
Burr, D. C., Baldassi, S., Morrone, M. C., and Verghese, P. (2009). ‘Pooling and segmenting motion signals’.
Vision Research 49(10): 1065–72.
Calvert, G. A., Spence, C., and Stein, B. E. (2004). The Handbook of Multisensory Processes. (Cambridge,
MA: MIT Press).
Castet, E. and Wuerger, S. (1997). ‘Perception of moving lines: interactions between local perpendicular
signals and 2D motion signals’. Vision Research 37(6): 705–20.
Castet, E., Lorenceau, J., Shiffrar, M., and Bonnet, C. (1993). ‘Perceived speed of moving lines depends on
orientation, length, speed and luminance’. Vision Research 33(14): 1921–36.
Castet, E., Charton, V., and Dufour, A. (1999). ‘The extrinsic/intrinsic classification of two-dimensional
motion signals with barber-pole stimuli’. Vision Research 39(5): 915–32.
Cropper, S. J., Badcock, D. R., and Hayes, A. (1994). ‘On the role of second- order signals in the perceived
direction of motion of type II plaid patterns’. Vision Research 34(19): 2609–12.
DeLucia, P. R. and Ott, T. E. (2011). ‘Action and attentional load can influence aperture effects on motion
perception’. Exp Brain Research 209(2): 215–24.
Duncan, R. O., Albright, T. D., and Stoner, G. R. (2000). ‘Occlusion and the interpretation of visual
motion: perceptual and neuronal effects of context’. J Neurosci 20(15): 5885–97.
518 Bruno and Bertamini
Duncker, K. (1938). ‘Über induzierte Bewegung [Concerning induced movement] ’. In Source book of
Gestalt psychology, edited and translated by W D. Ellis, pp. 161–72. (London: Routledge and Kegan
Paul). Reprinted from Psychologische Forschung (1929), 12 180–259.
Ernst, M. O. and Banks, M. S. (2002). ‘Humans integrate visual and haptic information in a statistically
optimal fashion’. Nature 415(6870): 429–33.
Fennema, C. L. and Thompson, W. B. (1979). ‘Velocity determination in scenes containing several moving
objects’. Computer Graphics and Image Processing 9: 310–15.
Freeman, E. and Driver, J. (2008). ‘Voluntary control of long-range motion integration via selective
attention to context’. Journal of Vision 8(11): 18.1–18.22.
Gerbino, W. and Bruno, N. (1997). ‘Paradoxical rest’. Perception 26: 1549–54.
Gizzi, M. S., Katz, E., Schumer, R. A., and Movshon, J. A. (1990). ‘Selectivity for orientation and direction
of motion of single neurons in cat striate and extrastriate visual cortex’. J Neurophysiol 63(6): 1529–43.
Graf, E. W., Adams, W. J., and Lages, M. (2004). ‘Prior depth information can bias motion perception’.
Journal of Vision 4(6): 427–33.
Grossberg, S. (2011). ‘Visual motion perception’. In Encyclopedia of Human Behavior, edited by V. S.
Ramachandran, second edn. (Oxford: Elsevier).
Grossberg, S. and Mingolla, E. (1993). ‘Neural dynamics of motion perception: direction fields, apertures,
and resonant grouping’. Percept Psychophys 53(3): 243–78.
Hafed, Z. M. and Krauzlis, R. J. (2006). ‘Ongoing eye movements constrain visual perception’. Nat Neurosci
9(11): 1449–57.
Hedges, J. H., Stocker, A. A., and Simoncelli, E. P. (2011). ‘Optimal inference explains the perceptual
coherence of visual motion stimuli’. Journal of Vision 11(6): 14, 1–16.
Hildreth. E. C. (1983). The Measurement Of Visual Motion. (Cambridge, MA: MIT press).
Hock (this volume), In The Handbook of Perceptual Organization, edited by J. Wagemans. (Oxford: Oxford
University Press).
Hu, B. and Knill, D. C. (2010). ‘Kinesthetic information disambiguates visual motion signals’. Curr Biol
20(10): R436–7.
Hubel, D. H. and Wiesel, T. N. (1968). ‘Receptive fields and functional architecture of monkey striate
cortex’. The Journal of Physiology 195(1), 215–43.
Kane, D., Bex, P., and Dakin, S. (2011). ‘Quantifying “the aperture problem” for judgments of motion
direction in natural scenes’. Journal of Vision 11(3): 25, 1–20.
Kim, J. and Wilson, H. R. (1993). ‘Dependence of plaid motion coherence on component grating
directions’. Vision Research 33(17): 2479–89.
Kooi, F. L. (1993). ‘Local direction of edge motion causes and abolishes the barberpole illusion’. Vision
Research 33(16): 2347–51.
Loffler, G. and Orbach, H. S. (2003). ‘Modeling the integration of motion signals across space’. J Opt Soc
Am A Opt Image Sci Vis 20(8): 1472–89.
Lorenceau, J. and Shiffrar, M. (1992). ‘The influence of terminators on motion integration across space’.
Vision Research 32(2): 263–73.
Lorenceau, J., Shiffrar, M., Wells, N., and Castet, E. (1993). ‘Different motion sensitive units are involved in
recovering the direction of moving lines’. Vision Research 33(9): 1207–17.
Maruya, K., Yang, E., and Blake, R. (2007). ‘Voluntary action influences visual competition’. Psychol Sci
18(12): 1090–8.
McDermott, J., Weiss, Y., and Adelson, E. H. (2001). ‘Beyond junctions: nonlocal form constraints on
motion interpretation’. Perception 30(8): 905–23.
Metelli, F. (1940) ‘Ricerche sperimentali sulla percezione del movimento’. Rivista di psicologia 36: 319–60.
Perceptual Organization and the Aperture Problem 519
Movshon, J. A., Adelson, E. H., Gizzi, M. S., and Newsome, W. T. (1986) ‘The analysis of moving visual
patterns’. In Pattern recognition mechanisms, edited by C. Chagas, R. Gattass and C. Gross, pp. 117–51.
(Vatican City: Vatican Press).
Montagnini, A., Mamassian, P., Perrinet, L., Castet, E., and Masson, G. S. (2007). ‘Bayesian modeling of
dynamic motion integration’. J Physiol Paris 101(1–3): 64–77.
Mussap, A. J. and Te Grotenhuis, K. (1997). ‘The influence of aperture surfaces on the barber-pole illusion’.
Perception 26(2): 141–52.
Nakayama, K. and Silverman, G. H. (1988). ‘The aperture problem—II. Spatial integration of velocity
information along contours’. Vision Research 28(6): 747–53.
Pack, C. C. (2001). ‘The aperture problem for visual motion and its solution in primate cortex’. Sci Prog
84(Pt 4): 255–66.
Pack, C. C. and Born, R. T. (2001). ‘Temporal dynamics of a neural solution to the aperture problem in
visual area MT of macaque brain’. Nature 409(6823): 1040–2.
Pack, C. C., Gartland, A. J., and Born, R. T. (2004). ‘Integration of Contour and Terminator Signals in
Visual Area MT of Alert Macaque’. J Neurosci 24(13): 3268–680.
Pack, C. C., Livingstone, M. S., Duffy, K. R., and Born, R. T. (2003). ‘End- stopping and the aperture
problem: two-dimensional motion signals in macaque V1’. Neuron 39(4): 671–80.
Pei, Y. C., Hsiao, S. S., and Bensmaia, S. J. (2008). ‚The tactile integration of local motion cues is analogous
to its visual counterpart’. Proc Natl Acad Sci USA 105(23): 8130–5.
Perrinet, L. U. and Masson, G. S. (2012). ‘Motion-Based Prediction is Sufficient to Solve the Aperture
Problem’. Neural Computation 24(10): 2726–50.
Petter, G. (1956) ‘Nuove ricerche sperimentali sulla totalizzazione percettiva’. Rivista di psicologia
50: 213–27.
Schall J. D. (2000). ‘Decision making: From sensory evidence to a motor command’. Current Biology
10(11): R404-R406.
Sekuler, R., Sekuler, A. B., Lau, R. (1997). ‘Sound alters visual motion perception’. Nature 385: 308.
Shiffrar, M. and Pavel, M. (1991). ‘Percepts of rigid motion within and across apertures’. JEPHPP
17(3): 749–61.
Shimojo, S., Silverman, G. H., and Nakayama, K. (1989). ‘Occlusion and the solution to the aperture
problem for motion’. Vision Research 29(5): 619–26.
Stein, B. E. and Meredith, M. A. (1993). The Merging of the Senses. (Cambridge, MA: MIT Press).
Stoner, G., Albright, T., and Ramachandran, V. (1990). ‘Transparency and coherence in human motion
perception’. Nature 344(6262): 153–5.
Tanaka, K. and Saito, H. A. (1989). ‘Analysis of motion of the visual field by direction, expansion/
contraction, and rotation cells clustered in the dorsal part of the medial superior temporal area of the
macaque monkey’. Journal of Neurophysiology 62(3): 626–41.
Todorovic D., (1996). ‘A gem from the past: Pleikart Stumpf ’s (1911) anticipation of the aperture problem,
Reichardt detectors, and perceived motion loss at equiluminance’. Perception 25(10): 1235–42.
Tootell, R. B. H., Reppas, J. B., Kwong, K. K., Malach, R., Born, R. T., Brady, T. J., et al. (1995). ‘Functional
analysis of human MT and related visual cortical areas using magnetic resonance imaging’. Journal of
Neuroscience 15(4): 3215.
Vallortigara, G. and Bressan, P. (1991). ‘Occlusion and the perception of coherent motion’. Vision Research
31(11): 1967–78.
Vezzani et al. (this volume), In The Handbook of Perceptual Organization, edited by J. Wagemans.
(Oxford: Oxford University Press).
Wallach, H. (1935). ‘Über visuell wahrgenommene Bewegungsrichtung’ Psychologische Forschnung
20: 325–380.
520 Bruno and Bertamini
Weiss, Y., Simoncelli, E. P., and Adelson, E. H. (2002). ‘Motion illusions as optimal percepts’. Nat Neurosci
5(6): 598–604.
Wilson, H. R. and Kim, J. (1994). ‘Perceived motion in the vector sum direction’. Vision Research
34(14): 1835–42.
Wilson, H. R., Ferrera, V. P., and Yo, C. (1992). ‘A psychophysically motivated model for two-dimensional
motion perception’. Vis Neurosci 9(1): 79–97.
Wohlschlager, A. (2000). ‘Visual motion priming by invisible actions’. Vision Research 40(8): 925–30.
Wright, M. J. and Gurney, K. N. (1997). ‘Coherence and motion transparency in rigid and nonrigid plaids’.
Perception 26(5): 553–67.
Wuerger, S., Shapley, R., and Rubin, N. (1996). ‘ “On the visually perceived direction of motion” by Hans
Wallach: 60 years later’. Perception, 25, 1317–67.
Yabe, Y., Watanabe, H., and Taga, G. (2011). ‘Treadmill experience alters treadmill effects on perceived
visual motion’. PLoS One 6(7): e21642.
Yo, C. and Wilson, H. R. (1992). ‘Perceived direction of moving two- dimensional patterns depends on
duration, contrast and eccentricity’. Vision Research 32(1): 135–47.
Chapter 25
Introduction
Relative motion is one of the phylogenetically oldest and most compelling sources of information
about distance from one’s viewpoint (depth). Disparities between the left and right eye’s perspec-
tives are quite informative too, and stereopsis (depth perception on the basis of such disparities) is
of great help in breaking camouflage (Wardle et al. 2010). Oddly, though, the prerequisite orbital
convergence of the eyes from a lateral to a frontal position seems to have evolved, in primates,
only after the use of vision for reaching and grasping (Isbell 2006). It thus seems that, in order to
see depth, we were getting by just fine without stereopsis—relying only on monocular depth cues
like relative motion.
In part due to us moving about, the projection of the world on our retinae is constantly in
motion. Even when proprioceptive and motor information is unavailable to help us distinguish
between motion generated by the environment and motion generated by ourselves, and even in
the face of conflicting binocular disparity and other depth cues, motion generates strong impres-
sions of depth. Here we review this particular kind of depth perception that depends solely on
relative motion.
The oldest studies in this field concern the phenomenon of stereokinesis, which we discuss first.
Most of the more recent studies focus, instead, on the kinetic-depth effect (KDE), also known as
structure from motion (SfM), which we discuss afterwards.
Stereokinetic effect
Early work
Mach
Ernst Mach (1868, 1886) was the first to report a depth effect created by a figure moving in the
frontoparallel plane. He writes: “A flat linear drawing, monocularly observed, often seems flat. But
if the angles are made variable and motion is introduced, any such drawing immediately stretches
out in depth. One then usually sees a rigid body in rotation”1 (Mach 1886, pp. 99–100). (What
“angles” Mach refers to here remains unclear.)
Mach (1886, p. 102; 1897, p. 108) also discovered an unusual percept induced by either of two
kinds of motion. In the first case, an egg is rolled over a table in such a way that it performs jolting
Our translation.
1
522 Vezzani, Kramer, and Bressan
(a) (b)
Fig. 25.1 (a) An ellipse on a rotating turntable (here represented by the circle) becomes, at the
stereokinetic stage, a rigid disc. (b) A circle with an eccentric dot on a rotating turntable (here
partially represented by the arc) becomes, at the stereokinetic stage, a rigid cone, either pointing
outward or receding inward.
Reproduced V. Benussi, Introduzione alla psicologia sperimentale, Lezioni tenute nell’anno 1922–23, Bicocca
University: Milan, 1922–1923.
movements, rather than smooth rotation. In the second case, the egg is placed horizontally on
the table and is rotated smoothly around a vertical axis. If viewed from a particular angle, in both
cases but more strikingly in the latter, the egg is perceived as a liquid body or large oscillating
drop. The effect disappears immediately if trackable spots are added to the egg’s surface.
Benussi
Peculiarly, the investigation of stereokinesis has been dominated by researchers from the Italian
University of Padua: Benussi, Musatti, Zanforlin, Beghi, Xausa, Vallortigara, and Bressan. In 1921,
Vittorio Benussi noted that some flat stimuli in slow rotation in the frontal plane appear to trans-
form into solid, cyclically moving 3-D objects (Musatti 1924; see also Benussi 1922–1923, 1925,
1927). Because the perceived corporeity of these illusory objects is similar to that of stereoscopi-
cally perceived ones, Benussi called the phenomenon stereokinetic. He thought the illusion arises
because of past experience with solid objects.
Benussi observed that, while watching an ellipse on a rotating turntable (Figure 25.1a)2, three
separate percepts arise in order. First, the ellipse appears to rotate rigidly around both the turntable’s
centre and its own. Second, the ellipse becomes an elastic, constantly deforming ring or disc that still
rotates around the turntable’s centre, but no longer around its own centre (best effects are obtained
if the ellipse’s axes have a 3:2 ratio; Wallach et al. 1956). At this stage, the percept is similar to Mach’s
rotating egg, but still 2-D, and therefore strictly speaking not stereokinetic; nevertheless, it has since
been studied in its own right (e.g., Weiss and Adelson 2000). Third, the ellipse suddenly appears
to disconnect from the turntable and becomes a rigid ring or disc slanted in depth, that while still
rotating around the turntable’s centre, also oscillates about its own centre. It is perceived to repeat-
edly reverse in depth, with its farthest edge becoming its closest and vice versa (Benussi 1922–1923).
Bressan and Vallortigara (1986a) later reported that, if observation continues, the third percept
is followed by a fourth—an elongated egg whose ends are located at different distances from the
observer and rotate in the frontal plane (see also Mefferd’s “cigar:” Mefferd 1968a, 1968b; Wieland
and Mefferd 1968). The disc and the egg alternate in time, separated by brief intervals in which
either a rotating rigid ellipse or a distorting elastic one are perceived (Vallortigara et al. 1988; see also
Mefferd 1968a). Benussi and his student Musatti (1924) basically only studied contour ellipses, but
all the percepts described above, including the fourth, obtain with both contour and filled ellipses.
Benussi (1927) described stereokinetic solids as “moving with astounding grace, smoothness,
elasticity, and ease, rhythmically and adroitly.”3 No surprise they attracted the attention of artists. In
the early 1920s, artist Marcel Duchamp created a series of Rotoreliefs: discs depicting circles and spi-
rals that, when rotating, produce percepts of depth. His stereokinetic displays were basically com-
plex versions of Benussi’s, and were created later. However, Duchamp had already used rotation in
previous art works (<www.marcelduchamp.net/ecatalogue.htm>). Quite possibly, therefore, he dis-
covered the stereokinetic effect independently from Benussi. In 1926, Duchamp portrayed ten of his
Rotoreliefs in the six-minute film Anémic Cinéma (D’Aversa 2007; note the illusory-contour rings
at 1:50 minutes into the film). Some Rotoreliefs were also used in Hans Richter’s 1947 surrealist film
Dreams that Money can Buy (<www.youtube.com/watch?feature=player_embedded&v=mJ5Cl30_
KvE>). More recently, the psychologist and artist Frederick S. Duncan (1975) has created remark-
ably powerful stereokinetic discs he called psychokinematic objects.
Musatti
Benussi’s assistant at the University of Padua, Cesare Musatti, authored the first published paper
on stereokinesis (Musatti 1924), followed by several others (e.g., Musatti 1928, 1975). He general-
ized to other stereokinetic stimuli Benussi’s three perceptual stages. First, rigid veridical motion is
perceived on a plane. Second, either relative motion between different parts of the stimulus or an
“ameboid” deformation is seen. And third, a stereokinetic solid emerges. Musatti argued that, with
few exceptions (such as inhomogeneously colored ellipses, e.g. Musatti 1929; for an English transla-
tion of some of Musatti’s observations, see Albertazzi 2004), the relative-motion or ameboid stage
is a necessary precursor to the stereokinetic stage. He proposed two completely different explana-
tions for the second and third stages (Musatti 1924). He explained the third, like Benussi, with past
experience with rotating solids, and the second with what he called “orientation stability.”
Orientation stability
Before turning to perception Musatti had studied mathematics, and in 1928 he was the first to use
vector analysis to describe perceptual phenomena—a particularly helpful approach subsequently
adopted by others (e.g., Johansson 1950; Wallach 1935; see also Giese chapter, this volume).
Musatti suggested considering, for example, a rotating turntable with two nested circles and two
virtual points, one on each circle (Figure 25.2a). During a 90o rotation, the two points maintain
the same position relative to each other (compare Figure 25.2a to Figure 25.2b). However, if the
two points are not marked, it is impossible to keep track of them, and the rotation goes unno-
ticed: a phenomenon called orientation stability (Musatti 1924) or identity imposition (Wallach
and Centrella 1990). If the rotational component of the stimulus’ motion is removed, only a trans-
latory component remains, and this is what is observed. That is, during the 90o rotation, the virtual
points on the two circles appear neither to take part in this rotation, nor to remain fixed relative
to one another, but to translate relative to one another (Figure 25.2c).
If, instead of two circles, only a single ellipse is presented, then this relative translation is not
seen between virtual points on different shapes, but between different virtual points on the same
shape. In this case, the ellipse is perceived to continually deform.
The phenomenon of orientation stability also occurs with some figures whose contours are not
uniform and should therefore not produce it (Musatti 1924, 1955, 1975; Proffitt et al. 1992). For
Fig. 25.2 After a 90° clockwise rotation, the two points marked by grey triangles in (a) will have
moved as in (b), but due to orientation stability they seem to have moved as in (c).
Adapted from Dennis R. Proffitt, Irvin Rock, Heiko Hecht, and Jim Schubert, Stereokinetic effect and its relation to
the kinetic depth effect, Journal of Experimental Psychology: Human Perception and Performance,
18(1), pp. 3–21, http://dx.doi.org/10.1037/0096-1523.18.1.3 © 1992, American Psychological Association.
example, if the contours of the two circles in Figure 25.2 are dashed rather than solid, one still
does not see the circles rotate together, as they physically do, but translate relative to each other.
Meanwhile, the dashes are perceived to slide along the circles’ contours—an effect that Musatti
recognized but never reconciled with his theory.
concentric circles the stimulus contains, the more compelling the stereokinetic effect, but whether
this also affects the height of the cone is unclear: some reported that it does (e.g., Wallach and
Centrella 1990), others that it does not (e.g., Robinson et al. 1985; Zanforlin 1988a).
Musatti (1924, 1928–1929, 1955, 1975) reasoned that the cone could appear rigid only if its
base were physically slanted relative to the observer, and the base does indeed look slanted. But,
if the base were physically slanted, its retinal projection would be an ellipse; instead, it is a circle.
To solve this “geometrical paradox,” Musatti (1955, 1975) proposed that, because of a general ten-
dency of all points on the stimulus to appear equally far from the observer, (a) the eccentric dot
that becomes the cone’s apex “resists” coming closer to the observer, and (b) the circle “resists”
becoming slanted. Whereas the first kind of “resistance” should decrease the cone’s height and
increase its slant, the second should do the opposite. Some compromise between the two might
then determine how the cone is perceived. However, because the two “resistances” cannot be
quantified, this hypothesis is untestable (Zanforlin 1988b).
Recent work
The minimum-relative-motion principle
Zanforlin (1988a,b; see also related work by Beghi et al. 1991a,b; Beghi et al. 2008; Liu 2003)
proposed a new model, based on a version of the Gestalt “minimum principle” (see van der Helm
chapter, this volume), which includes the minimization of relative velocity differences within a
percept. When this minimization eliminates them all, the percept is rigid, but this rigidity is a
mere byproduct.
In the case of the stereokinetic cone, the model of Zanforlin and colleagues involves two sep-
arate minimizations of relative velocity differences: the first explains orientation stability, the
second the emergence of the stereokinetic solid. The process is illustrated in Figure 25.3. First
minimization: the farther away each point of the circle is from the turntable’s centre c, the longer
the physical trajectory it covers during rotation and, thus, the faster it moves (Figure 25.3a). When
orientation stability is reached, however, all these differences in velocity disappear (Figure 25.3b).
Second minimization: the velocity of the eccentric dot e is different from that of the points on the
526 Vezzani, Kramer, and Bressan
o e b c o e b c
(c) a''
b''
a o b c
Fig. 25.3 (a) When the circle rotates around the turntable’s centre c, its points move at different
velocities. For example, the trajectory a-a’ is longer than the trajectory b-b’, and a moves therefore
faster than b. (b) When stability of orientation is reached, all points cover equally long trajectories
and therefore have the same velocity. The trajectory and velocity of the eccentric dot e, however,
are unaffected by the orientation stability of the circle, and remain different from those of a
and b. (c) The bar ab moves (solid arrows) around the turntable’s centre c. After a 90o rotation of the
turntable, it ends up as a’’b’’. What is perceived before the stereokinetic transformation, however,
is that the bar ab rotates clockwise around its own centre, which concurrently moves from o to o’
along a clockwise circular path. The two components into which the linear velocity of a and b can
be subdivided occur simultaneously, but their description may be simplified by imagining them as
consecutive: in this case, ab would move to a’b’ (dashed arrows) and a’b’ would move to a’’b’’
(dotted arrows).
circle, and by the addition of a depth component, another minimization of velocity differences
takes place. It results in a rigid cone whose points, including e, all have the same velocity (for a
complete geometrical analysis, see Zanforlin 1988a,b).
The minimum-relative-motion explanation can be extended to the rotating ellipse and the
rotating bar (Beghi et al. 2008; Zanforlin 1988b, 2000; Zanforlin and Vallortigara 1988). Here we
will describe how it applies to the latter, which is a case of stereokinesis on inadequate basis.
At first, a bar drawn radially on a rotating turntable is simply perceived to move around
the turntable’s centre, like a rotating clock hand. After a while, it seems to rotate around its
Stereokinetic Effect, Kinetic Depth Effect, and Structure from Motion 527
own centre as well (Figure 25.3c), and finally, all of a sudden, it looks slanted into 3-D space
(Mefferd and Wieland 1967; Musatti 1955; Renvall 1929). The bar end that is farther away from
the centre of rotation appears closer to the observer. The bar never becomes elastic; hence, its
stereokinetic transformation cannot be explained as a rigid interpretation of a non-rigidity.
It can, however, be explained within the minimum-relative-motion model (Zanforlin and
Vallortigara 1988).
Again, two separate minimizations of relative velocity differences are involved. The first explains
the rotation of the bar around its own centre, the second the bar’s dislocation in depth. In Figure
25.3c, a moves faster then o and o moves faster than b. The linear velocity of a and b can be
subdivided into a common component, identical to that of o, and a residual one. If only the first
component were present, the points a, b, and o would be motionless relative to one another, and
would move at the same velocity with respect to the turntable’s centre c. Once this component is
subtracted from the motion of a and b, a second component remains: a and b appear to rotate
around o, at the same speed but in opposite directions. This corresponds to the apparent rotation
of the bar around its own centre.
The speed difference between a and b disappears as a result of the first minimization. However,
because of the residual motion component, the velocities of a and b are still different from the
velocity of o. According to Zanforlin and Vallortigara (1988; for a geometrical demonstration see
also Beghi et al. 2008; Zanforlin 2000), the second minimization makes the three velocities iden-
tical by slanting the bar in depth.
(a) (b)
(c) (d)
Fig. 25.4 The stimulus (a), in rotation, produces the Saturn illusion, which includes a (partially)
illusory ring. The stimulus (b) produces the Saturn illusion, but no moving phantoms connecting
the three bottom bars to the illusory ring. The stimulus (c) produces the Saturn illusion with a
“diadem-like” illusory ring in which the three bottom bars, although locally identical to (b), are
connected to the ring by moving phantoms, as depicted in (d).
Reproduced from P. Bressan and G. Vallortigara, Stereokinesis with moving visual phantoms, Perception
16(1), pp. 73–8, Figures 25.1, 25.3, and 25.4 Copyright © 1987, Pion. With kind permission from Pion Ltd,
London www.pion.co.uk and www.envplan.com.
nested dashed circles: occasionally, the gaps between the dashes on one circle appeared to link up
with the gaps on the other, fleetingly forming illusory contours. For details, see Albertazzi 2004.)
Stereokinesis can also affect perceived color, by creating 3-D perceptual objects that are then
filled-in with the color of nearby elements (neon color spreading: for a review, see Bressan et al.
1997). For example, after some observation time, two small red discs on a rotating turntable
give rise to a slightly reddish cylinder spanning between them (Figure 25.5a; see Zanforlin 2003;
Zanforlin and Vallortigara 1990). If the two red discs are replaced by red circles, neon color
spreading does not occur (Figure 25.5b), unless at least one of the circles has a gap that is oriented
towards the other (Figure 25.5c). (For a separate demonstration of neon color spreading in ste-
reokinesis, see Bressan and Vallortigara 1991.)
Fig. 25.5 Rotation of each of the stimuli (a), (b), and (c) produces an illusory cylinder. The inducing
elements are red (here shown in grey) and the cylinder is reddish in (a) and (c), and colorless in
(b). Similar stereokinetic effects can also be obtained with black inducers, but in this case only the
illusory-contour cylinder in (a) is tinged.
Reproduced from M. Zanforlin and G. Vallortigara, The magic wand: a new stereokinetic anomalous surface,
Perception 19(4), pp. 447–57, Copyright © 1990, Pion. With kind permission from Pion Ltd, London www.pion.
co.uk and www.envplan.com.
b
c
a
Fig. 25.6 The device used by Metzger (1934). The turntable b with the vertical rods is set in rotation.
The rods are illuminated by the light source c and their shadows are projected onto a translucent
screen a.
Reproduced from Psychologische Forschung, 19(1), pp. 1–60, Beobachtungen über phänomenale Identität,
Wolfgang Metzger, © 1934, Springer-Verlag. With kind permission from Springer Science and Business Media.
known that the blades of a windmill silhouetted against the sky often reverse their apparent direc-
tion of motion. To investigate this phenomenon, Miles (1931) projected on a screen the shadow
of a two-bladed rotating fan. His observers reported, among other things, a rotary motion that
often reversed. As Musatti (1955) had already noticed in stereokinesis, what the observers saw was
affected by the experimenter’s suggestions.
Metzger used a method similar to Miles’s, but with the device illustrated in Figure 25.6. A set of
thin rods stood on a rotating horizontal turntable; the rods’ shadows were cast onto a translucent
screen. The relatively large distance between the light source and the turntable (five meters) and
the relatively small distance between the turntable and the screen (as small as possible) ensured
that the projection was approximately orthographic rather than perspective. Whereas in a per-
spective projection all imaginary projection lines meet at one point, in orthographic projection
they are (a) parallel to one another (parallel projection) and (b) orthogonal to the projection
530 Vezzani, Kramer, and Bressan
Fig. 25.7 If stimulus (a) is set in rotation behind aperture (b), observers see a solid pyramid (c).
Data from Metzger, Wolfgang. translated by Lothar Spillman., Laws of Seeing, 2006, The MIT Press.
plane. Thus, in orthographic projection, unlike in perspective projection, identical objects at dif-
ferent distances all cast identical images onto the projection plane. In this way, orthographic pro-
jections allow the removal of perspective cues to depth. To ensure that indeed all perspective cues
to depth were eliminated, Metzger also blocked the ends of the rods from view; on the screen, they
all had the same height. The shadows of the rods moved horizontally over the screen, with con-
stantly changing distances between them. The velocity of the turntable was uniform, and hence,
each shadow performed a simple harmonic motion.
With this device, observers initially see the shadows move horizontally in 2-D. When they overlap,
the shadows can be seen to either stream (that is, to continue in the same direction) or bounce. For
individuals who tend to see streaming rather than bouncing, the 2-D percept is eventually replaced by
one of circular motion in 3-D: the kinetic depth effect (KDE). While the variable (harmonic) motion
of each shadow becomes perceptually uniform, the relative motion between them disappears and
they unite into a rigid whole. The shadows then appear as edges and no longer as independent lines.
Metzger’s explanation is that, in accordance with Gestalt theory (e.g., Wertheimer 1923; for reviews,
see Wagemans et al. 2012a,b; also Wagemans, this volume; van der Helm, this volume), the visual
system appears to adopt the simplest and most stable (least changing) interpretation of the stimulus.
Metzger noted that the initial 2-D percept might be due to the thin rods’ shadows appearing, at
first, as figures (e.g., Metzger 1935, section 19). At this stage there would be no deforming surfaces
because the space between the shadows is seen as background, and backgrounds have no shape of
their own (Rubin 1921). Later, the rods’ shadows appear as borders of continually deforming sur-
faces. Only then can a tendency to minimize deformations arise—producing the rigid 3-D percept.
This idea was put to test by Giorgio Tampieri (1956, 1968), who used stimuli composed of colored
areas that could only be perceived as surfaces (Figure 25.7a). If the hypothesis were correct, the 3-D
percept should emerge virtually right away. For example, Tampieri rotated Figure 25.7a’s polygon
around its centre, behind a screen with a wedge-shaped aperture whose apex coincided with the
polygon’s centre (Figure 25.7b). What observers saw was one face after another of a solid rotating
pyramid (Figure 25.7c). Tampieri reported that the impression of depth was more compelling than
in Benussi and Musatti’s stimuli and indistinguishable from that produced by a real pyramid. More
importantly, the depth percept emerged instantaneously, confirming the hypothesis.
Wallach
According to Wallach and colleagues (Wallach and O’Connell 1953; Wallach et al. 1953), any 3-D
percept of a monocular, static stimulus is based on a learned association between a 2-D retinal
projection and a 3-D structure. Wallach and colleagues argued that, initially, it is the KDE that
allows the 3-D structure of an object to be perceived. Because such a structure becomes associated
Stereokinetic Effect, Kinetic Depth Effect, and Structure from Motion 531
with the object’s retinal projection, this projection will subsequently evoke the 3-D structure even
when the object does not move.
To test this hypothesis Wallach and colleagues investigated, using Metzger’s technique, various
simple wire objects, whose orthographic 2-D projections are interpreted as 3-D only when they
move. They presented stationary projections up to seven days after subjects had viewed the moving
ones. Nearly all subjects perceived the stationary projections as coming from 3-D objects, whereas
before exposure to the KDE, they did not. (For a related modern study, see Sinha and Poggio 1996.)
Wallach and O’Connell (1953) thought they had demonstrated the necessary and sufficient
conditions of the KDE: the projected contours had to change in both length and orientation.
Although Metzger had shown that changes in length (of the spaces between contours) were
enough, Wallach and O’Connell doubted whether the phenomenon described by Metzger could
be experienced by naïve observers—unless prompted about what they should see. However,
White and Mueser (1960) confirmed Metzger’s findings, and actually extended them to displays
with two rods only. Later studies showed that whereas the KDE is stronger with both length
and orientation changes, the former is sufficient (e.g., Börjesson and von Hofsten 1972, 1973;
Johansson and Jansson 1968).
Wallach and colleagues also proposed that stereokinesis could be explained by simultaneous
changes in the length and orientation of virtual, rather than real, lines. Consider, for example,
a rotating disc with two nested, non-concentric circles and a virtual line that connects them.
Because of orientation stability, the two circles appear to move relative to each other and this
causes the virtual line to change in both length and orientation. Thus, at least some stereokinetic
stimuli could be seen as forms of KDE (Wallach and Centrella 1990; Wallach et al. 1956).
Ullman
The rigidity assumption
Wallach and O’Connell (1953) investigated, but did not explain, the KDE. Ullman (1977; 1979a,b),
calling the same phenomenon structure from motion (SfM), did and his first use of a computa-
tional approach proved very influential.
Ullman studied the orthographic projection of two transparent virtual cylinders with a
common vertical axis (Figure 25.8; for a related demonstration, see <www.youtube.com/
watch?v=RdwU28bghbQ>). Each cylinder was defined by 100 points, scattered across its virtual
surface. The cylinders were perceived as such when rotating, but appeared flat when stationary.
The perception of SfM with this type of stimulus allowed the exclusion of an explanation (based
on Gestalt grouping by common fate) in which points must be grouped into objects before any
depth is recovered. In fact, even if the points sitting on each cylinder move at the same speed in
3-D space, their 2-D projections span an ample range of velocities. In the stimulus of Figure 25.8,
various points belonging to the same cylinder move at different speeds, whereas various points
belonging to different cylinders move at the same speed.
In principle, the 2-D projections can be produced by an infinite number of rotating 3-D objects
(Eriksson 1973). Like others before him (e.g., Johansson 1975), Ullman assumed that 3-D objects
are perceived as rigid. His structure-from-motion theorem states that, given this rigidity assumption,
three distinct orthographic or perspective views of just four non-coplanar points4 suffice to nar-
row the possibilities down to just one correct solution. It follows that an object cannot possibly be
How the points in one view are correctly matched to those in another view is a called the correspondence problem.
4
Because this is typically studied as a separate topic we will not discuss it here; see Herzog and Ogmen, this volume.
532 Vezzani, Kramer, and Bressan
Fig. 25.8 A side view of two nested cylinders exclusively defined by dots (outlines were not
presented), illuminated from the right and projected orthographically onto a screen on the left.
Adapted from Ullman, Shimon, The Interpretation of Visual Motion, figure 4.1, page 135, © 1979 Massachusetts
Institute of Technology, by permission of The MIT Press.
perceived as rigid when it is not, and that incorrect “phantom structures” cannot emerge either; “the
interpretation scheme is virtually immune to misinterpretation” (Ullman 1979b, p. 411). However,
2-D orthographic projection determines a 3-D object only up to a reflection about the frontal plane.
That is, the perceived 3-D object can reverse in depth, while simultaneously inverting its apparent
direction of rotation, a bistability that is unavoidable with orthographically projected stimuli.
Braunstein and Andersen (1984) presented evidence against the rigidity assumption. However,
Ullman (1979a,b; 1984a) was already aware that 2-D projections could lead not only to rigid, but
also to non-rigid, SfM percepts (e.g., Braunstein 1962; Green 1961; Wallach and O’Connell 1953;
Wallach et al. 1956; White and Mueser 1960). He claimed that non-rigid SfM only occurs if the
2-D projection (a) looks 3-D even when stationary—as in the case of a distorting Necker cube—or
(b) is misperceived—as in the case of smooth contours lacking distinguishable, traceable features.
Because it tends to be initially inaccurate and to improve with each update, the internal
model accounts at least qualitatively for the fact that human SfM perception improves with
observation time. Yet, Ullman (1984b) admitted that the model had an important draw-
back: even after a long exposure time, the recovered model of a rigid 3-D object still con-
tains residual non-rigid distortions. (For an elaboration of Ullman’s ideas, see Grzywacz and
Hildreth 1987; Hildreth et al. 1995.)
rather than globally (Domini and Braunstein 1998; for a review, see Domini and Caudek 2003).
Locally computed optic-flow deformation does suffice to recover the local affine properties of objects
(Koenderink 1986; Koenderink and van Doorn 1991). By itself, though, the recovery of these affine
properties still leaves room for an infinite number of interpretations of a particular projection. Figure
25.9, for example, shows two doors. The first is narrow and swings open fast (Figure 25.9a). The sec-
ond is wide, already partially open, but swings further open more slowly (Figure 25.9b). In both cases
the projected widths of the doors shrink; and, for particular widths and rotational velocities, the two
doors produce exactly the same optic flow. In fact, the number of doors that can produce this optic
flow is infinite. Yet, at any one time, our visual system chooses only one of them as its SfM solution.
It has been proposed that, even if other depth cues are ignored, the visual system need not
necessarily be constrained by optic flow alone. In all likelihood, it is also constrained by noise
within the visual system. If it is assumed that deformation values are subject to Gaussian noise,
then it turns out that, given the observed 2-D deformation, different 3-D interpretations have a
different posterior probability of being correct (Domini and Caudek 2003). As its SfM solution,
the visual system might therefore adopt the particular 3-D interpretation that maximizes this
posterior probability. In the example of Figure 25.9, it will thus adopt one particular pair of slant
and rotational velocity values to arrive at one unambiguous SfM solution. The authors suggest,
though, that in order to assess posterior probabilities some learning may be required. With this
observation, we thus seem to have come full circle in this chapter; one of the first conjectures we
(a)
(b)
Fig. 25.9 Projections of two opening doors viewed from above. In each panel, the solid bar on the
left represents a door that opens until it reaches the position indicated by the dashed bar. The solid
bar on the right represents a 2-D projection screen. The dotted lines represent projection lines from
the door onto the 2-D screen. The door is relatively narrow and initially closed in (a) and relatively
wide and initially already partially open in (b). Notice, however, that despite that the doors differ in
width, their projections on the screen are identical.
Reprinted from Trends in Cognitive Sciences, 7(10), Fulvio Domini and Corrado Caudek, 3-D structure perceived
from dynamic information: a new theory, pp. 444–9, Copyright (2003), with permission from Elsevier.
Stereokinetic Effect, Kinetic Depth Effect, and Structure from Motion 535
reported here about how 3-D percepts might arise from 2-D stimuli involved this very idea that
learning from past experience would be essential.
Until now, we have only considered orthographic projections of dynamic stimuli. The projec-
tion of the world onto our retinae, however, is perspective, not orthographic. In orthographic
projections, the projected distance between two points in a frontal plane does not depend on this
plane’s depth (i.e., its distance along the z-axis). In perspective projections, in contrast, it does; it
decreases with depth until it approaches zero at the vanishing point. Consequently, in perspective
projections, the further away a point is that moves a particular distance, the smaller its projected
traversed distance—and thus, the smaller its projected velocity. Stated more generally, in perspec-
tive projections, unlike orthographic ones, projected velocity is inversely proportional to depth.
This motion perspective is indeed used by our visual system (Jain and Zaidi 2011). Still, when
objects are fairly shallow, or not very close to the observer, their perspective projection approxi-
mates an orthographic one. At this point, the use of motion perspective becomes impossible. For
this reason, even though strictly speaking it is unwarranted, it is often reasonable to assume that
the projection of an object onto our retinae is orthographic.
Conclusion
There is a consensus that the recovered structure in structure from motion (a) depends on
local, rather than global, computations, (b) is—under most conditions—at best affine, rather
than Euclidean, and (c) need not be rigid. A recurring idea, in both structure from motion and
536 Vezzani, Kramer, and Bressan
stereokinesis, is that the visual system favours interpretations—whether 3-D or not—of 2-D
motion that contain as little motion as possible. Finally, an idea that has been around almost
since the beginning, but has attracted little systematic study, is that past experience may play a
key role.
Among others, studies on long-time congenitally blind patients who have recently gained their
sight suggest that past experience may, in fact, be more important for perception than has previ-
ously been thought (Ostrovsky et al. 2006; Ostrovsky et al. 2009). These patients, for example, have
difficulty parsing a simple stimulus consisting of a circle and a square that overlap; to them, the
stimulus appears to contain three non-overlapping shapes rather than just two overlapping ones.
However, if the circle and square are set in motion relative to each other, the patients suddenly
perceive what remains invariant: not the three non-overlapping shapes, but the circle and the
square. Even more importantly, despite a critical period for the development of visual perception
has presumably long passed, this experience subsequently helps the patients to parse in a normal
way stationary stimuli too. It has been argued that the processing of invariants is critical to the
perception of optic flow as well (e.g., Gibson 1979; Marr 1982). If so, uncovering how this percep-
tual learning unfolds over time could be a particularly fruitful way forward in the study of both
stereokinesis and structure from motion.
References
Albertazzi, L. (2004). Stereokinetic shapes and their shadows. Perception 33: 1437–52.
Andersen, R. A. and Bradley, D. C. (1998). Perception of three-dimensional structure from motion. Trends
in Cognitive Sciences 2: 222–8.
Ban, H., Preston, T. J., Meeson, A., and Welchman, A. E. (2012). The integration of motion and disparity
cues to depth in dorsal visual cortex. Nature Neuroscience 15: 636–43.
Beghi, L., Xausa, E., and Zanforlin, M. (2008). Modelling stereokinetic phenomena by a minimum relative
motion assumption: The tilted disk, the ellipsoid and the tilted bar. Biological Cybernetics 99: 115–23.
Beghi, L., Xausa, E., De Biasio, C., and Zanforlin, M. (1991a). Quantitative determination of the
three-dimensional appearances of a rotating ellipse without a rigidity assumption. Biological Cybernetics
65: 433–40.
Beghi, L., Xausa, E., and Zanforlin, M. (1991b). Analytic determination of the depth effect in stereokinetic
phenomena without a rigidity assumption. Biological Cybernetics 65: 425–32.
Benussi, V. (1922–1923). Introduzione alla psicologia sperimentale. Lezioni tenute nell’anno 1922–23.
Typescript by Dr. C. Musatti, Fondo Benussi. Milan: Bicocca University.
Benussi, V. (1925). La suggestione e l’ipnosi come mezzi di analisi psichica reale. Bologna: Zanichelli.
Benussi, V. (1927). Zur experimentellen Grundlegung hypnosuggestiver Methoden psychischer Analyse.
Psychologische Forschung 9: 197–274.
Börjesson, E. and von Hofsten, C. (1972). Spatial determinants of depth perception in two dot patterns.
Perception & Psychophysics 11: 263–8.
Börjesson, E. and von Hofsten, C. (1973). Visual perception of motion in depth: Application of vector
model to three-dot motion patterns. Perception & Psychophysics 13: 169–79.
Braunstein, M. L. (1962). Depth perception in rotating dot patterns: Effects of numerosity and perspective.
Journal of Experimental Psychology 64: 415–20.
Braunstein, M. L. and Andersen, G. J. (1984). A counterexample to the rigidity assumption in the visual
perception of structure from motion. Perception 13: 213–17.
Bressan, P. and Vallortigara, G. (1986a). Multiple 3-D interpretations in a classic stereokinetic effect.
Perception 15: 405–8.
Stereokinetic Effect, Kinetic Depth Effect, and Structure from Motion 537
Bressan, P. and Vallortigara, G. (1986b). Subjective contours can produce stereokinetic effects. Perception
15: 409–12.
Bressan, P. and Vallortigara, G. (1987a). Stereokinesis with moving visual phantoms. Perception 16: 73–8.
Bressan, P. and Vallortigara, G. (1987b). Learning to see stereokinetic effects. Perception 16: 187–92.
Bressan, P. and Vallortigara, G. (1991). Illusory depth from moving subjective figures and neon colour
spreading. Perception 20: 637–44.
Bressan, P., Mingolla, E., Spillmann, L., and Watanabe T. (1997). Neon colour spreading: A review.
Perception 26: 1353–66.
D’Aversa, A. S. [Lottedyskolia] (2007, April 20). Marcel Duchamp—Anemic Cinema [Video file]. Retrieved
from <http://www.youtube.com/watch?v=dXINTf8kXCc&list=UU4CDskGLhCGq0jYuHRTR81g&in
dex=18>.
Domini, F. and Braunstein, M. L. (1998). Recovery of 3-D structure from motion is neither Euclidean nor
affine. Journal of Experimental Psychology: Human Perception and Performance 24: 1273–95.
Domini, F. and Caudek, C. (2003). 3-D structure perceived from dynamic information: A new theory.
Trends in Cognitive Sciences 7: 444–9.
Domini F., Caudek, C., and Tassinari, H. (2006). Stereo and motion information are not independently
processed by the visual system. Vision Research 46: 1707–23.
Duncan, F. S. (1975). Kinetic art: On my psychokinematic objects. Leonardo 8: 97–101.
Eriksson, E. S. (1973). Distance perception and the ambiguity of visual stimulation: A theoretical note.
Perception & Psychophysics 13: 379–81.
Fischer, G. J. (1956). Factors affecting estimation of depth with variations of the stereokinetic effect.
American Journal of Psychology 69: 252–7.
Gibson, J. J. (1979). The ecological approach to visual perception. Boston: Houghton Mifflin.
Green, B. F., Jr. (1961). Figure coherence in the kinetic depth effect. Journal of Experimental Psychology
62: 272–82.
Grzywacz, N. M. and Hildreth, E. C. (1987). Incremental rigidity scheme for recovering structure from
motion: Position-based versus velocity-based formulations. Journal of the Optical Society of America A
4: 503–18.
Hildreth, E. C., Ando, H., Andersen, R. A., and Treue, S. (1995). Recovering three-dimensional structure
with surface reconstruction. Vision Research 35: 117–35.
Isbell, L. A. (2006). Snakes as agents of evolutionary change in primate brains. Journal of Human Evolution
51: 1–35.
Jain, A. and Zaidi, Q. (2011). Discerning non-rigid 3-D shapes from motion cues. Proceedings of the
National Academy of Sciences 108: 1663–8.
Jansson, G. and Johansson, G. (1973). Visual perception of bending motion. Perception 2: 321–6.
Johansson, G. (1950). Configurations in event perception. Uppsala: Almkvist and Wiksell.
Johansson, G. (1975). Visual motion perception. Scientific American 232: 76–88.
Johansson, G. and Jansson, G. (1968). Perceived rotary motion from changes in a straight line. Perception &
Psychophysics 6: 193–8.
Koenderink, J. J. (1986). Optic flow. Vision Research 26: 161–80.
Koenderink, J. J. and van Doorn, A. J. (1991). Affine structure from motion. Journal of the Optical Society
of America A—Optics Image Science and Vision 8: 377–85.
Landy, M. S., Maloney, L. T., Johnston, E. B., and Young, M. (1995). Measurement and modeling of depth
cue combination: In defense of weak fusion. Vision Research 35: 389–412.
Liu, Z. (2003). On the principle of minimal relative motion—the bar, the circle with a dot, and the ellipse.
Journal of Vision 3: 625–9.
Mach, E. (1868). Beobachtungen über monokulare Stereoskopie. Sitzungsberichte der Wiener Akademie 58.
538 Vezzani, Kramer, and Bressan
Mach, E. (1886). Beiträge zur Analyse der Empfindungen. Jena: Gustav Fischer. English
translation: Contributions to the analysis of the sensations, C. M. Williams (trans.), 1897. Chicago: The
Open Court.
Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual
information. New York: W.H. Freeman and Company.
Mefferd, R. B., Jr. (1968a). Perception of depth in rotating objects: 4. Fluctuating stereokinetic perceptual
variants. Perceptual and Motor Skills 27: 255–76.
Mefferd, R. B., Jr. (1968b). Perception of depth in rotating objects: 7. Influence of attributes of depth on
stereokinetic percepts. Perceptual and Motor Skills 27: 1179–93.
Mefferd, R. B., Jr. and Wieland, B. A. (1967). Perception of depth in rotating objects: 1. Stereokinesis and
the vertical-horizontal illusion. Perceptual and Motor Skills 25: 93–100.
Metzger, W. (1934). Beobachtungen über phänomenale Identität. Psychologische Forschung 19: 1–60.
Metzger, W. (1935). Tiefenerscheinungen in optischen Bewegungsfeldern. Psychologische Forschung
20: 195–260.
Metzger, W. (1975). Gesetze des Sehens. Eschborn: Klotz.
Miles, W. R. (1931). Movement interpretations of the silhouette of a rotating fan. American Journal of
Psychology 48: 392–405.
Musatti, C. L. (1924). Sui fenomeni stereocinetici. Archivio Italiano di Psicologia 3: 105–20.
Musatti, C. L. (1928). Sui movimenti apparenti dovuti ad illusione di identità di figura. Archivio Italiano di
Psicologia 6: 205–19.
Musatti, C. L. (1928–1929). Sulla percezione di forme di figura oblique rispetto al piano frontale. Rivista di
Psicologia 25: 1–14.
Musatti, C. L. (1929). Sulla plasticità reale, stereocinetica e cinematografica. Archivio Italiano di Psicologia
7: 122–37.
Musatti, C. L. (1930). I fattori empirici della percezione e la teoria della forma. Rivista di Psicologia 26: 259–64.
Musatti, C. L. (1931). Forma e assimilazione. Archivio Italiano di Psicologia 9: 61–156.
Musatti, C. L. (1937). Forma e movimento. Atti del Reale Istituto Veneto di Scienze, Lettere e Arti 97: 1–35.
Musatti, C. L. (1955). La stereocinesi e il problema della struttura dello spazio visibile. Rivista di Psicologia
49: 3–57.
Musatti, C. L. (1975). On stereokinetic phenomena and their interpretation. In: G.B. Flores D’Arcais (ed.),
Studies in Perception. Festschrift for Fabio Metelli, pp. 166–89. Milan-Florence: Martello-Giunti.
Nadler, J. W., Angelaki, D. E., and DeAngelis, G. C. (2008). A neural representation of depth from motion
parallax in macaque visual cortex. Nature 452: 642–5.
Nawrot, M. and Blake, R. (1989). Neural integration of information specifying structure from stereopsis
and motion. Science 244: 716–18.
Nawrot, M. and Blake, R. (1991). The interplay between stereopsis and structure from motion. Perception
& Psychophysics 49: 230–44.
Norman, J. F., Todd, J. T., and Orban, G. A. (2004). Perception of three-dimensional shape from specular
highlights, deformations of shading, and other types of visual information. Psychological Science
15: 565–70.
Ostrovsky, Y., Andalman, A., and Sinha, P. (2006). Vision following extended congenital blindness.
Psychological Science 17, 12: 1009–14.
Ostrovsky, Y., Meyers, E., Ganesh, S., Mathur, U., and Sinha, P. (2009). Parsing images via dynamic cues.
Psychological Science 20: 1484–91.
Piggins, D., Robinson, J., and Wilson, J. (1984). Illusory depth from slowly rotating 2-D figures: The
stereokinetic effect. In: W. N. Charman (ed.), Transactions of the First International Congress, “The
Frontiers of Optometry”. London: British College of Ophthalmic Opticians [Optometrists], Vol. 1,
pp. 171–82.
Stereokinetic Effect, Kinetic Depth Effect, and Structure from Motion 539
Proffitt, D. R., Rock, I., Hecht, H., and Schubert, J. (1992). Stereokinetic effect and its relation to the
kinetic depth effect. Journal of Experimental Psychology: Human Perception and Performance 18: 3–21.
Renvall, P. (1929). Zur Theorie der stereokinetischen Phänomene, in E. Kaila (ed.) Annales Universitatis
Aboensis, Series B, 10.
Robinson, J. O., Piggins, D. J., and Wilson, J. A. (1985). Shape, height and angular movement in
stereokinesis. Perception 14: 677–83.
Rogers, B. J. and Graham, M. E. (1984). After effects from motion parallax and stereoscopic
depth: Similarities and interactions. In: L. Spillman and B. R. Wooten (eds.), Sensory experience,
adaptation, and perception: Festschrift for Ivo Kohler, pp. 603–19. Hillsdale: Lawrence Erlbaum and
Associates.
Rubin, E. (1921). Visuell wahrgenommene Figuren. Copenhagen: Gyldendalske.
Sinha, P. and Poggio, T. (1996). Role of learning in three-dimensional form perception. Nature 384: 460–3.
Smith, R. (1738). A Complete System of Optics in Four Books. Cambridge: Printed for the author.
Tampieri, G. (1956). Contributo sperimentale all’analisi dei fenomeni stereocinetici. Rivista di Psicologia
50: 83–92.
Tampieri, G. (1968). Sulle condizioni del movimento stereocinetico. In: G. Kanizsa, G. Vicario (eds.),
Ricerche sperimentali sulla percezione, pp. 199–217. Trieste: Università degli Studi di Trieste.
Todd, J. T. (1998). Theoretical and biological limitations on the visual perception of three-dimensional
structure from motion. In: T. Watanabe (ed.), High-level motion processing- computational,
neurophysiological and psychophysical perspectives, pp. 359–80. Cambridge: MIT Press.
Todd, J. T. and Bressan, P. (1990). The perception of 3-dimensional affine structure from minimal apparent
motion sequences. Perception & Psychophysics 48: 419–30.
Todd, J. T., Oomes, A. H. J., Koenderink, J. J., and Kappers, A. M. L. (2001). On the affine structure of
perceptual space. Psychological Science 12: 191–6.
Todorović, D. (1993). Analysis of two- and three-dimensional rigid and nonrigid motions in the
stereokinetic effect. Journal of the Optical Society of America A 10: 804–26.
Tynan, P. and Sekuler, R. (1975). Moving visual phantoms: A new contour completion effect. Science
188: 951–2.
Ullman, S. (1977). The interpretation of visual motion (Unpublished doctoral dissertation). MIT,
Cambridge, MA.
Ullman, S. (1979a). The interpretation of visual motion. Cambridge: MIT Press.
Ullman, S. (1979b). The interpretation of structure from motion. Proceedings of the Royal Society of London.
Series B, Biological Sciences 203: 405–26.
Ullman, S. (1984a). Rigidity and misperceived motion. Perception 13: 219–20.
Ullman, S. (1984b). Maximizing rigidity: The incremental recovery of 3-D structure from rigid and
nonrigid motion. Perception 13: 255–74.
Vallortigara, G., Bressan, P., and Bertamini (1988). Perceptual alternations in stereokinesis. Perception
17: 31–4.
Vallortigara, G., Bressan, P., and Zanforlin, M. (1986). The Saturn illusion: A new stereokinetic effect.
Vision Research 26: 811–13.
Wagemans, J., Elder, J. H., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M., and von der
Heydt, R. (2012a). A Century of Gestalt Psychology in Visual Perception: I. Perceptual Grouping and
Figure-Ground Organization. Psychological Bulletin 138: 1172–217.
Wagemans, J., Feldman, J., Gepshtein, S., Kimchi, R., Pomerantz, J. R., van der Helm, P.A., and van
Leeuwen, C. (2012b). A Century of Gestalt Psychology in Visual Perception: II. Conceptual and
Theoretical Foundations. Psychological Bulletin 138: 1218–52.
Wallach, H. (1935). Über visuell wahrgenommene Bewegungsrichtung. Psychologische Forschung
20: 325–80.
540 Vezzani, Kramer, and Bressan
Wallach, H. and Centrella N. M. (1990). Identity imposition and its role in a stereokinetic effect. Perception
& Psychophysics 48: 535–42.
Wallach, H. and O’Connell, D. N. (1953). The kinetic depth effect. Journal of Experimental Psychology
45: 205–17.
Wallach, H., O’Connell, D. N., and Neisser, U. (1953). The memory effect of visual perception of
three-dimensional form. Journal of Experimental Psychology 45: 360–8.
Wallach, H., Weisz, A., and Adams, P. A. (1956). Circles and derived figures in rotation. American Journal
of Psychology 69: 48–59.
Wardle, S. G., Cass, J., Brooks, K. R., and Alais, D. (2010). Breaking camouflage: Binocular disparity
reduces contrast masking in natural images. Journal of Vision 10(14) 38: 1–12.
Weiss, Y. and Adelson, E. H. (2000). Adventures with gelatinous ellipses—constraints on models of human
motion analysis. Perception 29: 543–66.
Wertheimer, M. (1923). Untersuchungen zur Lehre von der Gestalt. II. Psychologische Forschung 4: 301–50.
English translation in: L. Spillmann (ed.), On perceived motion and figural organization, pp. 127–82.
Cambridge: MIT Press.
Wexler, M. and van Boxtel, J. A. (2005). Depth perception by the active observer. Trends in Cognitive
Sciences 9: 431–8.
White, B. W. and Mueser, G. E. (1960). Accuracy in reconstructing the arrangement of elements generating
kinetic depth displays. Journal of Experimental Psychology 60: 1–11.
Wieland, B. A. and Mefferd, R. B., Jr. (1968). Perception of depth in rotating objects: 3. Asymmetry and
velocity as the determinants of the stereokinetic effect. Perceptual and Motor Skills 26: 671–81.
Wilson, J. A., Robinson, J. O, and Piggins, D. J. (1983). Wobble cones and wobble holes—the stereokinetic
effect revisited. Perception 12: 187–93.
Zanforlin, M. (1988a). The height of a stereokinetic cone: A quantitative determination of a 3-D effect from
a 2-D moving pattern without a “rigidity assumption.”. Psychological Research 50: 162–72.
Zanforlin, M. (1988b). Stereokinetic phenomena as good gestalts. The minimum principle applied to circles
and ellipses in rotation: A quantitative analysis and a theoretical discussion. Gestalt Theory 10: 187–214.
Zanforlin, M. (1999). La visione tridimensionale dal movimento o stereocinesi. In: F. Purghé, N. Stucchi,
A. Olivero (eds.), La percezione visiva, pp. 438–59. Turin: UTET.
Zanforlin, M. (2000). The various appearances of a rotating ellipse and the minimum principle: A review
and an experimental test with non-ambiguous percepts. Gestalt Theory 22: 157–84.
Zanforlin, M. (2003). Stereokinetic anomalous contours: Demonstrations. Axiomathes 13: 389–98.
Zanforlin, M. and Vallortigara, G. (1988). Depth effect from a rotating line of constant length. Perception
&Psychophysics 44: 493–9.
Zanforlin, M. and Vallortigara, G. (1990). The magic wand: A new stereokinetic anomalous surface.
Perception 19: 447–57.
Chapter 26
Introduction
This chapter covers a few highlights from the past 20 years of research demonstrating that there is
‘motion from form’ processing. It has long been known that the visual system can construct ‘form
from motion.’ For example, appropriate dot motions on a two-dimensional computer screen can
lead to a percept of, say, a rotating three-dimensional cylinder or sphere. Less appreciated has
been the degree to which perceived motion follows from processes that rely upon rapid analyses
of form cues. Percepts that depend on such form-motion interactions reveal that form informa-
tion can be processed and integrated with motion information to determine both the perceived
velocity and shape of a moving object. These integration processes must be rapid enough to occur
in the brief period, probably less than a quarter of a second, between retinal activation and visual
experience.
Data suggest that global form analyses subserve motion processing in at least five ways (Porter
et al., 2011). Here, we describe three examples in which the analysis of form significantly influ-
ences our experience of moving objects. The following examples have been chosen not only for
their distinctiveness, but also to compliment other examples described in detail within other
chapters of this book (Bruno & Bertamini; Herzog & Öğmen; Hock; Vezzani et al.). First, we
describe Transformational Apparent Motion, a phenomenon that reveals how form analyses
permit the figural segmentation dedicated to solving the problem of figure-to-figure match-
ing over time (Hsieh and Tse, 2006; Tse, 2006; Tse & Caplovitz, 2006; Tse & Logothetis, 2002).
Secondly, we describe how the size and shape of an object can influence how fast it is perceived
to rotate. These interactions reveal the way in which form analyses permit the definition of
trackable features whose unambiguous motion signals can be generalized to ambiguously mov-
ing portions of an object to solve the aperture problem (Caplovitz et al., 2006; Caplovitz & Tse,
2007a,b). Finally, we describe a number of peculiar ways in which the motions of individual
elements can interact with the perceived shape and motion of a global object constructed by
the grouping of these elements. These phenomena reveal that the form analyses that underlie
various types of perceptual grouping can lead to the generation of emergent motion signals
belonging to the perceptually grouped object that appear to underlie the conscious experience
of motion (Caplovitz & Tse, 2006, 2007b; Hsieh & Tse, 2007; Kohler et al., 2010; Kohler et al.,
2009).
(a) Display Percept (b) Transformational Apparent Motion
1.
3.
Fig. 26.1 (a) Transformational Apparent Motion (TAM). Two abutting shapes are flashed in sequence, as shown on the left. The resulting percept is of
one shape smoothly extending from, and retracting back into the other, as depicted on the right. (b) TAM v. Translational Apparent Motion. In TAM
displays (top), when two frames are flashed in sequence, if the shapes in the second frame abut those in the first frame the percept is of smooth
deformation that is based on the figural parsing of the objects in both frames. However, in translational apparent motion displays (bottom), when the
shapes in the second frame do not abut those in the first frame, rigid motion to the nearest neighbor is perceived independent of any figural parsing.
Interactions of Form and Motion in the Perception of Moving Objects 543
the first and second scene, motion correspondences tend to be formed between spatially-proximal
objects. This is true even if the proximal objects have dramatically dissimilar shape and surface
characteristics. As with TAM, this would imply that the object had grossly deformed from one
scene to the next. However, this deformation is determined not on the basis of object parsing and
figural matching, but rather on the basis of spatiotemporal proximity (Ullman, 1979). As such,
observations such as these led to the discounting of the importance of form features in determin-
ing object motion in the past (Baro and Levinson, 1988; Burt and Sperling, 1981; Cavanagh and
Mather, 1989; Dawson, 1991; Kolers and Pomerantz, 1971; Kolers and von Grünau, 1976; Navon,
1976; Ramachandran et al., 1983; Victor and Conte, 1990). However, as illustrated in Figure 26.1B,
TAM can still be observed in cases where the nearest neighbor principle may be violated in favor
of matching shapes across scenes that actually comprise more distant figures. This has been dem-
onstrated to result from a set of parsing and matching principles involving the analysis of contour
relationships among successive and abutting figures (Tse et al., 1998; Tse and Logothetis, 2002).
This appears to result largely from an analysis of good contour continuity, which indicates main-
tained figural identity, and contour discontinuity, which implies figural differences. Given the
lack of figural overlap in most translational apparent motion displays, this parsing is generally
unnecessary in determining ‘what went where?’
Neural correlates
Functional magnetic resonance imaging has determined which areas of the brain show the great-
est blood oxygen level dependent (BOLD) activity in response to TAM displays, as compared
with control stimuli (Tse, 2006). Using a region of interest analysis, this study found greater activ-
ity in response to TAM than control displays in V1, V2, V3, V4, V3A/B, hMT+, and the Lateral
Occipital Complex (LOC). An additional whole-brain analysis identified an area in the posterior
fusiform gyrus that was also found to be more active during the perception of TAM than control
stimuli. The recruitment of early retinotopically organized areas highlights the importance of the
basic visual processes (i.e. spatially specific detection of edges and contour features) that underlie
the perception of TAM. The recruitment of higher-level areas likely reflects the more global pro-
cessing that must underlie figural parsing and subsequent figural matching.
Of particular interest is the recruitment of the LOC. The LOC is now fully established as playing
a fundamental role in form processing and object recognition (Grill-Spector et al., 2001; Haxby
et al., 2001; Kanwisher et al., 1996; Malach et al., 1995) and, like TAM, has been shown to process
global 3D object shape, as opposed to just local 2D shape features (Avidan et al., 2002; Gilaie-Dotan
et al., 2001; Grill-Spector et al., 1998, 1999; Malach et al., 1995 Mendola et al., 1999; Moore and
Engel, 2001; Tse and Logothetis, 2002; Kourtzi and Kanwisher, 2000, 2001; Kourtzi et al., 2003a).
A reasonable interpretation of the increased activity in LOC during the viewing of TAM displays
relative to control stimuli is that in addition to processing global form and figural relationships, this
information is also output to motion-processing areas of the brain, such as hMT+.
Given this interpretation, and the increased activity demonstrated in both LOC and hMT+ dur-
ing TAM displays, it seems that hMT+ and LOC, rather than being motion processing and form
processing areas, respectively, may both serve as part of a form/motion processing circuit. In fact,
multiple studies have shown functional and anatomical overlap between LOC and hMT+ (Ferber
et al., 2003; Kourtzi et al., 2003a; Liu and Cooper, 2003; Liu et al., 2004; Murray et al., 2003;
Stone, 1999; Zhuo et al., 2003). As noted later in this chapter, it is likely that V3A/B, an area that
also shows increased activity in response to TAM displays, plays a key role in this form/motion
processing circuit. These findings call into question the traditional view of separate motion and
Interactions of Form and Motion in the Perception of Moving Objects 545
form processing streams contained in the dorsal ‘where’ and ventral ‘what’ pathways (Goodale
and Milner, 1992; Ungerleider and Mishkin, 1982). Although at the very highest representational
levels ‘what’ and ‘where’ may be largely independent (Goodale and Milner, 1992; Ungerleider and
Mishkin, 1982), form and motion processes are likely to be non-independent within the process-
ing stages that serve as inputs to these later representations.
Additional work has been done using electroencephalography (EEG) to study visually-evoked
potentials (VEP) in response to TAM displays as compared with displays that only flashed, but
lacked the TAM percept (Mirabella & Norcia, 2008). This study found that the VEP waveform
evoked by pattern onset and offset was significantly more symmetrical for TAM displays than for
flashing displays. The timing of such TAM-related processing appears within the first 150 ms of
object appearance and disappearance, once again implicating the involvement of early visual areas
in processing TAM. Furthermore, it was shown in the frequency domain that there was a notice-
able reduction in the odd-harmonic components in the frequency spectra for the TAM display,
as compared with that for a flashing patch alone. This further reflects the increased symmetry in
the TAM VEP waveform. Interestingly, as the contrast between the cue and flashing patch in the
TAM display was increased, the symmetry in the resulting VEP waveform decreased. Behavioral
data matched this observation, as the likelihood of participants perceiving TAM in the display was
strongly correlated with the symmetry of the VEP waveform. Thus, both behavioral and EEG data
further demonstrate the influence of object surface features on perceived movement.
involvement as early as V1 had previously been demonstrated. In more recent years, visual areas
V1 and V2 have been implicated in the processing of global shape (Allman et al., 1985; Fitzpatrick,
2000; Gilbert, 1992, 1998; Lamme et al., 1998) despite the traditional view that V1 is only involved
in the processing of local features (Hubel and Wiesel, 1968). However, it is still unclear whether
such activity in V1 results from bottom-up or top-down activation. A recent fMRI study found
increased activity in response to the spatial integration of individual elements into perceptually
grouped wholes in early visual cortex, possibly as early as V1 (Caplovitz et al., 2008). This was
true, despite each individual element being located in the periphery of a different visual quadrant,
suggesting such increases in activity are likely due to top-down feedback.
Separate from TAM, parsing can be important in other standard and apparent motion displays,
as pooling the motion energy of multiple objects moving through the same point in space would
lead to inaccurate motion signals (Born and Bradley, 2005). Motion signals arising at occlusion
boundaries may also be spurious (Nakayama and Silverman, 1988), and parsing can facilitate
the segmentation of spurious from real motion signals. It would appear that the visual system
possesses such parsing mechanisms and they help us to accurately perceive the motion of multi-
ple overlapping objects (Hildreth et al., 1995; Nowlan and Sejnowski, 1995). While there is evi-
dence that hMT+ plays some role in such motion parsing processes (Bradley et al., 1995; Stoner
and Albright, 1992, 1996), other evidence suggests that aspects of this process, such as figure
segmentation, do not take place in hMT+. Rather, it is more likely that specialized areas, such
as LOC handle global figural segmentation and similar processes, and that the resulting neural
activity is then output to hMT+. Given such an interaction, the analyses of form and motion, and
thus shape over time and space, can be seen as interacting inseparable processes. That form and
motion should be analyzed in an integrated spatiotemporal fashion was suggested as early as 1979
(Gibson), and has been re-emphasized in more recent years (Gepshtein and Kubovy, 2000; Wallis
and Bülthoff, 2001).
This hypothesis is rooted in the works of Wallach (Wallach, 1935; Wallach & O’Connell, 1953;
Wallach et al., 1956) and Ullman (1979), which highlight the importance of such form features in
extracting 3D structure from motion (i.e. the Kinetic Depth Effect). In the case of a skinny ellipse,
the regions of high curvature located at the ends of the major axis may serve as an additional
source of motion information that is unavailable in the case of a fat ellipse. Moreover, this hypoth-
esis is consistent with the lack of effect observed with rotating rectangles whose corners may act
as trackable features regardless of whether they belong to a skinny or fat rectangle. To directly test
this hypothesis, an experiment was conducted in which the corners of a rectangle were ‘rounded
off ’ to a lesser or greater degree (Caplovitz et al., 2006). The more the corners were rounded, the
slower the rounded-rectangle appeared to rotate, thereby providing strong support in favor of the
form-defined trackable features hypothesis (see Figure 26.2A).
A third hypothesis, and one consistent with the data derived from the experiments described
above, is that the perceived speed of a rotating object is determined by the magnitudes of locally
detected 1D motion signals (Weiss and Adelson, 2000). Changes to an object’s shape will change
the distribution of component motion signals detected along its contour. When the magnitudes of
component motion signals derived from a skinny ellipse were compared with those derived from
a fat ellipse (see Figure 26.2B) it was found that they scaled in a manner wholly consistent with the
changes in perceived speed. Moreover, because the magnitudes of component motion signals scale
(a)
(b) (c)
Fig. 26.2 Trackable features and component vectors. (a) Proposed trackable features on rectangles,
ellipses, and rounded rectangles. (b) Changes in local component motion vectors of a rotating
ellipse as a function of changes in aspect ratio. (c) Changes in local component motion vectors as a
function of changes in the size of rotating objects.
548 Blair, Tse, and Caplovitz
as a function of their distance from the center of rotation, there are no differences in distribution
of such signals between skinny and fat rectangles. Although the relationship between component
motion magnitude and perceived speed is not as precise for the case of the rounded rectangles,
there is indeed a parametric decrease in the local distribution of component motions signals in the
corner regions as the corners become more and more rounded (Caplovitz et al., 2006).
As such, these initial sets of experiments were unable to conclusively determine whether
shape-related changes in perceived rotational speed arise due to trackable features or the inte-
gration of local component motion signals. It was not until very recently that experiments were
conducted to explicitly dissociate between these two hypotheses (Blair et al., 2014). This study
specifically examined the case of angular velocity. A hallmark of angular velocity is that it is
size invariant. Making a rotating object smaller will not change its angular velocity. However,
doing so will systematically decrease the magnitudes of the component motion signals derived
along its contour (see Figure 26.2C). The study compared the perceived rotational speeds of
small and large objects. There were two primary findings of the study: first, across a range of
object categories: ellipses, rectangles, stars, and rounded rectangles, smaller objects appear to
rotate more slowly than larger objects. This finding is what would be predicted by the local-
motion integration hypothesis. However, the second main finding of the study is that the degree
to which smaller objects appear to rotate slower is dependent upon the shape of the object.
Specifically, while the relative change in perceived speed of rectangles with very rounded cor-
ners is nearly perfectly predicted by the relative magnitudes of the component motion signals,
very little change in perceived speed is observed for regular rectangles, skinny ellipse, and star-
shapes. Indeed, simply reducing the degree to which the corners of the rounded-rectangles were
rounded off reduced the effect size of perceived rotational speed. These two findings suggest
that both hypotheses are likely to be true: the perceived speed of a rotating object is determined
by a combination of locally detected motion signals, which comprise a scale-variant source
of information, and the motion of form-defined trackable features, which comprise a scale-
invariant source of information.
What is important to note is that both sources of information are shape-dependent. However,
only the trackable feature motion requires an analysis of form, because in order to provide a use-
ful source of information, the trackable feature must first be classified as belonging to the object
that is rotating (see figure parsing above). Moreover, the motion of the trackable feature must be
attributed to other locations along the object’s contour. Lastly, in order to produce a size-invariant
representation (i.e. angular velocity), the motion of a trackable feature must be integrated with
information about its distance from the center of rotation, a necessarily non-local computation.
In the case of objects that simultaneously translate as they rotate, it appears to be the case that the
rotational motion around the object’s center is segmented from the overall translational motion of
the object (Porter et al., 2011). This suggests that the size invariant signal derived from the motion
of a trackable feature involves the computation of the object’s center.
The effects of object shape on the perceived speed of rotational motion have also been observed
and examined in the context of motion fading. Motion fading occurs when a slowly drifting or
rotating pattern appears to slow down and then momentarily stop, while the form of the pattern is
still visible (Campbell and Maffei, 1979, 1981; Lichtenstein, 1963; Spillman and De Weerd, 2003).
Experiments have shown that the presence of trackable features extends the time that it takes
motion fading to occur for rotating objects, as compared with those rotating objects, which do not
possess distinct trackable features (Hsieh and Tse, 2007). Furthermore, if the trackable features
of objects such as ellipses are made even more distinct by increasing a rotating ellipse’s aspect
ratio, it takes even longer for motion fading to occur (Kohler et al., 2010). It was further shown
Interactions of Form and Motion in the Perception of Moving Objects 549
that the effect of shape on the time for motion fading to occur is mediated by the perceived speed
of the rotating object. For example, a fatter ellipse will appear slower than a skinny ellipse and will
therefore take less time for motion fading to occur. Thus, by influencing the perceived speed of
rotation, an object’s contour features dictate how long it takes for a slowly rotating object to appear
to cease moving. This demonstrates the importance of the form-motion interaction that underlies
the role of trackable features role in the perception of rotational motion. Not only do they provide
a direct effect on perceived speed, but also indirect effects on other aspects of motion perception.
Neural correlates
Clearly, there is strong behavioral evidence for the existence of multiple form–motion interac-
tions. The question stands: where in the brain might these interactions take place? In the context
of the role form plays in the perceived speed of rotating objects, evidence from fMRI studies has
implicated the involvement of V3A. When shown rotating objects that modulated their contour
curvature at one point while remaining constant in speed and area, BOLD activity was also modu-
lated in area V3A of observers’ brains (Caplovitz & Tse, 2007b). Previous research focused on this
area has led to findings consistent with the interpretation that V3A makes use of areas of contour
curvature to process the rotational motion of objects. For one, it has been shown in several studies
that area V3A is motion selective (Tootell et al., 1997; Vanduffel et al., 2002). Motion processing
is only half of the story, and sure enough, V3A per cent BOLD signal change has also been cor-
related with contour and figural processing, even when contours and figures are not consciously
perceived (Schira et al., 2004). To go a step further, BOLD activity in V3A has been correlated
with various additional form-motion interactions. Specifically, it has been shown multiple times
that there is a greater percent BOLD signal change in the V3A when participants observe coher-
ent, as opposed to random motion (Braddick et al., 2000, 2001; Moutoussis et al., 2005; Vaina
et al., 2003). Finally, it was found that the V3A is more responsive to rotational than translational
motion (Koyama et al., 2005). In combination, these various findings indicate that V3A makes use
of form information, specifically contour curvature, to process motion information about moving
objects. The strongest activity may result in situations where the motion is more difficult for the
visual system to interpret, such as with rotation (Kaiser, 1990).
Neurophysiological data recorded in area MT of macaques has further elucidated some specifics
of how areas of contour curvature on objects may be used in processing object motion. Specifically,
certain neurons in macaque MT have been shown to respond more to the terminator motion of
lines than to the ambiguous motion signals present along a line’s contour. In addition, these neu-
rons respond strongest when terminators are intrinsically owned, as opposed to when they are
extrinsic (Pack et al., 2004). Interestingly, this process is not instantaneous, as it takes roughly 60 ms
for neurons in macaque MT to shift their response properties from those consistent with motion
perpendicular to a moving line, regardless of its actual direction of motion, to those consistent with
the true motion of the line independent of its orientation (Pack and Born, 2001). Behavioral data
examining initial pursuit eye movements support this finding, in that observers will initially follow
the motion perpendicular to the moving line before then exhibiting eye movements that follow the
unambiguous motion of line terminators. Further neurophysiological evidence has indicated that
neurons of this sort (dubbed end-stopped neurons) may be present in the visual system as early as
area V1 (Pack et al., 2003). This would mean that trackable feature information could be extracted
and utilized as early on as V1 in the visual processing stream. All these findings could help explain
how the visual system is capable of overcoming the aperture problem under various circumstances
using trackable features, and also, why it does not always do so perfectly.
550 Blair, Tse, and Caplovitz
(a) (b)
Faster Slower
Fig. 26.3 Emergent motion on the basis of perceptual grouping. (a) When four dot pairs, each
pair rotating around its own common center, are perceived as separate objects, they are perceived
to rotate faster than when dots are perceived to form the corners of two squares translating in
a circular pattern with one sliding in front of the other. (b) The percept of individual elements or
square corners may be biased by element shape and arrangement, with individual elements most
likely to be seen when misaligned (top), and squares more likely to be seen when the elements are
aligned (bottom).
Wallach and O’Connell, 1953). Recently, it has been demonstrated that the movement-dependent
shape distortions can come as a result of local form/motion interactions in elements grouped to
form a larger perceived object. As previously mentioned, elongated objects are perceived to move
faster when moving in a direction parallel, as opposed to orthogonal, to their elongated axis
(Georges et al., 2002; Seriès et al., 2002). Taking advantage of this observation, an experiment
was conducted in which differentially elongated Gaussian blobs were used to form the corners
of illusory four-sided translating shapes. In the experiment, the blob would be orientated such
that those on the leading edge of the illusory object would be either parallel or orthogonal to
the direction of motion and those on the trailing edge of the illusory shape would be orientated
orthogonally to those on the leading edge. It was found that when those on the leading edge were
parallel to the direction of motion, the resulting illusory object appeared to be elongated, while
the opposite effect was observed when blobs on the leading edge were oriented orthogonally to
the direction of motion, as depicted in Figure 26.4 (McCarthy et al., 2012). This example reveals
how form and motion interact with each other across a range of visual processing stages from
very early (local orientation dependent perceived speed) to later representations of perceived
global shape.
As mentioned in the introduction, a 3D representation of a moving object can be derived from
appropriate 2D velocities of seemingly random dot displays. In such form-from-motion displays,
depth, 3D object shape, and 3D object motion may be perceived if seemingly random dot fields
are moved in ways consistent with the dots in motion being affixed to a particular 3D shape
(Green, 1961). This process represents a form of perceptual grouping in which the individual
dots are grouped into a single perceptual whole. Intriguingly, the shape and motion of the per-
ceived object do not always match what would be predicted based upon the individual motions
of the dots that make up the display. Instead, characteristics of the shape and motion of the global
object depend upon the shape and motion of the object itself. For example, perceived variations
in the angular velocity of rotating 3D shapes simulated by dot fields were more closely tied to the
552 Blair, Tse, and Caplovitz
perceived deformation of the rotating shapes than on actual variations in their angular velocities
(Domini et al., 1998). Similarly, the perceived slant of a simulated surface varies as a function
of the angular velocity with which it rotates when other factors are kept constant (Domini &
Caudek, 1999). These various effects have been demonstrated both when objects are rotated,
while being passively observed, and when object motion is a function of simulated optic flow in
response to observer movement (Caudek et al., 2011; Fantoni et al., 2010, 2012). Additionally,
even when binocular visual cues such as disparity are available, such biases and misperceptions
are still observed (Domini et al., 2006). The perception of these effects and visual biases is also
correlated with changes in grasping movements for the simulated objects (Foster et al., 2011).
A model based on the assumption that the analysis of 3D shape is performed locally accounts
well for successful and unsuccessful interpretation of 3D shape and the movement of 3D shapes
by human observers, as demonstrated by a variety of form motion interactions observed using
Interactions of Form and Motion in the Perception of Moving Objects 553
this paradigm (Domini & Caudek, 2003). Thus, not only is visual perception affected by form
motion interactions, but the practical behaviors in response to such perceptions are also adjusted
accordingly.
Conclusion
These results can be taken as further evidence for the inherently constructive nature of motion
processing, and the importance of form operators in motion processing. While it is not clear
where in the brain the analysis of form occurs that results in the perception of rotational
motion, it probably occurs within some or all of the neural circuitry that realizes the form–
motion interactions described above. These results support the general thesis that there are,
broadly speaking, two stages to motion perception – one, where motion energy is detected by
cells in early visual areas tuned to motion magnitude and direction, and another stage where
this detected information is operated upon by grouping and other visual operators that then
construct the motion that will be perceived (Caplovitz & Tse, 2007a; Hsieh & Tse, 2007; Kohler
et al., 2009, 2010). This means that perceived motion, while constructed on the basis of locally
detected motion information, is not itself detected or even present in the stimulus. It should
also be noted that, while we have focused on specific examples from only three broad catego-
ries of form motion interaction, these examples represent only a small subset of what has been
identified and tested at this time with further examples ranging as far as the processes under-
lying the perception of biological motion and how motion is conveyed through static images
(i.e. motion streaks).
Classically, form and motion perception were considered to be mediated by independent pro-
cesses in the visual system. Indeed there is a good deal of evidence for such independence at
the earliest stages of visual processing, as well as at the highest levels of perceptual represen-
tation. However, there is growing evidence suggesting that the mechanisms that process form
and motion characteristics of the visual scene mutually interact in numerous and complex ways
across a range of mid-level visual processing stages. These form-motion interactions appear to
help resolve fundamental ambiguities that arise at the earliest stages in the processing of the reti-
nal image. By combining information from both domains, these form motion interactions allow
potentially independent high-level representations of an object’s shape and motion to more accu-
rately reflect what is actually occurring in the world around us.
Acknowledgment
This work was supported by an Institutional Development Award (IDeA) from the National
Institute of General Medical Sciences of the National Institutes of Health under grant number
1P20GM103650-01, and a grant from the National Eye Institute: 1R15EY022775.
References
Allman, J. M., Miezin, F., and McCuiness, E. (1985). Stimulus specific responses from beyond the classical
receptive field: neurophysiological mechanisms for local-global comparisons in visual neurons. Ann Rev
Neurosci 8: 407–430.
Altmann, C. F., Bülthoff, H. H. and Kourtzi, Z. (2003). Perceptual organization of local elements into
global shapes in the human visual cortex. Curr Biol 13(4): 342–349.
Anstis, S. (2003). Levels of motion perception. In Levels of Perception, edited by L. Harris & M. Jenkin,
pp. 75–99. New York: Springer.
554 Blair, Tse, and Caplovitz
Anstis, S., and Kim, J. (2011). Local versus global perception of ambiguous motion displays. J Vision
11(3): 13, 1–12. Available at: http://www.journalofvision.org/content/11/3/13.
Avidan, G., Harel, M., Hendler, T., Ben-Bashat, D., Zohary, E., and Malach, R. (2002). Contrast sensitivity
in human visual areas and its relationship to object recognition. J Neurophysiol 87: 3102–3116.
Baloch, A. A., and Grossberg, S. (1997). A neural model of high—level motion processing: line motion and
form-motion dynamics. Vision Res 37(21): 3037–3059.
Baro, J. A., and Levinson, E. (1988). Apparent motion can be perceived between patterns with dissimilar
spatial frequencies. Vision Res 28: 1311–1313.
Blair, C. B., Goold, J., Killebrew, K., & Caplovitz, G. P. (2014). Form features provide a cue to the angular
velocity of rotating objects. Journal of Experimental Psychology: Human Perception and Performance
40(1): 116–128. doi: 10.1037/a0033055(
Born, R. T., and Bradley, D. C. (2005). Structure and function of visual area MT. Ann Rev Neurosci
28: 157–189.
Braddick, O. J., O’Brien, J. M., Wattam-Bell, J., Atkinson, J., Hartley, T., and Turner, R. (2001). Brain areas
sensitive to coherent visual motion. Perception 30: 61–72.
Braddick, O. J., O’Brien, J. M., Wattam-Bell, J., Atkinson, J., and Turner, R. (2000). Form and motion
coherence activate independent but not dorsal/ventral segregated, networks in the human brain. Curr
Biol 10: 731–734.
Bradley, D. C., Qian, N., and Andersen, R. A. (1995). Integration of motion and stereopsis in middle
temporal cortical area of macaques. Nature 373(6515): 609–611.
Brown, J. F. (1931). The visual perception of velocity. Psychol Res 14(1): 199–232.
Bruno, N., & Bertamini, M. (2013). Perceptual organization and the aperture problem. In J. Wagemans
(Ed.), Oxford Handbook of Perceptual Organization (in press). Oxford, U.K.: Oxford University Press.
Burt, P., and Sperling, G. (1981). Time, distance, and feature trade-offs in visual apparent motion. Psychol
Rev 88: 171–195.
Campbell, F. W., & Maffei, L. (1979). Stopped visual motion. Nature 278: 192–193.
Campbell, F. W., & Maffei, L. (1981). The influence of spatial frequency and contrast on the perception of
moving patterns. Vision Res 21: 713–721.
Caplovitz, G. P., Hsieh, P-J., & Tse, P. U. (2006). Mechanisms underlying the perceived angular velocity of a
rigidly rotating object. Vision Res 46(18): 2877–2893.
Caplovitz, G. P., & Tse, P. U. (2007a). Rotating dotted ellipses: motion perception driven by grouped figural
rather than local dot motion signals. Vision Res 47(15): 1979–1991.
Caplovitz, G. P., & Tse, P. U. (2007b). V3A processes contour curvature as a trackable feature for the
perception of rotational motion. Cerebral Cortex 17(5): 1179–1189.
Caplovitz, G. P., Barroso, D. J., Hsieh, P. J., & Tse, P. U. (2008). fMRI reveals that non-local processing
in ventral retinotopic cortex underlies perceptual grouping by temporal synchrony. Hum Brain Map
29(6): 651–661.
Caudek, C., Fantoni, C., & Domini, F. (2011). Bayesian modeling of perceived surface slant from
actively-generated and passively-observed optic flow. PLoS ONE 6(4): 1–12.
Cavanagh, P., Arguin, M., and von Grünau, M. (1989). Interattribute apparent motion. Vision Res
29(9): 1197–1204.
Cavanagh, P., and Mather, G. (1989). Motion: the long and short of it. Spatial Vis 4: 103–129.
Dawson, M. R. W. (1991). The how and why of what went where in apparent motion: modeling solutions to
the motion correspondence problem. Psychol Rev 33(4): 569–603.
Domini, F., & Caudek, C. (1999). Perceiving surface slant from deformation of optic flow. J Exp Psychol
Hum Percept Perform 25(2): 426–444.
Domini, F., & Caudek, C. (2003). 3-D structure perceived from dynamic information: a new theory. Trends
Cogn Sci 7(10): 444–449.
Interactions of Form and Motion in the Perception of Moving Objects 555
Domini, F., Caudek, C., & Tassinari, H. (2006). Stero and motion information are not independently
processed by the visual system. Vision Res 46: 1707–1723.
Domini, F., Caudek, C., Turner, J., & Favretto, A. (1998). Discriminating constant from variable angular
velocities in structure form motion. Percept Psychophys 60(5): 747–760.
Downing, P., and Treisman, A. (1995). The shooting line illusion: attention or apparent motion? Invest
Ophthalmol Vision Sci 36: S856.
Downing, P., and Treisman, A. (1997). The line motion illusion: attention or impletion? J Exp Psychol Hum
Percept Perform 23(3): 768–779.
Fantoni, C., Caudek, C., & Domini, F. (2010). Systematic distortions of perceived planar surface motion in
active vision. J Vision 10(5): 12, 1–20.
Fantoni, C., Caudek, C., & Domini, F. (2012). Perceived slant is systematically biased in actively-generated
optic flow. PLoS ONE 7(3): 1–12.
Faubert, J., and von Grünau, M. (1995). The influence of two spatially distinct primers and attribute
priming on motion induction. Vision Res 35(22): 3119–3130.
Ferber, S., Humphrey, G. K. and Vilis, T. (2003). The lateral occipital complex subserves the perceptual
persistence of motion-defined groupings. Cereb Cortex 13: 716–721.
Fitzpatrick, D. (2000). Seeing beyond the receptive field in primary visual cortex. Curr Opin Neurobiol
10: 438–443.
Foster, R., Fantoni, C., Caudeck, C., & Domini, F. (2011). Integration of disparity and velocity information
for haptic and perceptual judgments of object depth. Acta Psychol 136: 300–310.
Georges, S., Seriès, P., Frégnac, Y., & Lorenceau, J. (2002). Orientation dependent modulation of apparent
speed: Psychophysical evidence. Vision Res 42: 2757–2772.
Gepshtein, S., and Kubovy, M. (2000). The emergence of visual objects in spacetime. Proc Natl Acad Sci
USA 97(14): 8186–8191.
Gibson, J. J. (1979). The Ecological Approach to Visual Perception. Boston: Houghton Mifflin.
Gilaie-Dotan, S., Ullman, S., Kushnir, T., and Malach, R. (2001). Shape-selective stereo processing in
human object- related visual areas. Hum Brain Map 15: 67–9.
Goodale, M., and Milner, A. (1992). Separate visual pathways for perception and action. Trends Neurosci
15: 20–25.
Green, B. F., Jr. (1961). Figure coherence in the kinetic depth effect. J Exp Psychol 62(3): 272–282.
Gilbert, C. D. (1992). Horizontal integration and cortical dynamics. Neuron 9: 1–13.
Gilbert, C. D. (1998). Adult cortical dynamics. Physiol Rev 78: 467–485.
Grill-Spector, K., Kushnir, T., Edelman, S., Avidan, G., Itzchak, Y., and Malach, R. (1999). Differential
processing of objects under various viewing conditions in the human lateral occipital complex. Neuron
24: 187–203.
Grill-Spector, K., Kushnir, T., Edelman, S., Itzchak, Y., and Malach, R. (1998). Cue-invariant activation in
object-related areas of the human occipital lobe. Neuron 21: 191–202.
Grill-Spector, K., Kourtzi, Z., and Kanwisher, N. (2001). The lateral occipital complex and its role in object
recognition. Vision Res 41: 1409–1422.
Haxby, J. V., Gobbini, M. I., Furey, M. L., Ishai, A., Schouten, J. L., and Pietrini, P. (2001). Distributed
and overlapping representations of faces and objects in ventral temporal cortex. Science
293(5539): 2425–2430.
Herzog, M. H., & Öğmen, H. (2013). Apparent motion and reference frames. In J. Wagemans (Ed.), Oxford
Handbook of Perceptual Organization (in press). Oxford, U.K.: Oxford University Press.
Hikosaka, O., Miyauchi, S., and Shimojo, S. (1991). Focal visual attention produces motion sensation in
lines. Investig Ophthamol Vis Sci 32(4): 176.
Hikosaka, O., Miyauchi, S., and Shimojo, S. (1993a). Focal visual attention produces illusory temporal
order and motion sensation. Vision Res 33(9): 1219–1240.
556 Blair, Tse, and Caplovitz
Hikosaka, O., Miyauchi, S., and Shimojo, S. (1993b). Visual attention revealed by an illusion of motion.
Neurosci Res 18(1): 11–18.
Hildreth, E. C., Ando, H., Andersen, R. A., and Treue, S. (1995). Recovering three-dimensional structure
from motion with surface reconstruction. Vision Res 35(1): 117–137.
Hock, H. S. (2013). Dynamic grouping motion: A method for determining perceptual organization for
objects with connected surfaces. In J. Wagemans (Ed.), Oxford Handbook of Perceptual Organization
(in press). Oxford, U.K.: Oxford University Press.
Hsieh, P-J., Caplovitz, G. P., and Tse, P. U. (2005). Illusory rebound motion and the motion continuity
heuristic. Vision Res 45(23): 2972–2985.
Hsieh, P-J., and Tse, P. U. (2006). Stimulus factors affecting illusory rebound motion. Vision Res
46(12): 1924–1933.
Hsieh, P-J., & Tse, P. U. (2007). Grouping inhibits motion fading by giving rise to virtual trackable features.
J Exp Psychol Hum Percept Perform 33: 57–63.
Hubel, D. H., and Wiesel, T. N. (1968). Receptive fields and functional architecture of monkey striate
cortex. J Physiol 195: 215–243.
Kaiser, M. K. (1990). Angular velocity discrimination. Percept Psychophys 47: 149–156.
Kanizsa, G. (1951). Sulla polarizzazione del movimento gamma [The polarization of gamma movement].
Arch Psichol Neurol Psichiatr 3: 224–267.
Kanizsa, G. (1979). Organization in Vision: Essays on Gestalt Perception. New York: Praeger.
Kanwisher, N., Chun, M. M., McDermott, J., and Ledden, P. J. (1996). Functional imagining of human
visual recognition. Brain Res Cogn Brain Res 5(1–2): 55–67.
Kenkel, F. (1913). Untersuchungen über den zusammenhang zwischen erscheinungsgrösse und
erscheinungsbewegung bei einigen sogenannten optischen täuschungen. Zeitschrift für Psychologie
67: 358–449.
Kohler, P. J., Caplovitz, G. P., Hsieh, P-J., Sun, J., & Tse, P. U. (2010). Motion fading is driven by perceived,
not actual angular velocity. Vision Res 50: 1086–1094.
Kohler, P. J., Caplovitz, G. P., & Tse, P. U. (2009). The whole moves less than the spin of its parts. Attention,
Percept Psychophys 71(4): 675–679.
Kolers, P. A., and Pomerantz, J. R. (1971). Figural change in apparent motion. J Exp Psychol 87: 99–108.
Kolers, P. A., and von Grünau, M. (1976). Shape and color in apparent motion. Vision Research
16: 329-335.
Koyama, S., Sasaki, Y., Andersen, G. J., Tootell, R. B., Matsuura, M., and Watanabe, T. (2005). Separate
processing of different global-motion structures in visual cortex is revealed by FMRI. Curr Biol
15(22): 2027–2032.
Kourtzi, Z., Erb, M., Grodd, W., and Bülthoff, H. H. (2003a). Representation of the perceived 3-D object
shape in the human lateral occipital complex. Cereb Cortex 13(9): 911–920.
Kourtzi, Z., and Kanwisher, N. (2000). Cortical regions involved in perceiving object shape. J Neurosci
20: 3310–3318.
Kourtzi, Z., and Kanwisher, N. (2001). Representation of perceived object shape by the human lateral
occipital complex. Science 293: 1506–1509.
Kourtzi, Z., Tolias, A. S., Altmann, C. F., Augath, M., and Logothetis, N. K. (2003b). Integration of local
features into global shapes. Monkey and human FMRI studies. Neuron 37(2): 333–346.
Lamme, V. A., Super, H., and Spekreijse, H. (1998). Feedforward, horizontal, and feedback processing in
the visual cortex. Curr Opin Neurobiol 8: 529–535.
Lichtenstein, M. (1963). Spatio-temporal factors in cessation of smooth apparent motion. J Opt Soc Am
53: 304–306.
Liu, T., and Cooper, L. A. (2003). Explicit and implicit memory for rotating objects. J Exp Psychol Learn
Mem Cogn 29: 554–562.
Interactions of Form and Motion in the Perception of Moving Objects 557
Liu, T., Slotnick, S. D., and Yantis, S. (2004). Human MT+ mediates perceptual filling-in during apparent
motion. NeuroImage 21(4): 1772–1780.
Malach, R., Reppas, J. B., Benson, R. R., Kwong, K. K., Jiang, H., Kennedy, W. A., Ledden, P. J., Brady,
T. J., Rosen, B. R., and Tootell, R. B. (1995). Object-related activity revealed by functional magnetic
resonance imaging in human occipital cortex. Proc Natl Acad Sci 92(18): 8135–8139.
McCarthy, J. D., Cordeiro, D., and Caplovitz, G. D. (2012). Local form-motion interactions influence
global form perception. Attention Percept Psychophys 74: 816–823.
Mendola, J. D., Dale, A. M., Fischl, B., Liu, A. K., and Tootell, R. B. H. (1999). The representation of real
and illusory contours in human cortical visual areas revealed by fMRI. J Neurosci 19: 8560–8572.
Mirabella, G., and Norcia, A. N. (2008). Neural correlates of transformational apparent motion. Perception
37: 1368–1379.
Moore, C., and Engel, S. A. (2001). Neural response to perception of volume in the lateral occipital
complex. Neuron 29: 277–286.
Moutoussis, K., Keliris, G., Kourtzi, Z., and Logothetis, N. (2005). A binocular rivalry study of motion
perception in the human brain. Vision Res 45(17): 2231–2243.
Murray, S. O., Olshausen, B. A., and Woods, D. L. (2003). Processing shape, motion and three-dimensional
shape-from-motion in the human cortex. Cereb Cortex 13: 508–516.
Nakayama, K., and Silverman, G. H. (1988b). The aperture problem II. Spatial integration of velocity
information along contours. Vision Res 28(6): 747–753.
Navon, D. (1976). Irrelevance of figural identity for resolving ambiguities in apparent motion. J Exp Psychol
Hum Percept Perform 2: 130–138.
Nowlan, S. J., and Sejnowski, T. J. (1995). A selection model for motion processing in area MT of primates.
J Neurosci 15(2): 1195–1214.
Pack, C. C., and Born, R. T. (2001). Temporal dynamics of a neural solution to the aperture problem in
visual area MT of macaque brain. Nature 409(6823): 1040–1042.
Pack, C. C., Gartland, A. J., and Born, R. T. (2004). Integration of contour and terminator signals in visual
area MT of alert macaque. J Neurosci 24(13): 3268–3280.
Pack, C. C., Livingstone, M. S., Duffy, K. R., and Born, R. T. (2003). End-stopping and the aperture
problem: two-dimensional motion signals in macaque V1. Neuron 39(4): 671–680.
Porter, K. B., Caplovitz, G. P., Kohler, P. J., Ackerman, C. M., & Tse, P. U. (2011). Rotational and
translational motion interact independently with form. Vision Res 51: 2478–2487.
Ramachandran, V.S., Ginsburg, A. P., and Anstis, S. M. (1983). Low spatial frequencies dominate apparent
motion. Perception 12: 457–461.
Ramachandran, V. S., and Gregory, R. L. (1978). Does colour provide an input to human motion
perception? Nature 275: 55–56.
Schira, M. M., Fahle, M., Donner, T. H., Kraft, A., and Brandt, S. A. (2004). Differential contribution of
early visual areas to the perceptual process of contour processing. J Neurophysiol 91(4): 1716–1721.
Seriès, P., Georges, S., Lorenceau, J., & Frégnac, Y. (2002). Orientation dependent modulation of apparent
speed: a model based on the dynamics of feedforward and horizontal connectivity in V1 cortex. Vision
Res 42: 2781–2797.
Spillmann, L., & De Weerd, P. (2003). Mechanisms of surface completion: perceptual filling-in of texture.
In Filling-in: From Perceptual Completion to Cortical Reorganization, edited by L. Pessoa & P. De Weerd,
pp. 81–105. Oxford: Oxford University Press.
Stelmach, L. B., and Herdman, C. M. (1991). Directed attention and perception of temporal order. J Exp
Psychol Hum Percept Perform 17(2): 539–550.
Stelmach, L. B., Herdman, C. M., and McNeil, K. R. (1994). Attentional modulation of visual processes
in motion perception. Journal of Experimental Psychology: Human Perception and Performance
20(1): 108-121.
558 Blair, Tse, and Caplovitz
Sternberg, S., and Knoll, R. L. (1973). The perception of temporal order: fundamental issues and
a general model. In: Attention and Performance, Vol. IV, edited by S. Kornblum, pp. 629–685.
New York: Academic Press.
Stone, J. V. (1999). Object recognition: view-specificity and motion-specificity. Vision Res 39: 4032–4044.
Stoner, G. R., and Albright, T. D. (1992). Motion coherency rules are form-cue invariant. Vision Res
32(3): 465–475.
Stoner, G. R., and Albright, T. D. (1996). The interpretation of visual motion: evidence for surface
segmentation mechanisms. Vision Res 36(9): 1291–1310.
Titchener, E. B. (1908). Lecture on the Elementary Psychology of Feeling and Attention. New York: McMillan.
Tootell, R. B., Mendola, J. D., Hadjikhani, N. K., Ledden, P. J., Liu, A. K., Reppas, J. B., Sereno, M. I., and
Dale, A. M. (1997). Functional analysis of V3A and related areas in human visual cortex. J Neurosci
17(18): 7060–7078.
Tse, P. U. (2006). Neural correlates of transformational apparent motion. NeuroImage 31(2): 766–773.
Tse, P. U., and Caplovitz, G. P. (2006). Contour discontinuities subserve two types of form analysis that
underlie motion processing. In: Progress in Brain Research 154: Visual Perception. Part I. Fundamentals
of Vision: Low and Mid-level Processes in Perception, edited by S. Martinez-Conde, S. L. Macknick,
L. M. Martinez, J-M. Alonso, and P. U. Tse, pp. 271–292. Amsterdam: Elsevier.
Tse, P. U., and Cavanagh, P. (1995). Line motion occurs after surface parsing. Invest Ophth Vision Sci
36: S417.
Tse, P. U., Cavanagh, P., and Nakayama, K. (1996). The roles of attention in shape change apparent motion.
Invest Ophthalmol Vision Sci 37: S213.
Tse, P. U., Cavanagh, P., and Nakayama, K. (1998). The role of parsing in high-level motion processing.
In: High-Level Motion Processing: Computational, Neurobiological, and Psychophysical Perspectives, edited
by T. Watanabe, pp. 249–266. Cambridge, MA: MIT Press.
Tse, P. U., and Logothetis, N. K. (2002). The duration of 3-d form analysis in transformational apparent
motion. Percept Psychophys 64(2): 244–265.
Ullman, S. (1979). The Interpretation of Visual Motion. Cambridge, MA: MIT Press.
Ungerleider, L., and Mishkin, M. (1982). Two cortical visual systems. In: Analysis of Visual Behavior, edited
by D. Ingle, M. Goodale, and R. Mansfield, pp. 549–586. Cambridge, MA: MIT Press.
Vaina, L. M., Gryzacz, N. M., Saiviroonporn, P., LeMay, M., Bienfang, D. C., and Conway, A. (2003).
Can spatial and temporal motion integration compensate for deficits in local motion mechanisms?
Neuropsychologia 41: 1817–1836.
Vanduffel, W., Fize, D., Peuskens, H., Denys, K., Sunaert, S., Todd, J. T., and Orban, G. A. (2002).
Extracting 3D from motion: differences in human and monkey intraparietal cortex. Science
298: 413–415.
Vezzani, S., Kramer, P., & Bressan, P. (2013). Stereokinetic effect, kinetic depth effect, and structure
from motion. In J. Wagemans (Ed.), Oxford Handbook of Perceptual Organization (in press). Oxford,
U.K.: Oxford University Press.
Victor, J. D., and Conte, M. M. (1990). Motion mechanisms have only limited access to form information.
Vision Res 30: 289–301.
von Grünau, M., and Faubert, J. (1994). Intraattribute and interattribute motion induction. Perception
23(8): 913–928.
von der Heydt, R., Peterhans, E., and Baumgartner, G. (1984). Illusory contours and cortical neuron
responses. Science 224(4654): 1260–1262.
Wallach, H. (1935). Uber visuell wahrgenommene Bewegungsrichtung. Psychol Forsch 20: 325–380.
Wallach, H., and O’Connell, D. N. (1953). The kinetic depth effect. J Exp Psychol 45(4): 205–217.
Wallach, H., Weisz, A., & Adams, P. A. (1956). Circles and derived figures in rotation. Am J Psychol
69: 48–59.
Interactions of Form and Motion in the Perception of Moving Objects 559
Wallis, G., and Bülthoff, H. (2001). Effects of temporal association on recognition memory. Proc Natl Acad
Sci USA 98(8): 4800–4804.
Weiss, Y., & Adelson, E. H. (2000). Adventures with gelatinous ellipses—constraints on models of human
motion analysis. Perception 29: 543–566.
Wertheimer, M. (1961). Experimental studies on the seeing of motion. In T. Shipley (Ed.), Classics in
psychology (pp. 1032-1088). New York: Philosophical Library. (Original work published 1912)
Zhuo, Y., Zhou, T. G., Rao, H. Y., Wang, J. J., Meng, M., Chen, M., Zhou, C., and Chen, L. (2003).
Contributions of the visual ventral pathway to long-range apparent motion. Science 299: 417–420.
Chapter 27
Overview
Rather than focusing on a particular aspect of perceptual organization, the purpose of this chapter
is to describe and extend a new methodology, dynamic grouping, which cuts across and addresses
a wide variety of phenomena and issues related to perceptual organization. The need for this new
methodology, which was introduced by Hock and Nichols (2012), arises from its relevance to the
most common stimulus in our natural environment, objects composed of multiple, connected
surfaces. Remarkably, and despite Palmer and Rock’s (1994) identification of connectedness as a
grouping variable, there has been no systematic research concerned with the perceptual organ-
ization of connected surfaces. This chapter demonstrates the potential of the dynamic grouping
method for furthering our understanding of how grouping processes contribute to object percep-
tion and recognition. It shows how the dynamic grouping method can be used to identify new
grouping variables, examines its relevance for how the visual system solves the ‘surface corres-
pondence problem’ (i.e., determines which of an object’s connected surfaces are grouped together
when different groupings are possible), and provides a concrete realization of the classical idea
that the whole is more than the sum of the parts. The chapter examines the relationship between
dynamic grouping and transformational apparent motion (Tse et al. 1998) and provides insights
regarding the nature of amodal completion and how it can be used to examine classical Gestalt
grouping variables entailing disconnected surfaces (e.g., proximity). Finally, it demonstrates that
perceptual grouping should have a more prominent role in theories of object recognition than is
currently the case, and proposes new theoretical approaches for characterizing the compositional
structure of objects in terms of ‘multidimensional affinity spaces’ and ‘affinity networks’.
with the recovery of objects from surface fragments that have become disconnected as a result
of degraded viewing conditions (e.g., Lamote and Wagemans 1999; Shipley and Kellman 2001;
Fantoni et al. 2008). Under non-degraded conditions, however, objects always are composed of
connected surfaces. It would not be surprising, therefore, if a different set of grouping variables
applied. Nor would it be surprising that a substantially different methodology would be required
in order to study these grouping variables.
The great success of the lattice method stems from the isolation of grouping variables and the
determination of their effects from competition between alternative perceptual organizations.
Similarity in shape is isolated for the Wertheimer (1923) lattice in Figure 27.1a; parallel rows are
perceived because the surfaces composing alternating rows are more similar than the surfaces
composing columns, so there is greater grouping strength horizontally than vertically. Proximity
is isolated for the lattice in Figure 27.1b; parallel columns are perceived because the surfaces com-
posing each column are closer together than the surfaces composing each row, so there is greater
grouping strength vertically than horizontally. Finally, shape similarity competes with proximity
for the lattice in Figure 27.1c. Parallel columns are perceived because grouping strength due to
proximity is greater than grouping strength due to shape similarity. Significantly, however, the
outcome of this competition between proximity and shape similarity is not true in general. It
holds only for the particular differences in proximity and the particular differences in shape for
the stimulus depicted in Figure 27.1c.
What is needed for significant progress in our understanding of perceptual organization, espe-
cially as it applies to the connected surfaces of objects, is the development of a new empirical
tool for assessing grouping strength for pairs of adjacent surfaces, and the determination of how
the effects of cooperating grouping variables are combined to establish overall grouping strength
(affinity) for pairs of adjacent surfaces. The prospect for a methodology meeting these require-
ments is a fully described compositional structure for an object (i.e., the pair-wise affinities for
all the object’s surfaces), and the determination that the compositional structure is central to the
recognition of the object.
Watt and Phillips (2000) use the term ‘dynamic grouping’ in a much different sense. Rather than motion
1
induced by changing values of grouping variables, their emphasis is on the dynamical, self-organizational
aspect of perceptual grouping for both moving and static stimuli.
(a) (b) (c)
Frame 1 Frame 1
Frame 2 Frame 2
(f) (g)
Increase
Increase
Affinity
Affinity
in affinity
in affinity
Increase Increase
in grouping in grouping
strength strength
Good Good
Connect- continua- Luminance Connect- continua-
Frame 1 Frame 1 ivity
ivity tion simlarity tion
Good Good
Connect- continua- Luminance Connect- continua- Lum
Frame 2 Frame 2 ivity sim
ivity tion simlarity tion
Cumulative strength of grouping variables Cumulative strength of grouping variables
(h) (j)
Frame 1
Affinity
Increase
Frame 2 in affinity
Increase in
grouping strength
Con-
Luminance
Frame 1 nect-
simlarity
ivity
Con-
Luminance
Frame 2 nect-
(i) simlarity
ivity
Frame 1 Cumulative strength of grouping variables
(k)
Frame 2
Increase
Affinity
in affinity
Increase
in grouping
strength
Con-
Good Luminance
Frame 1 nect-
continuation simlarity
ivity
Con-
Good Luminance
Frame 2 nect-
continuation simlarity
ivity
Cumulative strength of grouping variables
Fig. 27.1 Continued.
Dynamic Grouping Motion 563
the target surface always is greater than the luminance of the surfaces with which it is connected.
While some grouping variables remain the same during the transition from Frame 1 to Frame 2,
dynamic grouping variables change in value as a result of changes to the target surface. The change
(say in luminance) increases or decreases the affinity of the target surface with each of the surfaces
adjacent to it, without qualitatively changing the perceptual organization of the geometric object.
Changes (perturbations) in surface affinities that are created by dynamic grouping (DG) variables,
when large enough, elicit the perception of motion across the changing target surface.2,3 The dir-
ection of the DG motion is diagnostic for the affinity relationships among the stimulus’ surfaces
that were established during Frame 1, prior to the change in the target surface during Frame 2.
Fig. 27.1 (a,b,c) Examples using Wertheimer’s (1923) lattice method to identify grouping variables and
determine their relative strength by the outcome of competition between two perceptual organizations.
(d,e) Examples of stimuli for which dynamic grouping (DG) motion is perceived. (f,g) Nonlinear functions
relating the combined effect of grouping variables to the affinity of the surfaces in panels d and e.
Because of super-additivity, changes in affinity are larger and therefore, DG motion is stronger, when
pre-perturbation luminance similarity is greater. (h) Example of a stimulus from Tse et al. (1998) for
which transformational apparent motion (TAM) is perceived in relation to the square. (i) A version of Tse
et al.’s (1998) stimulus for which DG motion also is perceived in relation to the square. (j,k) Nonlinear
functions relating the combined effect of grouping variables to affinity for the two pairs of surfaces in
panel i. Because of super-additivity, changes in affinity are larger and therefore, DG motion is stronger,
for the surface pairs that benefit in pre-perturbation grouping strength from good continuation.
Parts a-c: Data from M. Wertheimer, A Source Book of Gestalt Psychology, tr. W.D. Ellis, Routledge and Kegan,
London, 1923. Parts d-g and i-k: Reprinted from Vision Research, 59, Howard S. Hock and David F. Nichols,
Motion perception induced by dynamic grouping: A probe for the compositional structure of objects, pp. 45–63,
Figure 4, doi: 10.1016/j.visres.2011.11.015 Copyright (c) 2012, with permission from Elsevier. Part h: Reproduced
from Watanabe, Takeo, ed., High-Level Motion Processing: Computational, Neurobiological, and Psychophysical
Perspectives, figure from pages 154–183, © 1998, Massachusetts Institute of Technology, by permission of The
MIT Press.
2 Previous experiments concerned with perceptual grouping and motion perception have studied the effects of
unchanging grouping variables on the perceptual organization of motions elicited by the displacement of sur-
faces (e.g. Kramer and Yantis 1997; Martinovic et al. 2009). Dynamic grouping differs in that the perception
of motion is across a changing surface that is not displaced, and is elicited by changes in grouping variables.
Dynamic grouping motion, although weaker, is phenomenologically similar to the line motion illusion that is
3
obtained when the changing surface is darker than the surfaces adjacent to it (Hock and Nichols 2010). For the
latter, motion perception results from the detection of oppositely signed changes in edge and/or surface con-
trast (i.e., counterchange). The avoidance of counterchange-determined motion is why the dynamic grouping
method requires the target surface to be lighter than surfaces adjacent to it.
564 Hock
the boundary is momentarily less salient, as if for the moment the grouping of the surfaces is
strengthened. These directions are characteristic for DG induced motion. The implications of
fluctuations in eye position or covert attention shifts without eye movements (Posner 1980) are
discussed in a section entitled ‘Further implications’ at the end of this chapter).
Frame 2 perturbation in luminance similarity produces a larger change in the affinity of the two
surfaces, and thereby elicits a stronger signal for motion across the changing surface in character-
istic DG-determined directions (i.e., away from the boundary of the surfaces when their affinity
increases, and toward the boundary when their affinity decreases.
Frame 2 Frame 2
(c) (d)
Increase Increase
in affinity in affinity
Increase Increase
Affinity
Affinity
in affinity in affinity
Increase
Increase
in grouping
in grouping
strength
strength
Fig. 27.2 (a) A version of Tse et al.’s (1998) stimulus for which unidirectional dynamic grouping motion is perceived in the direction determined by hue
similarity. (b) A similar stimulus, but with the horizontal bar presented only during Frame 2. Transformational apparent motion is perceived in the direction
determined by good continuation. (c,d) Nonlinear functions relating the combined effect of grouping variables to affinity for the two pairs of surfaces
in panels a and b. Both are consistent with hue similarity more strongly affecting grouping strength than good continuation. (e) For relatively long
boundary lengths, dynamic grouping (DG) motion is perceived across the changing surface on the left when its luminance is increased. (f) For the same
change in luminance, either no motion or symmetrically divergent motion is perceived when the boundary is shorter. (g) The perception of DG motion
across the surface on the left is restored when the luminance of the surface on the right is raised, increasing the luminance similarity and thereby the
pre-perturbation affinity of the two surfaces.
Dynamic Grouping Motion 567
horizontal and vertical bars, compared with when good continuation contributes to the grouping
of the horizontal bar and square (Figure 27.2c). As a result of the affinity for the horizontal and
vertical bars being located on a steeper segment of the grouping/affinity function, the perturba-
tion of luminance similarity produces a greater change in affinity, and therefore, stronger DG
motion across the horizontal bar in relation to the vertical bar than in relation to the square.
(It is noteworthy that this difference in grouping strength between good continuation and hue
similarity for this stimulus would not be discernible without something like the DG method.)
When the horizontal bar is presented only during the second frame (Figure 27.2b), as in Tse
et al.’s (1998) TAM paradigm, good continuity predominates despite the apparently stronger affin-
ity of the horizontal and vertical bars because of their hue similarity; i.e., the square appears to
expand into a long horizontal bar. As illustrated in Figure 27.2d, there is minimal pre-perturbation
affinity during the first frame for this stimulus (the effect of proximity grouping for the separated
surfaces is assumed to be negligible), and the insertion of the horizontal bar results in a larger
change in affinity for the grouping of the horizontal and vertical bars compared with the horizon-
tal bar and square. If the perception of motion depended only on the size of the affinity change,
TAM, like DG motion, would have been in relation to the vertical bar. This is the opposite of what
is actually perceived.
The perceptual differences between DG and TAM for the stimuli in Figures 27.2a and 27.2b
indicate that they do not always reflect identical aspects of perceptual organization. What then is
the relationship between them? It can be shown with a dynamical model (Hock & Schöner, 2010)
that DG and TAM can entail the same processing mechanisms, with both depending on differ-
ences in the rate of change in affinity that results from changes in grouping variables. DG and TAM
function differently in the model in that TAM depends on different grouping variables having dif-
ferent rates of change in affinity, whereas DG motion depends as well on rates of change varying
according to the level of stable, pre-perturbation affinity. The perceptual results described above
suggest that hue similarity may have a stronger effect on surface affinity than good continuation,
but the contribution of good continuation to surface affinity may emerge more rapidly.
Frame 2 B
A
(e)
Frame 1 Frame 2
A A
B
B
C C
Fig. 27.3 (a) A stimulus for which the perception of dynamic grouping (DG) motion is indicative of
amodal completion behind the occluding cube. The direction of the motion is consistent with the
implied presence of a discontinuous luminance boundary separating surfaces A and C. (b) Unidirectional
DG motion is perceived across the square surface on the right when its luminance is decreased and the
occluding surface is relatively narrow (the squares are relatively close together). (c) For the same change
in luminance, DG motion is not perceived when the occluding surface is relatively wide (the squares
are further apart). (d) The perception of DG motion across the square on the right is restored when the
luminance of the square on the left is lowered, increasing the luminance similarity and therefore the pre-
perturbation affinity of the two physically separated surfaces. (e) Variation of a stimulus from Biederman
(1987). The dynamic grouping motion that is perceived when the luminance of surface B is decreased
is consistent with its grouping with surface A, perhaps to form a truncated cone, a ‘geon’ which
contributes to the recognition of the object as a lamp in Biederman’s (1987) recognition-by-components
theory.
Adapted from Irving Biederman, Recognition-by-components: A theory of human image understanding,
Psychological Review, 94(2), pp. 115–147, http://dx.doi.org/10.1037/0033-295X.94.2.115 © 1987, American
Psychological Association.
Dynamic Grouping Motion 569
the grouping variable increases with increases in the length of the boundary separating pairs of
adjacent surfaces.
Implications of super-additivity
Super-additivity, according to which the combined effects of cooperating grouping variables on
the overall affinity of two surfaces exceeds their linear sum, is a concrete realization of the princi-
ple that the whole is more than the sum of the parts (von Ehrenfels 1890; Wagemans, this volume).
An important consequence of super-additive nonlinearity is that the effect of a particular group-
ing variable on the affinity of a pair of adjacent surfaces is context dependent. That is, it will
vary, depending on the presence or absence of other cooperating grouping variables. This con-
trasts with Bayesian analyses indicating that the effects of grouping variables are independent,
or additive (e.g., Elder and Goldberg 2002). Although Bayesian independence was confirmed
by Claessens and Wagemans (2008) using the lattice method, they also found, inconsistent with
Bayesian-determined independence, that the relative strength of proximity and co-linearity
depended on whether their lattice aligned with cardinal axes or was oblique.
Amodal completion
The DG method can be used to gain further insights into amodal completion, which is typi-
cally concerned with the continuity of unseen stimulus information in time (e.g., Yantis 1995;
Joseph and Nakayama 1999) and space (e.g., Michotte et al. 1964; Tse 1999; van Lier and Gerbino,
this volume). It also can be used to establish the strength of grouping variables for disconnected
surfaces.
Hidden boundaries
For the stimulus in Figure 27.3a, a partially occluded light gray bar composed of surfaces A and C
is readily perceived during the first frame of a two-frame trial. When surface A’s luminance is
decreased during the second frame, its luminance similarity with surface C decreases, resulting in
diagonally upward DG motion across A, toward an amodal hidden boundary with C. In addition
to its effect on the affinity of surfaces A and C, the luminance decrease for surface A increases
its similarity with surface B, so if DG motion were determined strictly on the basis of whether
surfaces are adjacent on the retina, the motion across surface A would have been in the opposite
direction, away from surface B. That the direction of DG motion is consistent with the grouping
of surfaces A and C is important because: (1) it shows that amodal completion can entail dis-
continuous luminance boundaries, not just continuity, (2) the DG method can be diagnostic for
the grouping of surfaces even when their common boundaries are hidden, and (3) it enables the
measurement of affinity for non-adjacent surfaces. The latter feature is the basis for the measure-
ment of proximity effects, which is described next.
luminance is lowered for the square on the left (Figure 27.3d). This is because the change in lumi-
nance increases the pre-perturbation luminance similarity of the two square surfaces, which are
physically separate but nonetheless perceptually grouped.
The pre-perturbation luminance similarity required in order to perceive motion in
DG-determined directions increases (the Michelson contrast of the physically separated surfaces
decreases) with successive increases in the distance between the squares. Precise psychophysi-
cal measurements with systematically varied pre-perturbation luminance similarity will make it
possible to determine whether the ratios based on the equivalent luminance similarity for each
proximity value (including a proximity value of zero) will be consistent with the distance ratios
measured by Kubovy and Wagemans (1995) in their experiments using the lattice method.
surface grouping precedes comparison with component information in memory would reduce
the complexity of object recognition (Jacobs 1996; Feldman 1999), but it also is possible that the
affinity values for all pairings of the surfaces composing an object are unique, and therefore suf-
ficient for the recognition of the object. In either case, the ultimate test for dynamic grouping, or
any other method for assessing the compositional structure of a multi-surface object, is that the
compositional structure is determinative for the recognition of the object.
Further implications
The example in Figure 27.3e shows that grouping processes should have an explicit role in theories
of object perception, but it is quite another thing to specify what the role should be. The approach
taken in this chapter is that grouping variables determine the affinity of pairs of surfaces, and
thereby, the compositional structure of the object comprising those surfaces. Experiments and
demonstrations with simple, 2D objects composed of two or three surfaces have provided evidence
for the usefulness of the dynamic grouping method for the determination of affinity. Extending
the method to multi-surface, 3D objects creates opportunities for discovering new grouping vari-
ables, and determining how ambiguities in perceptual grouping are resolved (the ‘surface corre-
spondence problem’) in the context of the other surfaces composing a complex object.
The key theoretical concepts are: (1) the affinity of a pair of surfaces belonging to an object
depends on the nonlinear (super-additive) summation of the affinity values ascribable to indi-
vidual grouping variables, and (2) the compositional structure of the object is revealed by
embedding the pairwise affinity relationships among the surfaces composing the object into a
multidimensional affinity space. This would entail multidimensional scaling (MDS) based on
matrices of DG-measured affinity for all the pairwise combinations of an object’s surfaces. Points
in the space would represent the surfaces composing an object, and the distance between the
points would represent the affinity of the surfaces. In contrast with multidimensional models of
object recognition that specify particular features, like color, shape and texture (e.g., Mei 1997),
the compositional structures determined with the dynamic grouping method will be based on an
abstract entity, affinity, so they will not be specific to the particular features of familiar objects.
They therefore would have the potential to exhibit a degree of invariance; i.e., generalize to other
objects with different features but a similar compositional structure, and to new viewpoints for
the same object.
Using MDS methods, the compositional structure of an object can be determined without
restrictions or pre-conceptions; e.g., without the typical assumption that the structure is hierarchi-
cal (Palmer 1977; Brooks 1983; Cutting 1986; Feldman (1999); Joo et al., this volume). Although
there are no restrictions in the compositional structure’s form, the existence of parts could be
indicated by the clustering of surfaces in multidimensional affinity space, and significant relations
between the parts, including possible hierarchical relations, could be indicated when pairs of sur-
faces from different clusters are relatively close in that abstract space.
An important consideration is the extent to which affinity relationships indicated by the
dynamic grouping method are definitive. In the experiments and demonstrations discussed in
this chapter, instructions have emphasized fixating on a dot placed in the center of the target sur-
face and maintaining attention on the dot for the entire two-frame trial. The purpose is to estab-
lish relatively unbiased conditions for determining the direction of dynamic grouping motion.
However, it is as yet undetermined whether fluctuations in eye position or covert attentional shifts
without eye movements (Posner 1980) will alter the compositional structures that are indicated by
572 Hock
the dynamic grouping method. Indeed, when the stimuli like those in Figures 27.1i and 27.2a are
freely examined there is the sense that the surfaces can be grouped in more than one way.
These uncertainties do not undermine the usefulness of the dynamic grouping method for
objects with more complex surface relationships. That is, changes in fixation or shifts of atten-
tion that reduce the measured affinity of a target surface with another surface would be likely
to also change its affinity with the other surfaces composing the object. Such changes can be
conceived of as the equivalent of the perturbations in luminance similarity that that can result
in the perception of dynamic grouping motion. That is, they can temporarily alter the multidi-
mensional compositional structure of an object, but the structure is nonetheless restored after
the perturbation.
The relationships among the surfaces composing an object also could be characterized as
an ‘affinity network’ in which each surface is represented by an activation variable and the
coupling strength for pairs of activation values is determined by their affinity. Changes in
luminance, eye position, or attention could perturb coupling strengths, but the inherent sta-
bility of the network would restore the couplings to their stable values. Exceptions are bistable
objects for which perturbations could result in new couplings among the object’s surfaces that
qualitative change the compositional structure of the object (e.g., the Necker cube). As in the
case of bistable motion patterns (Hock et al. 2003; Hock & Schöner 2010), such bistable objects
may provide an ideal vehicle for investigating the nature of compositional structure for static
objects.
References
Arseneault, J-L, Bergevin, R., and Laurendeau, D. (1994). ‘Extraction of 2D groupings for 3D object
recognition’. Proceedings SPIE 2239: 27.
Biederman, I. (1987). ‘Recognition-by-components: a theory of human image understanding’. Psychological
Review 94: 115–47.
Blair, C.D., Caplovitz, G.P., and Tse, P.U. (this volume). ‘Interactions of form and motion in the perception
of moving objects’. In The Handbook of Perceptual Organization, edited by J. Wagemans. (Oxford: Oxford
University Press).
Brooks, R.A. (1983). ‘Model-based three-dimensional interpretations of two-dimensional images’. IEEE
Transactions on pattern Analysis and Machine Intelligence, 5: 140–149.
Claessens, P.M.E. and Wagemans, J. (2008). ‘A Bayesian framework for cue integration in multistable
grouping: Proximity, colinearity, and orientation priors in zigzag lattices’. Journal of Vision 8: 1–23.
Cutting, J. (1986). Perception with an eye for motion. Cambridge, MA: MIT Press.
Edelman, S. (1997). ‘Computational theories of object recognition’. Trends in Cognitive Sciences 1: 206–304.
Elder, J., and Goldberg, R.M. (2002). ‘Ecological statistics of Gestalt laws for the perceptual organization of
contours’. Journal of Vision 2: 324–53.
Fantoni, C., Hilger, J., Gerbino, W., and Kellman, P. J. (2008). ‘Surface interpolation and 3D relatability’.
Journal of Vision 8: 1–19.
Feldman, J. (1999). ‘The role of objects in perceptual grouping’. Acta Psychologica 102: 137–63.
Gori, S., and Spillmann, L. (2010). ‘Detection vs. grouping thresholds for elements differing in spacing,
size and luminance. An alternative approach towards the psychophysics of Gestalten’. Vision Research
50: 1194–202.
Hock, H.S., and Nichols, D.F. (2010). ‘The line motion illusion: The detection of counterchanging edge and
surface contrast’. Journal of Experimental Psychology: Human Perception and Performance 36: 781–96.
Hock, H.S., and Nichols, D.F. (2012). ‘Motion perception induced by dynamic grouping: A probe for the
compositional structure of objects’. Vision Research 59: 45–63.
Dynamic Grouping Motion 573
Hock, H.S., & Schöner, G. (2010). ‘A neural basis for perceptual dynamics’. In Nonlinear dynamics in human
behavior, edited by. R. Huys and V. Jirsa, pp. 151–77. (Berlin: Springer Verlag).
Hock, H. S., Schöner, G., and Giese, M. A. (2003). ‚The dynamical foundations of motion pattern
formation; Stability, selective adaptation, and perceptual continuity’. Perception & Psychophysics
65: 429–57.
Iqbal, Q., and Aggarwal, J.K. (2002). ‘Retrieval by classification of images containing large manmade
objects using perceptual grouping’. Pattern Recognition 35: 1463–79.
Jacobs, D. (1996). ‘Robust and efficient detection of salient convex groups’. I.E.E.E. Transactions on Pattern
Analysis and Machine Intelligence 18: 23–37.
Jacot-Descombes, A., and Pun, T. (1997). ‘Asynchronous perceptual grouping: from contours to relevant
2-D structures’. Computer Vision and Image Understanding 66: 1–24.
Joo, J., Wang, S., & Zhu, S.-C. (2013). Hierarchical organization by and-or tree. In The Handbook of
Perceptual Organization, edited by J. Wagemans. (Oxford: Oxford University Press).
Joseph, J.S., and Nakayama, K. (1999). ‘Amodal representation depends on the object seen before partial
occlusion;. Vision Research 39: 283–92.
Kramer, P., and Yantis, S. (1997). ‘Perceptual grouping in space and time: Evidence from the Ternus
display’. Perception & Psychophysics 59: 87–99.
Kubovy, M., and Wagemans (1995). ‘Grouping by proximity and multistability in dot lattices: A quantitative
gestalt theory’. Psychological Science 6: 225–34.
Lamote, C., and Wagemans, J. (1999). ‘Rapid integration of contour fragments: From simple filling-in to
parts-based description’. Visual Cognition 6: 345–61.
Lowe, D.G. (1987). ‘Three-dimensional object recognition form single two-dimensional images’. Artificial
Intelligence 31: 355–95.
Marr, D., and Nishihara, H.K. (1978). ‘Representation and recognition of the spatial organization of
three-dimensional shapes’. Proceedings of the Royal Society of London, Series B 211: 151–80.
Martinovic, J., Meyer, G., Muller, M.M., and Wuerger, S.M. (2009). ‘S-cone signals invisible to the motion
system can improve motion extraction via grouping by color’. Visual Neuroscience 26: 237–48.
Mei, B. (1997). ‘Combining color, shape, and texture histogramming in a neutrally-inspired approach to
visual object recognition’. Neural Computation, 9: 777–804.
Michotte, A., Thinès, G., and Crabbè, G. (1964). Les compléments amodaux des structures perceptives
(Amodal completion of perceptual structures). (Leuven, Belgium: Publications Universitaires de
Louvain).
Palmer, S.E. (1999). Vision science: Photons to phenomenology. (Cambridge MA: Bradford Books).
Palmer, S.E., and Rock, I. (1994). ‘Rethinking perceptual organization: the role of uniform connectedness’.
Psychonomic Bulletin and Review 1: 29–55.
Palmer, S.E., Neff, J., and Beck, D. (1996). ‘Late influences on perceptual grouping: Amodal completion’.
Psychonomic Bulletin and Review 3: 75–80.
Pentland, A.P. (1987). ‘Perceptual organization and the representation of natural form’. Artificial Intelligence
28: 293–331.
Posner, M.I. (1980). ‘Orienting of attention’. Quarterly Journal of Experimental Psychology 32: 3–25.
Rush, G. (1937). ‘Visual grouping in relation to age’. Archives of Psychology, N.Y. 31: No. 217.
Shipley, T.F., and Kellman, P.J., (Eds.) (2001). From Fragments to Objects: Segmentation and Grouping in
Vision. (Amsterdam: Elsevier Science Press).
Tarr, M. J., and Bultoff, H.H. (1995). ‘Is human object recognition better described by
geon-structural-descriptions or by multiple-views? Comment on Biederman and Gerhardstein (1993).
Journal of Experimental Psychology: Human Perception and Performance 21 1494–505.
Tarr, M. J., Williams, P., Hayward, W. G., and Gauthier, I. (1998). ‘Three dimensional object recognition is
viewpoint-dependent’. Nature Neuroscience 1: 275–77.
574 Hock
A huge variety of empirical studies have been collected that treat different aspects of the percep-
tion of biological and body motion, ranging from psychophysical questions, the processing of
social signals, over ecological and developmental aspects, to clinical implications. Due to space
limitations, this chapter focuses primarily on aspects related to pattern formation and the organi-
zation of Gestalt for dynamic patterns.
Many topics in body motion perception, which cannot be covered in this chapter due to space
limitations, are treated in many excellent review articles and books. This includes the original work
by Gunnar Johannson (review: Jansson et al. 1994), the psychophysics and the neural basis of body
and facial motion processing (Puce and Perrett 2003; Allison et al. 2000; O’Toole et al. 2002; Blake &
Shiffrar, 2007), computational principles (Giese and Poggio 2003), imaging results (Blakemore and
Decety 2001; Puce and Perrett 2003), and its relationship to emotion processing (de Gelder 2006).
Another important topic that cannot be adequately treated in this review due to space limi-
tations is the relationship between body motion perception and motor representations. Several
recent books treat exhaustively different aspects of biological and body motion perception,
which could not be included in this review (e.g. Knoblich et al. 2006; Johnson and Shiffrar 2013;
Rizzolatti and Sinigaglia 2008).
Historical Background
While already Aristotle had written about the principles of movements of animals, the system-
atic scientific investigation of body motion perception started back two centuries ago with the
works and Eadweard Muybridge (1887) and Etienne-Jules Marey (1894) who studied body
motion, applying the technique of sequential photography. While classical Gestalt psychologists
had treated the organization of complex motion patterns not so extensively, the systematic study
of biological and body motion was initiated by the Swedish psychologist Gunnar Johansson in
the 1970s. He was originally interested in studying Gestalt laws of motion organization, and for
him body motion was an example of a complex motion pattern with relevance for everyday life
(Jansson et al. 1994). His work on biological motion grew out of studies on the organization of
much simpler motion patterns during his PhD thesis (Johansson 1950), aiming at the develop-
ment of a general ‘theory of event perception’.
Already classical Gestalt psychologists had described pattern organization phenomena for
simple motion patterns. This includes the classical law of ‘common fate’ (Wertheimer 1923),
work on motion grouping (Ternus 1926) and on ‘induced motion’ by Duncker (1929) (see
Figure 28.1a), and studies by Metzger (1937) on the ‘Prägnanz’ in motion perception perception
(see Herzog and Öğmen, this volume). In addition, some more recent work by Albert Michotte
576 Giese
(b)
(c)
(d)
Fig. 28.1 Perceptual organization of simple motion displays. (a) Induced motion (Duncker 1929): while
in reality the external frame moves and the dot is stationary, the dot is perceived as the moving element.
(The following examples are taken from Johansson (1950)): (b) three dots that move along straight lines
are perceptually grouped into two pairs of dots that move up and down, with a periodic ‘contraction’
of their virtual connection line horizontally. (c) Two dots that move vertically and two that move along
a circle are grouped into a single line that moves vertically. In addition, the exterior points are perceived
as moving horizontally. (d) Two dots, where one moves along a straight line and the second along
piecewise curved paths, is perceived as a ‘rotating wheel’, where one dot is rotating about the other.
Part a: Reproduced from Psychologische Forschung, 12(1), pp. 180–259, Über induzierte Bewegung, Karl
Duncker, © 1929, Springer Science and Business Media. With kind permission from Springer Science and Business
Media. Parts b-d: Reproduced from G. Johansson, ‘Configurations in Event Perception: An experimental study’.
Dissertation, Högskolan, Stockholm, 1950.
(1946/1963) addressed the interpretation of simple motion displays in terms of the perception
of ‘causality’.
Johansson tried to study systematically Gestalt grouping principles in simple motion displays
that consisted of small numbers of moving dots, where he varied systematically their geometrical
and temporal parameters. A variety of his observations are in line with modern theories about
the estimation of optic flow from spatiotemporal image data, such as the tendency to group dots
with similar motion vectors in the image plane, or a tendency to favor correspondences in terms
of slow motion.
In addition, Johansson made the important additional discovery that he formalized in his theory
of vector analysis: often even simple motion patterns are perceptually organized in terms of interpre-
tations that impose a hierarchy of spatial frames of reference, instead of a simple perceptual repre-
sentation that reflects just the physical structure of the motion. Some example stimuli that illustrate
this phenomenon are shown in Figure 28.1b–d. The physical motion of the stimulus is decomposed
into components that describe, sometimes non-rigid deformations within the grouped structure
(e.g. a contracting bar), and a second motion component that describes the motion of the whole
grouped structure within the external frame of reference (e.g. the movement of the whole bar). The
key point is that the perceptual interpretation provides a description in terms of relative motion
Biological and Body Motion Perception 577
that is described within frames of reference, which partially result from the grouping process itself.
This can be interpreted as a form of vectorial decomposition of the motion, e.g. in a component
that describes the motion of a whole group of dots, and an additive second vectorial component
that describes the relative motion between the individual dots within the groups. It seems obvi-
ous that the principle might be extendable for more complex displays, e.g. consisting of multiple
non-rigid parts that move against each other. The human body is an example for such a more com-
plex system, and this motivated originally the interest of Johansson in these types of stimuli.
The analysis of such hierarchical patterns of relative motion is an interesting theoretical problem,
and has motivated theoretical work in psychology that tried to account for the organization of such
patterns by an application of coding theory and the principle of minimum description length (Restle
1979). The underlying idea is to characterize different possible encodings of the motion patterns by
the required number of describing parameters (such as amplitude, phase, and frequency for sinus-
oidal oscillation). Encodings in terms of hierarchies of relative motions are often more compact, i.e.
require less describing parameters than the direct encoding of the physical movements. In computer
vision the minimum description length principle has been successfully applied, e.g., for motion seg-
mentation (Shi et al. 1998) and the compression of motion patterns in videos (e.g. Nicolas et al.
1997). However, general models that decompose complex motion patterns in terms of hierarchies of
relative motion, in the way envisioned by Johansson, remain to be developed.
Phenomenological Studies
Subsequent early research on body motion perception verified that different categories of move-
ments could be recognized from point-light stimuli, such as walking, running, or dancing (e.g.
Johansson 1973; Dittrich 1993). Further studies showed that humans also can recognize animals,
such as or dogs from such point-light stimuli (e.g. Bellefeuille and Faubert 1998; Jokisch and Troje
2003). Many early experiments tried to characterize the capability to derive subtle information
from such motion cues, such a gender (Barclay et al. 1978; Cutting et al. 1978; Pollick et al. 2005),
gaits of familiar people or friends (e.g. Beardsworth and Buckner 1981; Cutting and Kozlowski
1977), age (Montpare et al. 1988), or emotions (e.g. Dittrich et al. 1996; Walk and Homan 1984;
Atkinson et al. 2004; Roether et al. 2009). Also, it has been shown that observers can derive phys-
ical properties, such as the weights of lifted objects from such point-light stimuli (e.g. Runeson
and Frykholm 1981). In the context of these early studies, also the first mathematical descrip-
tions for critical features, e.g. for gender perception, and simplified mathematical models for gait
trajectories, suitable for the synthesis of point-light pattern by computer graphics (Cutting et al.
578 Giese
(a) (b)
Fig. 28.2 Point-light biological motion stimulus. (a) Light bulbs or markers are fixed to the major
joints of a moving human. (b) Presentation of moving dots alone results in a point-light stimulus that
induces the vivid perception of a moving human.
Reprinted by permission from Macmillan Publishers Ltd: Nature Reviews Neuroscience, 4(3), Martin A. Giese and
Tomaso Poggio, Neural mechanisms for the recognition of biological movements, page 180, Copyright © 2003,
Nature Publishing Group.
1978) have been developed. In addition, minimum coding theory was extended to gait patterns
(Cutting 1981).
Already starting to investigate the underlying critical processes, another stream of experi-
ments investigated the robustness of the perception of body motion form point-light stimuli,
introducing specific manipulations of Johansson’s original stimuli. This includes the mask-
ing of point-light stimuli by moving dot masks, generated from randomly positioned moving
dots from point-light stimuli (‘scrambled walker noise’) (Bertenthal and Pinto 1994; Cutting
et al. 1978). Other studies tried to degrade the local motion information by introducing tem-
poral delays between the stimulus frames (Thornton et al. 1998), variations of contrast polar-
ity and spatial-frequency information, or by changing the relative phase of the dots or their
disparity information (Ahlström et al. 1997). The depth information in binocularly presented
point-light stimuli could be strongly degraded without the observers even noticing this manipu-
lation (Bülthoff et al. 1998). This observation seems incompatible with mechanisms of biologi-
cal motion recognition that rely on a veridical reconstruction of depth. However, more recent
studies show that depth has an important influence and can disambiguate bistable point-light
stimuli whose orientation in space cannot be uniquely derived from two-dimensional informa-
tion (Vanrie et al. 2004; Jackson and Blake 2010). Other studies tried to degrade point-light
stimuli by randomizing the positions of the dots on the body (Cutting 1981) and by limiting the
life time of individual dots (e.g. Neri et al. 1998; Beintema and Lappe 2002). Another interesting
manipulation looking specifically for the organization of biological motion patterns in terms of
spatial units were studies that randomized the position of individual parts of the body, leaving
Biological and Body Motion Perception 579
their internal motion invariant (showing e.g. all limbs, vs. only the ipsi- or contralateral limbs)
(Pinto and Shiffrar 1999; Neri 2009).
Finally, another set of studied used the rotation of point-light walkers in the image plane (inver-
sion) in order to study frames of reference in which the underlying perceptual processing happens.
Like for the perception of faces, rotations in the image plane strongly degrades the perception of
body motion form point-light stimuli (e.g. Sumi 1984; Pavlova and Sokolov 2000. The orientation
dependence seems to be tied to an egocentric rather than to the external frame of reference (e.g.
Troje 2003). Also the ‘Thatcher illusion’ (that is the difficulty to recognize inverted face parts in
faces that are presented upside down) has been generalized to biological motion patterns (Mirenzi
and Hiris 2011). In line with this, a recent study has shown that the features of the local dots (e.g.
color) are less accessible for consciousness when they are embedded in an upright than in an
inverted biological motion walker (Poljac et al. 2012). These results strongly suggest that the per-
ceptual processing of biological motion might be critically dependent on templates that are tied to
the visual frame of reference, rather than on a generic process that reconstructs three-dimensional
shape from motion.
Fig. 28.3 Informative cues in body motion stimuli. The global configuration of a human body can be
recovered either from: (a) local form features (e.g. orientation and positions of limbs or limb parts),
or (b) from local motion features, which specify for each time point a complex instantaneous optic
flow field. (c) Trajectories of individual dots, like the ones of the feet, can also provide sufficient
information for the solution of specific biological motion tasks, e.g. detection of walking direction.
(d) Equivalent of a ‘life detector’ in the form domain. The direction of the nose in a scrambled face
image (middle panel) makes it easy to determine the heading direction of the face (upper panel).
This detection is more difficult if the picture is rotated upside down (‘inversion effect’).
Biological and Body Motion Perception 581
The fact that it is easy to recognize walking or running from static pictures of stick figures
shows that form information is relevant for the processing of body motion (Todd 1983). In
addition, it seems obvious that humans can learn to recognize point-light configurations, just
as any other shape, after sufficient training (Reid et al. 2009). Computational work has tried
to identify critical features for body motion perception, which generalize spontaneously from
full-body figures to point-light stimuli, applying principle components analysis to motion and
form features. It turns out that such generalization is easier to achieve for motion than for form
features (Casile and Giese 2005). In addition, the opponent motion of the hand and the feet
seems to be a critical feature for the recognition of biological motion (Casile and Giese 2005;
Chang and Troje 2009). Trying to oppose the potential relevance of local motion cues, Beintema
and Lappe (2002) have demonstrated that point-light walkers can be recognized from stimuli
where the dot positions are randomized on the skeleton in every frame. This manipulation
degrades the local motion information, but does not eliminate some of the critical motion fea-
tures (Casile and Giese 2005).
While Lappe and colleagues hypothesized that local motion processing is completely irrelevant
for biological motion processing, unless the moving figure has to be segmented from a (station-
ary) background (Lange and Lappe 2006), studies comparing the relevance of form and motion
cues sometimes found a primary relevance of form and sometimes of motion cues (e.g. Lu and
Liu 2006; Hiris et al. 2007; Thurman and Grossman 2008). Instead of denying the relevance of
individual cues, more recent work has rather studied how the cues are integrated. A recent set of
studies tried to develop reverse correlation techniques in order to identify critical features that
drive the categorization of biological motion patterns (Lu and Liu 2006; Thurman and Grossman
2008; Thurman et al. 2010). These studies found evidence for a relevance of both types of features,
consistent with the hypothesis that the nervous system fuses different informative cues during the
processing of body motion (instead of dumping classes of informative cues). Further evidence
suggests that it is dependent on the task which cue is more effective (Thirkettle et al. 2009). In
the same direction points also a recent study that suggests the existence of separate high-level
after-effects that are dependent on form or motion cues (Theusner et al. 2011).
A further stream of research about features in the recognition of body motion has been initi-
ated by the observation that the walking direction of point-light walkers can even be derived
from scrambled walkers, for which the configural information about the body shape has been
destroyed. In addition, the recognition of walking direction from these stimuli is worse if these
stimulus patterns are rotated upside down, implying an inversion effect (Troje and Westhoff
2006). The fact that the walking direction can be recognized without the configural informa-
tion in a forced-choice task is due to the fact that in particular the foot movement trajectory of
walking is highly asymmetrical (Figure 28.3c). (This fact is analogous to the observation that it
is easy to detect the facing direction of side views of faces from only the direction in which the
nose points, see Figure 28.3d.) The recognition of walking direction from such individual dot
trajectories is consistent with motion template detectors that are defined in a retinal frame of
reference. It is unclear in how far such detectors are learned or partially innate. Some research-
ers have interpreted the above observation as evidence for a special-purpose mechanism for the
detection of the asymmetric foot trajectories, which has been termed ‘life detector’. Since a similar
inversion effect was observed for the tendency of newly hatched chicks to align their bodies with
point-light patterns (Vallortigara and Regolin 2006), it has also been hypothesized that this spe-
cial purpose mechanism is evolutionary old, and potentially universal through a lot of species.
(See also Koenderink’s chapter on Gestalts as ecological templates, this volume.) The concept
of the ‘life detector’ has initiated a number of follow-up studies, investigating the processing of
582 Giese
biological motion information in absence of configural cues. For example, the perceived temporal
duration of biological motion and scrambled biological motion is prolonged compared to similar
non-biological stimuli (Wang and Jiang 2012).
A further general approach for the characterization of signals that are specific for biological
movements, and which can be processed even in absence of configural cues, has been motivated
by work in motor control on the differential invariants of body movements. An example for such
an invariant is the two-thirds power law that links the speed and the curvature of the endpoint
trajectories of arm and finger movements, and which holds even for trajectories in locomotion.
Psychophysical and imaging work shows that trajectories compatible with this law are perceived as
smoother (Viviani and Stucci 1989; Bidet-Ildei et al. 2006), and activate brain structures involved
in body motion processing more strongly than dot trajectories that are incompatible with this
invariant (Dayan et al. 2010; Casile et al. 2011).
Fig. 28.4 Top-down effects in the processing of body motion. (a) Visual search task for point-light
walkers: The target is the walker walking to the left side. Reproduced with permission from
Cavanagh et al. (2001). Attention-based visual routines: sprites. Cognition 80, p. 56, with permission
from Elsevier. (b) Stimulus demonstrating strong interference between shape recognition and body
motion perception. Reproduced from Hunt and Halper (2008). Disorganizing biological motion.
J. Vis. 8(9) 12, p. 3, with permission of the Association for Research in Vision and Ophthalmology.
(c) Motion stimulus by Fujimoto and Yagi (2008), showing that body motion processing interacts
with the organization of ambiguous coherent motion of a grating. The background is preferentially
perceived as moving in the direction that would be compatible with a forward locomotion of walker /
runner. Similar observations hold for point-light patterns.
Adapted from Kiyoshi Fujimoto and Akihiro Yagi, ‘Motion Illusion in Video Images of Human Movement’, in
Entertainment Computing - ICEC 2005, Lecture Notes in Computer Science, p. 532, Copyright © 2005,
Springer-Verlag Berlin Heidelberg. With kind permission from Springer Science and Business Media.
grouping principles interact with the perceptual organization of biological motion displays. This
was, for example, demonstrated by replacing the dots of point-light walkers by oriented Gabor
patches that support or disfavor the correct grouping into limbs (Poljac et al. 2011).
Relevance of Learning
Several studies that the perception of body motion and other complex motion patterns is depend-
ent on learning. It is a classical result that observers can learn to recognize individuals from their
body movements (e.g., Hill and Pollick 2000; Cutting and Kozlowski 1977; Troje et al. 2005).
The discrimination of biological from scrambled patterns can be successfully trained, where this
training induces corresponding changes of the BOLD activity in relevant areas (Grossman et al.
2004). Several studies have compared the learning of biological and similar non-biological motion
patterns, finding substantial learning effects, for both stimulus classes (Hiris et al. 2005; Jastorff
et al. 2006). It seems critical for the learning process that the learned patterns are related to an
underlying skeleton. Beyond this, the learning seems to be very fast, requiring less than 30 repeti-
tions, and it is associated with BOLD activity changes along the whole visual pathway (Jastorff
et al. 2009). Finally, the learning of the visual discrimination of body motion patterns has been
studied extensively in the context of different application domains. For example, experience seems
to improve body motion recognition of identity and emotional expression in dance (e.g. Sevdalis
584 Giese
and Keller 2011), or the efficiency of the prediction of dangerous events in surveillance videos
(e.g. Troscianko et al. 2004).
Related to the role of learning in body motion recognition is the question about the extent in
which this capability is innate, and how this capability has changed in the course of evolution. This
question is on the one hand addressed by many developmental studies, showing that the capabil-
ity to discriminate point-light from scrambled stimuli emerges very early in child development
(e.g. Fox and McDaniel 1982; Bertenthal 1993). Space does not permit to provide a more detailed
review of this interesting literature. In addition, a variety of studies has investigated biological
motion perception in other species, such as cats, pigeons, or macaques (e.g. Blake 1993; Dittrich
et al. 1998). While many species can discriminate intact point-light from scrambled stimuli more
detailed investigations suggest that even macaques might not perceive point-light stimuli in the
same way as humans do and require extensive training until they can recognize these patterns
correctly (Vangeneugden et al. 2010). This makes it crucial to carefully dissociate the relevant
computational levels of the processing of body motion in such experiments with other species,
before drawing far-reaching conclusions about potential evolutionary aspects.
Neural Mechanisms
Electrophysiological Studies
Substantial insights have been gained about neural mechanisms that are involved in the process-
ing of body motion. In particular, the imaging literature on action processing is vast, and a review
would by far exceed the scope of this chapter. In the following only a few key results from monkey
physiology and functional imaging can be highlighted that are particularly relevant for aspects
of visual pattern organization. In addition, it will not be possible to discuss the relevant literature
from neuropsychology and the relationship between body motion perception, brain lesions, and
psychiatric disorders, such as autism. More comprehensive discussions can be found in reviews
about the neural basis of body motion processing (e.g. Decety and Grezes 1999; Vaina et al. 2004;
Puce and Perrett 2003; Knoblich et al. 2006; Blake and Shiffrar 2007; Johnson and Shiffrar 2013).
Neurons with visual selectivity for body motion and point-light stimuli have been first described
in the superior temporal sulcus (STS) by the group of David Perrett (Perrett et al. 1985; Oram and
Perrett 1996). This region contains neurons that respond selectively to human movements and
body shapes, and in the monkey likely represents a site of convergence of form and motion infor-
mation along the visual processing stream. Some neurons in this area show specific responses to
combinations of articulary and translatory body motion, and many of them show selectivity for
the temporal order of the stimulus frames (Jellema and Perrett 2003; Barraclough et al. 2009). The
responses of many of these neurons are specific for certain stimulus views, and such view depend-
ence has been observed even at very high levels of the processing pathway, e.g. in mirror neurons
in premotor cortex (Caggiano et al. 2011). An extensive study of the neural encoding of body
motion in the STS has been realized by Vangeneugden et al. (2009) using a stimulus set that was
generated by motion morphing, and defining a triangular configuration in the morphing space.
Applying multi-dimensional scaling to the responses of populations of STS neurons, correspond-
ing metric configurations in the ‘neural space’ were recovered from the cell activities that closely
resembled these configurations in the physical space (consistent with a veridical neural encoding
of the physical space). In addition, this study reports ‘motion neurons’, especially in the upper
bank and fundus of the STS, which respond to individual and small groups of dots in point-light
stimuli, even in absence of global shape information. Conversely, the lower bank contains many
Biological and Body Motion Perception 585
‘shape neurons’ that are specifically selective for the global shape of the body. Recent studies also
applied neural decoding approaches using classifiers to responses of populations of STS neurons
for stick figure stimuli, as well as for densely textured avatars, showing that such stimuli can be
decoded from such population responses (Singer and Sheinberg 2010; Vangeneugden et al. 2011).
Another literature in the field of electrophysiology that is highly relevant for body motion pro-
cessing is related to the ‘mirrror neuron system’, and shows that neurons in parietal and premotor
cortex also are strongly activated by the observation of body motion. Space limitation do not
permit here to give a thorough review of this aspect, and the reader is referred to reviews and
books that treat specifically this aspect (e.g. Rizzolatti et al. 2001; Rizzolatti and Craighero 2004;
Rizzolatti and Sinigaglia 2008).
Imaging Studies
Meanwhile there exists a vast imaging literature on the perception of body motion, and we can
highlight only a very small number of aspects related to the mechanisms of pattern formation.
Further details can be found in the reviews mentioned at the beginning of this chapter.
Early positron emission spectroscopy (PET) and fMRI studies found evidence for the involve-
ment of a network of areas, including the posterior STS, in the processing of point-light biological
motion (Bonda et al. 1996; Vaina et al. 2001; Grossman and Blake 2002). The relevant network
includes also human MT, parts of the lateral occipital complex (LOC), and the cerebellum. Also
an inversion effect could be demonstrated for the activity in the STS (Grossman and Blake 2001).
Subsequent studies tried to dissociate activation components related to the action vs. human
shape (Peuskens et al. 2005), where specifically the right pSTS seems to respond selectively to the
human motion. The human STS can also be robustly activated by full-body motion patterns (e.g.
Pelphrey et al. 2003), and several studies have investigated body motion-induced activation pat-
terns using natural stimuli such as movies (e.g. Hasson et al. 2004; Bartels and Zeki 2004), even
being able to decode semantic categories from action videos (Huth et al. 2012). TMS stimulation
in the STS reduces the sensitivity to biological motion stimuli (Grossman et al. 2005).
Substantial work has been dedicated to study of body-selective areas in the inferotemporal cor-
tex and their involvement in the processing of body motion. One such area is the extrastriate
human body area (EBA) (Peelen and Downing 2007), which is selectively activated by static body
shapes and responds also strongly to body motion. Another relevant area is the fusiform body
area (FBA), which is very close to the fusifirm face area (FFA) (Peelen and Downing 2005). Both
areas have been interpreted as specifically processing the form aspects of body motion. Recent
studies, controlling for structure as well as motion cues, suggests that EBA and FBA might repre-
sent an essential stage of body motion processing that links the body information with the action
(Jastorff and Orban 2009). Very similar imaging results have been obtained by fMRI studies in the
monkey cortex, permitting to establish a homology between human and monkey imaging data on
body motion perception (e.g. Jastorff et al. 2012).
Again, there exists a vast and continuously growing imaging literature about the involvement
of motor and mirror representations in the perceptual processing of body motion. Again we refer
to other more specialized reviews (e.g. Buccino et al. 2004; van Overwalle and Baetens 2009) with
respect to this aspect.
Observed
sensory
feedback
View-specific modules
(b)
Complex Snapshot Motion pattern
Gabor filters feature neurons neurons
detectors
Form
pathway t1t2t3 Σ
− + View integration
Temporal
Recurrent NN summation Motion pattern
neurons
V1/2 V2, V4 IT/FBA STS, FBA, F5 (view-indep.)
− −
Fig. 28.5 Models of body motion recognition. (a) Example for a model for movement recognition by internal simulation of the underlying motor behavior. The core of the
MOSAIC model by Wolpert et al. (2003) is a mixture of expert controllers for different motor behaviors, such as walking or kicking. Forward models for each individual
controller predict the sensory signals that would be caused by the corresponding motor commands. These predictions are compared with the actual sensory input. The
classification of observed movements is obtained by choosing the controller model that produces the smallest prediction error. (b) Neural architecture for body motion
recognition, following models by Giese and Poggio (2003) and Fleischer et al. (2013). The model assumes processing in two parallel pathways that are specialized for form
and motion features. Model neurons at different levels mimic properties of cortical neurons. Recognition in the form pathway is accomplished by integrating the information
from sequences of recognized body shapes (recognized by ‘snapshot neurons’). Recognition from local motion features is accomplished by the detection of sequences of
characteristic optic flow patterns. Recognition is first accomplished in a view-specific manner within view-specific modules. Only at the highest hierarchy the outputs of
these view-specific modules are combined, achieving view-independent recognition. (Potentially relevant cortical areas in monkey and human cortex are indicated by the
abbreviations below the modules of the model. See above references for further details.)
Adapted from Daniel M. Wolpert, Kenji Doya, and Mitsuo Kawato, A unifying computational framework for motor control and social interaction, Philosophical Transactions B, 358 (1431),
pp. 593–602, DOI: 10.1098/rstb.2002.1238, Copyright © 2003, The Royal Society.
Biological and Body Motion Perception 587
small number of these approaches is relevant for biological systems. For a recent overview over
technical approaches see e.g. Moeslund et al. (2006). We will briefly sketch here some computa-
tional approaches that have been developed in the psychological literature on body motion per-
ception, and we will then more thoroughly discuss existing neural models.
Computational Models
Early theories of body motion recognition were based on simple invariants that can be derived from
the three-dimensional movements of articulated figures (e.g., Hoffman and Flinchbaugh 1982;
Webb and Aggarwal 1982). For example, for point-light stimuli the distances between dots on the
same limb tend to vary less than the distances between dots on different limbs. Alternatively, one
can try to derive geometrical constraints for the two-dimensional motion of points that are rigidly
connected in the three-dimensional space. Classical work by Marr and Vaina (1982), assumed
that the brain might recover the body shape, and track body movements, using parametric body
models that are composed from cylindrical shape primitives. Other models have exploited other
shape primitives, such as spheres (e.g. O’Rourke and Badler 1980).
Building on this idea another class of theoretical models has been developed that is presently
very influential in cognitive neuroscience. This class of models assumes that the recognition of
body movements and actions is based on the internal simulation of observed motor behaviors.
A tight interaction between body motion recognition and motor control is suggested by many
experiments (reviews see e.g. Knoblich et al. 2006; Schütz-Bosbach and Prinz 2007). For example,
a study by Jacobs and Shiffrar (2005) shows that the perception of gait speeds of point-light walk-
ers depends on whether the observers are walking or running during the observation. A direct
and highly selective coupling between motor control and mechanisms for the perception of bio-
logical motion is also suggested by a study that used Virtual Reality technology in order to control
point-light stimuli by the concurrent movements of the observer (e.g. Christensen et al. 2011).
In this case, detection of biological motion was facilitated if the stimulus was spatially and tem-
porally coherent with the ongoing movements of the observer, but impaired if this congruency
was destroyed. In addition, a variety of studies demonstrate that motor expertise (independent of
visual expertise) influences performance in body motion perception (e.g. Hecht et al. 2001; Casile
and Giese 2006; Calvo-Merino et al. 2006).
The analysis-by-synthesis idea that underlies this class of models goes back to classical motor
theory of speech recognition, which assumes that perceived speech is mapped onto ‘vocal gestures’
that form the units of the production of speech in the vocal tract (Liberman et al. 1967). For action
recognition this idea has been formulated, for example, by Wolpert and colleagues who suggested
that controller models for the execution of body movements might be used also for motion and
social recognition (Wolpert et al. 2003). The underlying idea is illustrated in Figure 28.5a. Their
MOSAIC model is based on a mixture of controller experts (forward models) for the execution of
different behaviors. Recognition is accomplished by predicting the observed sensory signals using
all controller models, and selecting the one that generates the smallest prediction error. Models
based on similar ideas have been suggested as account for the function of the ‘mirror neuron sys-
tem’ in action recognition, and as basis for the learning of movements by imitation (e.g. Oztop and
Arbib 2002; Erlhagen et al. 2006). In addition, related models have also been formulated exploit-
ing a Bayesian framework (e.g. Kilner et al. 2005).
Many of the discussed analysis-by-synthesis approaches require the reconstruction of
motor-relevant sensory variables, such as joint angles, at the input level. The estimation of such
variables from monocular image sequences is a very difficult computer vision problem that is
588 Giese
partially unsolved. Correspondingly, only few of the discussed models are implemented to a level
that would demonstrate their performance on real video data. For the brain it is unclear if and
how it solves the underlying reconstruction problem. Alternatively, the visual system might cir-
cumvent this difficult computational problem, recognizing body motion by computationally sim-
pler strategies.
Neural Models
Another class of models has been inspired by fundamental properties of the architecture of
the visual cortex and extends biologically-inspired models for the recognition of stationary
shapes (e.g. Riesenhuber and Poggio 1999) in space-time. Such an architecture, which repro-
duces broad range of data about body motion recognition from psychophysics, electrophysi-
ology, imaging, and neuropsychology is illustrated in Figure 28.5b. (See Giese and Poggio
(2003), Casile and Giese (2005), Giese (2006), Fleischer et al. (2013) for a detailed description.)
Consistent with the anatomy of the visual cortex, the model is organized in terms of two hier-
archical neural pathways, modeling the ventral and dorsal processing streams. The first pathway
is specialized for the processing of form information, while the second pathway processes local
motion information.
Both pathways consist of hierarchies of neural detectors that mimic properties of cortical neu-
rons, and which converge to a joint representation at a level that corresponds to the STS. The
complexity of the extracted features as well as the receptive field sizes of the feature detectors
increase along the hierarchy. The model creates position and scale invariance along the hierarchy
by pooling of the responses of detectors for the same feature over different positions and scales,
using a maximum operation (e.g. Riesenhuber and Poggio 1999). Stimuli can thus be recognized
largely independently of their size and positions in the visual field.
The detectors in the form pathway mimic properties of shape-selective neurons in the ventral
stream (including simple and complex cells in primary visual cortex, V4 neurons, and shape-
selective neurons in inferotemporal cortex). The detectors on the highest level of the form path-
way (‘snapshot neurons’) are selective for body postures that are characteristic for snapshots from
movies showing the relevant body movement. They are modeled by radial basis function (RBF)
units, which represent a form of fuzzy shape template (the RBF center defining the template).
The motion pathway of the model has the same hierarchical architecture, where its input level is
formed by local motion energy detectors. This pathway recognizes temporal sequences of com-
plexly-structured optic flow patterns, which are characteristic for body motion.
A central idea of the model is that body motion can be recognized by identifying temporal
sequences of features, such as body shapes or optic flow patterns in ‘snapshots’ from a movie (Giese
2000). In order to make the neural detectors selective for the temporal order of such sequences,
the model assumes the existence of asymmetric lateral connections between the snapshot neurons
in the form and motion pathway. The resulting network dynamics suppresses responses to mov-
ies for which the stimulus frames appear in the wrong temporal order (Giese and Poggio 2003).
The model accomplishes recognition first in a view-specific manner, within view-specific mod-
ules that are trained with different views of the body motion sequence. Only on the highest
hierarchy level the information from different view-specific modules is combined by pooling,
resulting in view-independent motion recognition (cf. Figure 28.5b).
If such a model is trained with normal full-body motion and tested with point-light walkers
the motion pathway spontaneously generalizes to point-light stimuli, while this is not the case
for the form pathway. This does not imply that configural information is irrelevant because also
Biological and Body Motion Perception 589
the optic flow templates in the motion pathway are dependent on the global body configuration.
In addition, this result does not imply that the form pathway cannot process point-light patterns.
If trained with them, the form pathway responds also perfectly to dot patterns (Casile and Giese
2005), consistent with the fact that trained observers can learn to recognize actions even from
static point-light patterns (Reid et al. 2009).
A strongly related model has been proposed by Beintema et al. (2006). This model was designed
originally in order to account for the processing of a biological motion from stimuli that degrade
local motion information by repositioning the dots on the skeleton of a moving point-light fig-
ure in every frame (Beintema and Lappe 2002). This model is very similar to the form pathway
of the model by Giese and Poggio (2003), where the major differences are: (i) The model does
not contain a motion pathway; (ii) it does contain a mechanism that accounts for position an
scale invariance; and (iii) it implicitly assumes that the form template detectors (RBFs) are always
perfectly positioned and scaled relative to the stimulus. In presence of static backgrounds this
perfect alignment might be accomplished by motion segmentation (Lange and Lappe 2006), while
this approach seems not applicable in presence of motion clutter, e.g. for dynamically masked
point-light stimuli. (More extensive discussions of related models can be found in Giese (2006)
and Fleischer et al. (2013).)
Meanwhile, much more computationally efficient versions of the Giese-Poggio model have
been developed in computer vision, reaching state-of-the-art performance for action detection
(e.g. Jhuang et al. 2007; Escobar et al. 2009; Schindler et al. 2008). In addition, the model has been
extended for the recognition of goal-directed actions (Fleischer et al. 2013). For this purpose, add-
itional modules were integrated that model the properties of neurons in parietal and premotor
cortex. One of these modules computes the spatial relationship (relative position and motion)
between the moving effector (e.g. the hand) and the goal object. The other module contains
neurons (probably in the STS and parietal cortex) that combine the information about the goal
object, the effector movement, and the spatial relationship between effector and goal. The model
accomplishes recognition of goal-directed hand actions from real videos, at the same time repro-
ducing a whole spectrum of properties of action-selective neurons in the STS, parietal and the
premotor cortex. Opposed to the architecture shown in Figure 28.5a, recognition by this model
is accomplished without the explicit reconstruction of three-dimensional structure parameters,
such as joint angles, from monocular image sequences. In addition, it has been shown (Fleischer
et al. 2012) that the model even accounts for certain forms of causality perception (Michotte
1946/1963).
Conclusion
This chapter has reviewed some central results and theories about the perception of body motion.
Work on this topic in psychology started from the original work of Johansson, who studied body
motion as an example of complex and ecologically relevant natural motion, and who was aiming
at uncovering and testing Gestalt rules for the perceptual organization of motion. Since then, this
field has made a strong development during which it has absorbed many other approaches out-
side Gestalt psychology and pattern formation. This includes psychophysical theories of pattern
detection, top-down control by attention, learning-based recognition theories, ecological and
developmental psychology, and modern approaches in physiology and imaging, including neu-
ral decoding by machine learning techniques. The large body of existing work has revealed some
neural and computational principles. However, we have no clear picture of the underlying neu-
ral and computational processes, and many of existing explanations remain phenomenological,
590 Giese
theoretically not rigorously defined, or only loosely tied to experimental data. The main stream
of present research is dominated, on the one hand, by pattern recognition approaches, implic-
itly assuming signal detection or filtering mechanisms, partly combined with ecological ideas.
Contrasting with this approach, research in cognitive neuroscience is fascinated by the idea of
an analysis by internal simulation of motor behavior, often entirely bypassing the aspects of
visual pattern recognition. Both streams go away from Johansson’s original idea of uncovering
the dynamic processes that control pattern formation in the organization of complex motion
patterns. It seems likely that such processes play a central role in the organization of ambigu-
ous stimulus information about body motion, and it seems quite interesting to pick up this old
line of research. Modern mathematical approaches in neurodynamics, Bayesian inference, and
computational learning, combined with the now available computer power, will provide a meth-
odological basis to re-address these questions. This approach in this direction seems even more
promising since the previous work has revealed insights about relevant features and underlying
basic processes, laying a basis for the study of active pattern formation in the processing of natu-
ralistic body motion stimuli.
Acknowledgments
I thank M. Angelovska for help with the illustrations and the editing of the references. I thank
J. Vangeneugden and an anonymous reviewer for helpful comments. Supported by EU Commission,
EC FP7-ICT-248311 AMARSi, F7 7-PEOPLE-2011-ITN: ABC PITN-GA-011-290011, HBP
FP7-ICT-2013-FET-F/ 604102; FP7-ICT-2013-10/ 611909 KOROIBOT, Deutsche Forschungsge
meinschaft: DFG GI 305/4-1, DFG GZ: KA 1258/15-1, and German Federal Ministry of Education
and Research: BMBF, FKZ: 01GQ1002A.
References
Ahlström, V., Blake, R., and Ahlström, U. (1997). Perception of biological motion. Perception 26: 1539–48.
Allison, T., Puce, A., and McCarthy, G. (2000). Social perception from visual cues: role of the STS region.
Trends Cogn Sci. 4: 267–78.
Atkinson, A.P., Dittrich, W.H., Gemmel, A.J., and Young A.W. (2004). Emotion perception from dynamic
and static body expressions in point-light and full-light displays. Perception 33: 717–46.
Barclay, C., Cutting, J., and Kozlowski, L. (1978). Temporal and spatial factors in gait perception that
influence gender recognition. Percept. Psychophys. 23: 145–52.
Barraclough, N.E., Keith, R.H., Xiao, D., Oram, MW, and Perrett, D.I. (2009). Visual adaptation to
goal-directed hand actions. J. Cogn. Neurosci. 21: 1806–20.
Bartels, A. and Zeki, S. (2004). Functional brain mapping during free viewing of natural scenes. Hum.
Brain Mapp. 21: 75–85.
Battelli, L., Cavanagh, P., and Thornton, I.M. (2003). Perception of biological motion in parietal patients.
Neuropsychologia 41: 1808–16.
Beardsworth, T. and Buckner, T. (1981). The ability to recognize oneself from a video recording of one’s
movements without seeing one’s body. Bulletin of the Psychonomic Society 18: 19–22.
Bellefeuille, A. and Faubert, J. (1998). Independence of contour and biological-motion cues for
motion-defined animal shapes. Perception 27: 225–35.
Beintema, J.P. and Lappe M. (2002). Perception of biological motion without local image motion.
Proceedings of the National Academy of Science USA 99: 5661–3.
Beintema, JA, Georg, K, and Lappe, M. (2006). Perception of biological motion from limited lifetime
stimuli. Percept. Psychophys. 68(4): 613–24.
Biological and Body Motion Perception 591
Decety, J. and Grèzes, J. (1999). Neural mechanisms subserving the perception of human actions. Trends
Cogn. Sci. 3(5): 172–8.
de Gelder B. (2006). Towards the neurobiology of emotional body language. Nat. Rev. Neurosci. 7(3): 242–9.
Dittrich, W.H. (1993). Action categories and the perception of biological motion. Perception 22: 15–22.
Dittrich, W. H., Troscianko, T., Lea, S. E., and Morgan, D. (1996). Perception of emotion from dynamic
point-light displays represented in dance. Perception 25: 727–38.
Dittrich, W.H., Lea, S.E.G., Barrett, J., and Gurr, P.R. (1998). Categorization of natural movements by
pigeons: visual concept discrimination and biological motion. J. Exp. Anal. Behav. 70: 281–99.
Duncker, K. (1929). Über induzierte Bewegung (Ein Beitrag zur Theorie optisch wahrgenommener
Bewegung). Psychologische Forschung 12: 180–259.
Erlhagen W., Mukovskiy A., and Bicho E. (2006). A dynamic model for action understanding and
goal-directed imitation. Brain Res. 1083(1): 174–88.
Escobar, M.J., Masson, G.S., Vieville, T., and Kornprobst, P. (2009.) Action recognition using a
bio-inspired feedforward spiking network. Int. J. Comput. Vision 82: 284–301.
Fleischer F, Christensen A, Caggiano V, Thier P, and Giese MA. (2012). Neural theory for the perception
of causal actions. Psychol. Res. 76(4): 476–93.
Fleischer, F., Caggiano, V., Thier, P. and Giese, M. A. (2013). Physiologically inspired model for the visual
Recognition of transitive hand actions. Journal of Neuroscience 15(33): 6563–80.
Fox, R. and Mc Daniel, C. (1982). The perception of biological motion by human infants. Science
218(4571): 486–7.
Fujimoto, K. (2003). Motion induction from biological motion. Perception 32: 1273–7.
Fujimoto, K. and Yagi, A. (2005). Motion illusion in video images of human movement. In: F. Kishino et al.
(eds.), ICEC 2005, LNCS 3711, Springer-Verlag, Berlin/Heidelberg, pp. 531–4.
Fujimoto, K. and Yagi, A. (2008). Biological motion alters coherent motion perception. Perception
37(12): 1783–9.
Giese, M.A. (2000). Neural field model for the recognition of biological motion patterns. Second
Proceedings of International ICSC Symposium on Neural Computation (NC 2000), pp. 1–12.
Giese, M.A. (2006). Computational Principles for the Recognition of Biological Movements, Model-based
versus feature-based approaches. In: Knoblich, W., Thornton, I. M., Grossjaen, M., Shiffrar, M. (eds),
The Human Body: Perception From the Inside Out, pp. 323–59. Oxford University Press.
Giese, M.A. and Lappe, M. (2002). Measurement of generalization fields for the recognition of biological
motion. Vision Res. 42(15): 1847–58.
Giese, M.A. and Poggio, T. (2003). Neural mechanisms for the recognition of biological movements. Nat.
Rev. Neurosci. 4: 179–92.
Giese, M. A., Thornton, I.M., and Edelman, S. (2008). Metrics of the perception of body movement.
Journal of Vision 8(9): 1–18.
Grossman, E.D. and Blake, R. (2001). Brain activity evoked by inverted and imagined biological motion.
Vision Res. 41(10–11): 1475–82.
Grossman, E.D. and Blake, R. (2002).Brain areas active during visual perception of biological motion.
Neuron 35(6): 1167–75.
Grossman ED, Blake R, and Kim CY. (2004). Learning to see biological motion: brain activity parallels
behavior. J. Cogn. Neurosci. 16: 1669–79.
Grossman, E.D., Battelli, L., and Pascual-Leone A. (2005). Repetitive TMS over STSp disrupts perception
of biological motion. Vis. Res. 45: 2847–53.
Hasson, U., Nir, Y., Levy, I., Fuhrmann, G., and Malach, R. (2004). Intersubject synchronization of cortical
activity during natural vision. Science 303: 1634–1640.
Hecht, H., Vogt, S., and Prinz, W. (2001). Motor learning enhances perceptual judgment: a case for
action-perception transfer. Psychol. Res. 65(1): 3–14.
Biological and Body Motion Perception 593
Herzog, M. H. and Öğmen, H. (2014). Apparent motion and reference frames. In: J. Wagemans (ed.),
Oxford Handbook of Perceptual Organization (in press). Oxford University Press.
Hill, H. and Pollick, F.E. (2000). Exaggerating temporal differences enhances recognition of individuals
from point light displays. Psychological Science Vol. 11 (3): 223–8.
Hiris, E. (2007). Detection of biological and nonbiological motion. J Vis. 7(12) 4: 1–16.
Hiris, E., Krebeck, A., Edmonds, J., and Stout, A. (2005). What learning to see arbitrary motion tells us
about biological motion perception. J. Exp. Psychol.: Hum. Percept. Perform. 31: 1096–106.
Hoffman, D.D. and Flinchbaugh, B.E. (1982). The interpretation of biological motion. Biol Cybern.
42(3): 195–204.
Hunt, A.R. and Halper, F. (2008). Disorganizing biological motion. J Vis. 8(9)12: 1–5.
Huth, A.G., Nishimoto, S., Vu, A.T., and Gallant, J.L. (2012). A continuous semantic space describes
the representation of thousands of object and action categories across the human brain. Neuron.
76(6): 1210–24.
Jackson, S. and Blake, R. (2010) Neural integration of information specifying human structure from form,
motion, and depth. J. Neurosci. 30(3): 838–48.
Jacobs, A. and Shiffrar, M. (2005). Walking perception by walking observers. J. Exp. Psychol.: Hum. Percept.
Perform. 31: 157–69.
Jansson, G., Bergström, S.S., Epstein, W., and Johansson, G. (1994). Perceiving Events and Objects.
Hillsdale: Lawrence Erlbaum Associates.
Jastorff, J. and Orban, G.A. (2009). Human functional magnetic resonance imaging reveals
separation and integration of shape and motion cues in biological motion processing. J. Neurosci.
29(22): 7315–29.
Jastorff, J., Kourtzi, Z., and Giese, M.A. (2006). Learning to discriminate complex movements: biological
versus artificial trajectories. J Vis. 6(8): 791–804.
Jastorff, J., Kourtzi, Z., and Giese, M.A. (2009). Visual learning shapes the processing of complex
movement stimuli in the human brain. J. Neurosci. 29(44): 14026–38.
Jastorff, J., Popivanov, I.D., Vogels, R., Vanduffel, W., and Orban, G.A. (2012). Integration of shape and
motion cues in biological motion processing in the monkey STS. Neuroimage. 60(2): 911–21.
Jellema, T. and Perrett, D.I. (2003). Perceptual history influences neural responses to face and body
postures. J. Cogn. Neurosci. 15(7): 961–71.
Jhuang, H., Serre, T., Wolf, L., and Poggio, T. (2007). A biologically inspired system for action recognition.
In: IEEE 11th International Conference on Computer Vision, ICCV 2007, Rio de Janeiro, Brazil,
October 14-20, pp. 1-8.
Johansson, G. (1950). Configurations in event perception: an experimental study, dissertation.
Stockholm: Högskolan.
Johansson, G. (1973). Visual perception of biological motion and a model for its analysis. Perception and
Psychophysics 14: 201–11.
Johansson, G. (1976). Spatio-temporal differentiation and integration in visual motion perception An
experimental and theoretical analysis of calculus-like functions in visual data processing. Psychological
Research 38: 379–93.
Johnson, K. and Shiffrar, M. (2013). People Watching. Oxford University Press.
Jokisch, D. and Troje, N.F. (2003). Biological motion as a cue for the perception of size. J. Vis. 3: 252–64.
Jordan H, Fallah M, and Stoner GR. (2006) Adaptation of gender derived from biological motion. Nat.
Neurosci. 9(6): 738–9.
Kilner, J., Friston, K.J., and Frith, C.D. (2005). The mirror-neuron system: a Bayesian perspective.
Neuroreport 18(6): 619–23.
Knoblich, G., Thornton, I.M., Grosjean, M., and Shiffrar, M. (2006). Human Body Perception from the
Inside Out. New York: Oxford University Press.
594 Giese
Koenderink, J. (2014). Gestalts as ecological templates. In: J. Wagemans (ed.), Oxford Handbook of
Perceptual Organization (in press). Oxford University Press.
Lange, J. and Lappe, M. (2006). A model of biological motion perception from configural form cues.
J. Neurosci. 26: 2894–906.
Leopold, D.A., O’Toole, A.J., Vetter, T., and Blanz, V. (2001). Proto-type-referenced shape encoding
revealed by high-level aftereffects. Nat. Neurosci. 4: 89–94.
Liberman, A.M., Cooper, F.S., Shankweiler, D.P., and Studdert-Kennedy, M. (1967). Perception of the
speech code. Psychol. Rev. 74(6): 431–61.
Lu, H. (2010). Structural processing in biological motion perception. J. Vis. 10(12): 1–13.
Lu, H. and Liu, Z. (2006). Computing dynamic classification images from correlation maps. J Vis.
6(4): 475–83.
Ma, Y., Paterson, H.M., and Pollick, F.E. (2006). A motion-capture library for the study of identity, gender,
and emotion perception from biological motion. Behav. Res. Methods 38: 134–41.
Marey, E.J. (1894). Le Mouvement, Masson, Paris.
Marr, D. and Vaina, L. (1982). Representation and recognition of the movements of shapes. Proc. R. Soc.
Lond. B. Biol. Sci. 214(1197): 501–24.
Mather, G., Radford, K., and West, S. (1992). Low level visual processing of biological motion. Proc. R. Soc.
Lond. B. Biol. Sci. 249: 149–55.
Metzger, W. (1937). Gesetze des Sehens, 1st German edition, Laws of Vision.
Michotte, A. (1946). La perception de la causalité. Louvain: Publications Universitaires. (English
translation: The perception of causality. (1963) London: Methuen.)
Mirenzi, A. and Hiris, E., (2011). The Thatcher effect in biological motion. Perception 40(10): 1257–60.
Moeslund, T.B., Hilton, A., and Kruger, V. (2006). A survey of advances in vision-based human motion
capture and analysis. Computer Vision and Image Understanding 104: 90–126.
Montpare, J. M., Zebrowitz, M., and McArthur, L. (1988). Impressions of people created by age-related
qualities of their gaits. Journal of Personality and Social Psychology 55: 547–56.
Muybridge, E. (1887). Muybridge’s Complete Human and Animal Locomotion. (All 781 Plates from the
1887 ‘Animal Locomotion.’ Volume I. Dover Publications, Inc. 1979.)
Neri, P. (2009). Wholes and subparts in visual processing of human agency. Proc. Biol. Sci.
276(1658): 861–9.
Neri, P., Morrone, M.C., and Burr D. (1998). Seeing biological motion. Nature 395, 894–896.
Nicolas, H., Pateux, S., and Le Guen, D. (1997). Minimum description length criterion for region-based
video compression, Image Processing, Proceedings, International Conference 1: 346–9.
Oram, M.W., and Perrett, D.I. (1996). Integration of form and motion in the anterior superior temporal
polysensory area (STPa) of the macaque monkey. J. Neurophysiol. 76: 109–29.
O’Rourke J. and Badler N. (1980). ‘Model-based image analysis of human motion using constraint
propagation.’ IEEE Trans. on Pattern Analysis and Machine Intelligence 2(6): 522–36.
O’Toole, A.J., Roark, D.A., and Abdi, H. (2002). Recognizing moving faces: a psychological and neural
synthesis. Trends Cogn. Sci. 6 (6): 261–6.
Oztop, E. and Arbib, M.A. (2002). Schema design and implementation of the grasp-related mirror neuron
system. Biol. Cybern. 87(2): 116–40.
Pavlova, M. and Sokolov, A. (2000). Orientation specificity in biological motion perception. Percept.
Psychophys. 62 (5): 889–99.
Peelen, M.V. and Downing, P.E. (2005). Selectivity for the human body in the fusiform gyrus.
J. Neurophysiol. 93(1): 603–8.
Peelen, M.V. and Downing, P.E. (2007). The neural basis of visual body perception. Nat. Rev. Neurosci.
8(8): 636–48.
Biological and Body Motion Perception 595
Pelphrey, K.A., Mitchell, T.V., Mc Keown, M.J., Goldstein, J., Allison, T., and McCarthy, G. (2003).
Brainactivity evoked by the perception of human walking: controlling for meaningful coherent motion.
J. Neurosci. 23: 6819–25.
Perrett, D.I., Smith, P.A., Mistlin, A.J., Chitty, A.J., Head, A.S., Potter, D.D., Broenni-Mann, R., Milner,
A.D., and Jeeves, M.A. (1985). Visual analysis of body movements by neurons in the temporal cortex of
the macaque monkey: a preliminary report. Behav. Brain Res. 16: 153–70.
Peuskens, H., Vanrie, J., Verfaillie, K., and Orban GA. (2005). Specificity of regions processing
biologicalmotion. Eur. J. Neurosci. 21: 2864–75.
Pinto, J. and Shiffrar, M. (1999). Subconfigurations of the human form in the perception of
biologicalmotion displays. Acta. Psychol. 102: 293–318.
Poljac, E., Verfaillie, K, and Wagemans, J. (2011) Integrating biological motion: the role of grouping in the
perception of point-light actions. PLoS ONE 6(10): e25867.
Poljac, E., de-Wit, L., and Wagemans, J. (2012). Perceptual wholes can reduce the conscious accessibility of
their parts. Cognition 123: 308–12.
Pollick, F.E., Paterson, H.M., Bruderlin, A., and Sanford, A.J. (2001). Perceiving affect from arm
movement. Cognition 82(2): B51–B61.
Pollick, F.E., Kay, J.W., Heim, K., and Stringer, R. (2005). Gender recognition from point-light walkers.
J. Exp. Psychol.: Hum. Percept. Perform. 31: 1247–65.
Puce, A. and Perrett, D., (2003). Electrophysiology and brain imaging of biological motion. Philos. Trans.
R. Soc. Lond. B Biol. Sci. 358: 435–45.
Reid, R, Brooks, A, Blair, D, and van der Zwan, R. (2009). Snap! Recognising implicit actions in static
point-light displays. Perception 38(4): 613–16.
Restle, F. (1979) Coding theory of the perception of motion configurations. Psychol. Rev. 86(1): 1–24.
Riesenhuber, M. and Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nat. Neurosci.
12(11): 1019–25.
Rizzolatti, G., Fogassi, L., and Gallese, V. (2001). Neurophysiological mechanisms underlying the
understanding and imitation of action. Nat. Rev. Neurosci. 2: 661–70.
Rizzolatti, G. and Craighero, L. (2004). The mirror-neuron system. Annu. Rev. Neurosci. 27: 169–92.
Rizzolatti, G. and Sinigaglia, C. (2008) Mirrors in the brain: How our minds share actions and emotions.
New York: Oxford University Press.
Roether, C.L., Omlor, L., Christensen, A., and Giese, M. A. (2009). Critical features for the perception of
emotion from gait. Journal of Vision 9(6): 1–32.
Rose, C., Cohen, M.F., and Bodenheimer, B. (1998). Verbs and adverbs: multidimensional motion
interpolation. Computer Graphics and Applications 18(5): 32–40.
Runeson, S. and Frykholm, G. (1981). Visual perception of lifted weight. J. Exp. Psychol.: Hum. Percept.
Perform. 7: 733–40.
Safford, A.S., Hussey E.A., Parasuraman, R., and Thompson, J.C. (2010). Object-based attentional
modulation of biological motion processing: spatiotemporal dynamics using functional magnetic
resonance imaging and electroencephalography. J. Neurosci. 30 (27): 9064–73.
Schindler, K., Van Gool, L., and de Gelder, B. (2008). Recognizing emotions expressed by body pose: a
biologically inspired neural model. Neural Netw. 21(9): 1238–46.
Schütz-Bosbach, S. and Prinz, W. (2007). Perceptual resonance: action-induced modulation of perception.
Trends Cogn. Sci. 11(8): 349–55.
Shi, J., Pan, J., and Yu, S. (1998). Joint motion estimation and segmentation based on the MDL principle.
ICSP ‘98. Fourth International Conference on Signal Processing, Proceedings, 2(2): 963–7.
Singer, J.M., Sheinberg, D.L. (2010). Temporal cortex neurons encode articulated actions as slow sequences
of articulated poses. J. Neurosci. 30: 3133–45.
596 Giese
Sevdalis, V. and Keller, P.E. (2011). Perceiving performer identity and intended expression intensity in
point-light displays of dance. Psychol. Res. 75(5): 423–34.
Sumi, S. (1984). Upside-down presentation of the Johansson moving light-spot pattern. Perception
13: 283–6.
Theusner, S., de Lussanet, M.H.E., and Lappe, M. (2011). Adaptation to biological motion leads to a
motion and a form after effect. Atten. Percept. Psychophys. 73(6): 1843–55.
Thirkettle, M., Benton, C.P., and Scott-Samuel, N.E. (2009). Contributions of form, motion and task to
biological motion perception. J. Vis. 9(3)28: 1-11.
Thornton, I.M. and Vuong, Q.C. (2004.) Incidental processing of biological motion. Curr. Biol.
14(12): 1084–9.
Thornton, I. M., Pinto J., and Shiffrar, M. (1998).The visual perception of human locomotion. Cognitive
Neuropsychology 15: 535–52.
Thornton, I.M., Rensink, R.A., and Shiffrar, M. (2002) Active versus passive processing of biological
motion. Perception 31(7): 837–53.
Thurman, S.M. and Grossman, E.D. (2008). Temporal ‘Bubbles’ reveal key features for point-light
biological motion perception. J. Vis. 8(3) 28: 1–11.
Thurman, S.M., Giese, M.A., and Grossman, E.D. (2010). Perceptual and computational analysis of critical
features for biological motion. J. Vis. 10: 1–15.
Ternus, J. (1926). Experimentelle Untersuchungen über phänomenale Identitat (Experimental
investigations of phenomenal identity). Psychologische Forschung 7: 81–136.
Todd, J.T. (1983). Perception of gait. J. Exp. Psychol.: Hum. Percept. Perform. 9(1): 31–42.
Troje, N.F. (2002). Decomposing biological motion: A framework for analysis and synthesis of human gait
patterns. J. Vis. 2(5) 2: 371–87.
Troje, N.F. (2003). Reference frames for orientation anisotropies in face recognition and biological-motion
perception. Perception 32 (2): 201–10.
Troje N. F., Sadr J., Geyer H. and Nakayama K. (2006). Adaptation aftereffects in the perception of gender
from biological motion. J. Vis. 6: 850–7.
Troje, N.F. and Westhoff, C. (2006). The inversion effect in biological motion perception: evidence for a ‘life
detector’? Curr. Biol. 16(8): 821–4.
Troje, N.F., Westhoff, C., and Lavrov, M. (2005). Person identification from biological motion: effects of
structural and kinematic cues. Percept Psychophys. 67(4): 667-75.
Troscianko T, Holmes A, Stillman J, Mirmehdi M, Wright D, and Wilson A. (2004) What happens next?
The predictability of natural behaviour viewed through CCTV cameras. Perception 33(1): 87–101.
Unuma, M., K. Anjyo, and R. Takeuchi (1995). Fourier principles for emotion-based human figure
animation, Proceedings of ACM SIGGRAPH ‘95, ACM Press, pp. 91–6.
Vaina, L.M., Solomon, J., Chowdhury, S., Sinha, P., and Belliveau, J.W. (2001). Functional neuroanatomy
of biological motion perception in humans. Proc. Natl. Acad. Sci. USA 98(20): 11656–61.
Vaina, L.M.V., Beardsley, S.A., and Rushton, S. (2004). Optic Flow and Beyond. Dordrecht: Kluwer
Academic Press.
Vallortigara, G. and Regolin, L. (2006). Gravity bias in the interpretation of biological motion by
inexperienced chicks. Curr. Biol. 16(8): R279–R280.
Vangeneugden, J, Pollick, F, and Vogels, R. (2009). Functional differentiation of macaque visual temporal
cortical neurons using a parametric action space. Cereb. Cortex. 19(3): 593–611.
Vangeneugden, J., Vancleef, K., Jaeggli, T., Van Gool, L., and Vogels, R. (2010). Discrimination of
locomotion direction in impoverished displays of walkers by macaque monkeys. J. Vis. 10: 22.1–22.19.
Vangeneugden, J., De Mazière, P.A., Van Hulle, M.M., Jaeggli, T., Van Gool, L., and Vogels, R.
(2011). Distinct mechanisms for coding of visual actions in macaque temporal cortex. J. Neurosci.
31(2): 385–401.
Biological and Body Motion Perception 597
Van Overwalle F. and Baetens K. (2009). Understanding others’ actions and goals by mirror and
mentalizing systems: a meta-analysis. Neuroimage 48(3): 564–84.
Vanrie J. and Verfaillie K. (2004). Perception of biological motion: a stimulus set of human point-light
actions. Behav. Res. Methods Instrum. Comput. 36(4): 625–9.
Vanrie, J., Dekeyser, M., and Verfaillie, K. (2004). Bistability and biasing effects in the perception of
ambiguous point-light walkers. Perception 33(5): 547–60.
Viviani, P., Stucchi, N. (1989). The effect of movement velocity on form perception: geometric illusions in
dynamic displays. Percept. Psychophys. 46(3): 266–74.
Walk, R.D. and Homan, C.P. (1984). Emotion and dance in dynamic light displays. Bull. Psychon. Soc.
22: 437–40.
Wang, L. and Jiang, Y. (2012). Life motion signals lengthen perceived temporal duration. Proc. Natl. Acad.
Sci. USA 109(11): E673–E677.
Webb, J.A. and Aggarwal, J.K. (1982). Structure from motion of rigid and jointed objects. Artif. Intell.
19: 107–30.
Wiley, D.J. and Hahn, J.K. (1997). Interpolation synthesis of articulated figure motion. IEEE Computer
Graphics and Applications 17(6): 39–45.
Wertheimer, M. (1923). Laws of organization in perceptual forms. First published as Untersuchungen zur
Lehre von der Gestalt II, in Psychologische Forschung 4: 301–50.
Wolpert, D. M., Doya, K., and Kawato, M. (2003). A unifying computational framework for motor control
and social interaction. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 358(1431): 593–602.
Section 7
Perceptual organization
and other modalities
Chapter 29
Chapter overview
How does the auditory system achieve the remarkable feat of (generally correctly) decom-
posing the sound mixture into perceptual objects under the time constraints imposed by
the need to behave in a timely manner? Based on our review we will argue for two key pro-
cessing strategies; firstly, perceptual representations should be predictive (Friston 2005;
602 Denham and Winkler
Summerfield and Egner 2009), and secondly, perceptual decisions should be flexible
(Winkler et al. 2012). In this chapter, we will first consider the principles that guide the forma-
tion of links between sounds, and their separation from other sounds. Next, some of the key
experimental paradigms that have been used to investigate auditory perceptual organization are
described, and the behavioral and neural correlates of perceptual organization summarized. We
use this information to motivate our working definition of an auditory perceptual object (Kubovy
and Van Valkenburg 2001; Griffiths and Warren 2004; Winkler et al. 2009), and demonstrate the
utility of this concept for understanding auditory perceptual organization. For the purposes of
this chapter we ignore the influences of other modalities, but see Spence (this volume) for the
importance of cross-modal perceptual organization.
grouping effects of coherent correlations between some other features (e.g. frequency modula-
tions (Darwin and Sandell 1995; Lyzenga and Moore 2005) or spatial trajectories (Bőhm et al.
2012)) is lacking.
Disjoint allocation (or belongingness) refers to the principle that each element of the sensory
input is only assigned to one perceptual object. In an auditory analogy to the exclusive bor-
der assignment in Rubin’s face–vase illusion, Winkler et al. (2006) showed that a tone which
could be equally assigned to two different groups was only ever part of one of them at any
given point in time. However, while this principle often holds in auditory perception, there are
some notable violations; e.g. in duplex perception, the same sound component can contribute
to the perception of a complex sound as well as being heard separately (Rand 1974; Fowler and
Rosenblum 1990).
Finally, the principle of closure refers to the tendency of objects to be perceived as continuing
unless there is evidence for their stopping, e.g. a glide continuing through a masking noise (Miller
and Licklider 1950; Riecke et al. 2008). For example, in ‘temporal induction’ (or phonemic res-
toration), the replacement of part of a sound (speech) with noise results in the perception of the
original, unmodified, sound as well as a noise that is heard separately (Samuel 1981; Warren et al.
1988). However, temporal induction only works if the sound that is deleted is expected, as is found
for over-learnt sounds such as speech; see also Seeba and Klump (2009).
Perception as inference. This raises an important point: namely, that the key idea of a ‘Gestalt’
as a pattern implicitly carries within it the notion of predictability; i.e., parts can evoke the rep-
resentation of the whole pattern. Specifically in the case of sounds, this allows one to generate
expectations about sound events that have not yet occurred. This notion goes beyond Gestalt
theory, aligning it with the empiricist tradition of unconscious inference (Helmholtz 1885) and
perception as hypothesis formation (Gregory 1980; Feldman this volume). Indeed, whereas
Gestalt psychologists thought that grouping principles were rooted in the laws of physics, more
recent thinking (Bregman 1990) regards them as heuristics acquired through evolution and
learning. By detecting patterns (or feature regularities) in the sensory input the brain can con-
struct compressed representations that allow it to ‘explain away’ (Pearl 1988) future events and so
radically reduce the amount of sensory data needed for adequately describing the environment
(Summerfield and Egner 2009). The use of schemata (with the corresponding loss of some detail)
has long been accepted as an explanation for the nature of long-term memory (Bartlett 1932) and
seems also to be the basis for the formation of perceptual representations in general (Neisser 1967;
Hochberg 1981; Bar 2007). In accordance with these ideas, Winkler and Cowan (2005) suggested
that sound sequences are represented by feature regularities (i.e. relationships between features
that define the detected pattern) with only a few items described in full detail for anchoring the
representation.
Auditory perceptual objects as predictive representations. Based on the Gestalt principles
and ideas of perceptual inference outlined above, Winkler and colleagues (Winkler 2007; Winkler
et al. 2009; Winkler 2010) proposed a definition of auditory perceptual objects as predictive rep-
resentations, constructed on the basis of feature regularities extracted from the incoming sounds
(see also Koenderink this volume for a more general treatment of ecological Gestalts). Object
representations are persistent, and absorb expected sensory events. Object representations encode
distributions over featural and temporal patterns and can generalize appropriately with regard to
the current context. Thus in accordance with the ideas of the Gestalt psychologists, it was sug-
gested that individual sound events are processed within the context of the whole, and the con-
solidated object representation refers to patterns of sound events.
In accord with Griffiths and Warren (2004), Winkler et al. (2009) do not distinguish ‘concrete’
from ‘abstract auditory objects’, where the former refers to the physical source and the latter to the
604 Denham and Winkler
pattern of emission (Wightman and Jenison 1995; Kubovy and Van Valkenburg 2001). Thus, the
notion of an auditory perceptual object is compatible with the definition of an auditory stream, as
a coherent sequence of sounds separable from other concurrent or intermittent sounds (Bregman
1990). However, whereas the term ‘auditory stream’ refers to a phenomenological unit of sound
organization, with separability as its primary property, the definition proposed by Winkler et al.
(2009) concerns the extraction and representation of the unit as a pattern with predictable com-
ponents (Winkler et al. 2012). This definition of an auditory perceptual object is compatible
with the memory component assumed in hierarchical predictive coding theories of perception
(Friston 2005; Hohwy 2007). These theories posit that the brain acts to minimize the discrep-
ancy between its predictions and the actual sensory input (termed the error signal), and that this
occurs at many different levels of processing (e.g. Friston and Kiebel 2009). Error signals propa-
gate towards higher levels which then attempt to suppress them through refinements to internal
models. Auditory perceptual objects can be regarded as models working at intermediate levels of
this predictive coding hierarchy (Winkler and Czigler 2012).
fully disconnected from feature extraction and binding, by necessity, we will address grouping as
a separate process.
Auditory Scene Analysis. In the currently most widely accepted framework describing per-
ceptual sound organization, Auditory Scene Analysis, Bregman (1990) proposes two separable
processing stages. The first stage is suggested to be concerned with partitioning sound events
into possible streams (groups) based primarily on featural differences (e.g. spectral content, loca-
tion, timbre). The second stage, within which prior knowledge, context, and/or task demands
exert their influence, is a competitive process between candidate organizations that ultimately
determines which one is perceived. Three notable further assumptions are included in the frame-
work: (1) Initially, the brain assumes that all sounds belong to the same stream and segregat-
ing them requires evidence attesting to the probability that they originate from different sources;
(2) For sequences with repeating patterns, perception settles on a final ‘perceptual decision’ after
the evidence-gathering stage is complete; (3) Solutions that include the continuation of a previ-
ously established stream are preferred to alternatives (the ‘old+new’ strategy).
The grouping stage. Most behavioral studies have targeted the first processing stage, assessing
the effects of various cues on auditory group formation. Bregman (1990) distinguishes two classes
of grouping processes: grouping based on concurrent (spectral, instantaneous, or vertical) cues,
and grouping based on sequential (temporal, contextual, or horizontal) cues. However, although
these two classes seem intuitively to be distinct, it turns out that instantaneous cues are susceptible
to the influences of prior sequential grouping (Bendixen, Jones, et al. 2010); e.g. a harmonic can
be pulled out of a complex with which it would otherwise be grouped if there are prior examples
of that tone (Darwin et al. 1995).
So what triggers the automatic grouping and segregation of individual sound events? There
have been surprisingly few experiments addressing this question explicitly, but the gap transfer
illusion (Nakajima et al. 2000) suggests that the auditory system tends to try to match onsets to
offsets according to their temporal proximity, and that the result (which also depends on the
extent to which features at the onset and offset match; Nakajima et al. 2004) is a perceptual event,
as defined above. Since listeners reliably reported the illusory event even though they were not
trying to hear it out, these experiments provide some evidence for obligatory grouping. Another
typical example of this class of obligatory grouping is the mistuned partial phenomenon. When
one partial of a complex harmonic tone is mistuned listeners perceive two concurrent sounds,
a complex tone and a pure tone, the latter corresponding to the mistuned partial (Moore et al.
1986). However, not all features trigger concurrent grouping; e.g. common interaural time differ-
ences between a subset of frequency components within a single sound event do not generate a
similar segregation of component subsets (Culling and Summerfield 1995).
In contrast to concurrent grouping, sequential grouping is necessarily based on some repre-
sentation of the preceding sounds. Most studies of this class of grouping have used sequences of
discrete sound events, and asked two main questions: (a) How do the various stimulus param-
eters affect sequential grouping of sound events, and (b) What are the temporal dynamics of
this grouping process (for reviews, see Carlyon 2004; Haykin and Chen 2005; Snyder and Alain
2007; Ciocca 2008; Shamma et al. 2011). In the most widely used stimulus paradigm (termed
the auditory streaming paradigm), sequences of the structure ABA- (where A and B denote two
sounds (typically tones) differing in some auditory feature(s) and ‘-’ stands for a silent interval)
are presented to listeners (van Noorden 1975). When the feature separation between A and B
is small and/or they are delivered at a slow pace, listeners predominantly hear a single coher-
ent stream with a galloping rhythm (termed the integrated percept). With a large separation
between the two sounds and/or fast presentation rates, they most often experience the sequence
606 Denham and Winkler
in terms of two separated streams, one consisting only of the A tones and the other of the
B tones, with each stream having its own isochronous rhythm (termed the segregated percept).
Throughout most of the feature-separation/presentation-rate space there is a trade-off between
the two cues: smaller feature separation can be compensated with higher presentation rate, and
vice versa (van Noorden 1975).
Differences in various auditory features, including frequency, pitch, loudness, location, timbre,
and amplitude modulation, have been shown to support auditory stream segregation (Vliegen
and Oxenham 1999; Grimault et al. 2002; Roberts et al. 2002). Thus it appears that sequential
grouping is based on perceptual similarity, rather than on specific low-level auditory features
(Moore and Gockel 2002; Moore and Gockel 2012). As for the timing of the sounds, it was shown
that the critical parameter is the silent interval between consecutive tones of the same set (the
within-stream inter-stimulus interval; Bregman et al. 2000); however, see Bee and Klump (2005)
for a counter-view. Temporal structure has also been suggested as a key factor in segregating
streams either by guiding attentive grouping processes (Jones 1976; Jones et al. 1981) or through
temporal coherence between elements of the auditory input (Elhilali, Ma, et al. 2009). Finally,
contextual effects, such as the presence of additional sounds or attentional set, can bias the final
perceptual outcome, suggesting that the second-stage processes of competition consider all pos-
sible alternative groupings (Bregman 1990; Winkler, Sussman, et al. 2003). In summary, sequen-
tial grouping effects generally conform to the Gestalt principles of similarity/good continuation
and common fate.
The competition/selection stage: Multistability in auditory streaming. Although the results
of many experiments have painted a picture consistent with Bregman’s assumptions (e.g. Cusack
et al. 2004; Snyder et al. 2006), other results appear to be at odds with the notion that the audi-
tory system (a) always starts from the integrated organization, and (b) that eventually a stable
final perception is reached. When listeners are presented with ABA- (or ABAB) sequences of a
few minutes duration and are asked to report their perception in a continuous manner, it has
been found that perception fluctuates between alternative organizations in all listeners and with
all of the combinations of stimulus parameters tested (Anstis and Saida 1985; Roberts et al. 2002;
Denham and Winkler 2006; Pressnitzer and Hupe 2006; Kondo and Kashino 2009; Hill et al. 2011;
Schadwinkel and Gutschalk 2011; Kondo et al. 2012; Denham et al. 2013). Thus the perception
of these sequences appears to be bi- or multistable (Schwartz et al. 2012), similar to some other
auditory (Wessel 1979) and visual stimulus configurations (e.g. Leopold and Logothetis 1999;
Alais and Blake this volume). Furthermore, segregated and integrated percepts are not the only
ones that listeners experience in response to ABA- sequences (Bendixen, Denham, et al. 2010,
Bendixen et al. 2013, Bőhm et al. 2013, Denham et al. 2013, Szalárdy et al. 2013), and, with stimu-
lus parameters strongly promoting the segregated organization, participants often report segrega-
tion first (Deike et al. 2012; Denham et al. 2013). It has also been found that the first experienced
perceptual organization is more strongly determined by stimulus parameters than those experi-
enced later (Denham et al. 2013).
Finally, higher-order cues, such as regularities embedded separately within the A and B streams,
promote perception of the segregated organization (Jones et al., 1981; Drake et al., 2000; Devergie
et al., 2010; Andreou et al., 2011; Rimmele et al., 2012; Rajendran et al., 2013), probably by extend-
ing the duration of the phases (continuous intervals with the same percept) during which lis-
teners experience the segregated percept, while they do not affect the duration of the phases of
the integrated percept (Bendixen, Denham, et al. 2010; Bendixen et al. 2013). This suggests that
predictability (closure in terms of the Gestalt principles) also plays into the competition between
alternative sound organizations, although differently from cues based on the rate of perceptual
Auditory Perceptual Organization 607
change (similarity/good continuation and common fate). Closure in auditory perceptual organ-
ization may therefore be seen to resonate with Koffka’s early intuition as acting not so much as a
low-level grouping cue but rather as something that helps to determine the final perceptual form
(Wagemans et al. 2012). Just as closure in vision allows the transformation of a 1D contor into a
2D shape (Elder and Zucker 1993), so the discovery of a predictable temporal pattern transforms
a sequential series of unrelated sounds into a distinctive motif.
In contrast to the laboratory findings of multistable perception, everyday experience tells us
that we perceive the world in a stable, continuous manner. We may find that initially we are not
able to distinguish individual sound sources when suddenly confronted with a new auditory
scene, such as entering a noisy classroom or stepping out onto a busy street. But generally within
a few seconds, we are able to differentiate them, especially sounds that are relevant to our task.
This experience is well captured by Bregman’s assumptions of initial integration and subsequent
settling on a stable segregated organization. In support of these assumptions, when averaging
over the reports of different listeners, it is generally found that within the initial 5–15 s of an ABA-
sequence, the probability of reporting segregation monotonically increases (termed the build-up
of auditory streaming) (but see Deike et al. 2012), and the incidence of a break during this early
period, or directing attention away from the sounds, causes a reset (i.e. a return to integration
followed by a gradual increase in the likelihood of segregation; Cusack et al. 2004). So, should we
disregard the perceptual multistability observed in the auditory streaming paradigm as simply a
consequence of the artificial stimulation protocol used? We suggest not. Illusions and artificially
constructed stimulus configurations have played an important role in the study of perception (e.g.
as the main method of Gestalt psychology), because they provide insights into the machinery of
perception. In the following, we provide a description of auditory perceptual organization based
on insights gained from multistable phenomena.
Winkler et al. (2012) suggested that one should consider sound organization in the brain in
terms of the continuous discovery of proto-objects (alternative groupings) and ongoing com-
petition between them. Continuous discovery and competition are well suited to the everyday
demands on auditory perceptual organization in a changing world. Proto-objects (Rensink 2000)
are the candidate set of representations that have the potential to emerge as the perceptual objects
of conscious awareness (Mill et al. 2013). Within this framework, proto-objects represent patterns
which have been discovered embedded within the incoming sequence of sounds; they are con-
structed by linking sound events and recognizing when a previously discovered sequence recurs
and can thus be used to predict future events. In a new sound scene, the proto-object that is easiest
to discover determines the initial percept. Since the time needed for discovering a proto-object
depends largely on the stimulus parameters (i.e., to what extent successive sound events satisfy/
violate the similarity/good continuation principle), the first percept strongly depends on stimulus
parameters. However, the duration of the first perceptual phase is independent of the percept
(Hupe and Pressnitzer 2012), since it depends on how long it takes for other proto-objects to be
discovered (Winkler et al. 2012).
Once alternative organizations have been discovered they start competing with each other.
Competition between organizations is dynamic both because proto-objects are discovered on the
fly, and may come and go, and because their strength, which determines which of them becomes
dominant at a given time, is probably affected by dynamic factors, such as how often they success-
fully predict upcoming sound events (cf. predictive coding theories (Friston 2005) and Bregman’s
‘old+new’ heuristic (Bregman 1990)), adaptation, and noise (Mill et al. 2013). The latter two influ-
ences are also often assumed in computational models of bi-stable visual perceptual phenomena
(e.g. Shpiro et al. 2009; van Ee 2009); adaptation ensures the observed inevitability of perceptual
608 Denham and Winkler
switching (the dominant percept cannot remain dominant forever), and noise accounts for the
observed stochasticity in perceptual switching (successive phase durations are largely uncor-
related, and the distribution of phase durations resembles a gamma distribution) (Levelt 1968;
Leopold and Logothetis 1999). Generalizing the two-stage account of perceptual organization
proposed by Bregman (1990) to two concurrent stages which operate continuously and in par-
allel, the first consisting of the discovery of predictive representations (proto-objects), and the
second, competition for dominance between proto-objects, results in a theoretical and compu-
tational framework that explains a wide set of experimental findings (Winkler et al. 2012; Mill
et al. 2013). For example, perceptual switching, first-phase choice and duration, and differences
between the first and subsequent perceptual phases can all be explained within this framework.
It also accounts for the different influences of similarity and closure on perception; the rate of
perceptual change (similarity/good continuation) determines how easy it is to form links between
the events that make up a proto-object, while predictability (closure) does not affect the discovery
of proto-objects, but can increase the competitiveness (salience) of a proto-object once it has been
discovered (Bendixen, Denham, et al. 2010).
Perceptual organization. Up to this point we have used the term ‘sound organization’ in a gen-
eral sense. Now we consider it in a narrower sense. The two sound organizations most commonly
(but not exclusively) appearing in the ABA- paradigm are integration and segregation. Whereas
the integrated percept is fully specified, there are in fact two possible segregated percepts: one may
hear the A sounds in the foreground and the Bs in the background, or vice versa. It is compara-
tively easy to switch between these two variants of the segregated percept (since we are aware of
both of them at the same time), while it is more difficult to voluntarily switch between segregation
and integration (as we are not simultaneously aware of both these organizations, i.e. we don’t hear
the integrated galloping rhythm while we experience the sequence in terms of two streams). In
essence, a specific sound organization corresponds to a set of possible perceptual experiences,
which are, in Bregman’s terms, compatible with each other, while perceptual experiences which are
mutually exclusive belong to different sound organizations.
What determines compatibility? Winkler et al. (2012) suggested that two (or more) proto-objects
are compatible if they never predict the same sound event (i.e. they have no common element—cf.
the Gestalt principle of disjoint allocation), and considered three possible ways in which competi-
tion may be implemented in order to account for perceptual experience. The first possibility they
considered is that compatibility is explicitly extracted and organizations are formed during the first
processing stage. This leads to the assumption of hierarchical competition, one between organiza-
tions, and another within each organization that includes multiple proto-objects. The second pos-
sibility is a foreground–background solution. In this case all proto-objects compete directly with
each other and once a dominant one emerges, all remaining sounds are grouped together into a
background representation. Results showing no clear separation of sounds in the background are
compatible with this solution (Brochard et al. 1999; Sussman et al. 2005). However, other stud-
ies suggest that the background is not always undifferentiated (Winkler, Teder-Salejarvi, et al.
2003). A third possibility is that proto-objects only compete with each other when they predict
the same sound event (collide). In this case organizations emerge because of the simultaneous
dominance of proto-objects that never collide with each other, and their alternation with other
compatible sets with which they do collide; i.e. when one proto-object becomes dominant in
the ongoing competition, others with which it doesn’t collide will also become strong, while all
proto-objects with which this set does collide are suppressed. Noise and adaptation ensure that at
some point a switch will occur to one of the suppressed proto-objects and the cycle will continue.
A computational model that demonstrates the viability of this solution for modeling perceptual
Auditory Perceptual Organization 609
experience in the ABA- paradigm has recently been developed (Mill et al. 2013). The assumption
that the perceptual organization of sounds is based on continuous competition between predic-
tive proto-objects leads to a system that is flexible, because alternative proto-objects are available
all the time, ready to emerge into perceptual awareness when they prove to be the best predictors
of the auditory input. The system is also stable and robust, because it does not need to reassess all
of its representations with the arrival of a new sound source in the scene, or in the event of tempo-
rary disturbances (such as a short loss of input, or during attentional switching between objects).
Secondly, finding that segregation can be reported first contradicts the assumption of integration
as default (see The competition/selection stage section). Thirdly, it has been shown that while a
similar distinct clustering of neural responses can be found when the A and B tones are overlap-
ping in time, in this case, listeners report hearing an integrated pattern (Elhilali, Ma, et al. 2009).
So, while differential suppression may be necessary, it is not a sufficient condition for segregation.
Event-related potential correlates of sound organization. Auditory event-related brain poten-
tials (AERPs) represent the synchronized activity of large neuronal populations, time-locked to
some auditory event. Because they can be recorded non-invasively from the human scalp, one
can use them to study the brain responses accompanying perceptual phenomena, such as audi-
tory stream segregation. An AERP correlate of concurrent sound organization is found when a
partial of a complex tone is mistuned, giving rise to the perception of two concurrent sounds (see
The grouping stage section); a negative wave peaking at about 180 milliseconds after stimulus
onset, whose amplitude increases with the degree of mistuning, is elicited (Alain, Arnott et al.
2001). This AERP component, termed the ‘object-related negativity’ (ORN), is proposed to signal
the automatic segregation of concurrent auditory objects (Alain et al. 2002). An AERP correlate
of sequential sound organization was found in an experiment showing that the amplitude of two
early sensory AERP components, the auditory P1 and N1, vary depending on whether the same
sounds are perceived as part of an integrated or segregated organization (Gutschalk et al. 2005;
Szalárdy et al. 2013).
Another electrophysiological measure that has been extensively used to probe sequential per-
ceptual organization is the Mismatch Negativity (MMN); for recent reviews see (Winkler 2007;
Näätänen et al. 2011). MMN is elicited by sounds that violate some regular auditory feature of
the preceding sound sequence; therefore, it can be used to probe what auditory regularities are
encoded in the brain. By setting up stimulus configurations which result in different regularities
depending on how the sounds are organized, MMN can be used as an indirect index of auditory
stream segregation. The first studies using MMN in this way (Sussman et al. 1999; Nager et al.
2003; Winkler, Sussman, et al. 2003) showed that the elicitation of MMN can be made dependent
on sound organization, and furthermore, that MMN is only elicited by violations of regularities
characterizing the stream to which a sound belongs, but not by violating the regularities of some
other parallel sound stream (Ritter et al. 2000; Winkler et al. 2006). These observations allowed a
number of issues, not easily accessible to behavioral methods, to be addressed. Here we highlight
three important questions: interactions between concurrent and sequential perceptual organiza-
tion, evidence for the existence of two stages in sound organization, and the role of attention in
forming and maintaining auditory stream segregation.
In a study delivering sequences of harmonic complexes in which the probability of a mistuned
component was manipulated, it was found that the ORN was reliably elicited by mistuning in
all conditions, but its magnitude increased with decreasing probability of occurrence (Bendixen,
Jones, et al. 2010). This was interpreted as being a heightened response towards the onset of a
possible new auditory object. The additional finding that a positive AERP component, the P3a,
usually associated with involuntary attentional switching (Escera et al. 2000), was elicited by mis-
tuned sounds in the low mistuning probability condition but not by tuned sounds in the high
mistuning probability condition, suggested that the auditory system is primarily interested in the
onset of new sound sources rather than their disappearance (Dyson and Alain 2008; Bendixen,
Jones, et al. 2010); a view further supported by results obtained in a different behavioral paradigm
(Cervantes Constantino et al. 2012).
It has been shown that the early (<100 ms) AERP correlates of auditory stream segregation,
the P1 and N1 components, are governed by the acoustic parameters (Winkler et al. 2005,
Auditory Perceptual Organization 611
Snyder et al. 2006), whereas later (>120 ms) responses (N2) correlate with perceptual experience
(Winkler et al. 2005, Szalárdy et al. 2013). Furthermore, the amplitude of the later AERP response
correlates with the probability of reporting segregation (the build-up of streams) and it is aug-
mented by attention (Snyder et al. 2006). These results suggest that the initial grouping, which
precedes temporal integration between sound events (Yabe et al. 2001; Sussman 2005), is mainly
stimulus-driven, whereas later occurring perceptual decisions are susceptible to top-down modu-
lation, a view compatible with Bregman’s theoretical framework.
Whereas most accounts of auditory streaming assume that perceptual similarity affects group-
ing through automatic grouping processes, Jones et al. (1978) suggested that segregation results
from a failure to rapidly shift attention between perceptually dissimilar items in a sequence. The
literature is divided on the role of attention in auditory stream segregation. Some electrophysi-
ological studies suggested that auditory stream segregation can occur in the absence of focused
attention (Winkler, Sussman, et al. 2003; Winkler, Teder-Salejarvi, et al. 2003; Sussman et al.
2007). In contrast, results of some behavioral and AERP studies suggest that attention may at least
be needed for the initial formation of streams (Cusack et al. 2004; Snyder et al. 2006); however,
see Sussman et al. (2007). How can attention affect sound organization? Snyder et al. (2012) argue
for an attentional ‘gain model’ in which the representation of attended sounds is enhanced, while
unattended ones are suppressed. Due to the short latency of the observed gain modulation they
suggested that attention operates both on the group formation phase of segregation as well as the
later selection phase (Bregman 1990). However, attention can also have other effects on sound
organization; attention can retune and sharpen representations in order to improve the segrega-
tion of signals from noise (Ahveninen et al. 2011), attention to a stream improves the phase lock-
ing of neural responses to the attended sounds (Elhilali, Xiang, et al. 2009), attention allows the
utilization of learned (non-primitive) grouping algorithms thus providing additional processing
capacities (Lavie et al. 2004); and, attention can bias the competition between alternative sound
organizations (as found in the visual system; Desimone 1998). Which of these are most relevant
to auditory perceptual organization has yet to be established.
The neuroscience view of auditory objects. ‘. . . in neuroscientific terms, the concepts of an
object and of object analysis can be regarded as inseparable’ (Griffiths and Warren 2004: 887).
Thus, neuroscientific descriptions of auditory perceptual objects focus on the processes involved
in forming and maintaining object representations. The detection and representation of regulari-
ties by the brain, as indexed by the MMN, has been used to establish a functional definition of an
auditory object (Winkler et al. 2009). Using evidence from a series of MMN studies, Winkler et al.
(2009) proposed that an auditory object is a perceptual representation of a possible sound source,
derived from regularities in the sensory input (Näätänen et al. 2001) that has temporal persistence
(Winkler and Cowan 2005) and can link events separated in time (Näätänen and Winkler 1999).
This representation forms a separable unit (Winkler et al. 2006) that generalizes across natural
variations in the sounds (Winkler, Teder-Salejarvi, et al. 2003) and generates expectations of parts
of the object not yet available (Bendixen et al. 2009).
Evidence for the representation of auditory objects in cortex, consistent with this defini-
tion, is found in fMRI (Hill et al., 2011; Schadwinkel and Gutschalk 2011), and in MEG and
multi-electrode surface recording studies of people listening to two competing talkers (Ding
and Simon 2012; Mesgarani and Chang 2012). By decoding MEG signals correlated with the
amplitude fluctuations of each of the speech signals it was shown that the brain preferentially
locks onto the temporal patterns of the attended talker, and that this representation adapts
to the sound level of the attended talker and not the interfering one (Ding and Simon 2012).
Multi-electrode recordings in non-primary auditory cortex similarly show that the brain locks
612 Denham and Winkler
onto critical features in the attended speech stream, and furthermore that a simple classifier built
from a set of linear filters can be used to decode both the attended speaker and the words being
uttered (Mesgarani and Chang 2012). Other experiments showing that context-dependent pre-
dictive activity in the hippocampus encoded temporal relationships between events and corre-
lated with subsequent recall of episodes (Paz et al. 2010), suggest that the hippocampus may also
be involved; although this work used multisensory cinematic material so it is not clear whether
the finding hold for sounds alone.
While traditional psychological accounts implicitly or explicitly refer to representations
of objects, there are models of auditory streaming and perception in general, which are not
concerned with positing a representation that would directly correspond to the contents of
conscious perception; we have already referred to two such theories. Although hierarchical
predictive coding (e.g. Friston and Kiebel 2009) includes predictive memory representations,
which are in many ways compatible with the notion of auditory object representations (Winkler
and Czigler 2012), no explicit connection with object representations is made. Indeed, whereas
predictive coding models have been successful in matching the statistics of perceptual decisions
(Lee and Mumford 2003; Aoyama et al. 2006; Yu 2007; Garrido et al. 2009; Daunizeau et al.
2010), they are better suited to describing the neural responses observed during perception
(Grill-Spector et al. 2006), than perceptual experience per se. Shamma and colleagues’ temporal
coherence model of auditory stream segregation (Elhilali and Shamma 2008; Elhilali, Ma, et al.
2009; Shamma et al. 2011) provides another way to avoid the assumption that object represen-
tations are necessary for sound organization; instead it is proposed that objects are essentially
whatever occupies the perceptual foreground and exist only insofar as they do occupy the fore-
ground. Temporal coherence can be calculated using relatively short time windows without
building a description of the past stimulation. Thus auditory streams can be separated in a
single pass. It is also claimed that object formation (binding) occurs late, i.e. the composite mul-
tifeatured percept of conscious awareness is formed through selective attention to some feature
that causes all features correlated with the attended feature to emerge together into perceptual
awareness (and thus form a perceptual object), while the background remains undifferentiated
(Shamma et al. 2011). In summary, there is currently little consensus on the role of auditory
object representations in perceptual organization and the importance placed on object repre-
sentations by the various models differs markedly.
object representation (Winkler et al. 2009) are compatible (Winkler and Czigler 2012) although
they have somewhat different aims. However, as of yet, there have been few attempts to face up to
the complexity of real auditory scenes in which grouping and categorization cues are not imme-
diately available; but see Yildiz and Kiebel (2011).
Progress may come from building bridges between competing theories. The instantiation of the
principle of common fate in the form of temporal coherence (Shamma et al. 2011) suggests a basis
for linking features and possibly events within a proto-object. Due to its generic nature, temporal
coherence as a cue is not limited to discrete well-defined sound events and can thus help to gener-
alize models that rely on such. The suggestion of a hierarchical decomposition of the sound world
into objects which are differentiated by attention and task demands, while others remain rather
more amorphous (Cusack and Carlyon 2003), can also be accommodated within the framework
of predictive object representations. The patterns or regularities encoded by proto-objects rep-
resent distributions over featural and temporal structures. Thus it is entirely feasible for some
proto-objects to represent well-differentiated and separated patterns, such as the voice of the
person to whom one is talking, while others may represent the undifferentiated combination of
background sounds, such as the background babble at a cocktail party (Cherry 1953). Finally,
decomposing complex sounds and finding events in long continuous sounds (Coath and Denham
2007; Yildiz and Kiebel 2011) may feed into models concerned with grouping events into auditory
object representations.
We started out by highlighting the two questions that the auditory system needs to answer: ‘What
is out there?’ and ‘What will it do next?’. In this chapter, we outlined the main approaches currently
being pursued to provide insights into how the human auditory system answers these questions
quickly and accurately under a variety of conditions, which can dramatically affect the cues that
are available. We suggest that in order to deliver robust performance within a changing world,
the human brain builds auditory object representations that are predictive of upcoming events,
and uses these in the formation of perceptual organizations that represent its interpretation of the
world. Flexible switching between candidate organizations ensures that the system can explore
alternative interpretations, and revise its perceptual decisions in the light of further information.
However, there is much that remains to be understood and current models are far from matching
the capabilities of human auditory perception. Perhaps, as outlined above, convergence between
the alternative approaches will provide a more satisfactory account of the processes underlying
auditory perceptual organization.
Acknowledgements
This work was supported in part by the Lendület project awarded to István Winkler by the
Hungarian Academy of Sciences (contract number LP2012-36/2012).
References
Ahveninen, J., M. Hamalainen, I. P. Jaaskelainen, S. P. Ahlfors, S. Huang, F. H. Lin, T. Raij, M. Sams, C.
E. Vasios, and J. W. Belliveau (2011). ‘Attention-Driven Auditory Cortex Short-Term Plasticity Helps
Segregate Relevant Sounds from Noise’. Proc Natl Acad Sci USA 108(10): 4182–4187.
Alain, C., S. R. Arnott, and T. W. Picton (2001). ‘Bottom-Up and Top-Down Influences on Auditory
Scene Analysis: Evidence from Event-Related Brain Potentials’. J Exp Psychol Hum Percept Perform
27(5): 1072–1089.
Alain, C., B. M. Schuler, and K. L. McDonald (2002). ‘Neural Activity Associated with Distinguishing
Concurrent Auditory Objects’. J Acoust Soc Am 111(2): 990–995.
614 Denham and Winkler
Alais, D. and R. Blake (this volume). ‘Multistability and Binocular Rivalry’. In The Oxford Handbook of
Perceptual Organization, ed. J. Wagemans (Oxford: Oxford University Press).
Anderson, L. A., G. B. Christianson, and J. F. Linden (2009). ‘Stimulus-Specific Adaptation Occurs in the
Auditory Thalamus’. J Neurosci 29(22): 7359–7363.
Andreou, L.-V., M. Kashino, and M. Chait (2011). ‘The Role of Temporal Regularity in Auditory
Segregation’. Hear Res 280(1–2): 228–235.
Anstis, S. and S. Saida (1985). ‘Adaptation to Auditory Streaming of Frequency-Modulated Tones’. J Exp
Psychol Hum Percept Perform 11: 257–271.
Aoyama, A., H. Endo, S. Honda, and T. Takeda (2006). ‘Modulation of Early Auditory Processing by
Visually Based Sound Prediction’. Brain Res 1068(1): 194–204.
Bar, M. (2007). ‘The Proactive Brain: Using Analogies and Associations to Generate Predictions’. Trends
Cogn Sci 11(7): 280–289.
Bar-Yosef, O., Y. Rotman, and I. Nelken (2002). ‘Responses of Neurons in Cat Primary Auditory Cortex to
Bird Chirps: Effects of Temporal and Spectral Context’. J Neurosci 22(19): 8619–8632.
Bartlett, F. C. (1932). Remembering: A Study in Experimental and Social Psychology (Cambridge: Cambridge
University Press).
Bee, M. A. and G. M. Klump (2005). ‘Auditory Stream Segregation in the Songbird Forebrain: Effects of
Time Intervals on Responses to Interleaved Tone Sequences’. Brain Behav Evol 66(3): 197–214.
Bee, M. A., C. Micheyl, A. J. Oxenham, and G. M. Klump (2010). ‘Neural Adaptation to Tone Sequences in
the Songbird Forebrain: Patterns, Determinants, and Relation to the Build-Up of Auditory Streaming’.
J Comp Physiol A Neuroethol Sens Neural Behav Physiol 196(8): 543–557.
Bendixen, A., E. Schröger, and I. Winkler (2009). ‘I Heard That Coming: Event-Related Potential Evidence
for Stimulus-Driven Prediction in the Auditory System’. J Neurosci 29(26): 8447–8451.
Bendixen, A., S. L. Denham, K. Gyimesi, and I. Winkler (2010). ‘Regular Patterns Stabilize Auditory
Streams’. J Acoust Soc Am 128(6): 3658–3666.
Bendixen, A., S. J. Jones, G. Klump, and I. Winkler (2010). ‘Probability Dependence and Functional
Separation of the Object-Related and Mismatch Negativity Event-Related Potential Components’.
Neuroimage 50(1): 285–290.
Bendixen, A., T. M. Bőhm, O. Szalárdy, R. Mill, S. L. Denham, and I. Winkler (2012). ‘Different Roles of
Similarity and Predictability in Auditory Stream Segregation’. J Learning & Perception in press.
Bertrand, O. and C. Tallon-Baudry (2000). ‘Oscillatory Gamma Activity in Humans: A Possible Role for
Object Representation’. Int J Psychophysiol 38(3): 211–223.
Bőhm, T. M., L. Shestopalova, A. Bendixen, A. G. Andreou, J. Georgiou, G. Garreau, P. Pouliquen, A.
Cassidy, S. L. Denham, and I. Winkler (2013). ‘The Role of Perceived Source Location in Auditory
Stream Segregation: Separation Affects Sound Organization, Common Fate Does Not’. Learn Percept
5(Suppl 2): 55–72.
Bregman, A. S. and G. Dannenbring (1973). ‘The Effect of Continuity on Auditory Stream Segregation’.
Percept Psychophys 13: 308–312.
Bregman, A. S. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound (Cambridge,
MA: MIT Press).
Bregman, A. S., P. A. Ahad, P. A. Crum, and J. O’Reilly (2000). ‘Effects of Time Intervals and Tone
Durations on Auditory Stream Segregation’. Percept Psychophys 62(3): 626–636.
Brochard, R., C. Drake, M. C. Botte, and S. McAdams (1999). ‘Perceptual Organization of Complex
Auditory Sequences: Effect of Number of Simultaneous Subsequences and Frequency Separation’. J Exp
Psychol Hum Percept Perform 25(6): 1742–1759.
Brunswik, E. (1955). ‘Representative Design and Probabilistic Theory in a Functional Psychology’.
Psychological Review 62(3): 193–217.
Carlyon, R. P. (2004). ‘How the Brain Separates Sounds.’ Trends Cogn Sci 8(10): 465–471.
Auditory Perceptual Organization 615
Cervantes Constantino, F., L. Pinggera, S. Paranamana, M. Kashino, and M. Chait (2012). ‘Detection of
Appearing and Disappearing Objects in Complex Acoustic Scenes.’ PLoS One 7(9): e46167.
Cherry, E. C. (1953). ‘Some Experiments on the Recognition of Speech, with One and with Two Ears’.
J Acoust Soc Am 25(5): 975–979.
Ciocca, V. (2008). ‘The Auditory Organization of Complex Sounds’. Front Biosci 13: 148–169.
Coath, M. and S. L. Denham (2007). ‘The Role of Transients in Auditory Processing’. Biosystems
89(1–3): 182–189.
Culling, J. F. and Q. Summerfield (1995). ‘Perceptual Separation of Concurrent Speech Sounds: Absence of
Across-Frequency Grouping by Common Interaural Delay’. J Acoust Soc Am 98(2, Pt 1): 785–797.
Cusack, R. and R. P. Carlyon (2003). ‘Perceptual Asymmetries in Audition’. J Exp Psychol Hum Percept
Perform 29(3): 713–725.
Cusack, R., J. Deeks, G. Aikman, and R. P. Carlyon (2004). ‘Effects of Location, Frequency Region, and
Time Course of Selective Attention on Auditory Scene Analysis’. J Exp Psychol Hum Percept Perform
30(4): 643–656.
Darwin, C. J. and R. P. Carlyon (1995). Auditory Grouping. In The Handbook of Perception and Cognition,
vol. 6: Hearing, ed. B. C. J. Moore, pp. 387–424 (London: Academic Press).
Darwin, C. J., R. W. Hukin, and B. Y. al-Khatib (1995). ‘Grouping in Pitch Perception: Evidence for
Sequential Constraints’. J Acoust Soc Am 98(2, Pt 1): 880–885.
Darwin, C. J. and G. J. Sandell (1995). ‘Absence of Effect of Coherent Frequency Modulation on Grouping a
Mistuned Harmonic with a Vowel’. J Acoust Soc Am 97(5, Pt 1): 3135–3138.
Daunizeau, J., H. E. den Ouden, M. Pessiglione, S. J. Kiebel, K. E. Stephan, and K. J. Friston (2010).
‘Observing the Observer (I): Meta-Bayesian Models of Learning and Decision-Making’. PLoS One
5(12): e15554.
Deike, S., P. Heil, M. Böckmann-Barthel, and A. Brechmann (2012). ‘The Build-Up of Auditory Stream
Segregation: A Different Perspective’. Frontiers in Psychology 3: 461.
Denham, S. L. and I. Winkler (2006). ‘The Role of Predictive Models in the Formation of Auditory Streams’.
J Physiol Paris 100(1–3): 154–170.
Denham, S. L., K. Gymesi, G. Stefanics, and I. Winkler (2013). ‘Multistability in Auditory Stream
Segregation: The Role of Stimulus Features in Perceptual Organisation’. Learn Percept 5(Suppl 2): 55–72.
Desimone, R. (1998). ‘Visual Attention Mediated by Biased Competition in Extrastriate Visual Cortex’.
Philos Trans R Soc Lond B Biol Sci 353(1373): 1245–1255.
Devergie, A., N. Grimault, B. Tillmann, and F. Berthommier (2010). ‘Effect of Rhythmic Attention on the
Segregation of Interleaved Melodies’. J Acoust Soc Am 128(1): EL1–EL7.
Ding, N. and J. Z. Simon (2012). ‘Emergence of Neural Encoding of Auditory Objects while Listening to
Competing Speakers’. Proc Natl Acad Sci USA 109(29): 11854–11859.
Drake, C., M. R. Jones, and C. Baruch (2000). ‘The Development of Rhythmic Attending in Auditory
Sequences: Attunement, Referent Period, Focal Attending’. Cogn 77(3): 251–288.
Dyson, B. J. and C. Alain (2008). ‘Is a Change as Good with a Rest? Task-Dependent Effects of Inter-trial
Contingency on Concurrent Sound Segregation’. Brain Res 1189: 135–144.
Elder, J. and S. Zucker (1993). ‘The Effect of Contour Closure on the Rapid Discrimination of
Two-Dimensional Shapes’. Vision Res 33(7): 981–991.
Elhilali, M. and S. A. Shamma (2008). ‘A Cocktail Party with a Cortical Twist: How Cortical Mechanisms
Contribute to Sound Segregation’. J Acoust Soc Am 124(6): 3751–3771.
Elhilali, M., L. Ma, C. Micheyl, A. J. Oxenham, and S. A. Shamma (2009). ‘Temporal Coherence in the
Perceptual Organization and Cortical Representation of Auditory Scenes’. Neuron 61(2): 317–329.
Elhilali, M., J. Xiang, S. A. Shamma, and J. Z. Simon (2009). ‘Interaction between Attention and
Bottom-Up Saliency Mediates the Representation of Foreground and Background in an Auditory Scene’.
PLoS Biol 7(6): e1000129.
616 Denham and Winkler
Escera, C., K. Alho, E. Schroger, and I. Winkler (2000). ‘Involuntary Attention and Distractibility as
Evaluated with Event-Related Brain Potentials’. Audiol Neurootol 5(3–4): 151–166.
Feldman, J. (this volume). In The Oxford Handbook of Perceptual Organization, ed. J. Wagemans
(Oxford: Oxford University Press).
Fishman, Y. I., J. C. Arezzo, and M. Steinschneider (2004). ‘Auditory Stream Segregation in Monkey
Auditory Cortex: Effects of Frequency Separation, Presentation Rate, and Tone Duration’. J Acoust Soc
Am 116(3): 1656–1670.
Fowler, C. A. and L. D. Rosenblum (1990). ‘Duplex Perception: A Comparison of Monosyllables and
Slamming Doors’. J Exp Psychol Hum Percept Perform 16(4): 742–754.
Friston, K. (2005). ‘A Theory of Cortical Responses’. Philos Trans R Soc Lond B Biol Sci 360(1456): 815–836.
Friston, K. and S. Kiebel (2009). ‘Predictive Coding under the Free-Energy Principle’. Philos Trans R Soc
Lond B Biol Sci 364(1521): 1211–1221.
Garrido, M. I., J. M. Kilner, K. E. Stephan, and K. J. Friston (2009). ‘The Mismatch Negativity: A Review of
Underlying Mechanisms’. Clin Neurophysiol 120(3): 453–463.
Gibson, J. J. (1979). The Ecological Approach to Visual Perception (Boston: Houghton Mifflin).
Gregory, R. L. (1980). ‘Perceptions as Hypotheses’. Philos Trans R Soc Lond B Biol Sci 290(1038): 181–197.
Griffiths, T. D. and J. D. Warren (2004). ‘What is an Auditory Object?’ Nat Rev Neurosci 5(11): 887–892.
Grill-Spector, K., R. Henson, and A. Martin (2006). ‘Repetition and the Brain: Neural Models of
Stimulus-Specific Effects’. Trends Cogn Sci 10(1): 14–23.
Grimault, N., S. P. Bacon, and C. Micheyl (2002). ‘Auditory Stream Segregation on the Basis of
Amplitude-Modulation Rate’. J Acoust Soc Am 111(3): 1340–1348.
Gutschalk, A., C. Micheyl, J. R. Melcher, A. Rupp, M. Scherg, and A. J. Oxenham (2005). ‘Neuromagnetic
Correlates of Streaming in Human Auditory Cortex’. J Neurosci 25(22): 5382–5388.
Haykin, S. and Z. Chen (2005). ‘The Cocktail Party Problem’. Neural Comput 17(9): 1875–1902.
Hill, K. T., C. W. Bishop, D. Yadav, and L. M. Miller (2011). ‘Pattern of BOLD Signal in Auditory Cortex
Relates Acoustic Response to Perceptual Streaming’. BMC Neurosci 12: 85.
Hochberg, J. (1981). ‘Levels of Perceptual Organization’. In Perceptual Organization, ed. M. K. J. Pomerantz,
pp. 255–278 (Hillsdale, NJ: Erlbaum).
Hohwy, J. (2007). ‘Functional Integration and the Mind’. Synthese 159: 315–328.
Hupe, J. M. and D. Pressnitzer (2012). ‘The Initial Phase of Auditory and Visual Scene Analysis’. Philos
Trans R Soc Lond B Biol Sci 367(1591): 942–953.
Jones, M. R. (1976). ‘Time, our Lost Dimension: Toward a New Theory of Perception, Attention, and
Memory’. Psychological Review 83: 323–355.
Jones, M. R., D. J. Maser, and G. R. Kidd (1978). ‘Rate and Structure in Memory for Auditory Patterns’.
Memory & Cognition 6: 246–258.
Jones, M. R., G. Kidd, and R. Wetzel (1981). ‘Evidence for Rhythmic Attention’. J Exp Psychol Hum Percept
Perform 7: 1059–1073.
Koenderink, J. (this volume). Gestalts as Ecological Templates. In The Oxford Handbook of Perceptual
Organization, ed. J. Wagemans (Oxford: Oxford University Press).
Köhler, W. (1947). Gestalt Psychology: An Introduction to New Concepts in Modern Psychology
(New York: Liveright Publishing Corporation).
Kondo, H. M. and M. Kashino (2009). ‘Involvement of the Thalamocortical Loop in the Spontaneous
Switching of Percepts in Auditory Streaming’. J Neurosci 29(40): 12695–12701.
Kondo, H. M., N. Kitagawa, M. S. Kitamura, A. Koizumi, M. Nomura, and M. Kashino (2012).
‘Separability and Commonality of Auditory and Visual Bistable Perception’. Cer Cort 22(8): 1915–1922.
Kubovy, M. and D. Van Valkenburg (2001). ‘Auditory and Visual Objects’. Cognition 80(1–2): 97–126.
Auditory Perceptual Organization 617
Lavie, N., A. Hirst, J. W. de Fockert, and E. Viding (2004). ‘Load Theory of Selective Attention and
Cognitive Control’. J Exp Psychol Gen 133(3): 339–354.
Lee, T. S. and D. Mumford (2003). ‘Hierarchical Bayesian Inference in the Visual Cortex’. J Opt Soc Am
A Opt Image Sci Vis 20(7): 1434–1448.
Leopold, D. A. and N. K. Logothetis (1999). ‘Multistable Phenomena: Changing Views in Perception’.
Trends Cogn Sci 3(7): 254–264.
Levelt, W. J. M. (1968). On Binocular Rivalry (Paris: Mouton).
Lyzenga, J. and B. C. Moore (2005). ‘Effect of Frequency-Modulation Coherence for Inharmonic
Stimuli: Frequency-Modulation Phase Discrimination and Identification of Artificial Double Vowels’.
J Acoust Soc Am 117(3, Pt 1): 1314–1325.
Malmierca, M. S., S. Cristaudo, D. Perez-Gonzalez, and E. Covey (2009). ‘Stimulus-Specific Adaptation in
the Inferior Colliculus of the Anesthetized Rat’. J Neurosci 29(17): 5483–5493.
Mesgarani, N. and E. F. Chang (2012). ‘Selective Cortical Representation of Attended Speaker in
Multi-talker Speech Perception’. Nature 485(7397): 233–236.
Micheyl, C., B. Tian, R. P. Carlyon, and J. P. Rauschecker (2005). ‘Perceptual Organization of Tone
Sequences in the Auditory Cortex of Awake Macaques’. Neuron 48(1): 139–148.
Micheyl, C., R. P. Carlyon, A. Gutschalk, J. R. Melcher, A. J. Oxenham, J. P. Rauschecker, B. Tian, and E.
Courtenay Wilson (2007). ‘The Role of Auditory Cortex in the Formation of Auditory Streams’. Hear
Res 229(1–2): 116–131.
Mill, R., T. Bőhm, A. Bendixen, I. Winkler, and S. L. Denham (2013). ‘Competition and Cooperation
between Fragmentary Event Predictors in a Model of Auditory Scene Analysis’. PLoS Comput Biol in press.
Miller, G. A. and J. C. R. Licklider (1950). ‘The Intelligibility of Interrupted Speech’. J Acoust Soc Am
22: 167–173.
Moore, B. C., B. R. Glasberg, and R. W. Peters (1986). ‘Thresholds for Hearing Mistuned Partials as
Separate Tones in Harmonic Complexes’. J Acoust Soc Am 80(2): 479–483.
Moore, B. C. J. and H. E. Gockel (2002). ‘Factors Influencing Sequential Stream Segregation’. Acta Acust
88: 320–333.
Moore, B. C. J. and H. E. Gockel (2012). ‘Properties of Auditory Stream Formation’. Philos Trans R Soc Lond
B Biol Sci 367(1591): 919–931.
Näätänen, R. and I. Winkler (1999). ‘The Concept of Auditory Stimulus Representation in Cognitive
Neuroscience’. Psychol Bull 125(6): 826–859.
Näätänen, R., M. Tervaniemi, E. Sussman, P. Paavilainen, and I. Winkler (2001). ‘ “Primitive Intelligence”
in the Auditory Cortex’. Trends Neurosci 24(5): 283–288.
Näätänen, R., T. Kujala, and I. Winkler (2011). ‘Auditory Processing that Leads to Conscious
Perception: A Unique Window to Central Auditory Processing Opened by the Mismatch Negativity and
Related Responses’. Psychophysiology 48(1): 4–22.
Nager, W., W. Teder-Sälejärvi, S. Kunze, and T. F. Münte (2003). ‘Preattentive Evaluation of Multiple
Perceptual Streams in Human Audition’. Neuroreport 14(6): 871–874.
Nakajima, Y., T. Sasaki, K. Kanafuka, A. Miyamoto, G. Remijn, and G. ten Hoopen (2000).
‘Illusory Recouplings of Onsets and Terminations of Glide Tone Components’. Percept Psychophys
62(7): 1413–1425.
Nakajima, Y., T. Sasaki, G. B. Remijn, and K. Ueda (2004). ‘Perceptual Organization of Onsets and Offsets
of Sounds’. J Physiol Anthropol Appl Human Sci 23(6): 345–349.
Neisser, U. (1967). Cognitive Psychology (New York: Appleton-Century-Crofts).
Nelken, I. (2008). ‘Processing of Complex Sounds in the Auditory System’. Curr Opin Neurobiol
18(4): 413–417.
618 Denham and Winkler
Oertel, D., R. R. Fay, and A. N. Popper (2002). Integrative Functions in the Mammalian Auditory Pathway
(New York: Springer-Verlag).
Paz, R., H. Gelbard-Sagiv, R. Mukamel, M. Harel, R. Malach, and I. Fried (2010). ‘A Neural Substrate in
the Human Hippocampus for Linking Successive Events’. Proc Natl Acad Sci USA 107(13): 6046–6051.
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (San
Mateo: Morgan Kaufmann Publishers).
Pressnitzer, D. and J. M. Hupe (2006). ‘Temporal Dynamics of Auditory and Visual Bistability Reveal
Common Principles of Perceptual Organization’. Curr Biol 16(13): 1351–1357.
Rajendran, V. G., N. S. Harper, B. D. Willmore, W. M. Hartmann, and J. W. H. Schnupp (2013).
‘Temporal Predictability as a Grouping Cue in the Perception of Auditory Streams’. J Acoust Soc Am
134(1): EL98–104.
Rand, T. C. (1974). ‘Letter: Dichotic Release from Masking for Speech’. J Acoust Soc Am 55(3): 678–680.
Rensink, R. A. (2000). ‘Seeing, Sensing, and Scrutinizing’. Vision Res 40(10–12): 1469–1487.
Riecke, L., A. J. Van Opstal, and E. Formisano (2008). ‘The Auditory Continuity Illusion: A Parametric
Investigation and Filter Model’. Percept Psychophys 70(1): 1–12.
Rimmele, J. M., E. Schröger, and A. Bendixen (2012). ‘Age-Related Changes in the Use of Regular Patterns
for Auditory Scene Analysis’. Hear Res 289(1–2): 98–107.
Ritter, W., E. Sussman, and S. Molholm (2000). ‘Evidence that the Mismatch Negativity System Works on
the Basis of Objects’. Neuroreport 11(1): 61–63.
Roberts, B., B. R. Glasberg, and B. C. Moore (2002). ‘Primitive Stream Segregation of Tone Sequences
without Differences in Fundamental Frequency or Passband’. J Acoust Soc Am 112(5, Pt 1): 2074–2085.
Samuel, A. G. (1981). ‘The Role of Bottom-Up Confirmation in the Phonemic Restoration Illusion’. J Exp
Psychol Hum Percept Perform 7(5): 1124–1131.
Schadwinkel, S. and A. Gutschalk (2011). ‘Transient Bold Activity Locked to Perceptual Reversals
of Auditory Streaming in Human Auditory Cortex and Inferior Colliculus’. J Neurophysiol
105(5): 1977–1983.
Schofield, A. R. (2010). Structural Organization of the Descending Auditory Pathway. In The Oxford
Handbook of Auditory Science, vol. 2: The Auditory Brain, ed. A. Rees and A. R. Palmer, pp. 43–64
(Oxford: Oxford University Press).
Schwartz, J. L., N. Grimault, J. M. Hupe, B. C. Moore, and D. Pressnitzer (2012). ‘Multistability
in Perception: Binding Sensory Modalities, an Overview’. Philos Trans R Soc Lond B Biol Sci
367(1591): 896–905.
Seeba, F. and G. M. Klump (2009). ‘Stimulus Familiarity Affects Perceptual Restoration in the European
Starling (Sturnus vulgaris)’. PLoS One 4(6): e5974.
Shamma, S. A., M. Elhilali, and C. Micheyl (2011). ‘Temporal Coherence and Attention in Auditory Scene
Analysis’. Trends Neurosci 34(3): 114–123.
Shpiro, A., R. Moreno-Bote, N. Rubin, and J. Rinzel (2009). ‘Balance between Noise and Adaptation in
Competition Models of Perceptual Bistability’. J Comput Neurosci 27(1): 37–54.
Snyder, J. S., C. Alain, and T. W. Picton (2006). ‘Effects of Attention on Neuroelectric Correlates of
Auditory Stream Segregation’. J Cogn Neurosci 18(1): 1–13.
Snyder, J. S. and C. Alain (2007). ‘Toward a Neurophysiological Theory of Auditory Stream Segregation’.
Psychol Bull 133(5): 780–799.
Snyder, J. S., M. K. Gregg, D. M. Weintraub, and C. Alain (2012). ‘Attention, Awareness, and the
Perception of Auditory Scenes’. Front Psychol 3: 15.
Spence, C. (this volume). ‘Cross-modal Perceptual Organization’. In The Oxford Handbook of Perceptual
Organization, ed. J. Wagemans (Oxford: Oxford University Press).
Stoffgren, T. A. and B. G. Brady (2001). ‘On Specification and the Senses’. Behavioral and Brain Sciences
24: 195–222.
Auditory Perceptual Organization 619
Summerfield, C. and T. Egner (2009). ‘Expectation (and Attention) in Visual Cognition’. Trends Cogn Sci
13(9): 403–409.
Sussman, E. S., W. Ritter, and H. G. Vaughan, Jr (1999). ‘An Investigation of the Auditory Streaming Effect
Using Event-Related Brain Potentials’. Psychophysiology 36(1): 22–34.
Sussman, E. S. (2005). ‘Integration and Segregation in Auditory Scene Analysis’. J Acoust Soc Am 117(3, Pt 1):
1285–1298.
Sussman, E. S., A. S. Bregman, W. J. Wang, and F. J. Khan (2005). ‘Attentional Modulation of
Electrophysiological Activity in Auditory Cortex for Unattended Sounds within Multistream Auditory
Environments’. Cogn Affect Behav Neurosci 5(1): 93–110.
Sussman, E. S., J. Horváth, I. Winkler, and M. Orr (2007). ‘The Role of Attention in the Formation of
Auditory Streams’. Percept Psychophys 69(1): 136–152.
Szalárdy, O., A. Bendixen, D. Tóth, S. L. Denham, and I. Winkler (2012). ‘Modulation-Frequency Acts as a
Primary Cue for Auditory Stream Segregation’. J Learning & Perception in press.
Szalárdy, O., T. Bőhm, A. Bendixen, and I. Winkler (2013). ‘Perceptual Organization Affects the
Processing of Incoming Sounds: An ERP Study’. Biol Psychol 93(1): 97–104.
Taaseh, N., A. Yaron, and I. Nelken (2011). ‘Stimulus-Specific Adaptation and Deviance Detection in the
Rat Auditory Cortex’. PLoS One 6(8): e23369.
Teki, S., M. Chait, S. Kumar, K. von Kriegstein, and T. D. Griffiths (2011). ‘Brain Bases for Auditory
Stimulus-Driven Figure-Ground Segregation’. J Neurosci 31(1): 164–171.
Ulanovsky, N., L. Las, and I. Nelken (2003). ‘Processing of Low-Probability Sounds by Cortical Neurons’.
Nat Neurosci 6(4): 391–398.
van Ee, R. (2009). ‘Stochastic Variations in Sensory Awareness are Driven by Noisy Neuronal
Adaptation: Evidence from Serial Correlations in Perceptual Bistability’. J Opt Soc Am A Opt Image Sci
Vis 26(12): 2612–2622.
van Leeuwen, C. (this volume). ‘Continuous versus Discrete Stages, Emergence versus Microgenesis.’ In The
Oxford Handbook of Perceptual Organization, ed. J. Wagemans (Oxford: Oxford University Press).
van Noorden, L. P. A. S. (1975). Temporal Coherence in the Perception of Tone Sequences. Doctoral
dissertation, Technical University Eindhoven.
Vliegen, J. and A. J. Oxenham (1999). ‘Sequential Stream Segregation in the Absence of Spectral Cues’.
J Acoust Soc Am 105(1): 339–346.
von Helmholtz, H. (1885). On the Sensations of Tone as a Physiological Basis for the Theory of Music
(London: Longmans, Green, and Co.).
Wagemans, J., J. H. Elder, M. Kubovy, S. E. Palmer, M. A. Peterson, M. Singh, and R. von der Heydt
(2012). ‘A Century of Gestalt Psychology in Visual Perception, I: Perceptual grouping and figure-ground
organization’. Psychol Bull 138(6): 1172–1217.
Warren, R. M., J. M. Wrightson, and J. Puretz (1988). ‘Illusory Continuity of Tonal and Infratonal Periodic
Sounds’. J Acoust Soc Am 84(4): 1338–1342.
Wessel, D. L. (1979). ‘Timbre space as a musical control structure’. Computer Music Journal 3: 45–52.
Wightman, F. L. and R. Jenison (1995). ‘Auditory Spatial Layout’. In Perception of Space and Motion, ed.
W. Epstein and S. J. Rogers, pp. 365–400 (San Diego, CA: Academic Press).
Winkler, I., E. Sussman, M. Tervaniemi, J. Horváth, W. Ritter, and R. Näätänen (2003). ‘Preattentive
Auditory Context Effects’. Cogn Affect Behav Neurosci 3(1): 57–77.
Winkler, I., W. A. Teder-Salejarvi, J. Horváth, R. Näätänen, and E. Sussman (2003). ‘Human Auditory
Cortex Tracks Task-Irrelevant Sound Sources’. Neuroreport 14(16): 2053–2056.
Winkler, I. and N. Cowan (2005). ‘From Sensory to Long-Term Memory: Evidence from Auditory Memory
Reactivation Studies’. Exp Psychol 52(1): 3–20.
Winkler, I., R. Takegata, and E. Sussman (2005). ‘Event-Related Brain Potentials Reveal Multiple Stages in
the Perceptual Organization of Sound’. Brain Res Cogn Brain Res 25(1): 291–299.
620 Denham and Winkler
Winkler, I., T. L. van Zuijen, E. Sussman, J. Horváth, and R. Näätänen (2006). ‘Object Representation in
the Human Auditory System’. Eur J Neurosci 24(2): 625–634.
Winkler, I. (2007). ‘Interpreting the Mismatch Negativity’. Journal of Psychophysiology 21: 147–163.
Winkler, I., S. L. Denham, and I. Nelken (2009). ‘Modeling the Auditory Scene: Predictive Regularity
Representations and Perceptual Objects’. Trends Cogn Sci 13(12): 532–540.
Winkler, I. (2010). ‘In Search for Auditory Object Representations’. In Unconscious Memory Representations
in Perception: Processes and Mechanisms in the Brain, ed. I. Winkle and I. Czigler, pp. 71–106
(Amsterdam: John Benjamins).
Winkler, I. and I. Czigler (2012). ‘Evidence from Auditory and Visual Event-Related Potential (ERP)
Studies of Deviance Detection (MMN and vMMN) Linking Predictive Coding Theories and Perceptual
Object Representations’. Int J Psychophysiol 83(2): 132–143.
Winkler, I., S. Denham, R. Mill, T. M. Bohm, and A. Bendixen (2012). ‘Multistability in Auditory Stream
Segregation: A Predictive Coding View’. Philos Trans R Soc Lond B Biol Sci 367(1591): 1001–1012.
Yabe, H., I. Winkler, I. Czigler, S. Koyama, R. Kakigi, T. Sutoh, T. Hiruma, and S. Kaneko (2001).
‘Organizing Sound Sequences in the Human Brain: The Interplay of Auditory Streaming and Temporal
Integration’. Brain Res 897(1–2): 222–227.
Yildiz, I. B. and S. J. Kiebel (2011). ‘A Hierarchical Neuronal Model for Generation and Online Recognition
of Birdsongs’. PLoS Comput Biol 7(12): e1002303.
Yu, A. J. (2007). ‘Adaptive Behavior: Humans Act as Bayesian Learners’. Curr Biol 17(22): R977–980.
Zhuo, G. and X. Yu (2011). ‘Auditory Feature Binding and its Hierarchical Computational Model’. Artificial
Intelligence and Computational Intelligence: Lecture Notes in Computer Science 7002: 332–338.
Zwicker, E. and H. Fastl (1999). Psychoacoustics: Facts and Models (Heidelberg, New York: Springer).
Chapter 30
Introduction
Tactile perception refers to perception by means of touch mediated only through the cutane-
ous receptors (mechanoreceptors and thermoreceptors) located in the skin (Lederman and
Klatzky, 2009; Loomis and Lederman, 1986). When also kinesthetic receptors (mechanorecep-
tors embedded in muscles, joints and tendons) are involved, the term haptic perception is used.
Four main types of cutaneous mechanoreceptors have been distinguished: Merkel nerve endings
(small receptive field, slowly adapting), Meissner corpuscles (small receptive field, fast adapting),
Pacinian corpuscles (large receptive field, slowly adapting) and Ruffini endings (large receptive
fields, fast adapting). Together, these are responsible for the human’s large range of sensitivities
to all kinds of stimulation, such as pressure, vibration, and skin stretch. The kinesthetic sense, or
kinesthesia, contributes to the perception of the positions and movement of the limbs (Proske and
Gandevia, 2009). The main kinesthetic receptor is the muscle spindle that is sensitive to changes
in length of the muscle; its sensitivity can be adapted to the circumstances. Most of our everyday
activities involving touch (think of handling and identifying objects, maintenance of body pos-
ture, sensing the texture of food in the mouth, estimating the weight of an object, etc.) fall into the
class of haptic perception.
An interesting difference with the sense of vision is that visual receptors are restricted to a small
well-delineated organ (namely the eye), whereas touch receptors are distributed all over the body.
However, the sensitivity of these receptors varies widely over the body. A commonly used measure
for the sensitivity is the two-point-threshold, which represents the smallest distance between two
stimuli that is necessary to distinguish the stimulation from just one stimulus. Such thresholds are
typically 2–4 mm for the fingertips, but can be more than 40 mm for the calf, thigh, and shoulder
(Lederman and Klatzky, 2009; Weinstein, 1968). Another interesting fact compared with vision is
that the extremities (limbs) are not only exploratory sense organs, but they are also performatory
motor organs (Gibson, 1966).
The availability of tactual information is usually taken-for-granted and as a consequence its
importance is severely underestimated. The importance of haptics, or of touch in general, is usu-
ally illustrated by referring to its significance to those individuals that lack the use of one of the
other major senses, particularly sight. Blind (or blindfolded) humans clearly have to rely heavily
on the sense of touch. However, this observation disregards the fact that in daily life touch is of
vital importance for everyone, not just for the visually disabled: living without the sense of touch
is virtually impossible (e.g. Cole and Paillard, 1995). Patients suffering from peripheral neuropa-
thy (a condition that deafferents the limbs, depriving the person of cutaneous and haptic touch)
are unable to control their limbs without visual feedback: in the dark or when covered under a
622 Kappers and Bergmann Tiest
blanket, they are completely helpless. Such patients are fortunately rare, but they make us aware of
our reliance on touch in basically all our daily activities.
Humans are able to perceive a wide range of properties by means of touch. Some of these are
shared with vision, for example, shape and size, but others are specific for touch, such as weight,
compliance, and temperature. Properties like texture can be perceived both visually and haptically,
but in quite different ways and these could contradict each other: an object might look smooth,
but feel rough and vice versa. In 1987, Lederman and Klatzky made an inventory of the typi-
cal hand movements humans make when assessing object and material properties. Information
about weight, size, texture, shape, compliance, and temperature can be obtained by unsupported
holding, enclosure, lateral movement, contour following, pressure and static touch, respectively
(Lederman and Klatzky, 1987). These so-called exploratory procedures do not only suffice to
assess these properties, but they are optimal and often also necessary.
This chapter aims at giving a concise overview of the human haptic perception of object and
spatial properties. Insight into perceptual organization can often be obtained by studying percep-
tual illusions, as many of these rely on tricks with perceptual organization. The theoretical basis
for this idea lies in the way information from the world around us is processed. A great deal of our
representation of the world is not actually perceived, but supplemented by our brain according
to certain mechanisms. When this process goes wrong, as is the case with illusions, these mecha-
nisms are laid bare and their operation can be fathomed. The topics in this chapter will, therefore,
where possible, be illustrated with tactile or haptic illusions (e.g. Hayward, 2008; Lederman and
Jones, 2011; Robertson, 1902; Suzuki and Arashida, 1992).
Object Properties
The question ‘What is an object?’ or, in particular, ‘How do humans segregate figure from ground?’
has been investigated extensively in vision. In touch, however, only a few studies are relevant in
this respect. For example, Pawluk and colleagues (2010) asked observers to distinguish between
figure and ground by means of a ‘haptic glance’, a very brief gentle contact with all five fingers of a
hand. They showed that such a brief contact is, indeed, sufficient for the distinction between figure
and ground. A similar pop-out phenomenon, immediately separating different aspects of a haptic
scene, has been reported for haptically relevant properties such as roughness (Plaisier et al., 2008)
and compliance (van Polanen et al., 2012). Some other studies report on numerosity perception.
By actively grasping a bunch of a small number of objects (in this case spheres), one can rapidly
determine the correct number of objects (Plaisier et al., 2009), which gives clear evidence of fast
object individuation by touch.
This section will focus on the haptic perception of object properties, such as curvature, shape,
size, and weight that have received quite some attention. It will also be shown that some of these
properties are susceptible to strong illusions and these are important for our understanding of
how and what aspects of objects can be perceived by touch.
Curvature
An important aspect of a smooth shape is its curvature and it is therefore of interest if and how
well humans can perceive and discriminate curvature, and what perceptual mechanism is used
for haptic curvature perception. The first studies on curvature perception focused on the question
how well humans could decide whether a stimulus was concave, straight or convex. Hunter (1954)
and later Davidson (1972) presented curved strips on the horizontal plane and found that what
observers perceive as straight is actually somewhat concave (the middle of the stimulus bent away
Tactile and Haptic Perceptual Organization 623
from the observer). They also compared performance of blind and blindfolded sighted observers
and their conclusion was that blind observers give more ‘objective’ (that is, veridical) responses.
Davidson found that if the sighted observers were instructed to use the scanning strategies of
the blind, their performance improved. He concluded that the exploratory movement of an arm
sweep might obscure the stimulus curvature.
Gordon and Morrison (1982) were interested in how well observers could discriminate curved
from flat stimuli. Using small curved stimuli explored by active touch, they could express the
discrimination threshold in terms of geometrical stimulus properties: the base-to-peak height
of the curved stimulus divided by half its length is constant (see Figure 30.1(a)). This expression
indicates the overall gradient of the stimulus. To exclude and investigate the possible influence of
kinesthetic perception on curvature discrimination, Goodwin et al. (1991) pressed small curved
stimuli onto the fingers of observers, so that only cutaneous receptors in the finger pads could play
a role. In this way, a 10 per cent difference in curvature could be detected. In a subsequent study
(Goodwin and Wheat, 1992), they found that discrimination thresholds remained the same even
if contact area was kept constant, so contact area was not the determining factor for curvature dis-
crimination. However, discrimination performance increased with contact area. For stimuli with
a larger contact area, the base-to-peak height is also larger, so their finding was consistent with
the conclusion of Gordon and Morrison that the stimulus gradient determines the discrimination
threshold (see Figure 30.1).
Pont et al. (1997) used stimuli that were similar in curvature and size to that of Hunter (1954)
and Davidson (1972), but they used these stimuli upright and performed discrimination instead
(a)
Base-to-peak
Gradient height
(b)
Base-to-peak
Gradient height
(c)
Gradient Base-to-peak
height
Half base length
Fig. 30.1 Illustration of the threshold expression of Gordon and Morrison (1982). (a) A curved
stimulus has a base-to-peak height and a length. The ratio of the two divided by 2 gives the gradient
or slope. (b) A stimulus with a higher curvature has a larger base-to-peak height if the length is the
same as in (a). As a consequence, the gradient is also larger. (c) Stimulus with the same curvature
as in (a), but of smaller length. The gradient is smaller than in (a) because of the nonlinear relation
between slope and stimulus length.
624 Kappers and Bergmann Tiest
of classification experiments. In various conditions, observers had to place their hand on two suc-
cessive stimuli and they had to decide which of the two had the higher curvature. Figure 30.2(a)–
(c) shows a few of their experimental conditions: stimuli could be placed along the various fingers
as in (a), across the fingers at several locations as in (b), or even at the dorsal side of the hand as in
(c). Consistent with the previous findings, they found that the gradient of the stimuli determined
the curvature discrimination threshold. As the dorsal side of the hand contains much less cutane-
ous mechanoreceptors than the palmar side, worse discrimination performance with the dorsal
side of the hand showed the importance of the cutaneous receptors in curvature perception. They
also found that performance with statically or dynamically touching the stimuli was not signifi-
cantly different (Pont et al., 1999). Possibly this is due to the important role the cutaneous recep-
tors play in discrimination performance.
If the overall gradient or slope of the stimulus plays a major role in curvature discrimination
performance, then height and local curvature are of minor importance. Pont et al. (1999) inves-
tigated this explicitly by creating a new set of stimuli in which the order of information that
the stimulus contained was varied (see Figure 30.2(d)–(f)). The first stimulus set contained only
height differences (zeroth order information), the second set contained both height differences
and slopes (zeroth and first order information) and the third set contained in addition local curva-
ture information (zeroth, first and second order information). Participants placed their fingers on
the stimuli as shown in Figure 30.2(d)–(f) and had to decide for each stimulus pair (within a set),
which of the two was more convex. All thresholds could be expressed in terms of base-to-peak
height. Convincingly, the thresholds for the zeroth order set were much higher than for both the
two other sets. There was no significant difference in thresholds if local curvature was added to
the stimuli, so thresholds are indeed based on the gradient information.
The experiments on stimulus order by Pont et al. (1999) were necessarily done using static
touch. Dostmohamed and Hayward (2005) designed a haptic device that made it possible to per-
form similar experiments using active touch. Participants had to place a finger on a small metal
Fig. 30.2 Illustration of some of the conditions in the experiments by Pont and colleagues (1997,
1999). (a) Stimulus placed along the index finger; (b) Stimulus placed across the fingers. (c) Stimulus
presented dorsally. (d) Stimulus just containing height differences (zeroth order information). (e)
Stimulus containing height and slope differences (zeroth and first order information). (f) Stimulus
containing height, slope and curvature information (zeroth, first, and second order information).
Tactile and Haptic Perceptual Organization 625
plate and when actively moving this plate, the plate followed the trajectory of a preprogrammed
stimulus shape. In this way, Wijntjes et al. (2009) could compare discrimination performance with
the same stimulus shapes Pont et al. (1999) used. They also included a condition directly touching
the real curved shapes. Their results were consistent with those obtained for static touch: height
information alone is not sufficient, but as soon as first order information (slope) is present, perfor-
mance is just as good as with the curved shapes. Therefore, the determining factor for curvature
discrimination performance is the overall gradient in the stimulus. It is clear that the principles
of perceptual organization are at work here: from just the orientation of the surface in a few loca-
tions, the entire curved surface is reconstructed according to the principle of good continuation.
Not only is the surface reconstructed, its curvature can also be perceived as accurately as in the
case of a complete surface.
Illusions of curvature
Although humans are sensitive to only small differences in curvature, their perception of
curvature is not veridical. Both Hunter (1954) and Davidson (1972) reported that what is
perceived as straight is actually curved away from the observer. Davidson’s explanation was
that a natural hand movement also follows a curved line, obscuring the stimulus’ curvature.
Vogels et al. (1996, 1997) found that a three-dimensional surface that is perceived as flat cor-
responds to a geometrically concave surface. In other words, an actually flat surface is usually
perceived as convex. There are other, even more pronounced, curvature illusions that will be
described below.
during the decay. As they did not find differences between the three conditions, they concluded
that peripheral receptors do not play a major role in causing the after effect. In a small experiment
with only two participants, they also tested whether the after effect transferred to the other hand.
As they did not find an indication of such a transfer, they had to conclude that the origin of the
after effect is neither of a high level.
Van der Horst et al. (2008a) found not only a substantial after effect when the curved surface
was just touched by a single finger, they also found a partial transfer of the after effect to other
fingers, both of the same hand and of the other hand. Because the transfer is only partial, they
conclude that the major part of the after effect is caused at a level where the individual fingers are
represented, but that in addition a part has to occur at a level shared by the fingers. Interestingly, in
another study Van der Horst et al. (2008b) found a full transfer of the after effect when the curved
surfaces were touched dynamically. They conclude that the level of the representation of curvature
apparently depends on the way the information is acquired (see Kappers (2011) for an overview
of all after effect studies).
Shape
Curvature is an important property of smooth shapes, but it is also of interest to investigate the
perception of shape itself. A first study was conducted by Gibson (1963), who used a set of smooth
solid objects that were ‘equally different’ from one another to perform matching and discrimi-
nation experiments. He concluded that blindfolded observers could distinguish such shapes by
touch. Klatzky and colleagues (1985) used a large set of common daily life objects, such as a comb,
wallet, screw, and tea bag, and they established that such three-dimensional objects could be rec-
ognized accurately and rapidly by touch alone. Norman and colleagues (2004) made plastic copies
of bell peppers, which they used in matching and discrimination experiments, both unimodally
(touch or vision) and bimodally (touch and vision). As the results in the various conditions were
quite similar, they concluded that the visual and haptic representations of three-dimensional
shape are functionally overlapping.
A different approach was followed by van der Horst and Kappers (2008). They used a set of
cylindrical objects with different elliptical cross-sections and a set of blocks with rectangular
cross-sections. The task of the observers was to grasp (without lifting) a pair of objects and deter-
mine which of the two had the circular (for the cylinders) or square (for the blocks) cross-section.
They found that an aspect ratio (i.e. ratio between the longer and the shorter axes) of 1.03 was
sufficient to distinguish circular from elliptical, but an aspect ratio of 1.11 was necessary for dis-
tinguishing square from rectangular. This was somewhat surprising, since the aspect ratio is more
readily available in the block than in the cylinders. They concluded that apparently the curva-
ture information present in the cylinders could be used in a reliable manner. Using a similar set
of objects, Panday et al. (2012) studied explicitly how local object properties (such as curvature
variation and edges) influenced the perception of global object perception. They found that both
Tactile and Haptic Perceptual Organization 627
curvature and curvature change could enhance performance in an object orientation detection
task, but edges deteriorated performance.
Size
Objects are always extended and thus have a certain size. Size can be measured in one, two, or
three dimensions, which corresponds to length, area, and volume. In this section, we will restrict
ourselves to the haptic perception of length and volume.
Length
An object’s length can basically be perceived in two ways. The first is the finger-span method, in
which the object is enclosed between thumb and index finger. This method is restricted to lengths
of about 10 cm or less, depending on hand size. The best accuracy (discrimination threshold)
with which lengths can be perceived in this way is about 0.5 mm (1 per cent) for a 5-cm reference
length (Langfeld, 1917). For greater lengths, the thresholds increase somewhat up to about 3 mm
for a 9-cm reference length (Stevens and Stone, 1959).
For even larger objects, the finger-span method cannot be used and movement is required to
perceive the object’s length. When moving the finger over the side of an object, two sources of
information are available—the distance travelled can be derived from the kinesthetic information
from muscles and joints. At the same time, it can also be extracted from the cutaneous informa-
tion of the fingertip moving over the surface by estimating the movement speed and duration.
Length perception with the movement method is a lot less accurate than the finger span method.
Based on kinesthetic information, the length discrimination threshold for an 8-cm reference
length is 11 mm (14 per cent), while based on cutaneous information, it is 25 mm (32 per cent)
(Bergmann Tiest et al., 2011). In conclusion, haptic length perception can be done with either the
finger-span method, kinesthetic movement information, or cutaneous movement information,
with varying degrees of accuracy.
Illusions of length
A well-known illusion in haptic length perception is the radial-tangential illusion, in which lengths
explored in the radial direction (away from and towards the body) are perceived to be larger than
lengths explored in the tangential direction (parallel to the frontoparallel plane; Armstrong and
Marks, 1999). This indicates that haptic space is anisotropic and that the perceived length of an
object depends on its orientation.
Regarding the different methods, it has been found that lengths perceived by the
finger-span method are judged to be shorter than by the movement method, both in a
perception-and-reproduction task (Jastrow, 1886) and in a magnitude estimation task using a
visual scale (Hohmuth et al., 1976). The difference in perceived length between the methods was
as high as a factor of 2.5 in some cases. Furthermore, lengths perceived using the movement
method with only cutaneous information were underestimated more than with only kinesthetic
information (Terada et al., 2006). When kinesthesia and cutaneous perception yielded conflicting
information, the estimate was found to be based on the greatest length.
Finally, the well-known Müller-Lyer illusion, in which the length of a line is perceived differ-
ently depending on the type of arrowheads present at the ends, has been demonstrated in touch
as well as in vision (Millar and Al-Attar, 2002; Robertson, 1902). All in all, these illusions indicate
that haptic length perception is not independent of the direction or the type of movements made,
nor of the direct environment of the object to be perceived.
628 Kappers and Bergmann Tiest
Volume
Although quite a number of studies focused on the perception of weight (see below), which
usually correlates with object size unless different materials are compared, only a few studies
investigated the haptic perception of volume. Volume is typically assessed by enclosing the
object with the hand(s) (Lederman and Klatzky, 1987). Kahrimanovic et al. (2011b) investi-
gated the just noticeable difference (JND) of spheres, cubes, and tetrahedrons that fitted in the
hand. They found that for the smaller stimuli of their set, the volumes of tetrahedra were sig-
nificantly more difficult to discriminate than those of cubes and spheres, with Weber fractions
of 0.17, 0.15, and 0.13, respectively. The availability of weight information did not improve
performance.
As visual estimates of volume were found to be biased depending on the object geometry,
Krishna (2006) decided to investigate this so-called ‘elongation bias’ haptically. She found that in
touch, an effect opposite to that in vision occurred: a tall glass was perceived as larger in volume
than a wide glass of the same volume. Her conclusion was that, whereas in vision, ‘height’ is a sali-
ent feature, for touch ‘width’ would be more salient. As objects can differ along more geometric
dimensions than just height or width, Kahrimanovic et al. (2010) investigated volume discrimin-
ation of spheres, cubes and tetrahedra (see Figure 30.3 left). These stimuli were of a size that fitted
in one hand. They found substantial biases: tetrahedra were perceived as much larger than spheres
(about 60 per cent) and cubes (about 30 per cent). Somewhat smaller, but still substantial biases
were found when observers had access to the mass (weight) of the object (although they were not
told explicitly that weight correlated with volume).
The subsequent step in the research was to investigate the physical correlates of these volume
biases. If the volumes of spheres, cubes, and tetrahedra are the same, then, among others, their
surface area and maximal length are not identical. It turned out that for volumes that were per-
ceived as being equal, the surface areas of the objects were almost the same (Kahrimanovic et al.,
2010). If participants were instructed to compare surface area of these shapes, their performance
was almost unbiased. This outcome makes sense, if one realizes that surface area correlates with
skin stimulation, which is a more direct measure of object size than the more ‘abstract’ volume.
If the cue of surface area of the cubes and tetrahedrons was absent by using wire frame objects,
biases increased to an average of 69 per cent in the cube-tetrahedron comparison. In this condi-
tion, the maximum length between two vertex points was the factor correlating with the partici-
pant’s perceived volume. Again, this can be understood by realizing that now length is the more
direct stimulus compared with volume. It seems to be a general principle of haptic perceptual
organization that volume is perceived on the basis of the most readily available geometric prop-
erty of the stimulus.
In a follow-up study, similar shapes but of a size much larger than the hand were used (see
Figure 30.3 right). Again a tetrahedron was perceived as larger than both the sphere (22 per cent)
and the cube (12 per cent), and the cube was perceived as larger than the sphere (8 per cent),
although the latter difference was not significant. From these smaller differences than in the pre-
vious study, it could already be seen that surface area could not be the (sole) responsible factor.
This need not be surprising. The objects are larger than the hands, so the skin area stimulated
when holding the objects is probably very similar (namely the whole hand surface) for all shapes.
Moreover, bimanual perception necessarily takes places at a higher level than unimanual percep-
tion, so the experimental findings need not be the same.
Weight
One of the first to report on weight perception was Weber (1834/1986). Since then, quite a number
of studies investigated human discriminability of weight (for an overview, see Jones (1986)). The
methods used to measure these thresholds are rather diverse and as a consequence the reported
Weber fractions also vary over a wide range, from 0.09 to 0.13 for active lifting. Thresholds
obtained with passively resting hands are higher, suggesting that receptors in muscles play a role
in weight discrimination (Brodie and Ross, 1984). Jones (1986) also gives an overview of the rela-
tionships between perceived weight and physical weight and also these vary widely: most authors
report power functions, but their exponents range from 0.7 to 2.0. When participants were asked
to enclose the objects (sphere, cubes, or tetrahedrons), Weber fractions for weight discrimination
were even higher (0.29). They were also higher than volume discrimination thresholds obtained
with the same objects, so apparently weight information could not be the determining factor in
volume discrimination (Kahrimanovic et al., 2011a).
These illusions show that different cues, which may not always be relevant to the task, contrib-
ute to the final percept. This suggests the existence of a mechanism, also in haptic perception, that
synthesizes the perception of an object from different information sources, possibly operating
according to Gestalt laws.
Spatial Properties
The haptic sense does not only provide us with object properties, but also the relations between
these objects or parts of objects have to be perceived. The perception of such spatial relations has
been studied most extensively in raised line drawings.
Line drawings
Although three-dimensional objects are easy to recognize by touch (see above), two- dimensional
raised line drawings are very hard to recognize (e.g. Heller, 1989; Klatzky et al., 1993; Loomis et al.,
1991; Magee and Kennedy, 1980; Picard and Lebaz, 2012), even with extended exploration times.
To illustrate this phenomenon, blindfolded observers had to explore a wire frame stimulus of a
house in an informal experiment, and when they felt confident that they could draw what they had
felt, they stopped the exploration that typically took several minutes, removed the blindfold and
made a drawing without seeing the stimulus. It can be seen in Figure 30.4, that some of the par-
ticipants clearly recognized a house, but most of them missed several details, such as parts like the
door, the bottom line of the roof or the placement of the chimney. Other participants had no idea
of the shape and were also not able to draw it. They missed (in addition) more important aspects
such as the straightness of lines, the relation between lines or the fact that many of the angles are
right. Note that observer LB was only able to recognize the house after he saw his own drawing.
One of the explanations given for the poor performance in recognizing line drawings, lies in the
difficulty to integrate spatial information. In the case of the line drawings, information is acquired
sequentially and has to be integrated over time into a coherent representation, a process possibly
governed by Gestalt laws. Loomis et al. (1991) compared tactual performance with that of explor-
ing a drawing visually with just a very limited field of view. If the field of view was similar in size
MM
LB SP IH
Original
ML PD
GO MH
Fig. 30.4 Result of an informal experiment. The original ‘house’ is a wire frame placed flat on a table
in the correct orientation. Blindfolded participants were asked to explore the stimulus and draw it
when they felt ready to do so. Exploration time was free and usually in the order of minutes. The
resulting drawings of the eight participants are shown.
Tactile and Haptic Perceptual Organization 631
to that of a finger pad, visual and tactual recognition performance was comparable. In an experi-
ment where the finger of the observer was either guided by the experimenter or actively moved
by the observer, performance was better in the guided condition (Magee and Kennedy, 1980). The
explanation could be that in the active condition movements are much noisier, making integra-
tion of information harder.
The role of vision in recognizing raised line drawings is somewhat controversial (e.g. Picard
and Lebaz, 2012). Some authors report similar performance of blindfolded sighted and con-
genitally blind observers (e.g. Heller, 1989), whereas others report worse performance for blind
observers (e.g. Lederman et al., 1990). In any case, from several studies, notably those by Kennedy
(e.g. 1993), it follows that congenitally blind observers are able to use raised line drawings to their
advantage.
Based on an idea by Ikeda and Uchikawa (1978), Wijntjes and colleagues (2008) gave blind-
folded observers 45 s to recognize drawings of common objects, such as a hammer, a car and
a duck. After this time period, they were forced to guess what they thought the object was.
Subsequently, in the case of a wrong answer (about 50 per cent of the cases), they had to draw
what they felt. Half of the observers had to do that without a blindfold, the other half with blind-
fold. Those who drew without blindfold, recognized their own drawing in about 30 per cent
of the cases; those who drew with blindfold mostly remained unaware of what the object was.
These different outcomes showed that the execution of motor movements during drawing could
not be the cause of the recognition. Naive observers also recognized the recognized drawings.
Therefore, the authors conclude that the mental capacities required to identify the drawing are
not sufficient. Externalization of the stimulus, as done by drawing on a sketchpad, seems to be a
process that can be used in the identification of serial input that needs to be integrated.
Spatial patterns
Gestalt psychologists have identified a number of regularities or ‘laws’ that can be used to explain
how humans categorize and group individual items, and how they perceive spatial patterns.
Principles of ‘similarity’, ‘proximity,’ and ‘good continuation’ can explain how humans group
items that seem to belong together. Almost all research has been performed using visual experi-
ments and only recently a few studies investigated the existence of such laws in the touch domain
(Gallace and Spence, 2011).
Overvliet et al. (2012) used a search task to investigate the influence of similarity and proxim-
ity on finding a target item pair among distractor pairs. Their stimuli consisted of two columns
of small vertical and horizontal bars. They found, among others, that if distractors consisted of
pairs of different items and the target of a pair of identical items, performance was worse (longer
reaction times) than in the reverse condition. However, when searching for a different pair among
identical pairs, the task can be performed by just searching for the odd-one-out in either the left or
the right column. There is no need to correlate the input from the left and right fingers (although
that was the task instruction). This makes the task inherently easier than the reverse task, but in
our opinion, it is questionable whether this has to do with the Gestalt concept of similarity. The
finding that there is no influence of proximity (between the pairs of stimuli in the two columns)
can be explained in the same way.
Good continuation
Items that are aligned tend to be perceived as a group and will be integrated to a perceptual whole.
Chang and colleagues (2007a) also designed a ‘good continuation’ experiment, once again com-
paring visual and haptic performance. They constructed 16 different layouts, shapes that were
partially occluded. The occlusion was represented both by color and texture, so that the same
stimuli could be used in the visual and haptic experiments. They found that overall visual and
haptic behavior was nearly the same, indicating that the Gestalt principle of continuation is also
applicable to touch.
Spatial relations
Helmholtz (1867/1962) was one of the first to notice that visual perception of the world around
us is not veridical. Hillebrand (1902) showed that lines that appeared parallel to the eye were not
at all parallel. A few years later, Blumenfeld (1913) showed that also visually equidistant lines
are not physically parallel, and, interestingly, that they are different from the ‘parallel alleys’ of
Hillebrand. In the literature, a discussion started about the concept and existence of ‘visual space’.
Inspired by these findings, Blumenfeld (1937) decided to perform similar experiments to investi-
gate the veridicality of haptic space. With pushpins, he fixed two threads to a table and he asked
blindfolded observers to straighten these threads by pulling them towards themselves in such
a way that they would be parallel to each other. Blumenfeld found that these threads were not
parallel: if the distance between the two pushpins was smaller than the observer’s shoulder width,
the threads diverged; if the distance was larger, the threads converged. In the same year, also von
Skramlik (1937) reported on the distortion of haptic space.
For a long time, hardly any research on the perception of haptic space was performed. In the
late nineties, Kappers and colleagues decided to investigate the haptic perception of parallelity in
more detail. Their first set-up consisted of a table on which 15 protractors in a 5 by 3 grid were
placed (e.g. Kappers and Koenderink, 1999). An aluminum bar of 20 cm could be placed on each
of the protractors. The bars could rotate around the center of the protractor. A typical experiment
consisted of a reference bar placed at a certain location in an orientation fixed by the experimenter
and a test bar at another location in a random orientation. The task of the blindfolded observers
was to rotate the test bar in such a way that it felt parallel to the reference bar. In all conditions,
either uni- or bimanual, large, but systematic deviations of parallelity were found. Depending on
the condition, these deviations could be more than 90°. The bar at the right hand side (either the
reference or the test) had to be rotated clockwise with respect to a bar to the left of it in order to be
perceived as haptically parallel (e.g. Kappers, 1999, 2003; Kappers and Koenderink, 1999). These
Tactile and Haptic Perceptual Organization 633
findings were reproduced in other labs (e.g. Fernández-Díaz and Travieso, 2011; Kaas and van
Mier, 2006; Newport et al., 2002).
The current explanation for the deviations is that they are caused by the biasing influence of an
egocentric reference frame (e.g. Kappers, 2005, 2007; Zuidhoek et al., 2003). The task of the obser-
ver is to make the two bars parallel in an allocentric (physical) reference frame, but of course, the
observer only has recourse to egocentric reference frames, such as the hand or the body reference
frame (see Figure 30.5). If the task had been performed (unintentionally) in an egocentric refer-
ence frame, the deviations would occur in the direction found. However, the deviations are not
as extreme as predicted by performance in just an egocentric reference frame, but they are biased
in that direction.
The evidence for this explanation is accumulating rapidly. For example, a time delay between
exploration of the reference bar and setting of the test bar causes a reduction of the deviation
(Zuidhoek et al., 2003), although in general a time delay would cause a deterioration of task per-
formance. The explanation is thought to lie in a shift during the delay from the egocentrically
biased spatial representation to a more allocentric reference frame, as suggested by Rossetti et al.
(1996) in pointing experiments. Non-informative vision (i.e. vision of the environment without
seeing the stimuli or set-up) strengthens the representation of the allocentric reference frame.
It was shown that this indeed leads to a reduction of the deviations (e.g. Newport et al., 2002;
Parallel in allocentric
reference frame
Haptically parallel
Parallel in egocentric
reference frame
Fig. 30.5 Illustration of different reference frames. (Top) Allocentric reference frame. This reference
frame coincides with a physical reference frame fixed to the table. Parallel bars have the same
orientation with respect to the protractor, independent of the location of the protractor. (Middle)
Haptically parallel. The two bars shown are perceived as haptically parallel by one of the observers
(the size of the deviations strongly depends on observer). (Bottom) Egocentric reference frame, in
this case fixed to the hand. The two bars have the same orientation with respect to the orientation
of the hand. The orientation of the hand will depend on its location, so the deviation from veridical
will directly depend on the hand. It can be seen that haptically parallel lies in between allocentrically
and egocentrically parallel.
634 Kappers and Bergmann Tiest
Zuidhoek et al., 2004). Asking observers to make two bars perpendicular, results for some observ-
ers in almost parallel bars (Kappers, 2004). This is consistent with what would be predicted on the
basis of the reference frame hypothesis. Moreover, mirroring bars in the mid-sagittal plane gave
almost veridical performance (Kappers, 2004; Kaas and van Mier, 2006). This is to be expected as
performance in both an egocentric and an allocentric reference frame would lead to veridical set-
tings. Moreover, the deviations obtained on mid-sagittal (Kappers, 2002), frontoparallel (Volcic
et al., 2007) and three-dimensional set-ups (Volcic and Kappers, 2008) can all be explained with
this same hypothesis.
The nature of the biasing egocentric reference frame originates most probably in a combina-
tion of the hand and the body. Kappers and colleagues (Kappers and Liefers, 2012; Kappers and
Viergever, 2006) manipulated the orientation of the hand during the exploration of the bars and
they showed that the deviation was linearly related to the orientation of the hand, that is, the ori-
entation of the hand reference frame. However, even when the two hands were aligned, a small but
significant deviation remained and this is consistent with influence of the body reference frame.
Illusions of orientation
The above-described investigations on the non-veridicality of haptic space already show that per-
ception of orientation is apt to yield illusions. Another class of illusions concerns the so-called
oblique effect (e.g. Appelle and Countryman, 1986; Gentaz et al., 2008; Lechelt and Verenka,
1980). This effect, also reported in vision, shows itself in more variable performance for oblique
orientations (usually 45° or 135°) than for horizontal and vertical orientations (0° and 90°). Gentaz
and colleagues (Gentaz et al., 2008) attribute the haptic oblique effect to gravitational cues and
memory constraints that are specific for haptics.
Concluding Remarks
We focused this chapter on the haptic perception of objects and spatial properties, and left out
all mention of the perception of material properties. Using haptic perception, our mind creates
a representation of the world around us based on observed curvatures, shapes, sizes, weights,
and orientations of objects. It remains to be seen whether all these elements fit together into a
consistent representation governed by rules similar to those formulated by Gestalt psychologists
for visual perception. As we have seen, the perception of these elements is fraught with illusory
effects. The perception of size, orientation, shape, and weight all interact with each other, produc-
ing different results in different situations. It is these interactions that may be very instructive in
the deconstruction of the haptic perceptual system, and it is for this reason that, in addition to
studying the elements in isolation, the interactions between them should be studied and their
mechanisms fathomed.
References
Appelle, S. and Countryman M. (1986). Eliminating the haptic oblique effect: influence of scanning
incongruity and prior knowledge of the standards. Perception 15(3): 325–329.
Armstrong, L. and Marks L. E. (1999). Haptic perception of linear extent. Percept Psychophys
61(6): 1211–1226.
Bergmann Tiest, W. M., van der Hoff, L. M. A. and Kappers A. M. L. (2011). Cutaneous and kinesthetic
perception of traversed distance. In Proc. IEEE World Haptics Conference, edited by C. Basdogan,
S. Choi, M. Harders, L. Jones, and Y. Yokokohji, pp. 593–597 (Istanbul: IEEE).
Tactile and Haptic Perceptual Organization 635
Blumenfeld, W. (1913). Untersuchungen über die scheinbare Grösse im Sehraume. Zeitschr Psychol 65:
241–404.
Blumenfeld, W. (1937). The relationship between the optical and haptic construction of space. Acta Psychol
2: 125–174.
Brodie, E. E. and Ross, H. E. (1984). Sensorimotor mechanisms in weight discrimination. Percept
Psychophys 36(5): 477–481.
Carter, O., Konkle, T., Wang, Q., Hayward, V., and Moore, C. (2008). Tactile rivalry demonstrated with an
ambiguous apparent-motion quartet. Curr Biol 18(14): 1050–1054.
Chang, D., Nesbitt, K. V., and Wilkins, K. (2007a). The Gestalt principle of continuation applies to both
the haptic and visual grouping of elements. In Second Joint EuroHaptics Conference and Symposium on
Haptic Interfaces for Virtual Environment and Teleoperator Systems (WHC’07), pp. 15–20.
Chang, D., Nesbitt, K. V., and Wilkins, K. (2007b). The Gestalt principles of similarity and proximity
apply to both the haptic and visual grouping of elements. In Proc Eight Australasian Conference on User
Interface, Vol. 64: pp. 79–86 (Darlinghurst: Australian Computer Society, Inc.).
Cole, J., and Paillard, J. (1995). Living without touch and peripheral information about body position and
movement: studies with deafferented patients. In The Body and the Self, edited by J. L. Bermudez, N.
Eilan, and A. Marcel, pp. 245–266 (Cambridge, MA: MIT press).
Davidson, P. W. (1972). Haptic judgments of curvature by blind and sighted humans. J Exp Psychol
93(1): 43–55.
Dostmohamed, H., and Hayward, V. (2005). Trajectory of contact region on the fingerpad gives the illusion
of haptic shape. Exp Brain Res 164(3): 387–94.
Ellis, R. R., and Lederman, S. J. (1993). The role of haptic versus visual volume cues in the size-weight
illusion. Percept Psychophys 53(3): 315–324.
Ellis, R. R., and Lederman, S. J. (1999). The material-weight illusion revisited. Percept Psychophys
61(8): 1564–1576.
Fernández-Díaz, M., and Travieso, D. (2011). Performance in haptic geometrical matching tasks depends
on movement and position of the arms. Acta Psychol 136(3): 382–389.
Gallace, A., and Spence, C. (2011). To what extent do Gestalt grouping principles influence tactile
perception? Psychol Bull 137(4): 538–61.
Gentaz, E., Baud-Bovy, G., and Luyat, M. (2008). The haptic perception of spatial orientations. Exp Brain
Res 187(3): 331–348.
Gibson, J. J. (1933). Adaptation, after-effect and contrast in the perception of curved lines. J Exp Psychol
16(1): 1–31.
Gibson, J. J. (1963). The useful dimensions of sensitivity. Am Psychol 18: 1–15.
Gibson, J. J. (1966). The Senses Considered as Perceptual Systems (Boston: Houghton Mifflin Company).
Goodwin, A. W., John, K. T., and Marceglia, A. H. (1991). Tactile discrimination of curvature by humans
using only cutaneous information from the fingerpads. Exp Brain Res 86(3): 663–672.
Goodwin, A. W., and Wheat, H. E. (1992). Human tactile discrimination of curvature when contact area
with the skin remains constant. Exp Brain Res 88(2): 447–450.
Gordon, I. A., and Morison, V. (1982). The haptic perception of curvature. Percept Psychophys 31: 446–450.
Hayward, V. (2008). A brief taxonomy of tactile illusions and demonstrations that can be done in a
hardware store. Brain Res Bull 75(6): 742–752.
Heller, M. A. (1989). Texture perception in sighted and blind observers. Percept Psychophys 45(1): 49–54.
Hillebrand, F. (1902). Theorie der scheinbaren Grösse bei binocularem Sehen. Denkschrift Wiener Akad
Mathemat-Naturwissensch Klasse 72: 255–307.
Hohmuth, A., Phillips, W. D., and VanRomer, H. (1976). A discrepancy between two modes of haptic
length perception. J Psychol 92(1): 79–87.
636 Kappers and Bergmann Tiest
Hunter, I. M. L. (1954). Tactile-kinesthetic perception of straightness in blind and sighted humans. Q J Exp
Psychol 6: 149–154.
Ikeda, M., and Uchikawa, K. (1978). Integrating time for visual pattern perception and a comparison with
the tactile mode. Vision Res 18(11): 1565–1571.
Jastrow, J. (1886). The perception of space by disparate senses. Mind 11(44): 539–554.
Jones, L. A. (1986). Perception of force and weight: theory and research. Psychol Bull 100(1): 29–42.
Kaas, A., and van Mier, H. (2006). Haptic spatial matching in near peripersonal space. Exp Brain Res
170: 403–413.
Kahrimanovic, M., Bergmann Tiest, W. M., and Kappers, A. M. L. (2010). Haptic perception of volume
and surface area of 3-D objects. Atten Percept Psychophys 72(2): 517–527.
Kahrimanovic, M., Bergmann Tiest, W. M., and Kappers, A. M. L. (2011a). Characterization of the haptic
shape-weight illusion with 3-dimensional objects. IEEE Trans Haptics 4(4): 316–320.
Kahrimanovic, M., Bergmann Tiest, W. M., and Kappers, A. M. L. (2011b). ‘Discrimination thresholds for
haptic perception of volume, surface area, and weight’. Atten Percept Psychophys 73(8): 2649–2656.
Kappers, A. M. L. (1999). Large systematic deviations in the haptic perception of parallelity. Perception
28(8): 1001–1012.
Kappers, A. M. L. (2002). Haptic perception of parallelity in the midsagittal plane. Acta Psychol
109(1): 25–40.
Kappers, A. M. L. (2003). Large systematic deviations in a bimanual parallelity task: further analysis of
contributing factors. Acta Psychol 114(2): 131–145.
Kappers, A. M. L. (2004). The contributions of egocentric and allocentric reference frames in haptic spatial
tasks. Acta Psychol 117(3): 333–340.
Kappers, A. M. L. (2005). Intermediate frames of reference in haptically perceived parallelity. In Proc
1st Joint EuroHaptics Conference and Symposium on Haptic Interfaces for Virtual Environment and
Teleoperator Systems, pp. 3–11 (Pisa, Italy: IEEE Computer Society).
Kappers, A. M. L. (2007). Haptic space processing—allocentric and egocentric reference frames. Can J Exp
Psychol 61(3): 208–218.
Kappers, A. M. L. (2011). Human perception of shape from touch. Phil Trans R Soc B 366: 3106–3114.
Kappers, A. M. L., and Koenderink, J. J. (1999). Haptic perception of spatial relations. Perception
28(6): 781–795.
Kappers, A. M. L., and Liefers, B. J. (2012). What feels parallel strongly depends on hand orientation. In
Haptics: Perception, Devices, Mobility, and Communication, Vol. 7282 of Lecture Notes on Computer
Science, edited by P. Isokoski and J. Springare, pp. 239–246 (Berlin Heidelberg: Springer-Verlag).
Kappers, A. M. L., and Viergever, R. F. (2006). Hand orientation is insufficiently compensated for in haptic
spatial perception. Exp Brain Res 173(3): 407–414.
Kennedy, J. R. (1993). Drawing & the Blind: Pictures to Touch (New Haven, CT: Yale University Press).
Klatzky, R. L., Lederman, S. J., and Metzger, V. A. (1985). Identifying objects by touch: an ‘expert system’.
Percept Psychophys 37(4): 299–302.
Klatzky, R. L., Loomis, J. M., Lederman, S. J., Wake, H., and Fujita, N. (1993). Haptic identification of
objects and their depictions. Percept Psychophys 54(2): 170–178.
Krishna, A. (2006). Interaction of senses: the effect of vision versus touch on the elongation bias. J Consum
Res 32(4): 557–566.
Langfeld, H. S. (1917). The differential spatial limen for finger span. J Exp Psychol 2(6): 416–430.
Lechelt, E. C., and Verenka, A. (1980). Spatial anisotropy in intramodal and cross-modal judgments of
stimulus orientation: the stability of the oblique effect. Perception 9(5): 581–589.
Lederman, S. J., and Jones, L. A. (2011). Tactile and haptic illusions. IEEE Trans Haptics 4(4): 273–294.
Lederman, S. J., and Klatzky, R. L. (1987). Hand movements: a window into haptic object recognition.
Cogn Psychol 19(3): 342–368.
Tactile and Haptic Perceptual Organization 637
Lederman, S. J., and Klatzky, R. L. (2009). Haptic perception: a tutorial. Atten Percept Psychophys
71(7): 1439–1459.
Lederman, S. J., Klatzky, R. L., Chataway, C., and Summers, C. D. (1990). Visual mediation and the haptic
identification of 2-dimensional pictures of common objects. Percept Psychophys 47(1): 54–64.
Loomis, J. M., Klatzky, R. L., and Lederman, S. J. (1991). Similarity of tactual and visual picture
recognition with limited field of view. Perception 20(2): 167–177.
Loomis, J. M., and Lederman, S. J. (1986). Tactual perception. In Cognitive Processes and Performance,
Vol. 2 of Handbook of Perception and Human Performance, edited by K. R. Boff, L. Kaufman, and
J. P. Thomas, Chapter 31, 31.1–31.41 (New York: John Wiley & Sons).
Magee, L. E., and Kennedy, J. M. (1980). Exploring pictures tactually. Nature 283: 287–288.
Millar, S., and Al-Attar, Z. (2002). The Mu¨ller-Lyer illusion in touch and vision: implications for
multisensory processes. Percept Psychophys 64(3): 353–365.
Murray, D., Ellis, R., Bandomir, C., and Ross, H. (1999). Charpentier (1891) on the size–weight illusion.
Atten Percept Psychophys 61: 1681–1685.
Newport, R., Rabb, B., and Jackson, S. R. (2002). Noninformative vision improves haptic spatial
perception. Curr Biol 12(19): 1661–1664.
Norman, J. F., Norman, H. F., Clayton, A. M., Lianekhammy, J., and Zielke, G. (2004).The visual and
haptic perception of natural object shape. Percept Psychophys 66(2): 342–351.
Overvliet, K., Krampe, R., and Wagemans, J. (2012). Perceptual grouping in haptic search: the influence of
proximity, similarity, and good continuation. J Exp Psychol Hum Percept Perform 38(4): 817–821.
Panday, V., Bergmann Tiest, W. M., and Kappers, A. M. L. (2012). Influence of local properties on haptic
perception of global object orientation. IEEE Trans Haptics 5: 58–65.
Pawluk, D., Kitada, R., Abramowicz, A., Hamilton, C., and Lederman, S. J. (2010). Haptic figure-ground
differentiation via a haptic glance. In IEEE Haptics Symposium, 25–26 March, Waltham Massachusetts,
USA, 63–66.
Picard, D., and Lebaz, S. (2012). Identifying raised-line drawings by touch: a hard but not impossible task. J
Visual Impair Blindness 106(7): 427–431.
Plaisier, M. A., Bergmann Tiest, W. M., and Kappers, A. M. L. (2008). Haptic pop-out in a hand sweep.
Acta Psychol 128: 368–377.
Plaisier, M. A., Bergmann Tiest, W. M., and Kappers, A. M. L. (2009). One, two, three, many—subitizing
in active touch. Acta Psychol 131(2): 163–170.
Pont, S. C., Kappers, A. M. L., and Koenderink, J. J. (1997). Haptic curvature discrimination at several
regions of the hand. Percept Psychophys 59(8): 1225–1240.
Pont, S. C., Kappers, A. M. L., and Koenderink, J. J. (1998). Anisotropy in haptic curvature and shape
perception. Perception 27(5): 573–589.
Pont, S. C., Kappers, A. M. L., and Koenderink, J. J. (1999). Similar mechanisms underlie curvature
comparison by static and dynamic touch. Percept Psychophys 61(5): 874–894.
Proske, U., and Gandevia, S. C. (2009). The kinesthetic senses. J Physiol 587(17): 4139–4146.
Robertson, A. (1902). Studies from the Psychological Laboratory of the University of California VI
‘Geometric-optical’ illusions in touch. Psychol Rev 9: 549–569.
Robles-De-La-Torre, G., and Hayward, V. (2001). Force can overcome object geometry in the perception of
shape through active touch. Nature 412(6845): 445–448.
Rossetti, Y., Gaunet, F., and Thinus-Blanc, C. (1996). Early visual experience affects memorization and
spatial representation of proprioceptive targets. NeuroReport 7(6): 1219–1223.
Stevens, S. S., and Stone, G. (1959). Finger span: ratio scale, category scale and JND scale. J Exp Psychol
57(2): 91–95.
Suzuki, K., and Arashida, R. (1992). Geometrical haptic illusions revisited—haptic illusions compared with
visual illusions. Percept Psychophys 52(3): 329–335.
638 Kappers and Bergmann Tiest
Terada, K., Kumazaki, A., Miyata, D., and Ito, A. (2006). Haptic length display based on
cutaneous-proprioceptive integration. J Robot Mechatron 18(4): 489–498.
van der Horst, B. J., Duijndam, M. J. A., Ketels, M. F. M., Wilbers, M. T. J. M., Zwijsen, S. A., and
Kappers, A. M. L. (2008a). Intramanual and intermanual transfer of the curvature aftereffect. Exp Brain
Res 187(3): 491–496.
van der Horst, B. J., and Kappers, A. M. L. (2008). Using curvature information in haptic shape perception
of 3D objects. Exp Brain Res 190(3): 361–367.
van der Horst, B. J., Willebrands, W. P., and Kappers, A. M. L. (2008b). Transfer of the curvature aftereffect
in dynamic touch. Neuropsychologia 46(12): 2966–2972.
van Polanen, V., Bergmann Tiest, W. M., and Kappers, A. M. L. (2012). Haptic search for hard and soft
spheres. PLOS One 7(10): e45298
von Helmholtz, H. (1867/1962). Treatise on Physiological Optics, Vol. 3 (English transl. by J. P. C. Southall)
for the Optical Society of America (1925) from the 3rd German edn of Handbuch der physiologischen
Optik (New York: Dover).
Vogels, I. M. L. C., Kappers, A. M. L., and Koenderink, J. J. (1996). Haptic aftereffect of curved surfaces.
Perception 25(1): 109–119.
Vogels, I. M. L. C., Kappers, A. M. L., and Koenderink, J. J. (1997). Investigation into the origin of the
haptic after-effect of curved surfaces. Perception 26: 101–107.
Volcic, R., and Kappers, A. M. L. (2008). Allocentric and egocentric reference frames in the processing of
three-dimensional haptic space. Exp Brain Res 188(2): 199–213.
Volcic, R., Kappers, A. M. L., and Koenderink, J. J. (2007). Haptic parallelity perception on the
frontoparallel plane: the involvement of reference frames. Percept Psychophys 69(2): 276–86.
von Skramlik, E. (1937). Psychophysiologie der Tastsinne (Leipzig: Akademische Verlagsgesellschaft).
Weber, E. H. (1834/1986). E.H. Weber on the Tactile Senses, H. E. Ross and D. J. Murray edition
(Hove: Erlbaum (UK) Taylor & Francis).
Weinstein, S. (1968). Intensive and extensive aspects of tactile sensitivity as a function of body part, sex,
and laterality. In The Skin Senses, edited by D. Kenshalo, pp. 195–222 (Springfield, IL: Thomas).
Wijntjes, M. W. A., Sato, A., Hayward, V., and Kappers, A. M. L. (2009). Local surface orientation
dominates haptic curvature discrimination. IEEE Trans Haptics 2(2): 94–102.
Wijntjes, M. W. A., van Lienen, T., Verstijnen, I. M., and Kappers, A. M. L. (2008). The influence of picture
size on recognition and exploratory behavior in raised-line drawings. Perception 37(4): 602–614.
Zuidhoek, S., Kappers, A. M. L., van der Lubbe, R. H. J., and Postma, A. (2003). Delay improves
performance on a haptic spatial matching task. Exp Brain Res 149(3): 320–330.
Zuidhoek, S., Visser, A., Bredero, M. E., and Postma, A. (2004). Multisensory integration mechanisms in
haptic space perception. Exp Brain Res 157(2): 265–268.
Chapter 31
Introduction
The last quarter of a century or so has seen a dramatic resurgence of research interest in the ques-
tion of how sensory inputs from different modalities are combined, merged, and/or integrated,
and, more generally, come to affect one another in perception (see Bremner et al. 2012; Stein
2012; Stein et al. 2010, for reviews). Until very recently, however, the majority of this research,
inspired as it often has been by neurophysiological studies of orienting responses in model brain
systems, such as the superior colliculus, has tended to use simple stimuli (e.g., a single beep,
flash, and/or tactile stimulus) on any given trial (see Stein & Meredith 1993 for a review). As a
result, to date, problems of perceptual organization have generally taken something of a back seat
in the world of multisensory perception research. That said, there has recently been a surge of
scientific interest in trying to understand how the perceptual system (normally in humans) deals
with, or organizes, more complex streams/combinations of multisensory inputs into meaningful
perceptual units, and how ambiguous (often bistable) inputs are interpreted over time. In trying
to answer such questions, it is natural that researchers look for inspiration in the large body of
empirical research that has been published over the last century on the Gestalt grouping prin-
ciples identified within the visual (Beck 1982; Kimchi et al. 2003; Kubovy & Pomerantz 1981;
Wagemans et al. 2012; Wertheimer 1923/1938; see also the many other chapters in this publi-
cation), auditory (Bregman 1990; Wertheimer 1923/38; see also Denham in this publication),
and occasionally tactile systems (Gallace & Spence 2011; see also ‘Tactile and haptic perceptual
organization’ by Kappers & Tiest). One might reasonably imagine that those classic grouping
principles, such as common fate, binding by proximity, and binding by similarity, that have been
shown to influence perceptual organization when multiple stimuli are presented within the same
sensory modality should also operate when combinations of stimuli originating from different
sensory modalities are presented instead.
In this review, the evidence concerning the existence of general principles of cross-modal per-
ceptual organization and multisensory Gestalt grouping is summarized. The focus here is pri-
marily on cross-modal perceptual organization and multisensory Gestalten for the spatial (some
would say ‘higher’) senses of audition, vision, and touch. Given the space constraints, this review
will focus primarily on the results of research that has been published more recently.1 The main
body of the text is arranged around a review of the evidence that is relevant to answering four key
questions that run through the literature on cross-modal perceptual organization.
1 Researchers interested in more of a historical perspective should see Spence et al. (2007) and/or Spence and
Chen (2012).
640 Spence
Note that the stimulus displays capitalized on the cross-modal correspondence between pitch and elevation
2
Physical display
(a) Auditory Visual (b)
stimuli stimuli
8 ‒ 72
mm
Hz
T2 T2 Lower
T2 T6 stimuli
ratio = 1.06
Frequency
Lower T4 T4
T6
4 mm
stimuli
T4
Time
(c) One-object percept (slow rate) (d) Two-object percept (fast rate)
Vertical position in vision
Frequency in audition;
Time Time
Fig. 31.1 (a, b) Schematic illustration of the sequence of auditory and visual stimuli presented by O’Leary
and Rhodes (1984) in their study of cross-modal influences on perceptual organization. T1–T6 indicate
the temporal order (from first to last) in which the six stimuli were presented in each sensory modality.
Half of the stimuli were from an upper group (frequency in sound, spatial location in vision), the rest
from a lower group. The stimuli were presented in sequence, alternating between events from the
upper and lower groups, either delivered individually (unimodal condition) or else together in synchrony
(in the cross-modal condition). (c, d) Perceptual correlates associated with different rates of stimulus
presentation. In either sensory modality, at slow rates of stimulus presentation (c), a single stream
(auditory or visual) was perceived, as shown by the continuous line connecting the points. At faster rates
of stimulus presentation (d), however, two separate streams were perceived concurrently, one in the
upper range (frequency or spatial position, for sound or vision, respectively) and the other in the lower
range. In the cross-modal condition, at intermediate rates of stimulus presentation, participants’ reports
of whether they perceived one stream versus two in a given sensory modality were influenced by their
perception of there being one or two streams in the other modality. O’Leary and Rhodes took these
results to show that the nature of the perceptual organization in one sensory modality can influence
how the perceptual scene may be organized (or segregated) in another modality.
Reproduced from Stein, Barry E., ed., The New Handbook of Multisensory Processing, figure 14.1, © 2012
Massachusetts Institute of Technology, by permission of The MIT Press.
The visual stimuli in Hupé et al.’s (2008) first experiment consisted of a network of crossing
lines (square wave gratings) viewed through a circular aperture. This display could either be per-
ceived as two gratings moving in opposite directions or as a single plaid moving in an inter-
mediate direction. Meanwhile, pure tones alternating in frequency in the pattern High (pitch)/
Low/High-High/Low/High could be presented over headphones. The participants either heard
642 Spence
two segregated streams (High-High-High, and --Low---Low--) or a single stream with the pitch
alternating from item to item. While the statistics of switching between alternative perceptual
interpretations were similar for the two modalities, there was absolutely no correlation between
the perceptual switches taking place in audition and vision.
This first experiment can, though, be criticized on the grounds that the participants would
have had no particular reason to treat the auditory and visual stimuli as belonging to the same
object or event (that is, they were completely unrelated). Hence, the fact that Hupé et al. (2008)
obtained a null result is perhaps not so surprising. In a second experiment, the auditory and visual
stimuli were spatiotemporally correlated: the auditory stimuli were as in Experiment 1, but were
now presented in an alternating sequence from one of a pair of loudspeaker cones, one placed on
either side of central fixation. The visual stimuli consisted of the illumination of an LED placed in
front of either loudspeaker that could be perceived either as two lights flashing independently, or
else could give rise to the perception of horizontal visual apparent motion. However, once again,
there was no evidence of any correlation between the perceptual switches taking place in the two
modalities. Therefore, despite the fact that the spatiotemporal presentation of the auditory and
visual stimuli was correlated in this study, the participants would presumably not have had any
particularly good reason to bind the contents of their visual and auditory experience.
One other study that is worth mentioning here comes from Sato et al. (2007). They investigated
the auditory and visual verbal transformation effect. In the auditory version of this phenomenon
(see Warren & Gregory 1958), as a participant listens to a speech stimulus that is played repeatedly,
such as the word ‘life’, after a number of repetitions, it alternates and the observer will likely hear
it as ‘fly’ instead. As time passes by, the percept alternates back and forth. Sato et al. discovered
that the same thing happens if we look at moving lips repeatedly uttering the same syllable instead
(this is known as the visual transformation effect). Sato and his colleagues presented auditory
alone, visual alone, and audiovisual stimulus combinations (either congruent or incongruent).
The participants were instructed to report their initial auditory ‘percept’, and whenever it changed
over the course of the 90 seconds of each trial. In Sato et al.’s study, either /psә/ or /sәp/ were used
as the speech stimuli. The results of their first experiment revealed that the incongruent audio-
visual condition, where the visual stimulus alternated between being congruent and incongruent
with what was heard, resulted in a higher rate of perceptual alternations as compared to any of the
other three conditions. Note here that what is seen and what is heard may be taken by participants
to refer to the same phonological entity. In fact, Kubovy and Yu (2012) have argued recently that
this (speech) may constitute a unique case when it comes to multisensory multistability.3
To date, the only studies that have attempted to investigate the question of whether the perceptual
organization taking place in one modality affects the perceptual organization taking place in the other
have involved the presentation of audiovisual stimuli (Hupé et al. 2008; O’Leary & Rhodes 1984; Sato
et al. 2007). It is interesting to speculate, then, on whether a similar conclusion would also have been
reached on the basis of visuotactile studies.4 There is currently surprisingly little unequivocal support
One final thing to note here is that it is unclear from Sato et al.’s (2007) study whether their participants ever
3
experienced the audiovisual stimulus stream as presenting one stimulus auditorily and another visually, as
sometimes happens in McGurk-type experiments.
One way to test this possibility would be to look for correlations in the changing interpretation of bistable
4
spatial displays such as the Ternus display (Harrar & Harris 2007; cf. Shi et al. 2010), or in simultaneously pre-
sented visual and tactile apparent motion quartets (Carter et al. 2008). Suggestive evidence from Harrar and
Harris, not to mention one’s own intuition, would appear to suggest that if the appropriate stimulus timings
Cross-modal perceptual organization 643
for the view that the perceptual organization (or interpretation) of an ambiguous, or bistable, stimulus
(or stimuli) in one sensory modality will necessarily, and automatically, affect the perceptual organi-
zation (or interpretation) of a stimulus (or stimuli) that happens to be presented in another modality
at around the same time (even when the auditory and visual stimuli can plausibly be related to one
another—e.g., as a result of their cross-modal correspondence, see O’Leary & Rhodes 1984, or due to
their spatiotemporal patterning, see Hupé et al. 2008; see also Kubovy & Yu 2012).
could be established, such that synchronous stimulus presentation was maintained while both modality inputs
retained their individual bistability, then any switch in the perceptual interpretation of the visual display
would likely also trigger a switch in the interpretation of the tactile display (one might certainly frame such a
result in terms of visual dominance).
The temporal ventriloquism effect has most frequently been demonstrated between pairs of auditory and
5
visual stimuli. It occurs when the perceived timing of an event in one modality (normally vision) is pulled
toward temporal alignment with a slightly asynchronous event presented in another modality (e.g., audition;
see Morein-Zamir et al. 2003; Vroomen et al. 2004).
644 Spence
topic of apparent motion). At the same time, the participants are instructed to ignore any cues
delivered by the simultaneous presentation of an irrelevant visual (or, on occasion, tactile) appar-
ent motion stream (see Soto-Faraco et al. 2004b for a review). The results of numerous stud-
ies have now demonstrated that people simply cannot ignore the visual apparent motion (even
though it may be entirely task-irrelevant), and will often report that they perceived the sound as
moving in the same direction, even if the opposite was, in fact, the case (e.g., Soto-Faraco et al.
2002). As hinted at already, similar cross-modal dynamic capture effects have also been reported
in experiments involving the presentation of tactile stimuli as well, both when tactile apparent
motion happens to act as the target modality, and when it acts as the to-be-ignored distractor
modality (Lyons et al. 2006; Sanabria et al. 2005b; Soto-Faraco et al. 2004a).
One other area of research that is relevant to the question of cross-modal perceptual organiza-
tion relates to the local versus global perceptual grouping taking place within a given modality
and its effect on perceptual organization within another sensory modality. For instance, Sanabria
et al. (2004) demonstrated the dominance of global field effects over local visual apparent motion
when the two were pitted directly against each other in the setting of the cross-modal dynamic
capture task (see Figure 31.2). In this particular experiment, the four-lights display (see Figure
31.2B) induced the impression of two pairs of lights moving in one direction, while the central
pair of lights (if considered in isolation) appeared to move in the opposite direction. In other
words, if the local motion of the two central lights was from right to left, the global motion of the
four-light display was from left to right instead. However, Sanabria et al.’s results revealed that it
was the direction of global visual motion that ‘captured’ the perceived direction of auditory appar-
ent motion (see also Sanabria et al. 2005a).
2-lights
T2 T2
T1 T1
Light Sound
(b)
4-lights
T2 T2
T1 T1
Fig. 31.2 Schematic illustration of the different trial types presented in Sanabria et al.’s (2004)
study of the effect of local versus global visual perceptual grouping on the cross-modal dynamic
capture effect. The horizontal arrows indicate the (global) direction of visual apparent motion.
The magnitude of the cross-modal dynamic capture effect was significantly greater in the 2-lights
displays (a) than in the 4-lights displays (b). More importantly for present purposes though, the
results also revealed that the modulatory cross-modal effect of visual apparent motion on the
perceived direction of auditory apparent motion was determined by the global direction of visual
apparent motion rather than by the local motion of the central pair of lights (which appeared to
move in the opposite direction).
Data from Daniel Sanabria, Salvador Soto-Faraco, Jason S. Chan, and Charles Spence, When does visual
perceptual grouping affect multisensory integration? Cognitive, Affective, and Behavioural Neuroscience, 4(2),
pp. 218–29, 2004.
Cross-modal perceptual organization 645
Elsewhere, Rahne et al. (2008) have used an alternating high/low tone sequence, similar to that
used by O’Leary and Rhodes (1984), to demonstrate the effect of visual segmentation cues on audi-
tory stream segregation. The participants in their study either saw a circle presented in synchrony
with every third tone (thus being paired successively with a high tone, then with a low tone, then
with a high tone, etc.) or else they saw a square that appeared in synchrony with just the low-pitched
tones. The likelihood that the participants would perceive the auditory sequence as a single stream
was significantly higher in the former (circle) condition than in the latter (square) condition.
In terms of visuotactile interactions, Yao et al. (2009) have investigated whether the presenta-
tion of visual information would affect the cutaneous rabbit illusion (Geldard & Sherrick 1972).
They placed tactile stimulators at either end of a participant’s arm. LEDs were also placed at the
same locations, as well as at the ‘illusory’ locations where the tactile stimuli are generally per-
ceived to have been presented following the activation of the tactors (in this case, at the interven-
ing position, along the arm). Yao et al. reported that the activation of the lights that mimicked the
hopping percept strengthened the tactile illusion, while the activation of the lights at the veridical
locations of tactile stimulation weakened it. This result shows that the tactile grouping underly-
ing the cutaneous rabbit illusion can be modulated by concurrently presented visual information,
even if it is not relevant to the participant’s task.
At this point, it is worth noting that the majority of studies reported thus far in the text have
involved situations in which the conditions for intramodal perceptual grouping were established
prior to the presentation of the critical cross-modal stimuli (e.g., see Ngo & Spence 2010; Vroomen &
De Gelder 2000; Watanabe & Shimojo 2001; Yao et al. 2009). However, it turns out that even
when the situation is temporally reversed, and the strength of intramodal perceptual grouping is
modulated by any stimuli that happen to be presented after the critical cross-modal stimuli, the
story remains unchanged (e.g., see Sanabria et al. 2005b). Thus, it would appear that intramodal
perceptual grouping normally tends to take precedence over cross-modal perceptual grouping
(see also Cook & Van Valkenburg 2009 for a similar conclusion).
In summary, then, a relatively large body of empirical evidence involving a range of different
behavioural paradigms has by now convincingly demonstrated that as the strength of intramodal
perceptual grouping increases, the magnitude of any cross-modal effects on visual, auditory, or
tactile perception are reduced. Thus, the answer to the second of the questions posed in this chapter
would appear to be unequivocally in the affirmative: that is, the strength of intramodal perceptual
grouping can indeed modulate the strength/magnitude of cross-modal interactions (at least when
the stimuli can be meaningfully related to one another; cf. Cook & Van Valkenburg 2009).
Before moving on, it should be noted that a large body of research shows that the rate of stimulus
presentation in one sensory modality can influence the perceived rate of presentation of stimuli
delivered in another modality (e.g., Gebhard & Mowbray 1959; Recanzone 2003; Wada et al. 2003;
Welch et al. 1986). However, as highlighted by Spence et al. (2007), given the high rates of stimulus
presentation used in the majority of studies in this area, it could plausibly be argued that most of
the results that have been published to date actually tell us more about cross-modal influences on
the perception of a discrete stimulus attribute (e.g., the flicker or flutter rate) rather than necessar-
ily telling us anything meaningful about the cross-modal constraints on perceptual organization.
An argument could certainly be made here that it is only when the stimuli are presented at rates
that are slow enough to allow for the individuation of the elements within the relevant stimulus
streams, and thus the matching of those elements across sensory modalities, that the results of
such research will really start to say anything interesting about cross-modal perceptual organiza-
tion (rather than just being relevant to researchers interested in multisensory integration).
Relevant to this discussion is research by Fujisaki and Nishida (e.g., Fujisaki & Nishida 2010).
They conducted a number of studies demonstrating that people can only really pair (or bind) pairs
646 Spence
of auditory, visual, and/or tactile stimulus streams cross-modally (i.e., in order to make in/out-of-
phase judgements) when the stimuli in those streams are presented at rates that do not exceed
4 Hz.6 If we take this as a legitimate argument (and I am the first to flag up that some may find it
controversial), then the majority of research on cross-modal influences on rate perception and on
flicker/flutter thresholds may, ultimately, turn out not to be relevant to the topic of cross-modal
perceptual organization (see also Benjamins et al. 2008).
6 The one modality pairing where this limit did not apply was for cross-modal interactions between audi-
tory and tactile stimuli. There phase judgements are possible at stimulus presentation rates as high as 12 Hz
(Fujisaki & Nishida 2010).
Cross-modal perceptual organization 647
Loudspeaker
LED
Visual apparent
motion (observed)
Auditory apparent
motion (observed)
Intermodal apparent
motion (anticipated)
Fig. 31.3 Schematic illustration of the stimulus displays used to investigate the possibility of an
intersensory motion Gestalt (i.e., supramodal apparent motion) by Huddleston et al. (2008). When
the interstimulus intervals were adjusted appropriately, participants reported visual apparent motion
(vertically), auditory apparent motion (horizontally), but there were no reports of any circular
supramodal (or intermodal) apparent motion, thus providing evidence against the existence of an
intersensory Gestalt, at least in this case of audiovisual apparent motion.
apparent motion was stronger than the tactile motion. However, the interesting result for present
purposes was that mean ratings of the strength of apparent motion, while much weaker than
intramodal motion, were significantly greater than 0 for the cross-modal trials at many of the ISIs
tested. However, one could imagine that if Allen and Kolers (1981) were still writing, they might
not be convinced by such effects based, as they are, on self-report. It would seem plausible that
task demands might have played some role in modulating how participants respond in this kind
of task. Thus, more objective data using a more indirect task would certainly be useful in order
to convince the sceptic. However, on the other hand, Harrar et al. might want to argue that there
is, in fact, nothing fundamentally wrong with using subjective ratings to assess the strength of
apparent motion.
Researchers have also looked for evidence to support the existence of intersensory Gestalten in
the area of intersensory rhythm perception. The idea here is that it might be possible to experience
a cross-modal (or intermodal) rhythm that is not present in any one of the component unisensory
stimulus streams. However, just as for the other studies already mentioned, a closer look at the
literature reveals that while claims of intermodal rhythm perception certainly do exist (Guttman
et al. 2005), there is actually surprisingly little reliable psychophysical evidence to back up such
assertions. Furthermore, many authors have explicitly argued against the possibility of intermodal
rhythm perception (e.g., Fraisse 1963). Perhaps the strongest evidence in support of such a claim
comes from recent research on the perception of musical metre.
Huang et al. (2012) have recently provided some intriguing evidence that appears to suggest that
people can efficiently extract the musical metre (defined as the abstract temporal structure corre-
sponding to the periodic regularities of the music) from a temporal sequence of elements, some of
which happen to be presented auditorily, others via the sense of touch. Importantly, here, the metre
information was not available to either modality stream when considered in isolation. Huang et al.’s
results can therefore be taken as providing support for the claim that audiotactile musical metre per-
ception constitutes one of the first genuinely intersensory Gestalten to have been documented to date.
In conclusion, despite a number of attempts having been made over the decades, there is still
surprisingly little scientific evidence to support the claim that intersensory (or cross-modal)
648 Spence
Gestalten really do exist (see Guttman et al. 2005, p. 234; Huddleston et al. 2008).7 That said, both
of the examples just described (Harrar et al. 2008; Huang et al. 2012) might be taken to challenge
the conclusion forwarded recently by Spence and Chen (2012) that truly intersensory Gestalten
do not exist (see also Spence & Bayne 2015). One suggestion here as to why they may be so elu-
sive in laboratory studies (and presumably also in daily life) is that the nature of the experience
that we have in each of the senses is so fundamentally different that it may make cross- or trans-
modal Gestalten particularly difficult, if not impossible, to achieve or find (see Kubovy & Yu 2012;
Spence & Bayne 2015, on this point; though see Aksentijević et al. 2001; Julesz & Hirsh 1972;
Lakatos & Shepard 1997, for evidence that similar grouping principles may structure our experi-
ence in the different modalities).
Those working in the field of flavour perception often suggest that flavours constitute a form of multisensory
7
Gestalt (e.g., Delwiche 2004; Small & Green 2011; Spence et al. 2012; Verhagen & Engelen 2006). If such a
claim were to be true, then this could constitute another example of (genuinely intermodal) perceptual group-
ing. However, it is difficult to determine whether many of the authors making such claims really mean any-
thing more by the suggestion that flavour is a Gestalt than merely that the combination of gustatory, retronasal
olfactory, and trigeminal inputs give rise to an emergent property, or object, that is, the flavour of a food or
beverage that happens to be localized to the mouth. There really isn’t time to do justice to these questions here,
but the interested reader is directed to Kroeze for further discussion of this issue.
It is perhaps worth noting that cross-modal causality also plays an important role in audiovisual integration
8
(see Armontrout et al. 2009; Kubovy & Schutz 2010; Schutz & Kubovy 2009).
Cross-modal perceptual organization 649
stimuli. The pair of visual and auditory stimuli presented on each trial were either cross-modally
congruent (i.e., a smaller circle was presented together with a higher-pitched sound or a larger
circle with a lower-pitched sound) or else they were incongruent (i.e., a smaller circle was paired
with a lower-pitched sound or a larger circle paired with a higher-pitched sound). The results
revealed that participants found it significantly harder to report the temporal order in which
the stimuli had been presented on the cross-modally congruent trials as compared to on the
cross-modally incongruent trials. The same pattern of results was also documented in a sec-
ond experiment in which the cross-modal correspondence between visual shape (angularity)
and auditory pitch/waveform was assessed. In a final study, Parise and Spence (2009) went on
to demonstrate a larger spatial ventriloquism effect for pairs of spatially-misaligned auditory
and visual stimuli when they were cross-modally congruent than when they were incongruent.
The results demonstrate enhanced spatiotemporal integration (as measured by the temporal and
spatial ventriloquism effects), thus leading to poorer temporal and spatial resolution of the com-
ponent unimodal stimuli, on cross-modally congruent as opposed to cross-modally incongruent
trials. Such findings suggest that cross-modal correspondences, which can perhaps be thought
of as a form of cross-modal Gestalt grouping by similarity, influence multisensory perception/
integration.
A growing number of studies published over the last few years have also demonstrated that the
perception of a bistable or ambiguous stimulus on one modality (normally vision) can be biased
by the information presented in another sensory modality, usually audition (e.g., Conrad et al.
2010; Guzman-Martinez et al. 2012; Kang & Blake 2005; Takahashi & Watanabe 2010, 2011; Van
Ee et al. 2009) but, on occasion, touch/haptics (see Binda et al. 2010; Bruno et al. 2007; Lunghi
et al. 2010). Often, such studies have contrasted pairings of stimuli that do, or do not, correspond
cross-modally. So, for example, in one study, the frequency of an amplitude-modulated auditory
stimulus was shown to bias subjective reports (e.g., in the binocular rivalry situation) toward
one of two competing visual stimuli (gratings) whose phase and contrast modulation frequency
happened to match that of the sound (see Kang & Blake 2005). Similarly, exploring an oriented
grooved surface haptically can also bias a participant’s perception in the binocular rivalry situa-
tion toward a congruently (as opposed to an orthogonally) oriented visual image (grating) of the
same spatial frequency (see Binda et al. 2010; Lunghi et al. 2010).
Thus, taken together, the latest evidence on the topic of cross-modal correspondences demon-
strates that when the stimuli presented in different sensory modalities correspond, there may be
perceptual interactions observed that are not present when the stimuli are incongruent (either
because they are incongruent, or else because they are simply unrelated to the stimuli/task that a
participant has been given to perform; Sweeny et al. 2012). What is more, there is also a feeling
of rightness that accompanies the pairing of stimuli that correspond cross-modally (which isn’t
there for pairs of stimuli that do not correspond; Koriat 2008). Such correspondences need not
be based on a perceptual mapping, but they often are. What is more, they can often affect both
perceptual organization and awareness. Such phenomena can be conceptualized in terms of the
Gestalt grouping based on similarity. Indeed, cross-modal correspondences have been described
as cross-modal similarities by some researchers (e.g., see Marks 1987a, b).9
9 Note here that there is likely also an interesting link to questions of perceptual organization in synaesthesia
proper (with which cross-modal correspondences are often confused; though see Deroy & Spence 2013) and
their potential use within the burgeoning literature on sensory substitution (see Styles & Shimojo in this pub-
lication).
650 Spence
Conclusions
The latest evidence from a number of psychophysical studies of cross-modal scene perception and
perceptual organization that have been reviewed in this chapter provides some answers to the four
questions that were outlined at the start of this piece. First, it would appear that the perceptual
organization of the stimuli taking place in one sensory modality does not automatically influ-
ence the perceptual organization of stimuli presented in another sensory modality (Hupé et al.
2008; O’Leary & Rhodes 1984), except perhaps in the case of speech (Sato et al. 2007; see also
Kubovy & Yu 2012). Second, intramodal perceptual grouping frequently modulates the strength
of cross-modal perceptual grouping (or interactions; Soto-Faraco et al. 2002; see Spence & Chen
2012 for a review). The evidence suggests that unimodal auditory, visual, and tactile perceptual
grouping can, and do, affect the cross-modal interactions taking place between auditory and vis-
ual stimuli. Finally, there is currently little convincing evidence for the existence of intersensory
Gestalten (see Allen & Kolers 1981; Huddleston et al. 2008), despite various largely anecdotal or
introspective claims to the contrary (e.g., see Harrar et al. 2008; Zapparoli & Reatto 1969). We
should keep in mind that several of the latest findings might nevertheless require us to revise this
view (see Harrar et al. 2008; Huang et al. 2012; Yao et al. 2009, on this question). Finally, I have
reviewed the latest evidence showing that cross-modal correspondences (Spence 2011), which
sometimes modulate both perceptual organization and awareness, can be conceptualized in terms
of cross-modal grouping by similarity.
It would seem probable that our understanding of the cross-modal constraints on perceptual
organization will likely be furthered in the coming years by animal (neurophysiological) studies
(see Rahne et al. 2008 for one such study). Furthermore, although beyond the scope of the pre-
sent study, it should also be noted that attention is likely to play an important role in cross-modal
perceptual organization (see Kimchi & Razpurker-Apfeld 2004; Sanabria et al. 2007; Talsma et al.
2010; and the chapters by Alais, Holcombe, Humphreys, and Rees in this publication). What does
seem clear already, though, is that cross-modal perceptual organization is modulated by Gestalt
grouping principles such as grouping by spatial proximity, common fate, and similarity just as in
the case of intramodal perception.
References
Aksentijević, A., Elliott, M.A., and Barber, P.J. (2001). ‘Dynamics of Perceptual Grouping: Similarities in
the Organization of Visual and Auditory Groups’. Visual Cognition 8: 349–358.
Allen, P. G., and Kolers, P. A. (1981). ‘Sensory Specificity of Apparent Motion’. Journal of Experimental
Psychology: Human Perception and Performance 7: 1318–1326.
Armontrout, J. A., Schutz, M., and Kubovy, M. (2009). ‘Visual Determinants of a Cross-modal Illusion’.
Attention, Perception, & Psychophysics 71: 1618–1627.
Beck, J. (Ed.) (1982). Organization and Representation in Vision (Hillsdale, NJ: Erlbaum).
Benjamins, J. S., van der Smagt, M. J., and Verstraten, F. A. J. (2008). ‘Matching Auditory and Visual
Signals: Is Sensory Modality Just Another Feature?’ Perception 37: 848–858.
Binda, P., Lunghi, C., and Morrone, C. (2010). ‘Touch Disambiguates Rivalrous Perception at Early Stages
of Visual Analysis’. Journal of Vision 10(7): 854.
Bregman, A. S. (1990). Auditory Scene Analysis: The Perceptual Organization of Sound (Cambridge,
MA: MIT Press).
Bremner, A., Lewkowicz, D., and Spence, C. (Eds.) (2012). Multisensory Development (Oxford: Oxford
University Press).
Cross-modal perceptual organization 651
Bruno, N., Jacomuzzi, A., Bertamini, M., and Meyer, G. (2007). ‘A Visual-haptic Necker Cube Reveals Temporal
Constraints on Intersensory Merging During Perceptual Exploration’. Neuropsychologia 45: 469–475.
Carter, O., Konkle, T., Wang, Q., Hayward, V., and Moore, C. (2008). ‘Tactile Rivalry Demonstrated with
an Ambiguous Apparent-motion Quartet’. Current Biology 18: 1050–1054.
Conrad, V., Bartels, A., Kleiner, M., and Noppeney, U. (2010). ‘Audiovisual Interactions in Binocular
Rivalry’. Journal of Vision 10(10): 1–15.
Cook, L. A., and Van Valkenburg, D. L. (2009). ‘Audio-visual Organization and the Temporal
Ventriloquism Effect Between Grouped Sequences: Evidence that Unimodal Grouping Precedes
Cross-modal Integration’. Perception 38: 1220–1233.
Delwiche, J. (2004). ‘The Impact of Perceptual Interactions on Perceived Flavor’. Food Quality and
Preference 15: 137–146.
Deroy, O., and Spence, C. (2013). ‘Weakening the Case for “Weak Synaesthesia”: Why Crossmodal
Correspondences are not Synaesthetic’. Psychonomic Bulletin & Review 20: 643–664.
Fraisse, P. (1963). The Psychology of Time (London: Harper & Row).
Fujisaki, W., and Nishida, S. (2010). ‘A Common Perceptual Temporal Limit of Binding Synchronous
Inputs Across Different Sensory Attributes and Modalities’. Proceedings of the Royal Society B
277: 2281–2290.
Gallace, A., and Spence, C. (2011). ‘To What Extent do Gestalt Grouping Principles Influence Tactile
Perception?’ Psychological Bulletin 137: 538–561.
Gebhard, J. W., and Mowbray, G. H. (1959). ‘On Discriminating the Rate of Visual Flicker and Auditory
Flutter’. American Journal of Psychology 72: 521–528.
Geldard, F. A., and Sherrick, C. E. (1972). ‘The Cutaneous “Rabbit”; A Perceptual Illusion’. Science
178: 178–179.
Gilbert, G. M. (1938). ‘A Study in Inter-sensory Gestalten’. Psychological Bulletin 35: 698.
Gilbert, G. M. (1941). ‘Inter-sensory Facilitation and Inhibition’. Journal of General Psychology 24: 381–407.
Guttman, S. E., Gilroy, L. A., and Blake, R. (2005). ‘Hearing What the Eyes See: Auditory Encoding of
Visual Temporal Sequences’. Psychological Science 16: 228–235.
Guzman-Martinez, E., Ortega, L., Grabowecky, M., Mossbridge, J., and Suzuki, S. (2012). ‘Interactive
Coding of Visual Spatial Frequency and Auditory Amplitude-modulation Rate’. Current Biology
22: 383–388.
Harrar, V., and Harris, L. R. (2007). ‘Multimodal Ternus: Visual, Tactile, and Visuo-tactile Grouping in
Apparent Motion’. Perception 10: 1455–1464.
Harrar, V., Winter, R., and Harris, L. R. (2008). ‘Visuotactile Apparent Motion’. Perception & Psychophysics
70: 807–817.
Huang, J., Gamble, D., Sarnlertsophon, K., Wang, X., and Hsiao, S. (2012). ‘Feeling Music: Integration of
Auditory and Tactile Inputs in Musical Meter Perception’. PLoS ONE 7(10): e48496.
Huddleston, W. E., Lewis, J. W., Phinney, R. E., and DeYoe, E. A. (2008). ‘Auditory and Visual
Attention-based Apparent Motion Share Functional Parallels’. Perception & Psychophysics 70: 1207–1216.
Hupé, J. M., Joffoa, L. M., and Pressnitzer, D. (2008). ‘Bistability for Audiovisual Stimuli: Perceptual
Decision is Modality Specific’. Journal of Vision 8(7): 1–15.
Julesz, B., and Hirsh, I. J. (1972). ‘Visual and Auditory Perception—An Essay of Comparison’. In Human
Communication: A Unified View, edited by E. E. David, Jr., and P. B. Denes (Eds.), pp. 283–340
(New York: McGraw-Hill).
Kang, M.-S., and Blake, R. (2005). ‘Perceptual Synergy Between Seeing and Hearing Revealed During
Binocular Rivalry’. Psichologija 32: 7–15.
Keetels, M., Stekelenburg, J., and Vroomen, J. (2007). ‘Auditory Grouping Occurs Prior to Intersensory
Pairing: Evidence From Temporal Ventriloquism’. Experimental Brain Research 180: 449–456.
652 Spence
Kimchi, R., Behrmann, M., and Olson, C. R. (Eds.). (2003). Perceptual Organization in Vision: Behavioral
and Neural Perspectives (Mahwah, NJ: Erlbaum).
Kimchi, R., and Razpurker-Apfeld, I. (2004). ‘Perceptual Grouping and Attention: Not All Groupings are
Equal’. Psychonomic Bulletin & Review 11: 687–696.
Koriat, A. (2008). ‘Subjective Confidence in One’s Answers: The Consensuality Principle’. Journal of
Experimental Psychology: Learning, Memory, and Cognition 34: 945–959.
Kubovy, M., and Pomerantz, J. J. (Eds.) (1981). Perceptual Organization (Hillsdale, NJ: Erlbaum).
Kubovy, M., and Schutz, M. (2010). ‘Audio-visual Objects’. Review of Philosophy & Psychology 1: 41–61.
Kubovy, M., and Yu, M. (2012). ‘Multistability, Cross-modal Binding and the Additivity of Conjoint
Grouping Principles’. Philosophical Transactions of the Royal Society B 367: 954–964.
Lakatos, S., and Shepard, R. N. (1997). ‘Constraints Common to Apparent Motion in Visual, Tactile,
and Auditory Space’. Journal of Experimental Psychology: Human Perception & Performance
23: 1050–1060.
Lunghi, C., Binda, P., and Morrone, M. C. (2010). ‘Touch Disambiguates Rivalrous Perception at Early
Stages of Visual Analysis’. Current Biology 20: R143–R144.
Lyons, G., Sanabria, D., Vatakis, A., and Spence, C. (2006). ‘The Modulation of Crossmodal Integration by
Unimodal Perceptual Grouping: A Visuotactile Apparent Motion Study’. Experimental Brain Research
174: 510–516.
Marks, L. E. (1987a). ‘On Cross-modal Similarity: Auditory-visual Interactions in Speeded Discrimination’.
Journal of Experimental Psychology: Human Perception and Performance 13: 384–394.
Marks, L. E. (1987b). ‘On Cross-modal Similarity: Perceiving Temporal Patterns by Hearing, Touch, and
Vision’. Perception & Psychophysics 42: 250–256.
Metzger, W. (1934). ‘Beobachtungen über Phänomenale Identität (Studies of Phenomenal Identity)’.
Psychologische Forschung 19: 1–60.
Michotte, A. (1946/1963). The Perception of Causality (London: Methuen).
Morein-Zamir, S., Soto-Faraco, S., and Kingstone, A. (2003). ‘Auditory Capture of Vision: Examining
Temporal Ventriloquism’. Cognitive Brain Research 17: 154–163.
Ngo, M., and Spence, C. (2010). ‘Crossmodal facilitation of masked visual target identification’. Attention,
Perception, & Psychophysics 72: 1938–1947.
O’Leary, A., and Rhodes, G. (1984). ‘Cross-modal Effects on Visual and Auditory Object Perception’.
Perception & Psychophysics 35: 565–569.
Parise, C., and Spence, C. (2009). ‘When Birds of a Feather Flock Together: Synesthetic Correspondences
Modulate Audiovisual Integration in Non-synesthetes’. PLoS ONE 4(5): e5664.
Parise, C. V., and Spence, C. (2012). ‘Audiovisual Crossmodal Correspondences and Sound Symbolism: An
IAT Study’. Experimental Brain Research 220: 319–333.
Rahne, T., Deike, S., Selezneva, E., Brosch, M., König, R., Scheich, H., Böckmann, M., and Brechmann,
A. (2008). ‘A Multilevel and Cross-modal Approach Towards Neuronal Mechanisms of Auditory
Streaming’. Brain Research 1220: 118–131.
Recanzone, G. H. (2003). ‘Auditory Influences on Visual Temporal Rate Perception’. Journal of
Neurophysiology 89: 1078–1093.
Sanabria, D., Soto-Faraco, S., Chan, J. S., and Spence, C. (2004). ‘When Does Visual Perceptual Grouping
Affect Multisensory Integration?’ Cognitive, Affective, & Behavioral Neuroscience 4: 218–229.
Sanabria, D., Soto-Faraco, S., Chan, J. S., and Spence, C. (2005a). ‘Intramodal Perceptual Grouping
Modulates Multisensory Integration: Evidence from the Crossmodal Congruency Task’. Neuroscience
Letters 377: 59–64.
Sanabria, D., Soto-Faraco, S., and Spence, C. (2005b). ‘Assessing the Effect of Visual and Tactile Distractors
on the Perception of Auditory Apparent Motion’. Experimental Brain Research 166: 548–558.
Cross-modal perceptual organization 653
Sanabria, D., Soto-Faraco, S., and Spence, C. (2007). ‘Spatial Attention Modulates Audiovisual Interactions
in Apparent Motion’. Journal of Experimental Psychology: Human Perception and Performance
33: 927–937.
Sato, M., Basirat, A., and Schwartz, J. (2007). ‘Visual Contribution to the Multistable Perception of Speech’.
Perception & Psychophysics 69: 1360–1372.
Schutz, M., and Kubovy, M. (2009). ‘Causality and Cross-modal Integration’. Journal of Experimental
Psychology: Human Perception & Performance 35: 1791–1810.
Sekuler, R., Sekuler, A. B., and Lau, R. (1997). ‘Sound Alters Visual Motion Perception’. Nature
385: 308.
Shi, Z., Chen, L., and Müller, H. (2010). ‘Auditory Temporal Modulation of the Visual Ternus Display: The
Influence of Time Interval’. Experimental Brain Research 203: 723–735.
Small, D. M., and Green, B. G. (2011). ‘A Proposed Model of a Flavour Modality’. In Frontiers in the Neural
Bases of Multisensory Processes, edited by M. M. Murray and M. Wallace, pp. 705–726 (Boca Raton,
FL: CRC Press).
Soto-Faraco, S., Lyons, J., Gazzaniga, M., Spence, C., and Kingstone, A. (2002). ‘The Ventriloquist in
Motion: Illusory Capture of Dynamic Information Across Sensory Modalities’. Cognitive Brain Research
14: 139–146.
Soto-Faraco, S., Spence, C., and Kingstone, A. (2004a). ‘Congruency Effects Between Auditory and
Tactile Motion: Extending the Phenomenon of Crossmodal Dynamic Capture’. Cognitive, Affective, &
Behavioral Neuroscience 4: 208–217.
Soto-Faraco, S., Spence, C., Lloyd, D., and Kingstone, A. (2004b). ‘Moving Multisensory Research
Along: Motion Perception Across Sensory Modalities’. Current Directions in Psychological Science
13: 29–32.
Spence, C. (2011). ‘Crossmodal Correspondences: A Tutorial Review’. Attention, Perception, & Psychophysics
73: 971–995.
Spence, C., and Bayne, T. (2015). ‘Is Consciousness Multisensory?’ In D. Stokes, M. Matthen and S. Biggs
(Eds.), Perception and its modalities (pp. 95–132). Oxford: Oxford University Press.
Spence, C., and Chen, Y.-C. (2012). ‘Intramodal and Crossmodal Perceptual Grouping’. In The New
Handbook of Multisensory Processing, edited by B. E. Stein, pp. 265–282 (Cambridge, MA: MIT Press).
Spence, C., Ngo, M., Percival, B., and Smith, B. (2012). ‘Crossmodal Correspondences: Assessing Shape
Symbolism for Cheese’. Food Quality & Preference 28: 206–12.
Spence, C., Sanabria, D., and Soto-Faraco, S. (2007). ‘Intersensory Gestalten and Crossmodal Scene
Perception’. In Psychology of Beauty and Kansei: New Horizons of Gestalt Perception, edited by
K. Noguchi, pp. 519–579 (Tokyo: Fuzanbo International).
Stein, B. E. (Ed.) (2012). The New Handbook of Multisensory Processing (Cambridge, MA: MIT Press).
Stein, B. E., and Meredith, M. A. (1993). The Merging of the Senses (Cambridge, MA: MIT Press).
Stein, B. E., Burr, D., Costantinides, C., Laurienti, P. J., Meredith, A. M., Perrault, T. J., et al. (2010).
‘Semantic Confusion Regarding the Development of Multisensory Integration: A Practical Solution’.
European Journal of Neuroscience 31: 1713–1720.
Sweeny, T. D., Guzman-Martinez, E., Ortega, L., Grabowecky, M., and Suzuki, S. (2012). ‘Sounds
Exaggerate Visual Shape’. Cognition 124: 194–200.
Takahashi, K., and Watanabe, K. (2010). ‘Implicit Auditory Modulation on the Temporal Characteristics of
Perceptual Alternation in Visual Competition’. Journal of Vision 10(4): 1–13.
Takahashi, K., and Watanabe, K. (2011). ‘Visual and Auditory Influence on Perceptual Stability in Visual
Competition’. Seeing and Perceiving 24: 545–564.
Talsma, D., Senkowski, D., Soto-Faraco, S., and Woldorff, M. G. (2010). ‘The Multifaceted Interplay
Between Attention and Multisensory Integration’. Trends in Cognitive Sciences 14: 400–410.
654 Spence
van Ee, R., van Boxtel, J. J. A., Parker, A. L., and Alais, D. (2009). ‘Multisensory Congruency
as a Mechanism for Attentional Control over Perceptual Selection’. Journal of Neuroscience,
29: 11 641–11 649.
Verhagen, J. V., and Engelen, L. (2006). ‘The Neurocognitive Bases of Human Multimodal Food
Perception: Sensory Integration’. Neuroscience and Biobehavioral Reviews 30: 613–650.
Vroomen, J., and de Gelder, B. (2000). ‘Sound Enhances Visual Perception: Cross-modal Effects of
Auditory Organization on Vision. Journal of Experimental Psychology: Human Perception and
Performance 26: 1583–1590.
Vroomen, J., Keetels, M., de Gelder, B., and Bertelson, P. (2004). ‘Recalibration of Temporal Order
Perception by Exposure to Audio-visual Asynchrony’. Cognitive Brain Research 22: 32–35.
Wada, Y., Kitagawa, N., and Noguchi, K. (2003). ‘Audio-visual Integration in Temporal Perception’.
International Journal of Psychophysiology 50: 117–124.
Wagemans, J., Elder, J. H., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M., and von der Heydt,
R. (2012). ‘A Century of Gestalt Psychology in Visual Perception. I. Perceptual Grouping and
Figure-ground Organization’. Psychological Bulletin 138: 1218–1252.
Warren, R. M., and Gregory, R. L. (1958). ‘An Auditory Analogue of the Visual Reversible Figure’. American
Journal of Psychology 71: 612–613.
Watanabe, K., and Shimojo, S. (2001). ‘When Sound Affects Vision: Effects of Auditory Grouping on Visual
Motion Perception’. Psychological Science 12: 109–116.
Welch, R. B., DuttonHurt, L. D., and Warren, D. H. (1986). ‘Contributions of Audition and Vision to
Temporal Rate Perception’. Perception & Psychophysics 39: 294–300.
Wertheimer, M. (1923/1938). ‘Laws of Organization in Perceptual Forms’. In A Source Book of Gestalt
Psychology, edited by W. Ellis, pp. 71–88 (London: Routledge & Kegan Paul).
Yao, R., Simons, D., and Ro, T. (2009). ‘Keep Your Eye on the Rabbit: Cross-modal Influences on the
Cutaneous Rabbit Illusion’. Journal of Vision 9: 705.
Yau, J. M., Olenczak, J. B., Dammann, J. F., and Bensmaia, S. J. (2009). ‘Temporal Frequency Channels are
Linked across Audition and Touch’. Current Biology 19: 561–566.
Zapparoli, G. C., and Reatto, L. L. (1969). ‘The Apparent Movement Between Visual and Acoustic Stimulus
and the Problem of Intermodal Relations’. Acta Psychologica 29: 256–267.
Chapter 32
Sensory substitution:
A new perceptual experience
Noelle R. B. Stiles and Shinsuke Shimojo
Introduction
The theme of this book, ‘perceptual organization’, asks how sensory inputs are organized into an
integrated, structured percept. Whereas most of the chapters do so in a single modality, several
chapters including this one and the one by Spence (this volume) ask the same question across
modalities. We may rephrase it as how cross-modal organization generates our unique perceptual
experience. Individual modalities have been traditionally isolated as specific sensations, yet all
senses are seamlessly blended into a holistic experience in the typical daily environment. Where
is the line segregating each modality? Is vision visual because the information comes from the
retina, or could it be ‘vision’ if the information derives from an image even if it is encoded by
a sound? As recent studies have shown evidence for the processing of both auditory and tactile
information in visual cortex (Bavelier and Neville 2002; Cohen et al. 1997; Collignon et al. 2009;
Sadato et al. 1996), the definition of vision in the brain has become increasingly blurry. Sensory
substitution (SS) encodes an image into a sound or tactile stimulation, and trained subjects have
been found not only to utilize the stimulus to coordinate adaptive behavior, but also to process
it in early visual areas. Some superusers of a sensory substitution device have further claimed to
subjectively experience a vision-like perception associated with device usage (Ward and Meijer
2010). This chapter will not only go over the technical and historical perspective of SS, but will
also more importantly highlight the implications of SS to cross-modal plasticity and the potential
of SS to reveal cross-modal perceptual organization.
Sensory substitution is processed like vision at cortical levels, but is transduced by audition (or
somatosensation) at receptor levels, thus it should be considered neither pure vision nor audi-
tion/somatosensation, but rather a third type of subjective sensation, or ‘qualia’. If perceptual
experience in sensory substitution is unique, do the same visual primitives hold? Are these visual
primitives fundamental to all vision-like processing, or are they dependent on the visual sen-
sory transduction process? Several other questions fundamental to the essential nature of visual
experience also become feasible to investigate with this new broader definition of ‘visual’ process-
ing, such as holistic vs. local processing, static vs. dynamic recognition and depth perception, and
perception based on purely sensory vs. sensory-motor neural processing. Studies with sensory
substitution attempt to aid the blind by understanding these questions and thereby improving
both SS devices and the users’ quality of life. Further, these investigations advance neuroscience
by demonstrating the roles that neural plasticity and sensory integration play in the organization
of visual perception. In short, the SS provides scientists and philosophers with a new artificial
dimension to examine perceptual organization processes.
656 Stiles and Shimojo
1 Discussion of the ‘effort’ and ‘practice’ required for sensory substitution learning implies top-down attention
(Browne 2003, p. 277). Further the lack of blind subject ‘confidence’ due to ‘long experimental time’ indicates
slow conscious processing rather than automatic perception (Dunai 2010, p. 84).
Sensory Substitution 657
Frequency
Portable computer
Low
Audio
vOICe output
software Video
input
Fig. 32.1 Schematic diagram of the vOICe device, which encodes an image into sound in real time.
A subject wears a pair of glasses with a camera attached that transmits live video to a portable
computer. The computer runs the vOICe software, transforming the image into a soundscape by
encoding the brightness of pixels into loudness of a sound frequency range that is high for upper
pixels and progressively lower for middle and bottom pixels. This column of pixels is scanned across
the image at one Hz with stereo panning (the scan rate is adjustable). The soundscape representing
an image frame is communicated to the user via headphones.
while conveying useful information about obstacles. An alternative method to reducing training
time and enhancing performance may be improvement of training methods, such as training
that exploits intrinsic cross-modal correspondences (Pratt 1930; Spence 2011; Stevens and Marks
1965) making devices more intuitive as will be elaborated later in this chapter.
such that brain regions are segregated by processing of different types of information and not by
stimulus modality (Pascual-Leone and Hamilton 2001). The metamodal theory of the brain was
supported by the activation of the shape-decoding region, Lateral Occipital tactile-visual area
(LOtv), by audition when shape was conveyed by vOICe encoded sounds (Amedi et al. 2007).
Modalities are also plastic after development and can generate learned relations across senses,
as witnessed in visual activation during echolocation, sound localization, and braille reading in
the blind (late blind vs. early blind) (Bavelier and Neville 2002; Cohen et al. 1997; Collignon et al.
2009; Sadato et al. 1996). Braille reading activated primary visual cortex (BA 17) and extrastri-
ate cortices bilaterally in blind subjects (Sadato et al. 1996). Repetitive Transcranial Magnetic
Stimulation (rTMS) was used to deactivate visual cortical regions in blind braille experts and
generated errors in braille interpretation (Cohen et al. 1997). These results demonstrate a func-
tional and causal link between visual activation and the ability to read braille in the blind. Other
studies provide even more evidence for plasticity in the handicapped such as enhanced visual
ERPs (Event Related Potentials) in early-onset deaf (Neville et al. 1983; Neville and Lawson 1987),
auditory ERPs in the posterior (occipital) region in early and late blind (Kujala et al. 1995), and
posterior DC potentials in blind by tactile reading (Uhl et al. 1991).
Perceptual organization usually refers to Gestalt principles, such as proximity-based (both in
space and time) grouping/segregation, regularity, and Prägnanz (good shape). Vision, audition,
and somatosensation have partly the same, but partly different (unique) perceptual organization
rules. For example, segregation or chunking rules operate across modalities in the same way at
the most abstract level, but indeed it could be spatial in vision but temporal in audition (Bregman
and Campwell 1971; Neri and Levi 2007; Vroomen and De Gelder 2000; see also Denham and
Winkler, this volume). SS provides opportunity to investigate what would happen to such percep-
tual organization rules when between-modality connectivity is enhanced by training. To be more
specific, questions including: (a) would the auditory or the tactile modality acquire vision-like
perceptual organization rules and (b) would cross-modal combinations themselves self-organize
and generate new cross-modal organization principles, can be investigated in detail with sensory
substitution.
Existing literature on cross-modal interactions is a guide to understanding and interpreting
the visual nature of sensory substitution processing. Sensory substitution also requires plastically
generating new learned relationships across modalities, but it may also rely on existing develop-
mental connections. In fact, SS might modulate the strength of existing developmental connec-
tions, and thereby alter cross-modal perception, even in sighted subjects. Ideally, the training of
participants can exploit these existing cross-modal interactions and mappings to enable effortless
training and signal interpretation. In addition, training on SS devices should take into account
cross-modal interaction variance across both functional and experimental subject groups, includ-
ing the early blind with no visual experience, the late blind who have limited visual experience,
and the sighted with normal visual perception (Bavelier and Neville 2002; Poirier et al. 2007b).
as a ‘red color with yellow seeds all around it and a green stalk’; whereas for unfamiliar objects her
brain ‘guesses’ at the color such as ‘greyish black’ for a sweater, and occasionally reduces the object
detail to a line drawing (Ward and Meijer 2010, p. 497). When rTMS was applied to her visual cor-
tex, she claimed to have the visual experience damped, causing her to ‘carefully listen to the details
of the soundscapes’ instead of having an automatic ‘seeing’ sensation, qualitatively linking visual
activation to ‘visual’ characteristics of the subjective experience (Merabet et al. 2009, p. 136). The
vOICe ‘visual’ experience according to PF:
‘Just sound? . . . No, it is by far more, it is sight! . . . When I am not wearing the vOICe, the light I perceive
from a small slit in my left eye is a grey fog. When wearing the vOICe the image is light with all the little
greys and blacks . . . The light generated is very white and clear, then it erodes down the scale of color to
the dark black.’
Ward and Meijer 2010, p. 495
Subject PF has not been the only blind user who has reported visual experiences with sensory sub-
stitution devices. A study with eighteen blind subjects and ten sighted controls found that in the
last three weeks of a three month training period, seven blind subjects claimed to perceive phos-
phenes while using a tactile sensory substitution device (Ortiz et al. 2011). Four out of seven sub-
jects with visual experiences retained light perception; they ranged in blindness onset from one
to 35 years old. In most cases the phosphenes appeared in the shape and angle of the line stimulus
tactilely presented; the ‘visual’ perception over time dominated the tactile perception (Ortiz et al.
2011). The blind group with ‘visual’ experience had activation in occipital lobe regions such as BA
17, 18, and 19 measured via electroencephalography (EEG); in contrast, the non-phosphene blind
subjects did not have visual activation (Ortiz et al. 2011).
Tactile devices have been studied for distal attribution of users (i.e. the externalization of the
stimulus) as defined by: (1) the coupling of subject movement and stimulation; (2) the presence of
an external object; and (3) the existence of ‘perceptual space’ (Auvray et al. 2005). Distal attribu-
tion was tested on sixty subjects naïve to the auditory sensory substitution device and its encod-
ing. Subjects moved freely with headphones, webcam attached, and a luminous object in hand
and in some conditions were provided an object to occlude the luminous object. A link between
subject’s actions and auditory stimulation was often perceived, this coupling perception occurred
more often than perception of distal object or environmental space.
Key questions about ‘visual’ sensations with sensory substitution remain. These include the
connection between ‘visual’ perception and functionality with the device, showing if ‘visual’ qual-
ity of experience enhances recognition and localization with sensory substitution. The cause of
visual perception with sensory substitution is also still unclear. Is ‘visual perception’ via sensory
substitution just mediated by primary visual areas, or do prefrontal and higher visual cortices play
a key role? Further, a quantitative rTMS study of Ortiz’s subjects that have ‘visual’ experience may
show if the visual cortical activation is necessary for their visual perception of sensory substitu-
tion stimuli. Deactivation of prefrontal regions (via rTMS) might demonstrate if those regions are
a part of a top-down cognitive network necessary to the distinctively unique subjective experience
of ‘visual’ nature with sensory substitution.
A major complication in visual activation and ‘visual’ perception with sensory substitution is
the role of visualization, particularly in the late blind. The late blind have experienced vision and
therefore are more familiar with visual principles but also have the ability to activate visual cortex
via visualization, or a mental effort to visually imagine a scene/object. PF is late blind (blindness
onset at age of twenty-one years) and five out of seven of Ortiz’s blind subjects with ‘visual’ percep-
tion had blindness onset at the age of four years or later (Ortiz et al. 2011). Therefore, it is possible
660 Stiles and Shimojo
that the visual activation in these late-blind subjects is due to top-down cognitive visualization
rather than an automatic ‘visual’ perception. The major evidence against visualization was limited
to the qualitative claims that (1) the ‘visual’ perception happens automatically, and (2) (in Ortiz’s
subjects) that tactile sensations fade and ‘visual’ perception dominates. A quantitative study of the
automaticity of ‘visual’ perception with sensory substitution device (i.e. does it occur even when
top-down attention is distracted) may further clarify the role of visualization in sensory substitu-
tion ‘visual’ experience. It will no doubt provide empirical seeds for theoretical reconsideration of
the subjective aspects of perception, including the issue of ‘qualia’.
1 B B
B Before training
B H
After training
0.9 RD H ∗
J
F F 100 ∗
J
0.8
Proportion correct
80
% of correct responses
TO H
0.7 J
F
H J 60
0.6 F ET
40
0.5
ES
J 20
0.4
Chance performance: 0.33
0
0.3 Elements Patterns
4 5 6 7 8 9
Pattern size
RD: Finger tip perceived raised dots, * Statistically significant difference between before and after
TO: Electrotactile tongue discrimination training (Elements: Wilcoxon test for paired samples: Z = 1.99,
ET: Fingertip electrotactile discrimination p < 0.05;, Patterns: Wilcoxon test for paired samples: Z = −2.23,
(subject dynamically modulate current), p < 0.03)
ES: Fingertip electrostatic stimulation
(Ba) Object localization, Tactile sensory (Bb) Object localization, Auditory sensory
substitution (Chebat et al. 2011) substitution (Auvray et al. 2007)
Detection
** ** **
100 *
Correct response (%)
80
22
20
60 18
16
14
Error/cm
40 12
10
8
20 6 14
4 12
2 10
0 0
8
L S L S SA SO SA SO Ve 40
rti 70 6
CB SC CB SC ca
l d 30 60 cm 4
ist
an 50 ow/
2
ce 0 40 the elb 2
CB: Congenitally blind, SC: Sighted controls, to 30 to
th 10
ee 20 tance 0
L: Large object, S: Small object,SA: Step-Around obstacle, 10 is
lbo 0 ntal d
w/ 0
rizo
SO: Step-Over obstacle (*P ≤ 0.05; **P ≤ 0.001) cm Ho
Fig. 32.2 Behavioral outcomes of Sensory Substitution training. Psychophysical testing with tactile
and auditory sensory substitution devices has had similar outcomes. Object recognition testing
with Tongue Display Unit (Aa) has shown a correlation between the pattern size and proportion
correct; all subjects exceeded the chance performance. Pattern recognition with an auditory
device (Ab) significantly improved with training and had a similar average percent correct as tactile
pattern recognition (between 0.6 and 0.8 proportion correct). Obstacle localization in uncluttered
maze environment with a tactile device (Ba) had between 0.8 and 1 proportion correct for most
object types. Localization of a four cm diameter ball with an auditory device showed that inaccuracy
increased with distance to the object (webcam to view environment was held in the right hand and
aligned with the elbow) (Bb).
662 Stiles and Shimojo
the same category of an original object. Subjects performed above chance at recognizing specific
objects even within the same category and subjects were more accurate when there were fewer
objects in each category.
A majority of the studies on object recognition with sensory substitution have focused on arti-
ficial stimuli in simplified environments. No studies yet have explored natural objects in natural
environments (such as finding a shirt in a closet or a clock on a nightstand) or the role of dis-
tractor objects to object perception (such as recognizing a object in the center of the field of view
with two objects to the left and right). A potential reason is that artificial patterns are easier to
identify and also can be manipulated to test for sensory substitution resolution as well as quan-
tify objects complexity relatively easily, with a hope that more cluttered scenes would eventually
become recognizable in the progress of training. Several key visual questions such as spatially
segregating objects, object recognition independent of point of view (i.e. shape constancy), and
differentiation of shadows and reflections from physical objects remain unanswered.
Vision is to perceive ‘what is where by looking’ (Marr 1982, p. 3). Recognition studies investi-
gated the ‘what’ element of perception, and now localization studies will highlight the ‘where’ ele-
ment of vision. Clinically, object localization has been most commonly studied with locomotion
through a maze of obstacles. Chebat and his collaborators (2011) constructed a life-sized maze
consisting of white hallway with black boxes, tubes, and bars (horizontal (on the floor or partial
protruding from the wall) or vertical (aligned with left or right wall)). Sixteen congenitally blind
and eleven sighted controls navigated the maze with a tactile display unit (10 ×10 pixels) and were
scored for obstacle detection (pointing at obstacle), and obstacle avoidance (walk past the obs-
tacle without touching it) (Figure 32.2Ba). Congenitally blind (CB in figure) were able to detect
and avoid obstacles significantly more accurately than the sighted controls (SC in figure). Both
groups performed the tasks above chance. Larger obstacles (white bars labeled L in figure) were
easier to avoid and detect than smaller obstacles (black bars labeled S in figure), and step-around
obstacles (white bars labeled SA in figure) were easier to negotiate than step-over obstacles (black
bars labeled SO in figure) (Figure 32.2Ba). A study by Proulx and colleagues (2008) showed that
auditory sensory substitution localization was enhanced when subjects were allowed to use the
SS device in normal life (in addition to device assessments) compared to subjects only using the
device during assessments. Other localization studies have also investigated artificial maze envi-
ronments and tracking of stimuli in 2-D and 3-D space (Chekhchoukh et al. 2011; Kupers et al.
2010). Auvray and colleagues (2007) used an auditory sensory substitution device to study the
accuracy of localization with a pointing task (Figure 32.2Bb) and found that 7.8 cm was the mean
error for pointing at 4 cm diameter ball. The pointing inaccuracy varied proportionally with dis-
tance to the hand held camera (vertically aligned with the subjects elbow).
Depth perception is also a key part of visual processing. With sensory substitution’s monocular
camera and low resolution it can be especially challenging for users to learn. Nevertheless, sighted
users have been found to have key illusions of monocular depth perception. As described earlier
in this chapter, Renier and colleagues (2005b) have tested for perception of the Ponzo illusion with
a sensory auditory substitution device and found that blindfolded sighted subjects could perceive
it similarly to the sighted, but early-blind subjects could not (Renier et al. 2005b). Investigation
of the vertical-horizontal illusion (vertical lines appear longer than horizontal lines) showed that
sighted subjects could perceive this illusion with an auditory sensory substitution device, but early
blind subjects could not perceive it (Renier et al. 2006). These results may indicate either that pre-
vious visual experience is essential for the perception of certain illusions, or that the duration of
training may have been too short or superficial. Testing late-blind subjects may further elucidate
why congenitally blind subjects did not perceive these illusions.
Sensory Substitution 663
The perceptual organization of sensory substitution perception has many properties yet to be
determined. Recognition and localization properties in natural environments are not thoroughly
quantified nor are performances in cluttered environments or in shadowy and glare-ridden set-
tings. Further questions as to what could be sensory substitution primitives (such as edges or
spatial frequencies in vision) have not been answered. Scene perception with sensory substitution
is also ambiguous. Questions such as: can spatial relations of scene be generated with sensory sub-
stitution, how much does it depend on past visual experience and the mode of stimulation (audi-
tory or visual), are still unanswered. The active allocation of attention via gaze is also a critical
component of the normal visual function that is entirely absent in sensory substitution encodings.
Does the absence of active sensation inhibit the processing of sensory substitution stimuli and the
generation of choice? Or instead, would exploration/orienting with the head turn compensate
for the gaze shift easily with minimal training? How does the absence of the gaze cascade impact
preference in the sensory substitution ‘visual’ experience (Shimojo et al. 2003)? Finally, Gestalt
binding principles of proximity and shared properties may or may not be perceived with sensory
substitution, and may be controlled by the transducing modality (somatosensation or audition) or
the processing modality (vision). These questions need to be answered in future research.
A. Activation in blind and sighted with a shape estimation task B. Sighted subject activation as a function of training
(Amedi et al. 2007) session on a pattern recognition task
(Poirier et al. 2006b)
(a) (b) (c)
Left Right
(d)
n=7 P = 0.05 (Corr.)
SV3 SA3
Session 3
SV5 SA5
(a) Single sighted subjects neural activation, (b) Blind subject neural Voxels corrected for multiple comparisons in the whole
activation, (c) Single sighted subject activation from auditory control brain and threshold exceeding p<0.05. Six sighted
task, (d) Average across seven vOICe trained users subjects.
(subjects in a and b).
Fig. 32.3 Imaging with Sensory Substitution. Neural activation was shown on the left
occipitotemporal cortex in all sighted and blind expert users during sensory substitution shape
classification (Aa–Ab), whereas sighted users did not have visual activation with auditory control
task (Ac). Averaged results show activation in several multimodal regions (Ad). During a sensory
substitution pattern recognition task six sighted subjects showed a progressive increase in occipital
activation with training on an auditory sensory substitution device (B).
fMRI and PET studies have demonstrated that visual cortex activation correlates with sensory
substitution use, but cannot prove causality. Repetitive Transcranial Magnetic Stimulation (rTMS)
deactivates a region of cortex, examining the possible causal link between neural activation and
subject performance. Collignon and colleagues (2007) applied rTMS to the right dorsal extrastriate
occipital cortex of seven sighted and seven early blind subjects (both trained on the PSVA audi-
tory sensory substitution device) preceding sensory substitution pattern recognition (Collignon et
al. 2007). Early blind subjects had longer reaction times and lower accuracies with rTMS applied
as compared to a sham rTMS condition; sighted subjects had no performance change (Collignon
et al. 2007) (Figure 32.4B). Merabet et al. (2009) also deactivated occipital peristriate regions of a
late blind sensory substitution superuser, PF, and demonstrated a decrement in recognition accur-
acy relative to pre-rTMS and post-sham rTMS conditions (Figure 32.4A). In the tactile domain,
TMS applied to occipital cortex elicited somatotopic tactile sensations in blind but not blindfolded
sighted users of a tactile sensory substitution device (Kupers et al. 2006). Overall, rTMS studies
indicate that the blind users of sensory substitution devices functionally and causally recruit the
occipital cortex, potentially due to long-term cross-modal plasticity from visual deprivation.
Dynamic Causal Modeling (DCM) studies in the blind have constructed a cross-modal net-
work for auditory and somatosensory processing and the visual cortex (Fujii et al. 2009; Klinge
et al. 2010). It remains to be shown if these networks are used in blind subjects with sensory
Sensory Substitution 665
A. rTMS on a Late blind auditory sensory substitution B. rTMS on Early blind auditory sensory substitution users
expert (Merabet et al. 2009) (Collignon et al. 2007)
PSVA-form recognition
100 100
NS Baseline Sham rTMS
∗ Post-rTMS 95 Real rTMS
80
NS ∗
90
60
% Correct
% Correct
85
40
80
20
75
0 70
Occipital pole Vertex Sighted Blind
NS: Not Significant, *: P<0.05 *: P<0.05, Error bars indicate standard errors.
Fig. 32.4 rTMS with Sensory Substitution. Repetitive Transcranial Magnetic Stimulation (rTMS)
decreases neural activation and influences behavior thereby generating a causal link between
behavioral outcomes and neural region activation. rTMS of an occipital region significantly reduced
percent correct at object identification in an expert vOICe user, PF (A). PF’s recognition was not
significantly impaired by rTMS of a vertex location. Seven early blind subjects were also impaired at
sensory substitution pattern recognition task with rTMS to right dorsal extrastriate occipital cortex
(B). Seven sighted subjects performance was not significantly affected by rTMS (B).
substitution, and if the cross-modal network in the sighted is similar to, or different from blind
subjects. Nevertheless, literature on functional connectivity of sensory substitution ‘stimuli’ and
dynamic causal modeling of the blind can be used to generate several neural network possibilities
(Figure 32.5A and 32.B) with feedforward and feedback connections. The network likely includes
the primary sensory region of the transducing modality (somatosensation or audition), which
connects to a multimodal region that further connects to primary visual regions (V3, V2, or V1).
The filtering of stimuli as sensory substitution stimuli or natural stimuli could occur at the pri-
mary region of transducing modality (A1 or S1) or the multimodal region. More studies on the
specificity of the plasticity would be required to elucidate this. The role of prefrontal regions in
top down cognitive processing of the cross-modal stimulus had yet to be shown. More critically,
it remains to be fully determined which specific regions in the network are casually linked to
performance and therefore the role each region plays in stimulus processing. Feedback between
visual regions and the multimodal regions may play a significant role in stimulus processing, yet
the degree of feedback in sensory substitution processing is unclear. Motor regions and other
primary sensory regions may also play an important role in plastic changes in the sensory substi-
tution neural network.
L R L R
S1 S1
PC PC A1 A1
STS STS
V3 V3
V3 V3
V1 V1
V1 V1
Fig. 32.5 Network with Sensory Substitution. Visual, auditory, and tactile regions generate a neural
network in blind and sighted sensory substitution users that process sensory information within a
feedforward and feedback hierarchy (A for tactile devices and B for auditory devices) (after Poirier
et al. 2007b). The sensory information is first filtered by primary sensory regions (A1 or S1 for
auditory and tactile devices, respectively). Sensory information is then communicated to multimodal
regions (such as STS or Parietal Cortex) and forward to primary visual regions (V3, V2 (not shown),
or V1). It is also likely that feedback and reiterative processing plays a role in the perception of the
sensory substitution stimuli.
is closely interlinked with perceptual organization principles in that modality (e.g., Palmer et al.,
this volume; van Tonder and Vishwanath, this volume). Since sensory substitution adds new crit-
ical associative dimensions to our perceptual experiences, it attracts artists with a possibility of
significant changes in the overall structure of multisensory aesthetics. If some subjects (primarily
late-blind) perceive sensory substitution with a vision-like perception, then do their preferences
for stimuli follow the aesthetics of vision rather than that of the transducing modality, i.e. audition
or somatosensation?
One interesting, though anecdotal, case is Neil Harbisson, a congenitally achromatic artist, who
uses a sensory substitution device to perceive ‘color’ as sound (Harbisson 2012). He seems to still
‘hear’ the color rather than ‘see’ it, and as such his perception of beautiful color combinations
derives from the aesthetics of audition rather than those of vision. His ‘color’ perception may
qualify as ‘a third kind of qualia’, given that he has mixed the information of vision (i.e., color)
as the decoded, and that of audition as the decoding. He also misinterprets natural sounds as
colors, thereby generating a new artificial synaesthesia. He uses these misinterpretations to gen-
erate visual artwork that represents the colors he perceives when listening to natural sounds, such
as famous music or speeches. One remaining question in his case, however, would be whether his
‘color’ experience is just a form of associative imagery, or a real percept as in the true synaesthete.
Aside from being an interesting case study, his experience opens the question as to whether typ-
ical sensory substitution users have aesthetics that are more typical to audition or to vision (or else
Sensory Substitution 667
newly-emerged cross-modal aesthetic organization), and if this depends on how they perceive
the stimulus. It might be true that aesthetics follows the mode of perception, such that late-blind
users, who are more likely to perceive sensory substitution as ‘vision’, will prefer different stimuli
to those of blindfolded sighted users, who are more likely to have an auditory experience with
sensory substitution.
Discussion
The practical objective of sensory substitution research is the rehabilitation potential for the
blind. Training methods and device encodings have yet to generate a high functionality outcome
with minimal training requirements. Several thrusts are attempting to ameliorate this problem,
including encodings that utilize spatial auditory processing, and optimizing training algorithms.
Improving training of existing devices such as the vOICe device may be possible by incorpo-
rating the findings from multimodal research. Well-known cross-modal correspondences or
intrinsic mappings of visual and auditory stimuli may enhance participant performance by using
pre-existing connections between auditory and visual stimuli to implicitly teach subjects how
to interpret sensory substitution stimuli. An alternative to improving training is to employ new
devices such as CASBLiP and MVSS that use 3-D sound to generate artificial sounds with a 3-D
spatial location, thereby indicating obstacles and overhangs to blind users, bypassing the 2-D
representation (‘image’). The idea behind them is unique and potentially innovating, because it
abandons the idea of vision as a 2-D (fronto-parallel) image whose parameters need to be trans-
lated into auditory (or somatosensory) parameters. Instead, it relies on the very simple idea of
direct perception, which immediately guides action for navigation and obstacle avoidance. While
CASBliP and MVSS have been developed, no extensive psychophysical evaluation of subject
capabilities have yet been published, thereby leaving their impact on rehabilitation as an open
question. Systematic evaluations of obstacle avoidance in cluttered environments and object iden-
tification will clarify the potential role of these new devices in improving the blind quality of life.
With both approaches, sensory substitution may have significant possibilities in blind rehabilita-
tion, up to the degree to which the brain has vigorous cross-modal plasticity.
Sensory substitution of vision may not only help rehabilitate the blind, but also provides a
powerful and unique method to study cross-modal interactions and visual perception. While sen-
sory substitution is similar to visual perception and often retains visual illusions, properties, and
activation in visual cortex, most sighted subjects experience it still as auditory or somatosensory
perception. As reviewed above, selected few device users, often superusers and the late blind,
claim to have vision-like experiences with device use. The imaging, rTMS, and behavioral data
indicate that the visual or auditory/somatosensory dominance of sensory substitution depends
on the plasticity of individual’s multimodal neural network and previous visual experience. Key
questions remain about the structure of the multimodal network, and which unimodal or amodal
regions process the temporal and spatial aspects of sensory substitution stimuli. Unanswered
questions include the topographical mapping of sensory substitution stimuli onto visual cortex
via training, the decay rate of visual activation from sensory substitution after a period of disuse
(in the blind and sighted), the automaticity of sensory substitution processing (i.e., whether it is
possible to acquire effortless perception without massive top-down attention), and how temporal
coordination is accomplished across modalities.
Although the information provided to subjects by sensory substitution devices may be derived
from the same source as visual stimuli in the sighted, it is interpreted and processed in a unique way
by the central nervous system, generating a percept that is neither visual nor auditory but instead
668 Stiles and Shimojo
is intrinsically cross-modal. Blind and sighted subjects interpret auditory cues differently, have dif-
ferent connectivity between visual and auditory/somatosensory cortices, and therefore likely use
different aspects of the information from sensory substitution to generate perception. Sensory
substitution is a new way of pairing sensory modalities, such that it may be understood as a new
sub-modality using the transduction of audition or somatosensation and processed by visual cortex.
How could such new sensory experiences be perceptually organized, be experienced, and guide
action? It will be a challenge to further quantify the unique aspects of this third type of qualia and
to understand the features, such as new illusions, that are wholly unique to this form of perception.
References
Amedi, A., Stern, W.M., Camprodon, J.A., et al. (2007). Shape conveyed by visual-to-auditory sensory
substitution activates the lateral occipital complex. Nature Neuroscience 10: 687–9.
Andersen, T.S., Tiippana, K., and Sams, M. (2004). Factors influencing audiovisual fission and fusion
illusions. Cognitive Brain Research 21: 301–8.
Araque, N.O., Dunai, L., Rossetti, F., et al. (2008). Sound map generation for a prototype blind mobility
system using multiple sensors. Service Robotics and Smart Homes: How a gracefully adaptive integration
of both environments can be envisaged? Bilbao, Spain.
Arno, P., De Volder, A.G., Vanlierde, A., et al. (2001). Occipital activation by pattern recognition in the
early blind using auditory substitution for vision. Neuroimage 13: 632–45.
Auvray, M., Hanneton, S., Lenay, C., and O’Regan, K. (2005). There is something out there: distal
attribution in sensory substitution, twenty years later. Journal of Integrative Neuroscience 4: 505–21.
Auvray, M., Hanneton, S., and O Regan, J.K. (2007). Learning to perceive with a visuo-auditory
substitution system: localisation and object recognition with the vOICe. Perception-London 36: 416–30.
Bach-y-Rita, P., Collins, C.C., Saunders, F.A., White, B., and Scadden, L. (1969). Vision substitution by
tactile image projection. Nature 221: 963–4.
Bach-y-Rita, P., Kaczmarek, K.A., Tyler, M.E., and Garcia-Lara, J. (1998). Form perception with a 49-point
electrotactile stimulus array on the tongue: a technical note. Development 35: 427–30.
Bavelier, D. and Neville, H.J. (2002). Cross-modal plasticity: where and how? Nature Reviews Neuroscience
3: 443–52.
Bouvrie, J.V. and Sinha, P. (2007). Visual object concept discovery: observations in congenitally blind
children, and a computational approach. Neurocomputing 70: 2218–33.
Bregman A.S. and Campwell J. (1971). Primary auditory stream segregation and perception of order in
rapid sequences of tones. Journal of Experimental Psychology 89: 244–9.
Browne, R.F. (2003). Toward mobility aid for the blind. Image and Vision Computing New Zealand, pp.
275–9. Palmerston North, New Zealand.
Capelle, C., Trullemans, C., Arno, P., and Veraart, C. (2002). A real-time experimental prototype for
enhancement of vision rehabilitation using auditory substitution. IEEE Transactions on Biomedical
Engineering 45: 1279–93.
Chalmers, D.J. (1995). Facing up to the problem of consciousness. Journal of Consciousness Studies
2: 200–19.
Chebat, D.R., Schneider, F.C., Kupers, R., and Ptito, M. (2011). Navigation with a sensory substitution
device in congenitally blind individuals. Neuroreport 22: 342–7.
Chekhchoukh, A., Vuillerme, N., and Glade, N. (2011). Vision substitution and moving objects tracking
in 2 and 3 dimensions via vectorial electro-stimulation of the tongue. Actes de ASSISTH 2011, 2eme
Conference internationale sur l’Accessibilite et les Systemes de Suppleance aux personnes en situaTions de
Handicaps. Paris.
Sensory Substitution 669
Clemons, J., Bao, S.Y., Savarese, S., Austin, T., and Sharma, V. (2012). MVSS: Michigan Visual Sonification
System. 2012 IEEE International Conference on Emerging Signal Processing Applications (ESPA),
pp. 143–6. Las Vegas.
Cohen, L.G., Celnik, P., Pascual-Leone, A., et al. (1997). Functional relevance of cross-modal plasticity in
blind humans. Nature 389: 180–2.
Collignon, O., Lassonde, M., Lepore, F., Bastien, D., and Veraart, C. (2007). Functional cerebral
reorganization for auditory spatial processing and auditory substitution of vision in early blind subjects.
Cerebral Cortex 17: 457–65.
Collignon, O., Voss, P., Lassonde, M., and Lepore, F. (2009). Cross-modal plasticity for the spatial
processing of sounds in visually deprived subjects. Experimental Brain Research 192: 343–58.
Dunai, L. (2010). Design, modeling and analysis of object localization through acoustical signals for
cognitive electronic travel aid for blind people. Universidad Politecnica De Valencia, School of Design
Engineering, PhD Thesis.
Ernst, M.O. and Banks, M.S. (2002). Humans integrate visual and haptic information in a statistically
optimal fashion. Nature 415: 429–33.
Fujii, T., Tanabe, H.C., Kochiyama, T., and Sadato, N. (2009). An investigation of cross-modal plasticity
of effective connectivity in the blind by dynamic causal modeling of functional MRI data. Neuroscience
Research 65: 175–86.
Harbisson, N. (2012). I listen to color. TEDGlobal, [Online] Jul 2012, Available at: http://www.ted.com/
talks/neil_harbisson_i_listen_to_color.html, accessed 26 Sept 2012.
Humayun, M.S., Weiland, J.D., Fujii, G.Y., et al. (2003). Visual perception in a blind subject with a chronic
microelectronic retinal prosthesis. Vision Research 43: 2573–81.
Klinge, C., Eippert, F., Roder, B., and Buchel, C. (2010). Corticocortical connections mediate
primary visual cortex responses to auditory stimulation in the blind. The Journal of Neuroscience
30: 12798–805.
Kujala, T., Huotilainen, M., Sinkkonen, J., et al. (1995). Visual cortex activation in blind humans during
sound discrimination. Neuroscience Letters 183: 143–6.
Kupers, R., Fumal, A., de Noordhout, A.M., Gjedde, A., Schoenen, J., and Ptito, M. (2006). Transcranial
Magnetic Stimulation of the visual cortex induces somatotopically organized qualia in blind subjects.
Proceedings of the National Academy of Sciences 103: 13256–60.
Kupers, R., Chebat, D.R., Madsen, K.H., Paulson, O.B., and Ptito, M. (2010). Neural correlates of
virtual route recognition in congenital blindness. Proceedings of the National Academy of Sciences
107: 12716–21.
Marr, D. (1982). Vision: A Computational Investigation into the Human Representation and Processing of
Visual Information. WH San Francisco: Freeman and Company.
Meijer, P.B.L. (1992). An experimental system for auditory image representations. IEEE Transactions on
Biomedical Engineering 39: 112–21.
Merabet, L., Rizzo, J., Amedi, A., Somers, D., and Pascual-Leone, A. (2005). What blindness can tell us
about seeing again: merging neuroplasticity and neuroprostheses. Nature Reviews Neuroscience 6: 71–7.
Merabet, L.B., Battelli, L., Obretenova, S., Maguire, S., Meijer, P., and Pascual-Leone, A. (2009).
Functional recruitment of visual cortex for sound encoded object identification in the blind.
Neuroreport 20: 132–8.
Mishra, J., Martinez, A., Sejnowski, T.J., and Hillyard, S.A. (2007). Early cross-modal interactions in
auditory and visual cortex underlie a sound-induced visual illusion. The Journal of Neuroscience
27: 4120–31.
Neri, P. and Levi, D.S. (2007). Temporal dynamics of figure-ground segregation in human vision. Journal of
Neurophysiology 97: 951–7.
670 Stiles and Shimojo
Neville, H.J. and Lawson, D. (1987). Attention to central and peripheral visual space in a movement
detection task: an event-related potential and behavioral study. II. Congenitally deaf adults. Brain
Research 405: 268–83.
Neville, H.J., Schimidt, A., and Kutas, M. (1983). Altered visual-evoked potentials in congenitally deaf
adults. Brain Research 266: 127–32.
Ortiz, T., Poch, J., Santos, J.M., et al. (2011). Recruitment of occipital cortex during sensory substitution
training linked to subjective experience of seeing in people with blindness. PLoS One 6: e23264.
Palmer, S.E. (1999). Vision Science: Photons to Phenomenology. Cambridge, MA: MIT Press.
Pascual-Leone, A. and Hamilton, R. (2001). The metamodal organization of the brain. In: Casanova, C.
and Ptito, M. (eds.). Vision: From Neurons to Cognition, pp. 427–45. Amsterdam: Elsevier Science.
Plaza, P., Cuevas, I., Collignon, O., Grandin, C., De Volver, A.G., and Renier, L. (2009). Percieving
schematic faces and man-made objects using a visual-to-auditory sensory substitution activates the
fusiform gyrus. 10th International Multisensory Research Forum. New York.
Poirier, C.C., Richard, M.A., Duy R.T., and Veraart C. (2006a). Assessment of sensory substitution
prosthesis potentialities in minimalist conditions of learning. Applied Cognitive Psychology 20: 447–60.
Poirier, C.C., De Volder, A.G., Tranduy, D., and Scheiber, C. (2006b). Neural changes in the ventral
and dorsal visual streams during pattern recognition learning. Neurobiology of Learning and Memory
85: 36–43.
Poirier, C., De Volder, A., Tranduy, D., and Scheiber, C. (2007a). Pattern recognition using a device
substituting audition for vision in blindfolded sighted subjects. Neuropsychologia 45: 1108–21.
Poirier, C., De Volder, A.G., and Scheiber, C. (2007b). What neuroimaging tells us about sensory
substitution. Neuroscience and Biobehavioral Reviews 31: 1064–70.
Pratt, C.C. (1930). The spatial character of high and low tones. Journal of Experimental Psychology 13: 278.
Proulx, M.J., Stoerig, P., Ludowig, E., and Knoll, I. (2008). Seeing ‘where’ through the ears: effects of
learning-by-doing and long-term sensory deprivation on localization based on image-to-sound
substitution. PLoS One 3: e1840.
Ptito, M., Moesgaard, S.M., Gjedde, A. and Kupers, R. (2005). Cross-modal plasticity revealed by
electrotactile stimulation of the tongue in the congenitally blind. Brain 128: 606–14.
Renier, L., Collignon, O., Poirier, C., et al. (2005a). Cross-modal activation of visual cortex during depth
perception using auditory substitution of vision. Neuroimage 26: 573–80.
Renier, L., Laloyaux, C., Collignon, O., et al. (2005b). The ponzo illusion with auditory substitution of
vision in sighted and early-blind subjects. Perception 34: 857–67.
Renier, L., Bruyer, R., and De Volder, A. (2006). Vertical-horizontal illusion present for sighted but not
early blind humans using auditory substitution of vision. Perception and Psychophysics 68: 535–42.
Renier, L. and De Volder, A. (2010). Vision substitution and depth perception: early blind subjects
experience visual perspective through their ears. Disability & Rehabilitation: Assistive Technology
5: 175–83.
Resnikoff, S., Pascolini, D., Etya’ale, D., et al. (2004). Global data on visual impairment in the year 2002.
Bulletin of the World Health Organization 82: 844–52.
Rosenthal, O., Shimojo, S., and Shams, L. (2009). Sound-induced flash illusion is resistant to feedback
training. Brain Topography 21: 185–92.
Sadato, N., Pascual-Leone, A., Grafman, J., et al. (1996). Activation of the primary visual cortex by braille
reading in blind subjects. Nature 380: 526–8.
Shams, L., Kamitani, Y., and Shimojo, S. (2000). What you see is what you hear. Nature, 40: 788.
Shams, L., Kamitani, Y., and Shimojo, S. (2002). Visual illusion induced by sound. Cognitive Brain Research
14: 147–52.
Shimojo, S. and Shams, L. (2001). Sensory modalities are not separate modalities: plasticity and
interactions. Current Opinion in Neurobiology 11: 505–9.
Sensory Substitution 671
Shimojo, S., Simion, C., Shimojo, E., and Scheier, C. (2003). Gaze bias both reflects and influences
preference. Nature Neuroscience 6: 1317–22.
Spence, C. (2011). Crossmodal correspondences: a tutorial review. Attention, Perception, and Psychophysics
73: 971–95.
Stevens, J.C. and Marks, L.E. (1965). Cross-modality matching of brightness and loudness. Proceedings of
the National Academy of Sciences of the United States of America 54: 407–11.
Stiles, N.R.B., McIntosh, B.P., Nasiatka, P.J., et al. (2010). An intraocular camera for retinal protheses:
restoring sight to the blind. In: A. Serpenguzel and A.W. Poon (eds.). Optical Processes in Microparticles
and Nanostructures: A Festschrift Dedicated to Richard Kounai Chang on His Retirement from Yale
University, pp. 385–430. Singapore: World Scientific.
Uhl F., Lindinger, G., Lang, W., and Deecke, L. (1991). On the functionality of visually deprived occipital
cortex in early blind persons. Neuroscience Letters 124: 256–9.
Vroomen, J. and De Gelder, B. (2000) Sound enhances visual perception: crossmodal effects of auditory
organization on vision. Journal of Experimental Psychology: Human Perception and Performance 26:
1583–90.
Ward, J. and Meijer, P. (2010). Visual experiences in the blind induced by an auditory sensory substitution
device. Consciousness and Cognition 19: 492–500.
Winter, J.O., Cogan, S.F., and Rizzo, J.F. (2007). Retinal prostheses: current challenges and future outlook.
Journal of Biomaterials Science, Polymer Edition 18: 1031–55.
World Health Organization. (2009). Visual impairment and blindness. [Online] June 2012, Available
at: http://www.who.int/mediacentre/factsheets/fs282/en/index.html, accessed 4 Oct 2012.
Zangenehpour, S. and Zatorre, R.J. (2010). Crossmodal recruitment of primary visual cortex following
brief exposure to bimodal audiovisual stimuli. Neuropsychologia 48: 591–600.
Chapter 33
Introduction
We depend on vision, more than on any other sense, to perceive the world of objects and events
beyond our bodies. We also use vision to move around that world and to guide our goal-directed
actions. Over the last 25 years, it has become increasingly clear that the visual pathways in the
brain that mediate our perception of the world are quite distinct from those that mediate the
control of our actions. This distinction between ‘vision-for-perception’ and ‘vision-for-action’ has
emerged as one of the major organizing principles of the visual brain, particularly with respect to
the visual pathways in the cerebral cortex (Goodale and Milner, 1992; Milner and Goodale, 2006).
According to Goodale and Milner’s (1992) account, the ventral stream of visual processing,
which arises in early visual areas and projects to inferotemporal cortex, constructs the rich and
detailed representation of the world that serves as a perceptual foundation for cognitive opera-
tions, allowing us to recognize objects, events and scenes, attach meaning and significance to them,
and infer their causal relations. Such operations are essential for accumulating a knowledge-base
about the world. In contrast, the dorsal stream, which also arises in early visual areas, but projects
instead to the posterior parietal cortex, provides the necessary visual control of skilled actions,
such as manual prehension. Even though the two streams have different functions and operating
principles, in everyday life they have to work together. The perceptual networks of the ventral
stream interact with various high-level cognitive mechanisms, and enable an organism to select
a goal and an associated course of action, while the visuomotor networks in the dorsal stream
(and their associated cortical and subcortical pathways) are responsible for the programming and
on-line control of the particular movements the action entails. Of course, the dorsal and ventral
streams have other roles to play as well. For example, the dorsal stream, together with areas in
the ventral stream, plays a role in spatial navigation – and areas in the dorsal stream appear to
be involved in some aspects of working memory (Kravitz et al., 2011). This review, however, will
focus on the respective roles of the two streams in perception and action – and will concentrate
largely on the implications of the theory for the principles governing perceptual organization and
visuomotor control.
system handle both vision-for-perception and vision-for-action? The answer to this question lies
in the differences in the computational requirements of vision-for-perception on the one hand
and vision-for-action on the other. To be able to grasp an object successfully, for example, the
visuomotor system has to deal with the actual size of the object, and its orientation and posi-
tion with respect to the hand you intend to use to pick it up. These computations need to reflect
the real metrics of the world, or at the very least, make use of learned ‘look-up tables’ that link
neurons coding a particular set of sensory inputs with neurons that code the desired state of
the limb (Thaler and Goodale, 2010). The time at which these computations are performed is
equally critical. Observers and goal objects rarely stay in a static relationship with one another
and, as a consequence, the egocentric location of a target object can often change radically from
moment-to-moment. In other words, the required coordinates for action need to be computed at
the very moment the movements are performed.
In contrast to vision-for-action, vision-for-perception does not need to deal with the abso-
lute size of objects or their egocentric locations. In fact, very often such computations would be
counter-productive because our viewpoint with respect to objects does not remain constant –
even though our perceptual representations of those objects do show constancy. Indeed, one can
argue that it would be better to encode the size, orientation, and location of objects relative to
each other. Such a scene-based frame of reference permits a perceptual representation of objects
that transcends particular viewpoints, while preserving information about spatial relationships
(as well as relative size and orientation) as the observer moves around. The products of perception
also need to be available over a much longer time scale than the visual information used in the
control of action. By working with perceptual representations that are object- or scene-based, we
are able to maintain the constancies of size, shape, color, lightness, and relative location, over time
and across different viewing conditions.
The differences between the relative frames of reference required for vision-for-perception and
absolute frames of reference required for vision-for-action lead, in turn, to clear differences in the
way in which visual information about objects and their spatial relationships is organized and
represented. These differences can be most readily seen in the way in which the two visual systems
deal with visual illusions.
A representative example of such conflicting results comes from studies that have compared the
effects of the Ebbinghaus illusion on action and perception. In this illusion, a circle surrounded by
an annulus of smaller circles appears to be larger than the same circle surrounded by an annulus
of larger circles (see Figure 33.1A). It is thought that the illusion arises because of an obligatory
comparison between the size of the central circle and the size of the surrounding circles, with
one circle looking relatively smaller than the other (Coren and Girgus, 1978). It is also possible
that the central circle within the annulus of smaller circles will be perceived as more distant (and
therefore larger) than the circle of equivalent retinal-image size within the array of larger circles.
In other words, the illusion may be simply a consequence of the perceptual system’s attempt to
make size-constancy judgments on the basis of an analysis of the entire visual array (Gregory,
1963). In addition, the distance between the surrounding circles and the central circle may also
play a role; if the surrounding circles are close to the central circle, then the central circle appears
larger, but if they are further away, the central circle appears smaller (Roberts et al., 2005). In many
(a) (c)
Perceptually different
Physically identical
(b) (d) 80
Large
Small
Aperture (mm)
60
40
Perceptually identical
Physically different 20
0s 1.0 s
Fig. 33.1 The effect of a size-contrast illusion on perception and action. (a) The traditional
Ebbinghaus illusion in which the central circle in the annulus of larger circles is typically seen as
smaller than the central circle in the annulus of smaller circles, even though both central circles are
actually the same size. (b) The same display, except that the central circle in the annulus of larger
circles has been made slightly larger. As a consequence, the two central circles now appear to be the
same size. (c) A 3D version of the Ebbinghaus illusion. Participants are instructed to pick up one of
the two 3D disks placed either on the display shown in panel A or the display shown in panel B.
(d) Two trials with the display shown in panel B, in which the participant picked up the small disk on
one trial and the large disk on another. Even though the two central disks were perceived as being
the same size, the grip aperture in flight reflected the real not the apparent size of the disks.
Reprinted from Current Biology, 5(6), Salvatore Aglioti, Joseph F.X. DeSouza, and Melvyn A. Goodale, Size-
contrast illusions deceive the eye but not the hand, pp. 679–85, Copyright (1995), with permission from Elsevier.
Different Modes of Visual Organization for Perception and for Action 675
experiments, the size of the surrounding circles and the distance between them and the central
circle are confounded. But whatever the critical factors might be in any particular Ebbinghaus
display, it is clear that the apparent size of the central circle is influenced the context in which it is
embedded. These contextual effects are remarkably resistant to cognitive information about the
real size of the circles. Thus, even when people are told that the two circles are identical in size
(and this fact is demonstrated to them), they continue to experience a robust illusion of size.
The first demonstration that grasping might be refractory to the Ebbinghaus illusion was car-
ried out by Aglioti et al. (1995). These investigators constructed a 3-D version of the Ebbinghaus
illusion, in which a poker-chip type disk was placed in the centre of a 2-D annulus made up of
either smaller or larger circles (Figure 33.1C). Two versions of the Ebbinghaus display were used.
In one case, the two central disks were physically identical in size, but one appeared to be larger
than the other (Figure 33.1A). In the second case, the size of one of the disks was adjusted so that
the two disks were now perceptually identical, but had different physical sizes (Figure 33.1B).
Despite the fact that the participants in this experiment experienced powerful illusion of size,
their anticipatory grip aperture was unaffected by the illusion when they reached out to pick up
each of the central disks. In other words, even though their perceptual estimates of the size of
the target disk were affected by the presence of the surrounding annulus, maximum grip aper-
ture between the index finger and thumb of the grasping hand, which was reached about 70% of
the way through the movement, was scaled to the real not the apparent size of the central disk
(Figure 33.1D).
The findings of Aglioti et al. (1995) have been replicated in a number of other studies (for
a review, see Carey, 2001; Goodale, 2011). Nevertheless, other studies using the Ebbinghaus
illusion have failed to replicate these findings. Franz et al. (2000a,b, 2001), for example, used
a modified version of the illusion and found similar (and significant) illusory effects on both
vision-for-action and vision-for-perception, arguing that the two systems are not dissociable
from one another, at least in healthy participants. These authors argued that the difference
between their findings and those of Aglioti et al. resulted from different task demands. In
particular, in the Aglioti study (as well as in a number of other studies showing that visuo-
motor control is resistant to visual illusions), subjects were asked to attend to both central
disks in the illusory display in the perceptual task, but to grasp only one object at a time in
the action task. Franz and colleagues argued that this difference in attention in the perceptual
and action tasks could have accounted for the pattern of results in the Aglioti et al. study. In
the experiments by Franz and colleagues, participants were presented with only a single disk
surrounding by an annulus of either smaller or larger circles. Under these conditions, Franz
and colleagues found that both grip aperture and perceptual reports were affected by the
presence of the surrounding annulus. The force of this demonstration, however, was under-
cut in later experiments by Haffenden and Goodale (1998), who asked participants either to
estimate the size of one of the central disks manually by opening their finger and thumb a
matching amount or to pick up it up. Even though in both cases, participants were arguably
directing their attention to only one of the disks, there was a clear difference in the effect of
the illusion: the manual estimates, but not the grasping movements were affected by the size
of the circles in the surrounding annulus.
Franz (2003) later argued the slope of the function describing the relationship between man-
ual estimates and the real size of the target object was far steeper than more ‘conventional’ psy-
chophysical measures and that, when one adjusted for the difference in slope, both action and
perception were affected to the same degree by the Ebbinghaus and by other illusions. Although
this explanation, at least on the face of it, is a compelling one, it cannot explain why Aglioti et al.
(1995), and Haffenden and Goodale (1998) found that when the relative sizes of the two target
676 Goodale and Ganel
objects in the Ebbinghaus display were adjusted so that they appeared to be perceptually identical,
the grip aperture that participants used to pick up the two targets continued to reflect the physical
difference in their size. Nor can it explain the findings of a recent study by Stöttinger and col-
leagues (2012) who showed that even when slopes were adjusted, manual estimates of object size
were much more affected by the illusion (in this case, the Diagonal illusion), than were grasping
movements.
Recently, several studies have suggested that online visual feedback during grasping could be a
relevant factor accounting for some of the conflicting results in the domain of visual illusions and
grasping. For example, Bruno and Franz (2009) have performed a meta-analysis of studies that
looked at the effects of the Müller–Lyer illusion on perception and action, and concluded that the
dissociation between the effects of this illusion on grasping and perception is mostly pronounced
when online visual feedback is available. According to this account, feedback from the fingers
and the target object during grasp can be affectively used by the visuomotor system to counteract
the effect of visual illusions on grip aperture. Further support for this proposal comes from stud-
ies that showed that visual illusions, such as the Ebbinghaus illusion, affect grasping trajectories
only during initial stages of the movement, but not in later stages, in which visual feedback can
be effectively used allow the visuomotor system to compensate for the effects of the illusory con-
text (Glover and Dixon, 2002). However, other studies that manipulated the availability of visual
feedback during grasp failed to find evidence of visual feedback on grasping performance in the
context of visual illusions (Ganel et al., 2008a; Westwood and Goodale, 2003).
The majority of studies that have claimed that action escapes the effects of pictorial illusions
have demonstrated this by finding a null effect of the illusory context on grasping movements. In
other words, they have found that perception (by definition) was affected by the illusion, but peak
grip aperture of the grasping movement was not. Null effects like this are never as compelling as
double dissociations between action and perception.
As it turns out, a more recent study has, in fact, demonstrated a double dissociation between
perception and action. Ganel and colleagues (2008a) used the well-known Ponzo illusion in
which the perceived size of an object is affected by its location within pictorial depth cues.
Objects located at the diverging end of the display appear to be smaller than those located at
the converging end. To dissociate the effects of real size from those of illusory size, Ganel and col-
leagues manipulated the real sizes of two objects that were embedded in a Ponzo display so that
the object that was perceived as larger was actually the smaller one of the pair (see Figure 33.2A).
When participants were asked to make a perceptual judgment of the size of the objects, their per-
ceptual estimates reflected the illusory Ponzo effect. In contrast, when they picked up the objects,
the aperture between the finger and thumb of their grasping hand was tuned to their actual size.
In short, the difference in their perceptual estimates of size for the two objects, which reflected the
apparent difference in the size, went in the opposite direction from the difference in their peak
grip aperture, which reflected the real difference in size (Figure 33.2B). This double dissociation
between the effects of apparent and real size differences on perception and action respectively
cannot be explained away by appealing to differences in attention or differences in slope (Franz et
al., 2001; Franz et al., 2000a,b; Franz, 2003).
In a series of experiments that used both the Ebbinghaus and the Ponzo illusions, Gonzalez
and her colleagues provided a deeper understanding of the conditions under which grasping can
escape the effects of visual illusions (Gonzalez et al., 2006). They argued that many of the earlier
studies showing that actions are sensitive to the effects of pictorial illusions required participants
to perform movements requiring different degrees of skill under different degrees of deliberate
control and with different degrees of practice. If one accepts the idea that high-level conscious
processing of visual information is mediated by the ventral stream (Milner and Goodale, 2006),
Different Modes of Visual Organization for Perception and for Action 677
(a) (b)
Long object
then it is perhaps not surprising that the less skilled, less practiced, and thus, more deliberate an
action, the greater the chances that the control of this action would be affected by ventral stream
perceptual mechanisms. Gonzalez et al. (2006) provided support for this conjecture by demon-
strating that awkward, unpracticed grasping movements, in contrast to familiar precision grips,
were sensitive to the Ponzo and Ebbinghaus illusions. In a follow-up experiment, they showed that
the effects of these illusions on initially awkward grasps diminished with practice (Gonzalez et al.,
2008). Interestingly, similar effects of practice were not obtained for right-handed subjects grasp-
ing with their left hand. Even more intriguing is the finding that grasping with the left hand, even
for many left-handed participants, was affected to a larger degree by pictorial illusions compared
with grasping with right hand (Gonzalez et al., 2006). Gonzalez and colleagues have interpreted
these results as suggesting that the dorsal-stream mechanisms that mediate visuomotor control
may have evolved preferentially in the left hemisphere, which primarily controls right-handed
grasping. Additional support from this latter idea comes from work with patients with optic
ataxia from unilateral lesions of the dorsal stream (Perenin and Vighetto, 1988). Patients with
left-hemisphere lesions typically show what is often called a ‘hand effect’ – they exhibit a deficit
in their ability to visually direct reaching and grasping movements to targets situated in both the
contralesional and the ipsilesional visual field. In contrast, patients with right-hemisphere lesions
are impaired only when they reach out to grasp objects in the contralesional field.
Although the debate of whether or not action escapes the effects of perceptual illusions is far
from being resolved (for recent findings, see Foster et al., 2012; Heed et al., 2011; van der Kamp
et al., 2012), the focus on this issue has directed attention away from the more general question
of the nature of the computations underlying visuomotor control in more natural situations. One
example of an issue that has received only minimal attention from researchers is the role of infor-
mation about object shape on visuomotor control (but see Cuijpers et al., 2004, 2006; Goodale
et al., 1994b; Lee et al., 2008) – and how that information might differ in its organization from
conventional perceptual accounts of shape processing.
678 Goodale and Ganel
Fig. 33.3 An example of a within-object illusion of shape. Although the two rectangles have an
equal width, the shorter rectangle is perceived as wider than the taller rectangle (see Ganel and
Goodale, 2003; Ben-Shalom and Ganel, 2012).
affected by relative frames of reference, the visual control of action is more analytical and is there-
fore immune to the effects of both within-object and between-objects pictorial illusions.
Recent work also suggests that there are fundamental differences in scene segmentation for
perception and action planning. It is well-established that our perceptual system parses complex
scenes into discrete objects, but what is less known is that parsing is also required for planning
visually-guided movements, particularly when more than one potential target is present. In a
recent study, Milne et al. (2013) explored whether perception and motor planning use the same
or different parsing strategies, and whether perception is more sensitive to contextual effects than
is motor planning. To do this, they used the ‘connectedness illusion’, in which observers typically
report seeing fewer targets if pairs of targets are connected by short lines (Franconeri et al., 2009;
He et al., 2009; see Figure 33.4).
Milne et al. (2013) tested participants in a rapid reaching paradigm they had developed that
requires subjects to initiate speeded arm movements toward multiple potential targets before one
of the targets is cued for action (Chapman et al., 2010). In their earlier work, they had shown that
when there were an equal number of targets on each side of a display, participants aimed their ini-
tial trajectories toward a midpoint between the two target locations. Furthermore, when the dis-
tribution of targets on each side of a display was not equal (but each potential target had an equal
probability of becoming the goal target), initial trajectories were biased toward the side of the
display that contained a greater number of targets. They argued that this behavior maximizes the
chances of success on the task because movements are directed toward the most probable location
of the eventual goal, thereby minimizing the ‘cost’ of correcting the movement in-flight. Because
it provides a behavioral ‘read-out’ of rapid comparisons of target numerosity for motor planning,
the paradigm is an ideal way to measure object segmentation in action in the context of the con-
nectedness illusion. When participants were asked to make speeded reaches towards the targets
where sometimes the targets were connected by lines, their reaches were completely unaffected by
the presence of the connecting lines. Instead, their movement plans, as revealed by their move-
ment trajectories, were influenced only by the difference in the number of targets present on each
side of the display, irrespective of whether connecting lines were there or not. Not unexpectedly,
680 Goodale and Ganel
Fig. 33.4 There appear to be fewer circles on the right than on the left, even though in both cases
there are 22 individual circles. Connecting the circles with short lines creates the illusion of fewer
circles. Even so, when our brain plans actions to these targets it computes the actual number of
targets. In the task used by Milne et al. (2013) far fewer circles were used, but the effect was still
present in perceptual judgments but not in the biasing of rapid reaching movements. In the action
task, it was the actual not the apparent number of circles that affected performance.
Reproduced from Jennifer L. Milne, Craig S. Chapman, Jason P. Gallivan, Daniel K. Wood, Jody C. Culham, and
Melvyn A. Goodale, Psychological Science, 24(8), Connecting the Dots: Object Connectedness Deceives Perception
but Not Movement Planning, pp. 1456–1465, doi:10.1177/0956797612473485, Copyright © 2013 by SAGE
Publications. Reprinted by Permission of SAGE Publications.
however, when they were asked to report whether there were fewer targets present on one side
compared with the other, their reports were biased by the connecting lines between the targets.
The work by Milne et al. (2013) suggests that scene segmentation for perception depends
on mechanisms that are distinct from those that allow humans to plan rapid and efficient
target-directed movements in situations where there are multiple potential targets. While the per-
ception of object numerosity can be dramatically influenced by manipulations of object grouping,
such as the connectedness illusion, the visuomotor system is able to ignore such manipulations,
and to parse individual objects and accurately plan, execute, and control rapid reaching move-
ments to multiple goals. These results are especially compelling considering that initial goal selec-
tion is undoubtedly based on a perceptual representation of the goal (for a discussion of this issue,
see Milner and Goodale, 2006). The planning of the final movement, however, is able to effectively
by-pass the contextual biases of perception, particularly in situations where rapid planning and
execution of the movement is paramount.
In short, the magnitude of the ‘just-noticeable difference’ (JND) increases with the magnitude
or intensity of the stimulus. The German physicist-turned-philosopher Gustav Fechner later for-
malized this basic psychophysical principle mathematically and called it Weber’s Law.
Weber’s law is one of the most fundamental features of human perception. It is not clear, how-
ever, if the visual control of action is subject to the same universal psychophysical function. To
investigate this possibility, Ganel and colleagues (Ganel et al., 2008b) carried out a series of psy-
chophysical and visuomotor experiments in which participants were asked either to grasp or to
make perceptual estimations of the length of rectangular objects. The JNDs were defined in this
study by using the standard deviation of the mean grip aperture and the standard deviation of the
mean perceptual judgment for a given stimulus. This is akin to the classical Method of Adjustment
in which the amount of variation in the responses for a given size of a stimulus reflects an ‘area of
uncertainty’ in which participants are not sensitive to fluctuations in size. Not surprisingly, Ganel
and colleagues found that the JNDs for the perceptual estimations of the object’s length showed
a linear increase with length, as Weber’s law would predict. The JNDs for grip aperture, however,
showed no such increase with object length and remained constant as the length of the object
increased (see Figure 33.5). In other words, the standard deviation for grip aperture remained
the same despite increases in the length of the object. Simply put, visually guided actions appear
to violate Weber’s law reflecting a fundamental difference in the way that object size is computed
for action and for perception (Ganel et al., 2008a,b). This fundamental difference in the psycho-
physics of perception and action has been found to emerge in children as young as 5 years of age
(Hadad et al., 2012, see Figure 33.6).
4.5 4.5
4.0 4.0
3.5 3.5
JND (mm)
JND (mm)
3.0 3.0
2.5 2.5
2.0 2.0
1.5 1.5
1.0 1.0
20 30 40 50 60 70 20 30 40 50 60 70
Object size (mm) Object size (mm)
Fig. 33.5 Effects of object size on visual resolution (Just Noticeable Difference: JND). (Left panel) The
effect of object size on JNDs for Maximum Grip Apertures (MGAs) during grasping. (Right panel)
The effect of object size on JNDs during perceptual estimations. Note that JNDs for the perceptual
condition increased linearly with length, following Weber’s law, whereas the JNDs for grasping were
unaffected by size.
Adapted from Current Biology, 18(14), Tzvi Ganel, Eran Chajut, and Daniel Algom, Visual coding for action
violates fundamental psychophysical principles, pp. R599–R601, Copyright (2008), with permission from Elsevier.
682 Goodale and Ganel
(a) (b)
6 6
5 5
4 4
JND (mm)
JND (mm)
3 3
2 2
1 1
0 0
20 25 30 35 40 45 50 20 25 30 35 40 45 50
Disk size (mm) Disk size (mm)
Fig. 33.6 JNDs for perceptual estimations (a) and for grasping (b) in different age groups. In all age
groups, JNDs for perceptual condition increased with object size, following Weber’s law. Importantly,
however, the JNDs for grasping in all groups were unaffected by changes in the size of the target.
Reproduced from Functional dissociation between perception and action is evident early in life, Bat-Sheva
Hadad, Galia Avidan, and Tzvi Ganel, Developmental Science, 15(5), pp. 653–658, DOI: 10.1111/j.1467-
7687.2012.01165.x Copyright © 2012, Blackwell Publishing Ltd.
This difference in the psychophysics of perception and action can be observed in other contexts
as well. In a recent study (Ganel et al., 2012), for example, participants were asked to grasp or to
make perceptual comparisons between pairs of circular disks. Importantly, the actual difference in
size between the members of the pairs was set below the perceptual JND. Again, a dissociation was
observed between perceptual judgments of the size and the kinematic measures of the aperture of
the grasping hand. Regardless of the whether or not participants were accurate in their judgments
of the difference in size between the two disks, the maximum opening between the thumb and
forefinger of their grasping hand in flight reflected the actual difference in size between the two
disks (see Figure 33.7). These findings provide additional evidence for the idea that the computa-
tions underlying the perception of objects are different from those underlying the visual control
of action. They also suggest that people can show differences in the tuning of grasping movements
directed to objects of different sizes even when they are not conscious of those differences in size.
The demonstrations showing that the visual control of grasping does not obey Weber’s law
resonates with Milner and Goodale’s (2006) proposal that there is a fundamental difference in the
frames of reference and metrics used by vision-for-perception and vision-for-action (Ganel et al.
2008b). This findings also converge with the results of imaging studies that suggest that the ventral
and the dorsal streams represent objects in different ways (James et al., 2002; Konen and Kastner,
2008; Lehky and Sereno, 2007). Yet, the interpretation of these results has not gone unchallenged
(Heath et al., 2011, 2012; Holmes et al., 2011; Smeets and Brenner, 2008). For example, in a series
of papers, Heath and his colleagues (Heath et al., 2011, 2012; Holmes et al., 2011) have exam-
ined the effects of Weber’s law on grip aperture throughout the entire movement trajectory and
found an apparent adherence to Weber’s law early, but not later in the trajectory of the movement.
A recent paper by Foster and Franz (2013), however, has suggested that these effects are con-
founded by movement velocity. In particular, due to task demands that require subjects to hold
their finger and thumb together prior to each grasp, subjects tend to open their fingers faster for
larger compared with smaller objects, a feature that characterizes only early stages of the grasping
Different Modes of Visual Organization for Perception and for Action 683
(a) (b)
53.0
Correct trials
52.8
Incorrect trials
52.6
52.0
51.8
51.6
51.4
51.2
51.0
Smaller disk Larger disk
Fig. 33.7 Grasping objects that are perceptually indistinguishable. (a) The set-up with examples of
the stimuli that were used. Participants were asked on each trial to report which object of the two
was the larger and then to grasp the object in each pair that was in the centre of the table (task
order was counterbalanced between subjects). (b) MGAs for correct and for incorrect perceptual
size classifications. MGAs reflected the real size differences between the two objects even in trials in
which subjects erroneously judged the larger object in the pair as the smaller one.
Reproduced from Tzvi Ganel, Erez Freud, Eran Chajut, and Daniel Algom, Accurate Visuomotor Control below
the Perceptual Threshold of Size Discrimination, PLoS One, 7(4), e36253, Figures 1 and 2 DOI: 0.1371/journal.
pone.0036253 Copyright © 2012, The Authors. This work is licensed under a Creative Commons Attribution 3.0
License.
trajectory. Therefore, the increased grip variability for larger compared with smaller objects dur-
ing the early portion of the trajectories could be attributed to velocity differences in the opening
of the fingers rather than to the effects of Weber’s law.
In their commentary on Ganel et al.’s (2008b) paper, Smeets and Brenner (2008) argue that
the results can be more efficiently accommodated by a ‘double-pointing’ account of grasping.
According to this model, the movements of each finger of a grasping hand are controlled indepen-
dently, each digit being simultaneously directed to a different location on the goal object (Smeets
and Brenner, 1999, 2001). Thus, when people reach out to pick up an object with a precision grip,
for example, the index finger is directed to one side of the object and the thumb to the other. No
computation of object size is required, only the computation of two separate locations on the
object, one for the finger and the other for the thumb. The apparent scaling of the grip to object size
is nothing more than a by-product of the fact that the index finger and thumb are moving towards
their respective end points. Smeets and Brenner go on to argue that because size is not computed
for grasping, and only location matters, Weber’s law would not apply. In other words, because
location, unlike size, is a discrete, rather than a continuous dimension, Weber’s law is irrelevant
for grasping. Smeets and Brenner’s account also comfortably explains why grasping escapes the
effects of pictorial illusions, such as the Ebbinghaus and Ponzo illusions. In fact, more generally,
their double-pointing or position-based account of grasping would appear to offer a more parsi-
monious account of a broad range of apparent dissociations between vision-for-perception and
vision-for-action than appealing to a two-visual-systems model.
Although Smeets and Brenner’s (1999, 2001) interpretation is appealing, there are several lines
of evidence showing that finger’s trajectories during grasping are tuned to object size, rather than
684 Goodale and Ganel
location. For example, van de Kamp and Zaal (2007) have shown that when one side of a target
object, but not the other is suddenly pushed in or out (with a hidden compressed-air device)
as people are reaching out to grasp it, the trajectories of both digits are adjusted in flight. In
other words, the trajectories of the both the finger and the thumb change to reflect the change in
size of the target object. Smeets and Brenner’s model would not predict this. According to their
double-pointing hypothesis, only the digit going to the perturbed side of the goal object should
change course. The fact that the trajectories of both digits show an adjustment is entirely consist-
ent with the idea that the visuomotor system is computing the size of the target object. In other
words, as the object changes size, so does the grip.
Another line of evidence that goes against Smeets and Brenner’s double-pointing hypoth-
esis comes from the neuropsychological literature. Damage to the ventral stream in the human
occipitotemporal cortex can result in visual form agnosia, a deficit in visual object recognition.
The best-documented example of such a case is patient DF, who has bilateral lesions to the lateral
occipital area rendering her unable to recognize or discriminate between even simple geometric
shapes such as a rectangle and a square. Despite her profound deficit in form perception, she is
able to scale her grasp to the dimensions of the very objects she cannot describe or recognize,
presumably using visuomotor mechanisms in her dorsal stream. As is often the case for neuro-
logical patients, DF is able to (partially) compensate for her deficits by relying on non-natural
strategies based on their residual intact abilities. Schenk and Milner (2006), for example, found
that, under certain circumstances, DF could use her intact visuomotor skills to compensate for
her marked impairment in shape recognition. When DF was asked to make simple shape clas-
sifications (rectangle/square classifications), her performance was at chance. Yet, her shape clas-
sifications markedly improved when performed concurrently with grasping movements toward
the target objects she was being asked to discriminate. Interestingly, this improvement appeared
not to depend on afferent feedback from the grasping fingers because it was found that even
when DF was planning her actions and just before the fingers actually started to move. Schenk
and Milner therefore concluded that information about an object’s dimensions is available at
some level via visuomotor activity in DF’s intact dorsal stream and this, in turn, improves her
shape-discrimination performance. For this to happen, the dorsal-stream mechanisms would
have to be computing the relevant dimension of the object to be grasped and not simply the
locations on that object to which the finger and thumb are being directed (for similar evidence
in healthy individuals, see Linnell et al., 2005). Again, these findings are clearly not in line with
Smeets and Brenner’s double-pointing hypothesis and suggest that the dorsal stream uses infor-
mation about object size (more particularly, the relevant dimension of the target object) when
engaged in visuomotor control. Parenthetically, it is interesting to note that the results of one of
the experiments in the Schenk and Milner study also provide indirect evidence that grip aper-
ture is not affected by the irrelevant dimension of the object to be grasped (Ganel and Goodale,
2003). When DF was asked to grasp objects across a dimension that was not informative of shape
(i.e., grasp across rectangles of constant width that varied in length), no grasping-induced per-
ceptual improvements in distinguishing between the different rectangles were found. This find-
ing not only shows that shape per se was not being used in the earlier tasks where she did show
some enhancement in her ability to discriminate between objects of different widths, but it also
provides additional evidence for the idea that visuomotor control is carried out in an analytical
manner (e.g. concentrating entirely on object width) without being influenced by differences in
the configural aspects of the objects.
As mentioned at the beginning of the chapter, Milner and Goodale (2006) have argued
that visuomotor mechanisms in the dorsal stream tend to operate in real time. If the target
Different Modes of Visual Organization for Perception and for Action 685
object is no longer visible when the imperative to begin the movement is given, then any
object-directed action would have to be based on a memory of the target object, a memory
that is necessarily dependent on earlier processing by perceptual mechanisms in the ventral
stream. Thus, DF is unable to scale her grasp for objects that she saw only seconds earlier,
presumably because of the damage to her ventral stream (Goodale et al., 1994a). Similarly,
when neurologically intact participants are asked to base their grasping on memory repre-
sentations of the target object, rather than on direct vision, the kinematics of their grasping
movements are affected by Weber’s law and by pictorial illusions (Ganel et al. 2008b; for
review, see Goodale, 2011). Again, without significant modification, Smeets and Brenner’s
double-pointing model does not provide a parsimonious account for why memory-based
action control should be affected by size, whereas real-time actions should not. However,
as we have already seen, according to the two-visual systems account, when vision is not
allowed and memory-based actions are performed, such actions have to rely on earlier per-
ceptual processing of the visual scene, processing that in principle is subject to Weber’s law
and pictorial illusions of size.
Conclusions
The visual control of skilled actions, unlike visual perception, operates in real time and reflects
the metrics of the real world. This means that many actions, such as reaching and grasping, are
immune to the effects of a range of pictorial illusions, which by definition affect perceptual judg-
ments. Only when the actions are deliberate and cognitively ‘supervised’ or are initiated after the
target is no longer in view do the effects of illusions emerge. All of this suggests that our perceptual
representations of objects are organized in a fundamentally different way from the visual informa-
tion underlying the control of skilled actions directed at those objects. As we have seen, the visual
perception of objects and their relations tends to be holistic and contextual with relative poor
real-world metrics, whereas the visual control of skilled actions is more analytical, circumscribed,
and metrically accurate. Of course, in everyday life, vision-for-perception and vision-for-action
work together in the production of purposive behavior – vision-for-perception, together with
other cognitive systems, selects the goal object from the visual array, while vision-for-action work-
ing with associated motor networks, carries out the required computations for the goal-directed
action. In a very real sense, then, the strengths and weaknesses of these two kinds of vision com-
plement each other in the production of adaptive behavior.
References
Aglioti, S., DeSouza, J. F., and Goodale, M. A. (1995). Size-contrast illusions deceive the eye but not the
hand. Curr Biol 5(6): 679–685.
Behrmann, M., Richler, J., and Avidan, G. (2013). Holistic face perception. In Oxford Handbook of
Perceptual Organization, edited by J. Wagemans. Oxford: Oxford University Press.
Ben-Shalom, A., and Ganel, T. (2012). Object representations in visual memory: evidence from visual
illusions. J Vision 12(7): 1–11.
Bruno, N., and Franz, V. H. (2009). When is grasping affected by the Müller-Lyer illusion? A quantitative
review. Neuropsychologia 47(6): 1421–1433.
Carey, D. P. (2001). Do action systems resist visual illusions? Trends Cogn Sci 5(3): 109–113.
Chapman, C. S., Gallivan, J. P., Wood, D. K., Milne, J. L., Culham, J. C., and Goodale, M. A. (2010).
Reaching for the unknown: multiple target encoding and real-time decision making in a rapid reach
task. Cognition 116: 168–176.
686 Goodale and Ganel
Coren, S., and Girgus, J. S. (1978). Seeing is Deceiving: the Psychology of Visual Illusions. Hillsdale,
NJ: Lawrence Erlbaum Associates.
Cuijpers, R. H., Brenner, E., and Smeets, J. B. J. (2006). Grasping reveals visual misjudgements of shape.
Exp Brain Res 175(1): 32–44.
Cuijpers, R. H., Smeets, J. B. J., and Brenner, E. (2004). On the relation between object shape and grasping
kinematics. J Neurophysiol 91(6): 2598–2606.
Culham, J. C., and Valyear, K. F. (2006). Human parietal cortex in action. Curr Opin Neurobiol 16(2): 205–212.
Duncan, J. (1984). Selective attention and the organization of visual information. J Exp Psychol Gen
113(4): 501–517.
Foster, R. M., and Franz, V. H. (2013). Inferences about time course of Weber’s Law violate statistical
principles. Vision Res 78: 56–60.
Foster, R. M., Kleinholdermann, U., Leifheit, S., and Franz, V. H. (2012). Does bimanual grasping of
the Müller-Lyer illusion provide evidence for a functional segregation of dorsal and ventral streams?
Neuropsychologia 50(14): 3392–3402.
Franconeri, S. L., Bemis, D. K., and Alvarez, G. A. (2009). Number estimation relies on a set of segmented
objects. Cognition 113: 1–13.
Franz, V. H. (2003). Manual size estimation: a neuropsychological measure of perception? Exp Brain Res
151(4): 471–477.
Franz, V. H., Fahle, M., Bülthoff, H. H., and Gegenfurtner, K. R. (2001). Effects of visual illusions on
grasping. J Exp Psychol Hum Percept Perform 27(5): 1124–1144.
Franz, V. H., and Gegenfurtner, K. R. (2008). Grasping visual illusions: consistent data and no dissociation.
Cogn Neuropsychol 25(7–8): 920–950.
Franz, V. H., Gegenfurtner, K. R., Bülthoff, H. H., and Fahle, M. (2000a). Grasping visual illusions: no
evidence for a dissociation between perception and action. Psychol Sci 11(1): 20–25.
Franz, V. H., Gegenfurtner, K. R., Bülthoff, H. H., and Fahle, M. (2000b). Grasping visual illusions: no
evidence for a dissociation between perception and action. Psychol Sci 11(1), 20–25.
Ganel, T., Chajut, E., and Algom, D. (2008b). Visual coding for action violates fundamental psychophysical
principles. Curr Biol 18(14): R599–601.
Ganel, T., Freud, E., Chajut, E., and Algom, D. (2012). Accurate visuomotor control below the perceptual
threshold of size discrimination. PloS One 7(4): e36253.
Ganel, T., and Goodale, M. A. (2003). Visual control of action but not perception requires analytical
processing of object shape. Nature 426(6967): 664–667.
Ganel, T., Tanzer, M., and Goodale, M. A. (2008a). A double dissociation between action and
perception in the context of visual illusions: opposite effects of real and illusory size. Psychol Sci
19(3): 221–225.
Glover, S., and Dixon, P. (2002). Dynamic effects of the Ebbinghaus illusion in grasping: support for a
planning/control model of action. Percept Psychophys 64(2): 266–278.
Gonzalez, C. L. R, Ganel, T., Whitwell, R. L., Morrissey, B., and Goodale, M. A. (2008). Practice makes
perfect, but only with the right hand: sensitivity to perceptual illusions with awkward grasps decreases
with practice in the right but not the left hand. Neuropsychologia 46(2): 624–631.
Gonzalez, C. L. R, Ganel, T., and Goodale, M. A. (2006). Hemispheric specialization for the visual control
of action is independent of handedness. J Neurophysiol 95(6): 3496–3501.
Goodale, M. A. (2011). Transforming vision into action. Vision Res 51(13): 1567–1587.
Goodale, M. A, Jakobson, L. S., and Keillor, J. M. (1994a). Differences in the visual control of pantomimed
and natural grasping movements. Neuropsychologia 32(10): 1159–1178.
Goodale, M. A, Meenan, J. P., Bülthoff, H. H., Nicolle, D. A., Murphy, K. J., and Racicot, C. I. (1994b).
Separate neural pathways for the visual analysis of object shape in perception and prehension. Curr Biol
4(7): 604–610.
Different Modes of Visual Organization for Perception and for Action 687
Goodale, M. A, and Milner, A. D. (1992). Separate visual pathways for perception and action. Trends
Neurosci 15(1): 20–25.
Goodale, M. A., and Milner, A. D. (2005). Sight Unseen: An Exploration of Conscious and Unconscious
Vision. New York: Oxford University Press.
Gregory, R. L. (1963). Distortion of visual space as inappropriate constancy scaling. Nature 199(678-91): 1.
Hadad, B-S., Avidan, G., and Ganel, T. (2012). Functional dissociation between perception and action is
evident early in life. Develop Sci 15(5): 653–658.
Haffenden, A. M., and Goodale, M. A. (1998). The effect of pictorial illusion on prehension and perception.
J Cogn Neurosci 10(1): 122–136.
He, L., Zhang, J., Zhou, T., and Chen, L. (2009). Connectedness affects dot numerosity
judgment: Implications for configural processing. Psychonom Bull Rev 16: 509–517.
Heath, M., Holmes, S. A., Mulla, A., and Binsted, G. (2012). Grasping time does not influence the early
adherence of aperture shaping to Weber’s law. Frontiers Hum Neurosci 6: 332.
Heath, M., Mulla, A., Holmes, S. A., and Smuskowitz, L. R. (2011). The visual coding of grip aperture
shows an early but not late adherence to Weber’s law. Neurosci Lett 490(3): 200–204.
Heed, T., Gründler, M., Rinkleib, J., Rudzik, F. H., Collins, T., Cooke, E., and O’Regan, J. K. (2011). Visual
information and rubber hand embodiment differentially affect reach-to-grasp actions. Acta Psychol
138(1): 263–271.
Holmes, S. A., Mulla, A., Binsted, G., and Heath, M. (2011). Visually and memory-guided grasping: aperture
shaping exhibits a time-dependent scaling to Weber’s law. Vision Res 51(17): 1941–1948.
James, T. W., Humphrey, G. K., Gati, J. S., Menon, R. S., and Goodale, M. A. (2002). Differential effects of
viewpoint on object-driven activation in dorsal and ventral streams. Neuron 35(4): 793–801.
Janczyk, M., and Kunde, W. (2012). Visual processing for action resists similarity of relevant and irrelevant
object features. Psychonom Bull Rev 19(3): 412–417.
Koffka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt, Brace.
Konen, C. S., and Kastner, S. (2008). Two hierarchically organized neural systems for object information in
human visual cortex. Nature Neurosci 11(2): 224–231.
Kravitz, D. J., Saleem, K., Baker, C. I., and Mishkin, M. (2011). A new neural framework for visuospatial
processing. Nature Rev Neurosci 12(4): 217–230.
Kunde, W., Landgraf, F., Paelecke, M., and Kiesel, A. (2007). Dorsal and ventral processing under
dual-task conditions. Psychol Sci 18(2): 100–104.
Lee, Y-L., Crabtree, C. E., Norman, J. F., and Bingham, G. P. (2008). Poor shape perception is the reason
reaches-to-grasp are visually guided online. Percept Psychophys 70(6): 1032–1046.
Lehky, S. R., and Sereno, A. B. (2007). Comparison of shape encoding in primate dorsal and ventral visual
pathways. J Neurophysiol 97(1): 307–319.
Linnell, K. J., Humphreys, G. W., McIntyre, D. B., Laitinen, S., and Wing, A. M. (2005). Action modulates
object-based selection. Vision Res 45(17): 2268–2286.
Milne, J. L., Chapman, C. S., Gallivan, J. P., Wood, D. K., Culham, J. C., and Goodale, M. A. (2013).
Connecting the dots: object connectedness deceives perception by not movement planning. Psychol Sci
24(8): 1456–1465.
Milner, A. D., and Goodale, M. A. (2006). The Visual Brain in Action, 2nd edn. New York: Oxford
University Press.
O’Craven, K. M., Downing, P. E., and Kanwisher, N. (1999). fMRI evidence for objects as the units of
attentional selection. Nature 401(6753): 584–587.
Perenin, M. T., and Vighetto, A. (1988). Optic ataxia: a specific disruption in visuomotor mechanisms.
I. Different aspects of the deficit in reaching for objects. Brain: J Neurol 111(3): 643–674.
Pomerantz, J. R., and Cragin, A. I. (2014). Emergent features and feature combination. In Oxford
Handbook of Perceptual Organization, edited by J. Wagemans. Oxford, U.K: Oxford University Press.
688 Goodale and Ganel
Roberts, B., Harris, M. G., and Yates, T. A. (2005). The roles of inducer size and distance in the Ebbinghaus
illusion (Titchener circles). Perception 34(7): 847–856.
Schenk, T., and Milner, A. D. (2006). Concurrent visuomotor behaviour improves form discrimination in a
patient with visual form agnosia. Eur J Neurosci 24(5): 1495–1503.
Schum, N., Franz, V. H., Jovanovic, B., and Schwarzer, G. (2012). Object processing in visual perception
and action in children and adults. J Exp Child Psychol 112(2): 161–177.
Smeets, J. B., and Brenner, E. (1999). A new view on grasping. Motor Control 3(3): 237–271.
Smeets, J. B., and Brenner, E. (2001). Independent movements of the digits in grasping. Exp Brain Res
139(1): 92–100.
Smeets, J. B., and Brenner, E. (2008). Grasping Weber’s law. Curr Biol 18(23): R1090–1091.
Stöttinger, E., Pfusterschmied, J., Wagner, H., Danckert, J., Anderson, B., and Perner, J. (2012). Getting
a grip on illusions: replicating Stöttinger et al [Exp Brain Res (2010) 202: 79–88] results with 3-D
objects. Exp Brain Res 216(1): 155–157.Thaler, L., and Goodale, M. A. (2010). Beyond distance and
direction: the brain represents target locations non-metrically. J Vision 10(3): 3.1–27.
Van de Kamp, C., and Zaal, F. T. (2007). Prehension is really reaching and grasping. Exp Brain Res
182(1): 27–34.
Van der Kamp, J., De Wit, M. M., and Masters, R. S. W. (2012). Left, right, left, right, eyes to the front!
Müller-Lyer bias in grasping is not a function of hand used, hand preferred or visual hemifield, but
foveation does matter. Exp Brain Res 218(1): 91–98.
Westwood, D. A., and Goodale, M. A. (2003). Perceptual illusion and the real-time control of action.
Spatial Vision 16(3–4): 243–254.
Section 8
Development of perceptual
organization in infancy
Paul C. Quinn and Ramesh S. Bhatt
Introduction
Even simple visual displays can have multiple interpretations. Consider the stimulus depicted
in Figure 34.1A. Why is it that most adults report perceiving an overlapping hexagon and cross,
despite the fact that other interpretations, such as those in Figure 34.1B–D, are equally physically
possible? As put by Metzger (1936/2006, p. 43, italics from original text), the ‘stimulus distribution
in the eye is always infinitely ambiguous’. One could argue that the favoured interpretation receives
support from language and instruction, given that during development we come to learn that the
labels ‘hexagon’ and ‘cross’ refer to those particular constellations of contours. However, the rapid
emergence of visual cognition (with many grouping phenomena evident in the initial months of
life), combined with the difficulty of the problem, suggests that the development of perceptual
organization results from the imposition of strong constraints (Quinn et al. 2008a). This chapter
will take up the task of identifying those constraints and explicating their developmental deter-
minants. In particular, we will examine how the constraints are a mix of the inherent operational
characteristics of the visual system and the learning engendered by a structured environment
(Bhatt and Quinn 2011). First, however, we consider some theoretical accounts of the ontogeny
of perceptual organization.
(b)
Fig. 34.1 (a) Configuration of contours perceived as a hexagon and cross, even though one could
just as readily perceive (b), (c), and (d).
Reproduced from Metzger, Wolfgang. Translated by Lothar Spillman., Laws of Seeing, figure 27, © 2006
Massachusetts Institute of Technology, by permission of The MIT Press.
point for infants is an unorganized ‘mosaic of sensory impressions’ (Zuckerman and Rock 1957,
p. 278), and experience with different shapes and forms must somehow induce the transforma-
tion of sensory data into bounded regions. Such transformation is presumably mediated through
memory but, according to Zuckerman and Rock, if that memory consists of amorphous sensa-
tions rather than cohesive shapes then it is unclear how it could lead to subsequent organized
percepts. Instead, it is simpler to assume that innate organizing processes account for the initial
structuring of visual displays into coherent patterns. As summarized by Zuckerman and Rock
(1957, p. 291), ‘the organization of the visual field into shaped areas is not an outcome of learn-
ing—past experience cannot carve visual form out of initially formless perception’.
Learning accounts
Two other views of the development of perceptual organization have proposed mechanisms that
allow one to more readily envision how organization could emerge, even if it is not the initial start-
ing point. For Hebb (1949), perception of a whole object is a learned process that is founded in per-
ception of the individual features of the object and the integration of those perceptions as achieved
through eye movements. As described by Hebb (1949, p. 83), ‘If line and angle are the bricks from
which form perceptions are built, the primitive unity of the figure might be regarded as the mortar,
and eye movement as the hand of the builder’. For Hebb, the emergence of perceptual organiza-
tion would take considerable developmental time because of dependence on improvements in eye
movements that yield more holistic perceptions as visual scanning becomes more systematic.
Another account of the emergence of perceptual organization relies neither on inherent con-
straints nor on perceptual learning that occurs from the development of visual scanning, but
rather on the learning of probabilistic image statistics derived from regularities in the environ-
ment (Brunswik and Kamiya 1953; Elder and Goldberg 2002; Elder, this volume). Consider the
organizing principle of proximity, which specifies that close elements will be grouped together. In
the Brunswik and Kamiya view, proximity may actually be learned because image elements that
correspond to the same object are likely to be closer to each other than elements that correspond
to different objects. Likewise, in the case of lightness similarity, discontinuities in luminance cues
are correlated with boundaries where one object ends and another begins. The discovery of such
Development of Perceptual Organization in Infancy 693
correlations by infants can presumably be used as a basis for integrating sequences of elements
that project from common structures in a visual scene.
With different theorists offering differing accounts of the development of perceptual organiza-
tion, some stressing innate grouping factors and others emphasizing ways in which visual order
could emerge through maturation of internal mechanisms or experience with a structured envi-
ronment, we turn to a discussion of the evidence.
Configural superiority
A strategy for researchers interested in the start-up of visual cognition has been to take empirical
phenomena supportive of a particular mental faculty in adults and adapt looking-time procedures
to study those same phenomena in infants. One such occurrence relevant to perceptual organi-
zation is the configural-superiority effect (Pomerantz 1981; Chapter 26, this volume). In adults,
configural superiority is in evidence when the mirror image line elements shown in Figure 34.2A
are found easier to discriminate when embedded in the non-informative contextual frame shown
in Figure 34.2B (Pomerantz et al. 1977). This result poses difficulty for feature analytic models
of visual processing, because if one were processing only the features of the visual forms (i.e. the
individual line segments), then the stimuli in Figure 34.2B should be more easily confused than
those in Figure 34.2A given the overlap of features in the horizontal and vertical line segments.
Instead, the finding suggests that emergent relations between features (i.e. angles, corners, whole
forms) are represented when processing visual patterns.
It could be argued that the configural-superiority effect shown in Figure 34.2A and B is lin-
guistically based given that labels such as ‘arrow’ versus ‘triangle’ may generate an acquired
694 Quinn and Bhatt
Global precedence
Another perceptual effect that has been considered as evidence of organization in adults and that
has been of interest to developmentalists is the global-precedence effect (Navon 1977; Kimchi,
this volume). In the procedure used to generate this effect, adult observers are presented with a
multilevel stimulus consisting of a large letter made from small letters. The global letter matches
or does not match the local letter and the observer’s task is to identify either the global letter or
the local letters. The key findings are that: (1) response times are faster to the global letter, (2) con-
flicting local letters do not impact upon processing at the global level, and (3) a conflicting global
letter interferes with processing of the local letters. This pattern of outcomes indicates that global
aspects of a stimulus are processed and recognized before local aspects.
Ghim and Eimas (1988) investigated whether a global precedence effect could be demonstrated
in young infants. In one condition, 3- to 4-month-old infants were familiarized with a global
square made up of local squares followed by either a local or global preference test. The local test
contrasted a pair of global diamond stimuli, one constructed from local squares and the other
from local diamonds. By contrast, the global test paired a global square with a global diamond,
each composed of novel local diamonds. If global precedence is occurring, then in the local test,
the novelty at the global level would lead infants to divide their attention evenly between the two
Development of Perceptual Organization in Infancy 695
stimuli, even though there is a source of novelty at the local level residing in the local diamonds.
However, in the global test, infants should prefer the global diamond, even though there is a com-
peting source of novelty from the local diamonds. These predictions were confirmed: infants in
the local test did not respond differentially, whereas those in the global test preferred the global
diamond (even though a control condition showed that infants were sensitive to the change in the
local elements). The findings provide evidence that, as is the case with adults, global information
has a processing advantage over local information in young infants (see also Frick et al. 2000).
Subjective contours
Yet another manifestation of organization in adult vision is the perception of subjective con-
tours (Kanizsa 1955; van Lier and Gerbino, this volume). Consider Figure 34.2C: adults perceive
a white square atop some pacman shapes. The contour appears to continue across the white
space between the shapes, thereby suggesting a completion process. Although one can argue for
a top-down explanation and suggest that the completion process is facilitated by knowledge of
the square form, this explanation is weakened by demonstrations that infants perceive illusory
contours (Ghim 1990; Johnson and Aslin 1998; Kavsek 2002; Hayden et al. 2008). For example,
Ghim (1990) reported that 3- to 4-month-olds were more likely to display novelty preferences in
tasks involving a pattern that elicited the perception of subjective contours (Figure 34.2C) versus
one that did not (Figure 34.2D) relative to tasks involving two patterns neither of which produced
subjective contours. In addition, after familiarization with an outline square, infants preferred
a pattern that did not produce subjective contours (Figure 34.2D) to one that did produce the
illusory square in adults (Figure 34.2C). This evidence suggests that, like adults, young infants are
capable of a completion process that produces the perception of subjective contours.
Demonstrations of configural superiority, global precedence, and subjective contours in infants
suggest that at least some of the mechanisms that produce perceptual organization in adults are
also functional in the initial months of life. However, these demonstrations are less informative
about how infants relate individual elements to each other. For example, in the cases of configural
superiority and global precedence, was it the Gestalt principles of closure, good continuation,
proximity, lightness similarity, or form similarity or some combination that allowed infants to
organize the patterns? Similarly, in the case of subjective contours, any of the above principles,
with the exception of proximity, could be involved. To better identify which specific grouping fac-
tors are functional during early development, some investigators have taken the approach of stud-
ying how infants will respond to displays of elements that could be organized by one or another
principle. We now turn to a discussion of these studies.
infants preferred the broken rod. An additional experiment that pitted common motion against good
continuation and similarity confirmed that it was common motion alone rather than the combination
of common motion, good continuation, and similarity that enabled infants to group the rod. Moreover,
using a similar methodology, Spelke (1982) reported that same-aged infants perceived the continuity
of two adjacent objects as long as their surfaces were contiguous and even when those surfaces were
dissimilar in size, shape, and textural markings.
These results led Spelke (1982) to develop a hybrid account of the development of object
perception, incorporating innate organizing principles as well as a role for learning based on
experience with a structured environment. Specifically, Spelke argued that infants at birth are
constrained by two core organizational principles, common movement and connected surface.
Adherence to these principles would parse from a visual scene those surfaces that move together
and maintain their coherence as they move and grant them the status of objects. The resulting
object ‘blobs’ can then be tracked over real time. Such experience, according to Spelke, allows
infants to discover that objects exhibit other properties including proximity of parts, similarity
of surface, and good continuation of contour (Brunswik and Kamiya 1953). In this way, some of
the principles that were considered to be innate organizing principles by the Gestaltists were, by
the Spelke account, learned through their natural correlation with the core principles.
Lightness similarity
Quinn et al. (1993) asked whether 3-month-olds could utilize lightness similarity to organize col-
umns or rows of elements that could be grouped only on the basis of their lightness versus darkness
(see also Quinn and Bhatt 2006). The test stimuli were horizontal versus vertical bars (see Figure
34.3, top panel). If the organization in the row and column arrays is apprehended, then infants
familiarized with columns should prefer horizontal bars and infants familiarized with rows should
prefer vertical bars. The findings provided positive evidence for use of lightness similarity: infants
preferred the novel organization of bars. An additional control experiment showed that infants
could discriminate between arrays differing in the shape (square versus diamond) of the dark or
light elements. This latter finding mitigates explanations of the preference for the novel organization
based on immature resolution acuity and indicates that infants were able to perceive the individual
elements of the displays and organize them into larger perceptual units (i.e. rows versus columns)
based on lightness similarity. Of note is that Farroni et al. (2000) used a similar methodology to
Development of Perceptual Organization in Infancy 697
Luminance
Familiar
Test
vs.
Proximity
Familiar Test
Columns vs.
Rows vs.
Fig. 34.3 Luminance (top panel): familiarization and test stimuli used in the study of Quinn et al.
(1993) investigating whether 3-month-old infants can organize visual patterns in accord with
lightness similarity. Proximity (bottom panel): familiarization and test stimuli used to determine
whether infants adhere to proximity when organizing visual patterns.
Reprinted from Acta Psychologica, 127(2), Paul C. Quinn, Ramesh S. Bhatt, and Angela Hayden, Young infants
readily use proximity to organize visual pattern information, pp. 289–98, doi: 10.1016/j.actpsy.2007.06.002
Copyright (2008), with permission from Elsevier.
argue that even newborns adhere to lightness similarity when organizing visual patterns; however,
because that study did not determine if the individual light elements could be resolved, it left open
the question of whether the displays were organized via the proximity of the dark elements.
Proximity
Another classic grouping principle investigated was proximity (Quinn et al. 2008b). As shown
in the bottom panel of Figure 34.3, using the same methodology as Quinn et al. (1993), 3- to
4-month-olds were presented with arrays of elements that could be organized into rows or col-
umns via proximity, and then tested with horizontal versus vertical bars. Infants preferred the test
stimuli with the novel organization, and subsequent control experiments indicated that the pref-
erences were not attributable to an a priori preference or to an inability to resolve elements within
the rows and columns. The results indicate that proximity joins lightness similarity as a grouping
principle that can be used to organize visual patterns by young infants.
Good continuation
A third classic static principle investigated was good continuation (Quinn and Bhatt 2005a). In con-
trast to the column versus row methodology used to study lightness similarity and proximity, a
methodology was adopted that had been used to investigate good continuation grouping by adults
698 Quinn and Bhatt
(b)
Fig. 34.4 Examples of the familiarization and test stimuli used in Quinn and Bhatt (2005a). The
in-line condition is depicted in (a) and the off-line condition in (b).
(Prinzmetal and Banks 1977). The displays (shown in Figure 34.4) consisted of a line of circular dis-
tracters and a square or diamond target. Infants were presented with one pattern and then tested for
discrimination between the familiar pattern and a novel one. In the top panel A, the target appeared
in line, embedded, or aligned (and thus in good continuation) with the distracters, whereas in the
bottom panel B, the target was off line with the distracters. The expectation is that if infants per-
ceived the patterns in accord with good continuation, then the change in the target should be more
difficult to detect when the target is in a good continuation relation with the distracters, as in the
in-line condition. By contrast, in the off-line condition, the target would not group with the distract-
ers and would retain its status as an independently processed unit of information, thereby increasing
the likelihood that a change in its form would be detected. Three- to 4-month-olds preferred the
novel test stimulus in the off-line condition, but not in the in-line condition. This evidence suggests
that good continuation is a third organizational principle available to young infants.
Form similarity
The functionality of form similarity in young infants was examined by Quinn et al. (2002),
who drew upon the methodology that was used to investigate lightness similarity and prox-
imity. As shown in Figure 34.5, 3- to 4-month-olds were familiarized with rows or columns
of Xs versus Os, and then tested with horizontal versus vertical bars. If infants group the
familiarization stimulus into rows or columns via form similarity, then they should prefer the
novel organization of bars. However, the infants did not display such a preference, even when
familiarization time was doubled; instead, attention was divided between the test stimuli.
A control study showed that infants were capable of discriminating between the familiari-
zation arrays and arrays that consisted entirely of Xs or Os. This latter result indicates that
failure of the infants to use form similarity was not due simply to an inability to discriminate
between the constituent X and O shapes.
With the data thus far described not demonstrating the use of form similarity by young infants,
Quinn et al. (2002) tested older infants aged 6 to 7 months on the form similarity task. This age
group preferred the novel organization. Thus, 6- to 7-month-olds, but not 3- to 4-month-olds, can
organize visual patterns in accord with form similarity. In combination with outcomes indicat-
ing that 3- to 4-month-olds can utilize lightness similarity, proximity, and good continuation to
organize visual patterns under similar testing conditions (Quinn et al. 1993, 2008b; Quinn and
Bhatt 2005a); the results indicating that only 6- to 7-month-olds can use form similarity suggest
Development of Perceptual Organization in Infancy 699
Familiar
Test
or vs.
Fig. 34.5 Examples of the familiarization and test stimuli used to test for perceptual organization by
form similarity in Quinn et al. (2002).
Reproduced from Paul C. Quinn, Ramesh S. Bhatt, Diana Brush, Autumn Grimes, and Heather Sharpnack,
Psychological Science, 13(4), Development of Form Similarity as a Gestalt Grouping Principle in Infancy,
pp. 320–328, doi: 10.1111/1467-9280.00458, Copyright © 2002 by SAGE Publications. Reprinted by Permission
of SAGE Publications.
that different Gestalt principles may become functional over different time courses of develop-
ment and that not all principles are readily deployed.
The findings are inconsistent with a strict Gestalt view that all organizing principles are
automatically activated upon first encounter with a visual pattern (e.g. Köhler 1929). The
data are, however, consistent with evidence indicating that adults have independent lumi-
nance- and edge-based grouping mechanisms (Gilchrist et al. 1997). They are also in accord
with the finding that some visual agnosics show intact lightness similarity and proximity
grouping, but impaired shape configuring and form-based grouping ability (Behrmann and
Kimchi 2003; Humphreys 2003), and the result that individuals with Williams syndrome
show superior lightness similarity and good continuation grouping abilities relative to those
for form similarity (Farran 2005). The developmental evidence contrasting the time course
of emergence of the principles of proximity and form similarity in infants is moreover con-
sistent with microgenetic evidence in adults indicating that proximity grouping occurs more
rapidly than form-based grouping in the time course of processing (Ben-Av and Sagi 1995;
Han et al. 1999). However, we now consider evidence indicating that the inability of young
infants to use form to organize visual images is not absolute.
the horizontal and vertical bars. This result suggests that young infants’ inability to organize by
form similarity is not a specific deficit with Xs versus Os, but rather a more general phenomenon.
A second attempt to determine if 3- to 4-month-olds could be induced to use form similarity employed
a training regime. Specifically, Quinn and Bhatt (2005b) asked whether variations in the patterns used to
depict rows or columns during familiarization would enhance infants’ performance in the form similarity
task. One may reason that pattern variation will facilitate performance because the invariant organization
of the stimuli will be more easily detected against a changing background. In other words, variation might
provide infants with the opportunity to form concepts of ‘rows’ or ‘columns’. To investigate this possibil-
ity, the form similarity task that had previously produced null results (when each of the three different
form contrasts was presented individually) was administered, but in this instance with each of the three
form contrasts presented during a single familiarization session (see Figure 34.6). Younger infants now
preferred the novel organization of bars. This striking result suggests that 3- to 4-month-olds can use form
similarity to organize elements if they are provided with varied examples with which to abstract the invar-
iant arrangement of the pattern. The outcome is theoretically significant because it demonstrates that
perceptual learning may play a role in acquiring some aspects of visual organization. Moreover, following
Goldstone’s (2003) proposal that one mechanism by which perceptual learning occurs is by increasing
attention to relevant information and decreasing attention to irrelevant information, Bhatt and Quinn
(2011) have suggested that variability led to grouping based on shape similarity because it enhanced
infant attention to global structures and diminished attention to local elements.
Familiar or or or
Test vs.
Fig. 34.6 Familiarization and test stimuli used in Quinn and Bhatt (2005b).
Reproduced from Paul C. Quinn and Ramesh S. Bhatt, Psychological Science, 16(7), Learning Perceptual
Organization in Infancy, pp. 511–515, doi: 10.1111/j.0956-7976.2005.01567.x, Copyright © 2005 by SAGE
Publications. Reprinted by Permission of SAGE Publications.
Development of Perceptual Organization in Infancy 701
introduced by Palmer and Rock in the 1990s (Rock and Palmer 1990; Palmer 1992; Palmer and
Rock 1994; see also Brooks, this volume).
Connectedness
Rock and Palmer (1990) described the principle of connectedness as the visual system’s tendency to
group together connected entities, and remarked that ‘connectedness . . . may be the most fundamental
principle of grouping yet uncovered’ (Rock and Palmer 1990, p. 86). To determine whether sensitivity
to connectedness is operational in early infancy, as shown in Figure 34.7, infants as young as 3 months
of age were habituated to the connected patterns shown in panels A or B, and then administered a pref-
erence test pairing connected elements (panel C) with disconnected elements (panel D) (Hayden et al.
2006). The expectation was that if the infants organize the habituation patterns on the basis of connect-
edness, then they should display a novelty preference for the disconnected-element test stimulus. This
outcome was observed, and a control condition showed that it could not be attributed to a spontaneous
preference. The results indicate that young infants are sensitive to the connectedness principle.
Common region
Another newer grouping principle is common region, which states that elements within a region
are grouped together and separated from those in other regions (Palmer 1992). Palmer has also
proposed that common region is driven by a characteristic that is external to the elements them-
selves. In other words, the ‘common region’ quality that engenders grouping of elements is not
inherent in the elements themselves. By contrast, other grouping principles such as similarity
are based on intrinsic characteristics of the elements to be grouped. Palmer thus distinguished
between ‘extrinsic’ versus ‘intrinsic’ organizational cues and suggested that common region is an
extrinsic cue. This distinction raises the possibility that common region could be a different kind
Habituation
(a) (b)
or
Test
(c) (d)
vs.
Fig. 34.7 The stimuli used in Hayden et al. (2006). Infants in the habituation conditions were
habituated to the connected patterns in panels (a) or (b) and tested with the patterns in panels (c)
and (d). Infants in the no-habituation condition were tested with the patterns in panels (c) and (d)
without prior exposure to the patterns in panels (a) and (b).
Reproduced from Psychonomic Bulletin & Review, 13(2), pp. 257–261, Infants’ sensitivity to uniform
connectedness as a cue for perceptual organization, Angela Hayden, Ramesh S. Bhatt, and Paul C. Quinn,
Copyright © 2006, Springer-Verlag. With kind permission from Springer Science and Business Media.
702 Quinn and Bhatt
of organizational cue from many others, thereby adding to the importance of understanding its
emergence in infants.
To examine whether young infants use common region to organize visual patterns, 3- to
4-month-olds were familiarized with a display consisting of two pairs of shapes, with one pair
(e.g. A and B) located together in a region and the other pair (e.g. C and D) located together in
another region (see Figure 34.8) (Bhatt et al. 2007). The locations of the individual shapes changed
from one trial to the next, but the shapes A and B always shared a region while the shapes C and
D shared another region. Infants were then tested with a within-region grouping (e.g. AB) versus
a between-region grouping (e.g. BC; see Figure 34.8). Importantly, because the physical distance
between A and B versus B and C was equivalent, the only difference between the A and B versus
B and C pairs was that the former pair shared the same region, whereas the members of the lat-
ter pair were from different regions. If common region is functional in infancy, then the A and B
elements should be grouped together because they always shared the same region. That is, infants
should find the within-region grouping to be familiar and the between-region grouping to be novel,
and respond differentially to these patterns during the test.
Another aspect of the work of Bhatt et al. (2007) is that it asks whether grouping will carry over
to novel regions, given that infants were habituated to vertical regions and tested with horizontal
Habituation
Trial 1
Trial 2
Test
Between region Within region
Fig. 34.8 Examples of the stimuli used in Bhatt et al. (2007). Infants were habituated to two pairs of
shapes, with one pair sharing a vertical region and the other pair a different vertical region. Infants
were then tested for their preference between a pair of shapes that had shared a common region
during habitation (within-region pair) versus a pair of shapes that had been in different regions during
habituation (between-region pair), both presented in novel horizontal regions.
Reproduced from Perceptual Organization Based on Common Region in Infancy, Ramesh S. Bhatt, Angela Hayden,
and Paul C. Quinn, Infancy, 12(2), pp. 147–168, Copyright © 2007 International Society on Infant Studies.
Development of Perceptual Organization in Infancy 703
regions. This manipulation allows one to determine whether the perceptual system expects group-
ing to remain intact when presented with elements that were previously grouped based on one set
of regions are subsequently encountered in novel regional configurations. Presumably, if grouping
and perceptual organization are to be functionally advantageous, they need to allow the world to
be structured into meaningful entities that transcend particular situations.
The major result from Bhatt et al. (2007) was that the infants discriminated the grouping of
elements from different regions from the grouping of elements that had shared a common region
during habituation. Moreover, Hayden et al. (2008) extended these results to regions formed by
illusory contours. The findings that infants are sensitive to common region suggest that the extrin-
sic nature of this cue did not preclude its role as an organizing factor. In other words, infants,
like adults, are not solely dependent upon the intrinsic nature of elements to organize them; they
are able to use extrinsic factors such as common region to organize. Additionally, the result that
performance transferred from differently shaped regions from familiarization to test provides
evidence that the perceptual organizational abilities of infants can produce processing units of
an abstract nature. This latter result actually points toward a unitization process by which previ-
ously disparate elements become grouped and begin to function as coherent units in new contexts
(Goldstone 2003; Bhatt and Quinn 2011).
Perceptual scaffolding
Given transfer between lightness and form similarity, one can inquire as to whether evidence might be
found for perceptual scaffolding, a process by which learning based on an already functional organizational
704 Quinn and Bhatt
principle enables an organizational process that is not yet functional. That is, might infants who are oth-
erwise not able to group based on an organizational principle be induced to do so if they are previously
allowed to group elements based on an already functional organizational process? To answer this ques-
tion, Quinn and Bhatt (2009) capitalized on previous evidence showing that 3- to 4-month-old infants
readily organize via lightness similarity (Quinn et al. 1993), whereas organization by form similarity is not
readily exhibited until 6 to 7 months of age (Quinn et al. 2002), and administered the procedure depicted
in Figure 34.9 (top panel) to a group of 3- to 4-month-olds. The younger infants succeeded in the task,
thereby showing that the already developed luminance-based organizational system facilitated grouping
based on form similarity. This conclusion was upheld by the null performance of a control group of 3- to
Luminance Shape
Shape Shape
Familiar stimulus Test stimuli
Fig. 34.9 Illustrations of the luminanceshape (top panel) and shapeshape tasks (bottom panel)
presented to infants by Quinn and Bhatt (2009) to examine whether infants will learn to use shape cues to
organize if presented in the context of organization based on luminance cues.
Reproduced from Paul C. Quinn and Ramesh S. Bhatt, Psychological Science, 20(8), Transfer and Scaffolding
of Perceptual Grouping Occurs Across Organizing Principles in 3- to 7-Month-Old Infants, pp. 933–938, doi:
10.1111/j.1467-9280.2009.02383.x, Copyright © 2009 by SAGE Publications. Reprinted by Permission of SAGE
Publications.
Development of Perceptual Organization in Infancy 705
4-month-olds who were familiarized and tested with the form elements shown in Figure 34.9 (bottom
panel). Taken together, the results highlight a scaffolding process that may engender learning by enabling
infants to group based on a new cue using an already functioning organizational process. Importantly,
this work demonstrates that new organizational principles can be learned via bootstrapping onto already
functioning organizational principles, as Spelke (1982) had suggested.
A salience hierarchy?
Although the chapter has thus far documented that a variety of organizational principles are oper-
ational in infants, what has not yet been discussed is whether there is differential salience among
the cues. That is, are there differences in cue salience when multiple cues are concurrently avail-
able in a stimulus display presented to infants? This question derives significance because of the
previously discussed differences in how readily principles such as lightness similarity and form
similarity are deployed, and because of arguments that connectedness may be the most funda-
mental of all the principles (Rock and Palmer 1990).
In an initial experiment that tested the salience of connectedness versus form similarity, 6- to
7-month-olds were habituated to a pattern that could be organized on the basis of both connect-
edness and shape similarity (Hayden et al. 2009). The stimuli contained alternating rows or col-
umns of two different shapes (Xs and Os). The shapes were connected by a black bar in the same
configuration (rows or columns) in which the shapes were organized (see Figure 34.10). Following
habituation, infants were tested with a pair of new stimuli: one in which connectedness was altered
(by breaking the connectedness among the shapes), and the other in which shape organization
was altered (a change from rows to columns or vice versa). The connectedness manipulation was
accomplished by positioning the previously connecting lines higher, rather than using shorter
lines in their original familiarization location, to keep the total amount of contour constant across
the displays. X–O stimuli were used to depict the shape contrast; while one could have used alter-
native displays to depict the shape contrast (e.g. square versus diamond), several different shape
contrasts presented to infants have yielded equivalent grouping results (Quinn and Bhatt 2005b).
If one of the perceptual organizational cues (connectedness versus shape similarity) was more
salient than the other, the change induced by the manipulation of this cue should be more novel
and the infants should look longer at this pattern than at the pattern in which the less salient cue
was altered. The key finding was that infants preferred the pattern displaying the change in con-
nectedness, a result suggesting that connectedness is more salient than shape similarity.
Hayden et al. (2009) next examined the salience relations of connectedness and lightness simi-
larity by repeating their experimental procedure, except that the patterns previously organized
by shape (i.e. X versus O) were now organized by lightness (i.e. dark versus light squares). In
this case, infants preferred to look at the pattern displaying a luminance change to a significantly
greater degree than the pattern displaying a connectedness change, a result suggesting that lumi-
nance similarity was more salient than connectedness. The pattern of the results of Hayden et al.
(2009) provide evidence that there is a luminance–connectedness–shape salience hierarchy oper-
ating among the organizational cues to which 6- to 7-month-olds have been shown to be sensitive.
Test trials
Fig. 34.11 Examples of the familiarization and test stimuli used in Quinn and Schyns (2003) and
Quinn et al. (2006). If the infants can parse the circle from the familiar patterns in accord with good
continuation, then they should prefer the pacman shape over the circle during the test trials.
Reproduced from What goes up may come down: perceptual process and knowledge access in the organization
of complex visual patterns by young infants, Paul C. Quinn and Philippe G. Schyns, Cognitive Science, 27(6),
pp. 923–35, Copyright © 2003, Cognitive Science Society, Inc.
Familiarization trials
Test trials
Fig. 34.12 Examples of the familiarization and test stimuli used in Quinn and Schyns (2003) and Quinn
et al. (2006). If the infants can extract the invariant pacman from the familiar patterns, then they should
prefer the circle shape over the pacman shape during the test trials.
Reproduced from What goes up may come down: perceptual process and knowledge access in the organization
of complex visual patterns by young infants, Paul C. Quinn and Philippe G. Schyns, Cognitive Science, 27(6),
pp. 923–35, Copyright © 2003, Cognitive Science Society, Inc.
708 Quinn and Bhatt
of perceptual units organized by good continuation. The bias set by good continuation can thus be
thought of as soft-wired. More generally, an individual’s history of categorization can affect their
subsequent organizational processes.
Conclusions
This chapter has reviewed evidence on the development of perceptual organization, described
against a backdrop of different theoretical views, including those that emphasize innate organiz-
ing principles and others that highlight perceptual learning. The studies clearly show that several
phenomena that have been taken as evidence of perceptual organization in adults, such as con-
figural superiority, global precedence, and subjective contours, can be demonstrated in infants.
The data also suggest that different organizational principles may become functional over dif-
ferent time courses of development, may be governed by different developmental determinants,
i.e. maturation versus experience, have differential salience, and that not all principles are readily
deployed in the manner originally proposed by Gestalt theorists.
The principles were additionally shown to be flexible in their operation in terms of producing
units of processing that would transfer across different displays organized by the same principle
and also across different principles. In this sense, the units produced by the infant’s organizational
processes may be regarded as conceptual-like in their generalizability.
To comment further on the differences among grouping principles, there is evidence for early
functionality of classic organizational principles that include common motion, good continua-
tion, lightness similarity, and proximity, as well as for the modern organizational principles of
common region and connectedness. By contrast, form similarity was shown to develop later
and not be as readily deployed. However, form similarity was shown to be activated when young
infants were provided with multiple element contrasts, thereby suggesting a role for perceptual
learning in its emergence. Form similarity was also activated when pulled along by the already
functional principle of lightness similarity, thus demonstrating a perceptual scaffolding process by
which new organizational principles can be learned.
Overall, the evidence points to a hybrid model to explain the development of perceptual
organization. As contended by the Gestaltists (Wertheimer 1923/1958; Köhler 1929; Koffka 1935;
Metzger 1936/2006), as well as Zuckerman and Rock (1957), a number of grouping principles
are operational in the early months. However, as contended by Hebb (1949) and Brunswik and
Kamiya (1953), other principles may be learned through perceptual experience (Bhatt and Quinn
2011). The data actually lend support to the type of model proposed by Spelke (1982) in which
some start-up principles enable other principles to be bootstrapped onto them.
As we look to the future, there are a number of aspects of the development of perceptual
organization that are likely to be subject to further empirical inquiry. First, there are few stud-
ies of perceptual organization in newborns, with the majority of studies being conducted with
infants aged 3 months or older. Additional work on the functionality of the principles from
birth to 3 months of age has the potential to change our understanding of what competencies
are part of the infant’s initial endowment. Second, given evidence that the development of per-
ceptual organization continues into adolescence (e.g. Kovacs 2000; Kimchi et al. 2005; Hadad
and Kimchi 2006; Scherf et al. 2009; Hadad et al. 2010), we need to know more about how the
perceptual organizing abilities of infants are both continuous and discontinuous with those of
children and young adults.
A third issue centers on the mechanisms by which infants learn perceptual organization. In the
sections on Further Work on Perceptual Grouping . . . and Relations Among the Principles, we
reviewed studies showing that variability exposure and scaffolding based on already functional
Development of Perceptual Organization in Infancy 709
organizational principles facilitate the use of new organizational cues in infancy. Moreover, Bhatt
and Quinn (2011) have suggested attentional enhancement and unitization (Goldstone 2003) as
mechanisms that underlie perceptual learning in infancy. By attentional enhancement, we refer
to an increased weighting of global structure in situations that allow infants to be exposed to dif-
ferent element contrasts depicting a common organization. By unitization, we mean the process
by which elements are grouped via adherence to one organizational principle, and continue to be
combined in novel contexts organized by the same principle or even different principles, thereby
functioning as higher-order building blocks. Future research will need to address these and other
proposals (e.g. Johnson 2010) concerning the nature of learning that contributes to the develop-
ment of perceptual organization.
In addition, we know little of the cognitive neuroscience underlying development of perceptual
organization in infants (for an exception see Csibra et al. 2000). What neural correlates underlie
development of the different grouping principles? Also, given recent advances in our abilities to
track the eye movements of infants as they scan visual displays, what is the role of eye movements
in the establishment of perceptual organization? Although eye movements may not play quite the
defining role that was proposed by Hebb (1949), there is evidence of correlation between visual
scanning and perceptual completion for displays of partly occluded objects (Johnson et al. 2004).
Furthermore, while figure–ground segregation has been an area of investigation in the literature
on adult perceptual organization (e.g. Peterson 1994; Vecera et al. 2002), we know little about pro-
cesses of figure–ground segregation in infants. Finally, it will be interesting to learn how well the
grouping principles described here as being functional for a variety of two-dimensional displays
can scale up to organizing even more complex three-dimensional displays (e.g. Soska and Johnson
2008; Vrins et al. 2011). Continuing investigation on these and the other topic areas reviewed in
this chapter is likely to shed further light on the question of how we come to establish perceptual
organization in the domain of vision.
Acknowledgements
Preparation of this chapter was supported by grant HD-46526 from the National Institute of Child
Health and Human Development. We thank Johan Wagemans and two anonymous reviewers for
their comments. Correspondence should be sent to Paul C. Quinn, Department of Psychology,
University of Delaware, Newark, DE 19716, USA. E-mail: pquinn@udel.edu.
References
Behrmann, M. and R. Kimchi (2003). ‘What does visual agnosia tell us about perceptual organization and
its relationship to object perception?’. J Exp Psychol: Human Percept Perform 29: 19–42.
Ben-Av, M. B. and D. Sagi (1995). ‘Perceptual grouping by similarity and proximity: experimental results
can be predicted by autocorrelations’. Vision Res 35: 853–866.
Bhatt, R. S., A. Hayden, and P. C. Quinn (2007). ‘Perceptual organization based on common region in
infancy’. Infancy 12: 147–168.
Bhatt, R. S. and P. C. Quinn (2011). ‘How does learning impact development in infancy? The case of
perceptual organization’. Infancy 16: 2–38.
Bomba, P. C., P. D. Eimas, E. R. Siqueland, and J. L. Miller (1984). ‘Contextual effects in infant visual
perception’. Perception 13: 369–376.
Brunswik, E. and J. Kamiya (1953). ‘Ecological cue validity of ‘proximity’ and other gestalt factors’. Am
J Psychol 66: 20–32.
Colombo, J., C. A. Laurie, T. A. Martelli, and B. R. Hartig (1984). ‘Stimulus context and infant orientation
discrimination’. J Exp Child Psychol 37: 576–586.
710 Quinn and Bhatt
Csibra, G., G. Davis, M. W. Spratling, and M. H. Johnson (2000). ‘Gamma oscillations and object
processing in the infant brain’. Science 290: 1582–1585.
Elder, J. H. and R. M. Goldberg (2002). ‘Ecological statistics of Gestalt laws from the perceptual
organization of contours’. J Vision 2: 324–353.
Fantz, R. L. (1964). ‘Visual experience in infants: Decreased attention to familiar patterns relative to novel
ones.’ Science, 164: 668–670.
Farran, E. K. (2005). ‘Perceptual grouping in Williams syndrome: evidence for deviant patterns of
performance’. Neuropsychologia 43: 815–822.
Farroni, T., E. Valenza, F. Simion, and C. Umilta (2000). ‘Configural processing at birth: evidence of
perceptual organization’. Perception 29: 355–372.
Frick, J. E., J. Colombo, and J. R. Allen (2000). ‘Temporal sequence of global-local processing in
3-month-old infants’. Infancy 1: 375–386.
Ghim, H. (1990). ‘Evidence for perceptual organization in infants: perception of subjective contours by
young infants’. Infant Behav Dev 13: 221–248.
Ghim, H. R. and P. D. Eimas (1988). ‘Global and local processing in 3- and 4-month-old infants’. Percept
Psychophys 43: 165–171.
Gilchrist, I. D., G. W. Humphreys, M. J. Riddoch, and H. Neumann (1997). ‘Luminance and edge
information in grouping: a study using visual search’. J Exp Psychol: Human Percept Perform 23: 464–480.
Goldstone, R. L. (2003). ‘Learning to perceive while perceiving to learn’. In Perceptual Organization in
Vision: Behavioral and Neural Perspectives, edited by R. Kimchi, M. Behrmann, and C. R. Olson, pp.
223–278 (Mahwah, NJ: Erlbaum).
Hadad, B. and R. Kimchi (2006). ‘Developmental trends in utilizing closure for grouping of shape: Effects
of spatial proximity and collinearity’. Percept Psychophys 68: 1264–1273.
Hadad, B. S., D. Maurer, and T. L. Lewis (2010). ‘The development of contour interpolation’. J Exp Child
Psychol 106: 163–176.
Han, S., G. W. Humphreys, and L. Chen (1999). ‘Uniform connectedness and classical Gestalt principles of
perceptual grouping’. Percept Psychophys 61: 661–674.
Hayden, A., R. S. Bhatt, and P. C. Quinn (2006). ‘Infants’ sensitivity to uniform connectedness as a cue for
perceptual organization’. Psychon Bull Rev 13: 257–271.
Hayden, A., R. S. Bhatt, and P. C. Quinn (2008). ‘Perceptual organization based on illusory regions in
infancy’. Psychon Bull Rev 15: 443–447.
Hayden, A., R. S. Bhatt, and P. C. Quinn (2009). ‘Relations between uniform connectedness, luminance,
and shape similarity as perceptual organizational cues in infancy’. Attention, Percept Psychophys
71: 52–63.
Hebb, D. O. (1949). The Organization of Behavior (New York: Wiley).
Humphreys, G. W. (2003). ‘Binding in vision is a multistage process’. In Perceptual Organization in
Vision: Behavioral and Neural Perspectives, edited by R. Kimchi, M. Behrmann, and C. R. Olson,
pp. 377–399 (Mahwah, NJ: Erlbaum).
Johnson, S. P. (ed.) (2010). Neoconstructivism: the New Science of Cognitive Development (New York: Oxford
University Press).
Johnson, S. P. and R. N. Aslin (1998). ‘Young infants’ perception of illusory contours in dynamic displays’.
Perception 27: 341–353.
Johnson, S. P., J. A. Slemmer, and D. Amso (2004). ‘Where infants look determines how they see: eye
movements and object perception performance in 3-month-olds’. Infancy 6: 185–201.
Kangas, A., N. Zieber, A. Hayden, P. C. Quinn, and R. S. Bhatt (2011). ‘Transfer of associative grouping to
novel perceptual contexts in infancy’. Attention, Percept Psychophys 73: 2657–2667.
Kanizsa, G. (1955). ‘Margini quasi-percettivi in campi con stimolazione omogenea’. Riv Psicologia 49: 7–30.
Kavsek, M. J. (2002). ‘The perception of static subjective contours in infancy’. Child Dev 73: 331–344.
Development of Perceptual Organization in Infancy 711
Kellman, P. J. and E. S. Spelke (1983). ‘Perception of partly occluded objects in infancy’. Cogn Psychol
15: 483–524.
Kimchi, R., B. Hadad, M. Behrmann, and S. Palmer (2005). ‘Microgenesis and ontogenesis of perceptual
organization: evidence from global and local processing of hierarchical patterns’. Psychol Sci
16: 282–290.
Koffka, K. (1935). Principles of Gestalt Psychology (New York: Harcourt, Brace and World).
Köhler, W. (1929). Gestalt Psychology (New York: Horace Liveright).
Kovacs, I. (2000). ‘Human development of perceptual organization’. Vision Res 40: 1301–1310.
Metzger, W. (1936/2006). The Laws of Seeing, translated by L. Spillmann (Cambridge, MA: MIT Press).
Navon, D. (1977). ‘Forest before trees: the precedence of global features in visual perception’. Cogn Psychol
9: 353–383.
Palmer, S. E. (1992). ‘Common region: a new principle of perceptual grouping’. Cogn Psychol 24: 436–447.
Palmer, S. E. and I. Rock (1994). ‘Rethinking perceptual organization: the role of uniform connectedness’.
Psychon Bull Rev 1: 29–55.
Peterson, M. A. (1994). ‘Shape recognition can and does occur before figure-ground organization’. Curr
Direct Psychol Sci 3: 105–111.
Pomerantz, J. R. (1981). ‘Perceptual organization in information processing’. In Perceptual Organization,
edited by M. Kubovy and J. R. Pomerantz, pp. 141–180 (Hillsdale, NJ: Erlbaum).
Pomerantz, J. R., L. C. Sager, and R. J. Stoever (1977). ‘Perception of wholes and of their component
parts: some configural superiority effects’. J Exp Psychol: Human Percept Perform 3: 422–435.
Prinzmetal, W. and W. P. Banks (1977). ‘Good continuation affects visual detection’. Percept Psychophys
21: 389–395.
Quinn, P. C. and R. S. Bhatt (2005a). ‘Good continuation affects discrimination of visual pattern
information in young infants’. Percept Psychophys 67: 1171–1176.
Quinn, P. C. and R. S. Bhatt (2005b). ‘Learning perceptual organization in infancy’. Psychol Sci 16: 511–515.
Quinn, P. C. and R. S. Bhatt (2006). ‘Are some Gestalt principles deployed more readily than others during
early development? The case of lightness versus form similarity’. J Exp Psychol: Human Percept Perform
32: 1221–1230.
Quinn, P. C. and R. S. Bhatt (2009). ‘Transfer and scaffolding of perceptual grouping occurs across
organizing principles in 3- to 7-month-old infants’. Psychol Sci 20: 933–938.
Quinn, P. C. and P. D. Eimas (1986). ‘Pattern–line effects and units of visual processing in infants’. Infant
Behav Dev 9: 57–70.
Quinn, P. C. and P. G. Schyns (2003). ‘What goes up may come down: perceptual process and knowledge
access in the organization of complex visual patterns by young infants’. Cogn Sci 27: 923–935.
Quinn, P. C., S. Burke, and A. Rush (1993). ‘Part–whole perception in early infancy: evidence for
perceptual grouping produced by lightness similarity’. Infant Behav Dev 16: 19–42.
Quinn, P. C., C. R. Brown, and M. L. Streppa (1997). ‘Perceptual organization of complex visual
configurations by young infants’. Infant Behav Dev 20: 35–46.
Quinn, P. C., R. S. Bhatt, D. Brush, A. Grimes, and H. Sharpnack (2002). ‘Development of form similarity
as a Gestalt grouping principle in infancy’. Psychol Sci 13: 320–328.
Quinn, P. C., P. G. Schyns, and R. L. Goldstone (2006). ‘The interplay between perceptual organization and
categorization in the representation of complex visual patterns by young infants’. J Exp Child Psychol
95: 117–127.
Quinn, P. C., R. S. Bhatt, and A. Hayden (2008a). ‘What goes with what? Development of perceptual
grouping in infancy’. In Psychology of Learning and Motivation, Vol. 49, edited by B. H. Ross, pp. 105–
146 (San Diego: Elsevier).
Quinn, P. C., R. S. Bhatt, and A. Hayden (2008b). ‘Young infants readily use proximity to organize visual
pattern information’. Acta Psychol 127: 289–298.
712 Quinn and Bhatt
Rock, I. and S. Palmer (1990). ‘The legacy of Gestalt psychology’. Sci Am 263: 84–90.
Salapatek, P. (1975). ‘Pattern perception in early infancy’. In Infant Perception: From Sensation to
Cognition: Vol. 1 Basic Visual Processes, edited by L. B. Cohen and P. Salapatek, pp. 133–248
(New York: Academic Press).
Scherf, K. S., M. Behrmann, R. Kimchi, and B. Luna (2009). ‘Emergence of global shape processing
continues through adolescence’. Child Dev 80: 162–177.
Schyns, P. G., R. L. Goldstone, and J. P. Thibaut (1998). ‘The development of features in object concepts’.
Behav Brain Sci 21: 1–54.
Spelke, E. S. (1982). ‘Perceptual knowledge of objects in infancy’. In Perspectives on Mental Representation,
edited by J. Mehler, M. Garrett, and E. Walker, pp. 409–430 (Hillsdale, NJ: Erlbaum).
Soska, K. C. and S. P. Johnson (2008). ‘Development of three-dimensional object completion in infancy’.
Child Dev 79: 1230–1236.
Vecera, S. P., E. K. Vogel, and G. F. Woodman (2002). ‘Lower region: a new cue for figure–ground
segregation’. J Exp Psychol: Gen 131: 194–205.
Vrins, S., S. Hunnius, and R. van Lier (2011). ‘Volume completion in 4.5-month-old infants’. Acta Psychol
138: 92–99.
Wagemans, J., J. H. Elder, M. Kubovy, S. E. Palmer, M. A. Peterson, M. Singh, and R. von der Heydt
(2012). ‘A century of Gestalt psychology in visual perception: I. Perceptual grouping and figure-ground
organization’. Psychol Bull 138: 1172–1217.
Wertheimer, M. (1923/1958). ‘Principles of perceptual organization’. In Readings in Perception, edited by
D. C. Beardslee and M. Wertheimer, pp. 115–135 (Princeton, NJ: Van Nostrand). Translated from the
German by M. Wertheimer.
Zuckerman, C. B. and I. Rock (1957). ‘A reappraisal of the roles of past experience and innate organizing
processes in visual perception’. Psychol Bull 54: 269–296.
Chapter 35
global tasks could provide important pointers to common mechanisms or principles of perceptual
organization.
2010). This is not to imply that there is a fundamental discrepancy between object recogni-
tion and perceptual organization (see Biederman 1987; Feldman and Hock, this volume).
Rather this is a question of emphasis: whilst modern research often focuses on how objects
are recognized, Gestalt research focused on how visual input could be organized into distinct
objects. In reality these processes are surely intertwined: recognition influences grouping and
grouping influences recognition (Pelli et al. 2009; Peterson 1994).
Gestalt psychologists, inspired more by the experimental science of physics than the stamp col-
lecting of biology, also focused on identifying universal laws in organizing visual input such as
the minimum or simplicity principle (see van der Helm, this volume). Nevertheless, the stimuli
and paradigms developed within the Gestalt school have motivated some of the most significant
developments in the study of individual differences. One of the most important of these was the
Embedded Figures Test (EFT—Gottschaldt 1926, see Figure 35.1). In the EFT a target element is
(e) Ponzo illusion (f) Block design (g) Collinear contour (h) Proximity dot lattice
Fig. 35.1 (a) Embedded Figures Test: the stimulus on the left has to be identified in the embedded
context to the right; (b) Navon Letters: the local and global letters are illustrated in direct conflict;
(c) Ebbinghaus Illusion: the perceived size of the central dot is influenced by its context; (d) Mooney
Figure and Original: a novel two-tone Mooney image is illustrated on the left based on the original to
the right; (e) Ponzo Illusion: the perceived size of the two dots is influenced by their context; (f) Block
Design: subjects have to replicate a simple pattern with 3-D cubes, critical for the local global literature
is the difference between the standard version on the left and the segmented version on the right; (g)
Collinear Contour: the co-alignment of a string of Gabors creates the impression of a closed shape
(generated using GERT, Demeyer and Machilsen 2012); (h) Proximity Dot Lattice: the slight difference in
spacing between the dots creates the impression of a row of oriented lines.
Reproduced from Lee de-Wit, Stimuli used to study individual differences in local and global perceptual
organization. FigShare. http://dx.doi.org/10.6084/m9.figshare.707082 (c) 2013, The Authors. This work is licensed
under a Creative Commons Attribution 3.0 License.
716 de-Wit and Wagemans
literally embedded (often exploiting a range of grouping cues, including proximity, closure and good
continuation) in a new more complex pattern, and participants have to find this local element in the
more complex whole. Herman Witkin (1962) used performance on this task to help motivate the
constructs of ‘field independent’ (more local) and ‘field dependent’ (more global) processing styles.
Witkin exploited this task not out of an interest in visual perception per se, but rather because he
regarded it as providing a more objective test of what he argued was a general cognitive style.
One of the important strengths of Witkin’s work was that he not only used participants’ embedded
figures score to measure their degree of perceptual bias. He also showed that performance on the EFT
was highly correlated with performance on the ‘rod-and-frame’ test (Witkin and Asch 1948). In this
test the orientation of a central rod has to be judged while that rod is surrounded by a larger oriented
frame. For some observers the judgment of the orientation of the individual rod is heavily influenced
by the surrounding context. Thus, analogous to the EFT, a judgment about a local part has to be made
whilst trying to ignore the influence of a more global whole. Witkin’s work, and in particular the use
of the EFT provided important groundwork for the study of individual differences in numerous con-
texts, from education (Goodenough 1976) to cultural differences (Witkin and Berry 1975), but prob-
ably most significantly for Uta Frith’s work with autism. Frith (1989) theorized that visual perception
in autism was altered in a manner that meant that parts were less likely to be integrated into coherent
wholes. This theory was motivated by the finding that across a number of tasks, including EFT, people
with autism were actually better at extracting or using local information.
The identification of changes in perceptual organization in autism has paralleled research in
schizophrenia. Already in 1952, Matussek postulated that schizophrenia involved an increased
perceptual disembedding of parts from wholes, which was integrally related to the disruption of
feeling meaningfully embedded as an agent in the world. Over the 1990s and 2000s a wide range
of evidence has been accumulated that schizophrenia, but in particular the disorganized symp-
toms of schizophrenia (Uhlhaas et al. 2006b), are associated with a reduction in the ability to use
a range of Gestalt grouping cues to integrate local signals into more global organized percepts
(Kurylo et al. 2007).
Since Witkin’s work, a wide range of stimuli and tasks have emerged as operalizations of local
and global processing. A sample of these stimuli and tasks are illustrated in Figure 35.1. As will
become clear however, these stimuli and tasks can be conceptualized as engaging very different
underlying processes. Some tasks are global in the sense of requiring a comparison of the relation
between two local elements (configural tasks), some tasks involve global judgments that critically
depend on the integration of local elements (Mooney 1957), some illusions test the perception of
a local element when spatially surrounded by contextual elements (Ebbinghaus and Ponzo illu-
sions), some tasks require the detection of a local element when spatially and structurally embed-
ded in a new context (EFT), some tasks attempt to explicitly put local and global responses in
conflict with each other (Navon 1977), other tasks look at the detection of changes in focal objects
in contrast to global scene contexts (Masuda and Nisbett 2006), and so forth. Other tasks do not
require perceptual judgments per se, but involve more complex responses, such that participants
have to draw a complex figure (Complex Figure of Rey) or have to reproduce a global pattern
using individual blocks (Block Design—WAIS). Given this wide range of ‘local global’ tasks it is
perhaps no surprise that the literature in this domain appears somewhat inconsistent, with some
authors reporting clear relationships between different local-global tasks and others finding that
distinct measures seem to be dominated by entirely unrelated sources of individual variance (see,
Construct validity: all that varies is not global, below).
A chapter reviewing local and global paradigms could provide a useful service to the field by
developing a taxonomy of local and global paradigms. Indeed one could argue that until such a
Individual Differences in Local and Global Perceptual Organization 717
clear taxonomy of tasks is defined there is no way to make progress. Often in psychology, however,
if one really wants to study interesting underlying processes, one cannot start from a predefined
concept. The terms local and global obviously have no meaning except with respect to a given
information processing system. From the perspective of perceptual organization, being able to
define the terms local and global in advance would require that we already have a definitive model
or explanation of how visual input is organized. This would be putting the cart before the horse.
The next section will therefore attempt to outline a range of theoretical perspectives that could
at least provide some candidate horses that could be pulling the clusters of correlated individual
differences in local-global tasks. This overview will only use the terms (and tasks pertaining to)
local and global when these have an inherently spatial component. The terms local and global are
sometimes used as synonymous with an assumed levels of processing or levels of abstraction. For
example, view-invariant object recognition may be described as a more global task, whereas rec-
ognizing the orientation of an object might be described as a more local task. In this chapter how-
ever, global tasks pertain only to tasks where a percept integrates visual stimuli (local parts) over
space. This integration is likely to be a recurrent feature at many levels of visual processing. For
example, edges may become integrated into a longer line, this line may be integrated as the border
of a rectangle, this rectangle may be integrated as a part (screen) of a larger object (a laptop), and
that object may in turn be integrated into an (office) scene. The potentially recurrently nature of
local to global integration at different spatial scales may indeed overlap with the extraction of
more and more abstract (or higher-level) representations, but this overlap need not be assumed,
and we do not use it here as a defining feature of local global tasks.
on an association learnt in the past (in one’s life, or over the course of evolution), namely that
these edges co-occur as part of the same line (see Elder and Goldberg 2002, for evidence that col-
linearity is a ‘likely’ feature of our input). The role of likelihood was contrasted with the notion
of simplicity: here interpretations were not based on likely associations but rather the inherent
simplicity of different perceptual interpretations (see Translating Gestalt simplicity into intrinsic
anatomical constraints, below).
The Gestalt focus on emergent properties based on the construct of simplicity may have led to
an unfortunate neglect of the possibility that many Gestalt laws could be learnt on the basis of
associations that are likely in the statistics of co-occurring features in the input to the visual sys-
tem. Assume, for example, that perception does come with certain building blocks (for luminance,
color, motion, simple edges) and that Hebbian learning causes these building blocks to become
associated over time, as neurons that fire together and wire together. Under these assumptions
many Gestalt principles for integrating local signals could emerge from associations that are ‘likely’
in the input to the visual system. A sensitivity to common fate for example could reflect an inter-
nalization of the statistical likelihood that, if one part of a rigid bodied object is moving, so too are
the other parts of that object. A sensitivity to proximity could emerge based on the fact that two
input signals that are spatially close together are more likely to have similar properties than two
signals that are more distant from each other. Even good continuation could result from a statistical
likelihood that in the input the visual system receives: In many real world scenes any edge is more
likely to continue in the same direction than in a different direction (Elder and Goldberg 2002;
Geisler, 2008), an association that a simple process of Hebbian learning could potentially be sensi-
tive to (for a potential implementation of this, see Prödohl et al. 2003).
How could individual differences emerge from this sensitivity to likely associations in the envi-
ronmental input? The primate (and human) visual system appears to be highly flexible in learning
contingencies in visual input (Cox et al. 2005; Li and DiCarlo 2010). Indeed algorithms based on
extracting contingencies in visual input that remain more stable over time result in representa-
tions that are not only useful for object recognition, but which also closely resemble the recep-
tive field sensitivities of early visual areas (Berkes and Wiskott 2005). There are therefore good
reasons to think that the nature of representations in the visual system could be shaped by one’s
experience: Given that individuals live in different environments (see Global priors and/or global
predictions, below), and may have different eye movement strategies to sample input from those
environments (particularly in patient populations), this provides a plausible cause for individual
differences. Critically here, whilst many low-level statistical properties maybe equivalent across
different image contexts, the kinds of associations that might shape the mid-level vision processes
important for local or global biases, are likely to differ. Collinearity is a good case in point, since it
seems logical that urban environments contain more collinearity (straight lines) than rural ones
(though this requires quantifying). If one’s sensitivity to collinearity is shaped by one’s input, then
one might expect that inhabitants of urban environments would show more global (integrated)
percepts (see Caparos et al. 2012 and Personality, mood, and culture, below), particularly in tasks
where collinearity is an important integration cue, like the Embedded Figures Test.
blocks (elementary sensations in this case) based on experience. Gestalt psychologists argued that
different perceptual interpretations were selected, not because they were probable, but because
those interpretations were inherently simpler. Defining what exactly makes a given perceptual
interpretation more ‘simple’ than another is by no means trivial (see van der Helm, this volume).
One way of thinking about this is in terms of the description length of a given perceptual inter-
pretation in a coding language (Chater 1996). Some Gestalt psychologists attempted to explain
simplicity in biophysical terms: They thought that visual stimulation generated electrical fields in
the brain, and that these electrical fields could more easily settle into certain formations based on
a minimization of energy, which determined the perceptual experience of the observer. The exact
biophysical implementation in terms of electrical fields is no longer tenable per se, but it should
inspire us to think about the ways in which intrinsic biophysical constraints could influence per-
ception (see Zucker, this volume).
A useful case in point is the heuristic rule to group input on the basis of proximity: The visual
system may well be organized into retinotopic maps, such that neighboring input will lead to
activation in neighboring neurons, but these neurons are physically separate entities (albeit con-
nected by synapses), and there is no a priori reason to assume that two neurons that are physi-
cally close to each other in the brain are any more likely to communicate or combine input than
two distant neurons. If however one adds some additional constraints, such that neurons that
are closer physically on a retinotopic map in the cortex share more connections, and that later-
ally communicated signals are delayed by the slower conduction rates of non-myelinated neu-
rons, then there are plausible (though not necessarily correct) reasons to think that these intrinsic
architectural constraints could shape how perceptual input is organized such that proximity
becomes a strong grouping cue (though see above for the alternative idea that the strength of local
connectivity could be learnt based on associations in the input). The possibility that such intrinsic
constraints could have a direct impact on local and global biases in perception is borne out by a
study by Schwarzkopf et al. (2011), who demonstrate that sensitivity to a number of contextual
size illusions is correlated with the functionally defined surface area of the primary visual cortex
of each individual. An intrinsic architectural constraint may therefore have a very direct influence
on how visual signals are integrated, and, thus provide a source of variance that could be common
to a number of local and global paradigms. If this correlation is not caused by a common third
process, then we need to further identify how cortical size can be related to perceptual biases. For
example, a smaller V1 could be associated with a greater strength of lateral interactions, which
could in turn follow from the constraint that neural signals take longer to conduct over larger
areas of cortical tissue. In addition, cortical size could also influence the scale over which signals
at one level are pooled to drive signals at subsequent stages, an idea that could be tested by look-
ing at topographic relations between visual field maps (see Heinzle et al. 2011 and Harvey and
Dumoulin 2011).
cortical size may influence such larger scale cortical dynamics). There are a number of sources
of evidence that these larger scale cortical rhythms are associated with more global object per-
ception (Tallon-Baudry and Bertrand 1999) and individual differences in perceptual grouping
in particular (Nikolaev et al. 2010). Indeed, changes in the formation of more long-range corti-
cal oscillations have also been directly linked to changes in grouping sensitivity in schizophre-
nia (Spencer et al. 2003; Uhlhaas et al. 2006a). In autism, there is also evidence for changes in
functional connectivity (Barttfeld et al. 2011), both purely at a neural level when perceiving the
Kanizsa illusion (Brown et al. 2005) and in relation to the perception of behavioral performance
in the detection of Mooney figures (Sun et al. 2012).
There is also causal evidence that the entrainment of cortical rhythms, either via visual stimula-
tion (Elliot and Muller 1998) or TMS (Romei et al. 2011) can directly influence perceptual organi-
zation. Indeed, Romei et al. used the Navon task to show that the entrainment of slower rhythms
(5 Hz) caused more global biases, whilst faster rhythms (20 Hz) induced more local biases. It is
tempting to speculate that slower rhythms facilitate global integration exactly because the global
percepts require the integration of signals separated by larger distances on cortical maps, and thus
require longer times (thus optimizing slower rhythms) to achieve integration.
As a side-point to debates concerning how to describe simplicity, the approach outlined above
focuses on the relative constraints imposed by the biophysical implementation of integrative
information processing. This approach contrasts with the focus on interpreting the Gestalt energy
minimization principle in terms of the length or complexity of description of different visual
interpretations (Chater 1996; see also van der Helm, this volume). We would argue that a modern
revision of the Gestalt principle of simplicity may prove more valuable in understanding percep-
tual organization when framed in terms of the Relative-Simplicity of the biological constraints on
integrated signal processing rather than the Strong-Simplicity implied by (biologically implausi-
ble) coding languages. Indeed, an inherent feature of the Strong-Simplicity approach is that all
coding languages have a common description length (Chater 1996), leaving no immediate scope
to explain individual differences.
in their flexibility, with some being unable to switch to the most appropriate level for a given task.
Finally, it could be that the ability to read out from early areas versus the ability to read out from
higher areas are independent, such that an individual maybe good at accessing information from
early stages, but that is not predictive of whether they are good at computing or accessing informa-
tion from higher stages. This may seem highly speculative, but it actually has important implications
in relation to a debate within the autism literature that enhanced local processing may exist without a
reduction in global perception (central coherence) per se (Mottron et al. 2006). Mottron et al. partly
motivated the idea that people with autism have an enhancement in local processing via demonstra-
tions of greater fMRI activity in sensory processing areas. There is however substantial evidence that
activation in early areas is dependent upon the interpretations formed in higher areas of the brain
(Muckli 2010). Of particular importance here are demonstrations that perceptually organizing input
into a global shape in higher areas can cause a reduction of activation in earlier areas (de-Wit et al.
2012; Fang et al. 2008; Murray et al. 2002). At the level of fMRI therefore it is sometimes not possible
to study representations at one stage of the system independent of how those representations inter-
act with higher stages of the system. Indeed this observation in fMRI is complimented by numerous
behavioral demonstrations of a direct interaction, such that global interpretations directly influence
the accessibility of local information (Chakravarthi and Pelli 2011; He et al. 2012; Poljac et al. 2012;
Sayim et al. 2010). Thus, returning to our discussion regarding how the ‘reading out’ of informa-
tion at different stages of the cortical hierarchy may provide a useful framework for thinking about
how a local or global perceptual bias could arise, this framework also needs to take into account the
dynamic interactions between levels of the hierarchy, which will sometimes mean that the accessibil-
ity of local and global interpretations will be interdependent.
useful for explicitly implementing approaches to perception, they do not provide any inherent
account for where the priors that bias the interpretation of sensory input actually come from (see
also Feldman, this volume). Thus, a Bayesian approach to explaining a global bias (or local in
autism) may ultimately have to operationalize changes to perceptual priors in terms of one of the
other factors outlined above.
to motivate the idea that dyslexia (and autism) is associated with a general magnocellular deficit.
Contrary to expectations however, Goodbourn et al. found no shared variance between these three
measures, despite demonstrating a wide range of variance in their sample, that was stable over suc-
cessive testing sessions. Thus, whilst magnocellular neurons maybe critically needed in order to
perform the three tasks in question, this does not mean that variance in this neuron type provides
a primary (common) source of variance on these tasks. In many ways the Goodbourn et al. study
sets a benchmark standard for what research on individual differences in visual perception should
look like. Firstly, they tested a very large sample (over a thousand participants), and demonstrated
that levels of correlation do not differ for participants with different levels of performance, nor do
the correlations (in this instance) differ for participants with a subsample that had a diagnosis of
dyslexia. They also included a control task (thought to measure a different function) to demon-
strate that correlations with this task are as high as those between the other (‘magnocellar’) tasks.
Last but most definitely not least, they established the test-retest reliability of all of their measures
with a subsample of their participants on a different day, giving a baseline for correlations that can
be expected between tasks based on the consistency of individual differences within each task.
Returning to the critical issue of distinguishing between mechanisms that are critical for a given
task and mechanisms that are a primary source of variance for that task, it is important that future
research focuses not just on individual differences on one task, but focuses on the correlations
across tasks. If one assumes that variance on a given task relates to an underlying process, then it
is important to demonstrate that this task correlates with variance on another task (and the more
dissimilar the better) which is assumed to be dependent upon the same underlying mechanism.
This focus on a common factor underlying variance in multiple tasks would bring us back to the
original formulation of consistent individual biases identified in the correlation between the rod
and frame task and the EFT used by Witkin.
The subsequent literature on local and global biases since Witkin has also revealed some striking
correlations between very different tasks. For example, the difference score in the Block Design
task (specifically between the locally segmented and standard versions) has been found to corre-
late with a number of more basic perceptual tasks, even though the Block Design task requires a
very complicated attentional/saccadic sampling and motor reconstruction process. Indeed, there
is evidence of correlated biases in the ability to respond to local or global properties across differ-
ent modalities (Bouvet et al. 2011). Furthermore, there have been several replications of (at least)
a cluster of correlated tasks in the general population (Grinter et al. 2009; Milne and Szczerbinski
2009) and in patient groups (Bolte et al. 2007; Uhlhaas et al. 2006b). At the same time however,
it is also clear that many tasks operationalized as local or global measures do not do not share a
primary source of common variance (Milne and Szcerbinski 2009).
where a discrimination that could at face value appear to require a global analysis (based on the
integration of multiple spatially separated local signals) can sometimes be solved by picking up on
one local cue. An alternative concern is that whilst the integration of local signals into global ones
is critical to these tasks, it is possible that this aspect of the task is not the most prominent factor
in generating individual variance on these tasks (as already outlined above). The impact of this
problem is likely to be compounded by the fact that tests proposed as measures of local or global
bias have very different tasks demands, and very different output measures (e.g., the drawing of
the Complex Figure of Rey). This problem is also interrelated to the fact that there may not be
sufficient variability in the population selected for variance in a mechanism of interest to manifest
as a clear factor dominating the individual differences (especially if one only recruits Western
undergrads with a psychology major). The validity of this problem is potentially borne out by the
fact that correlations between local-global paradigms are often higher in patient groups—who
presumably have more variance on the continuum of interest (see sections on schizophrenia and
autism). A final possibility however, is that the integration of local signals into global representa-
tions is simply implemented differently for different stimuli and different task demands.
Differentiating between the idea that the brain is just a ‘bag of tricks’ and the idea that common
mechanisms are involved, but become hard to identify because individual variability is denomi-
nated by other factors for a given task, will be a major challenge for research which aims to use
individual differences as a means of unearthing underlying mechanisms. This problem can be
illustrated with two studies already discussed, in which a common mechanisms may seem to
be implied, but a correlation is not found. First, in the study reported earlier by Schwarzkopf
et al. (2011), they find that the size of the primary visual cortex influences the strength of two
contextual illusions, but that the strength of these illusions did not correlate with each other. In a
similarly intriguing example, Caparos et al. (2012) have found that although exposure to an urban
environment influences bias on the Navon task and sensitivity to a contextual illusion, perfor-
mance on these tasks did not correlate. However, interpreting these ‘null’ effects is limited by our
current focus on null hypothesis testing, which only enables one to report if the null hypothesis
can be rejected. In other words, although they highlight that there is absence of evidence for a cor-
relation, they do not actually provide evidence against the existence of a correlation. Hopefully a
greater emphasis on the power needed to find effects (Button et al. 2013) and an increasing adop-
tion of Bayesian statistical techniques will enable studies to more meaningfully quantify support
for, and against, the existence of a correlation.
Clearly, there is more work to be done to establish when different local-global paradigms do and
do not correlate at an individual level. This will require larger scale studies that simultaneously
test many paradigms, and use statistical techniques that accumulate evidence both for and against
the existence of correlations. Ideally, these studies also need to test broad participant samples,
because, as discussed in the next section, it is often within patient samples that one sees clearer
correlations between tests.
Must et al. 2004; Kéri et al. 2005), and common fate (Chen et al. 2003). The reduced sensitivity to
common fate is evidenced via the increase in the number of coherent dots required to recognize
motion in one direction. As will become apparent, this deficit in global motion provides a very
direct parallel to that revealed in autism. Somewhat surprisingly however, direct theoretical or
empirical comparisons between the perceptual organizational differences in autism and schizo-
phrenia are rare, but direct comparisons including both patient groups have found highly compa-
rable changes (Bolte et al. 2007).
One of the reasons for the lack of direct comparisons between autism and schizophrenia may
result from a very explicit attempt by some researchers in schizophrenia to avoid some of the more
clinical or indirect measures of perceptual organization popular in autism research that poten-
tially include too many contributing factors (Kurylo et al. 2007). Interestingly, in their review
Silverstein and Keane (2011) explicitly exclude any discussion of what they call ‘global-local’ tasks
(one assumes they mean the Navon task), because they argue most of the variance induced in
these tasks is caused by attentional processes (which may indeed be valid; see Caparos et al. 2013).
Despite the emphasis on tests that look more directly at the sensitivity to different Gestalt
grouping principles, there are also interesting results in schizophrenia using slightly less con-
strained tests of perceptual organization. Johnson et al. (2005) for example report a clear local bias
in a version of the Navon task, in which they match the salience of the local and global targets such
that there is no ‘global precedence’ for control participants. Perhaps more interestingly, Uhlhaas
et al. (2006b) measure in parallel the ability of patients to detect a contour, group Gabor elements,
recognize a Mooney figure and assess size in contextual illusions. Uhlhaas et al. find a very con-
sistent change in perceptual organization across these very different tasks (towards what could be
described as a local bias). They also make clear that this change is more closely associated with
‘disorganized’ symptoms, although a differential sensitivity to different contextual effects may also
be evident for other symptoms (Yang et al. 2012). Uhlhaas et al. also highlight that performance
in all three of their tasks develops with age, something they raise to motivate a speculation that
the development of the ability to form long-range cortical synchronizations may be critical to all
of these tasks. The importance of long-range cortical synchronization in perceptual organization
in schizophrenia is also supported by work looking at Kanizsa figures (Spencer et al. 2003) and
Mooney figures (Uhlhaas et al. 2006a).
As in autism (see below) there is also some debate regarding whether the perceptual changes
are causal to the broader clinical syndrome. There are certainly interesting parallels between the
reductions in perceptual organization and the less organized world views of Schizophrenic patients
(Uhlhaas and Mishara 2007), and there are certainly reported correlations between perceptual
thresholds for form and motion coherence and deficits in Theory of Mind (Kelemen et al. 2005),
although this was studied with respect to the negative symptoms of schizophrenia. Also of signifi-
cant interest for this chapter, while correlations between different perceptual tasks appear to be
higher amongst Schizophrenics (Uhlhaas et al. 2006b), this does not appear to imply a fundamen-
tally altered mode of perceptual organization. In contrast, there is evidence that the continuum
of symptoms associated with certain aspects of schizophrenia also correlate with impairments
in contour integration and a reduced sensitivity to context illusions for non-clinical participants
scoring high on both the Schizotypy and the Thought Disorder Index (Uhlhaas et al. 2004).
Autism
Numerous reviews on the perceptual abilities of people with autism have concluded that there
are differences, but how consistent these differences are, and how they should be characterized or
Individual Differences in Local and Global Perceptual Organization 727
explained is not yet clear (Dakin and Frith 2005). This section will not attempt to provide an addi-
tional review per se, rather this section will selectively focus on issues that might help to resolve
some of these inconsistencies, or are of general interest to other questions regarding cultural and
developmental differences in perceptual organization.
Frith (1989) initially launched the focus on local and global differences in autism by focusing on
the Block Design test and the EFT. Whilst interesting in themselves, the exact conclusions that one
can draw from these findings regarding perceptual organization are complicated by the multiple
processes undoubtedly recruited in solving these tasks. Organizing perceptual input is certainly
a critical mechanism in these tests, but whether it is the main source of variance in all instances
is questionable. There does seem to be some evidence that these tasks are more closely related
to each other in autism (Bolte et al. 2007) than in the typical population (Pellicano et al. 2005),
which could suggest that the role of perceptual organization becomes more evident when it has a
larger influence on task performance. However, given the likely role of general task solving func-
tions and strategies in tasks such as these, and the known executive function problems in autism,
conclusions from these tasks certainly require careful consideration.
Many researchers have therefore attempted to use different paradigms, and in particular ones
that are more clearly motivated from vision science. One such task which provides some promise
here is the measurement of the threshold required for coherent global motion detection. This has
become one of the most replicated findings in autism (Davis et al. 2006; Milne et al. 2002; Spencer
et al. 2000; Tsermentseli et al. 2008), although the effect is more clearly seen with short presenta-
tion times (Robertson et al. 2012). Interestingly, there is also evidence for a negative correlation
between global motion thresholds and more complex tasks like the EFT for non-clinical samples
who score highly on Autistic traits (Grinter et al. 2009). Milne and Szczerbinski (2009) also find a
negative correlation between a ‘disembedding’ factor (based on performance on the block design
and EFT task) and global motion thresholds in non-clinical sample.
However, the attempt to shift to simpler paradigms has not resolved the debate regarding the
existence and nature of perceptual changes in autism. In this regard it is striking that reviews in
the domain of schizophrenia have come to a much clearer conclusion that there is an impaired
or weakened use of grouping principles in perceptual organization. There are numerous reasons
why the picture may be more complicated in autism, the most obvious being that the nature of the
perceptual changes may be very different. For example, whilst a breakdown in contour integration
is one of the most consistently associated findings with the disorganized symptoms of schizophre-
nia, there are numerous indications that this process is not impaired in autism (Blake et al. 2003;
Del Viva et al. 2006). Another salient difference is that while schizophrenia patients are normally
diagnosed or at least studied in adulthood (or late adolescence), patients with autism are studied
from much younger age. This significantly complicates the interpretation of studies on younger
samples with autism because the processes underlying the integration of local information into a
more global organization are known to continue to develop from childhood into adulthood. This
is well illustrated in a study by Scherf et al. (2008), who demonstrate that a difference between
participants with autism and typically developing children on the Navon task only emerged later
into adolescence as the typically developing children begin to adopt an increasingly global bias.
The role of development could also be important in contextual illusions. An initial study by
Happé (1996) showed a reduced sensitivity to a number of contextual illusions, but Ropar and
Mitchell (2001) did not find evidence for such a difference. This inconsistency is unfortunate in
terms of linking between different strands of research because these are versions of the same illu-
sions that are related to V1 size in adulthood, are biased in different cultures, and reveal weaker
contextual effects in patients with disorganized symptoms of schizophrenia. There is however
728 de-Wit and Wagemans
clear evidence that the sensitivity to these illusions also develops, and that the adult-like sensitiv-
ity is not apparent until later adolescence (Doherty et al. 2010; Káldy and Kovács 2003). To our
knowledge, studies that have compared autism and control samples at older ages have in fact
found evidence for differential sensitivities to these illusions (Bolte et al. 2007, also see Mitchell
et al. 2010), suggesting that participants with autism do perceive these illusions differently, but
that this difference is only clear once the perceptual processes underlying these illusions have
matured. This is not simply a methodological point. It also has important theoretical implications
regarding the causal role of differences in perceptual bias in autism. If the perceptual changes in
autism are more reliably discernible from the typical population only at older ages, then this sug-
gests that, if these perceptual biases are different, they may not have a causal role in generating
the broader syndrome, but rather emerge based on an underlying mechanism that impacts many
domains of processing.
Looking forward
Our aim in this chapter was to provide a global overview of a fragmented literature. Much of the
existing literature focuses only on specific tasks, one patient group, or one theoretical approach
or simply negates individual differences as a valid research tool. This chapter provides somewhat
more room to explore the space of theories, tasks, methods, patients, and populations of interest.
Hopefully, this outline will motivate larger scales of empirical research and will provide a broader
scope with which local and global tasks can be understood both as an intrinsic part of perceptual
organization, but also in terms of its relation to a domain-general challenge in combining local
signals into more abstract global wholes. However, rather than focusing on (premature) conclu-
sions, this final section will focus on some reflections for moving this field of research forward.
the context of individual differences however, we think this is not an optimal choice, because the
test-retest reliability is quite low (Dale and Arnell 2013), it has an unclear relationship to other
measures of local and global bias (Milne and Szczerbinski 2009), and more critically, there are too
many sources of variance contributing to task performance. At the current time, we would regard
an advantage on the EFT task and a reduction in detecting coherent motion as good benchmarks
for a local perceptual bias (especially when used in combination). However, these tasks, and espe-
cially the EFT (White and Saldana 2011), are also not without their problems. Ideally, the field
needs to translate experimental paradigms into broad-scale test batteries that provide continuous
variability for multiple aspects of local to global integration with minimal variation in the execu-
tive task demands.
Acknowledgements
We would like to thank Sander Van de Cruys, Ruth Van der Hallen, Kris Evers, Cees van Leeuwen,
Marlene Behrmann, Sam Schwarzkopf, Karina Linnell, Roeland Verhallen, Pieter Moors, Jonas
Kubilius, Brian Keane, Steve Silverstein, and Peter van der Helm for providing valuable feedback
on a previous version of this chapter. The Navon and Mooney images for Figure 35.1 were pro-
vided by Sander Van de Cruys. This work was supported by long-term structural funding from
the Flemish Government to JW (METH/08/02) and a postdoctoral fellowship from the Research
Foundation—Flanders (FWO) to LdW.
730 de-Wit and Wagemans
References
Arcaro, M. J., McMains, S. A., Singer, B. D., and Kastner, S. (2009). Retinotopic organization
of human ventral visual cortex. The Journal of Neuroscience 29(34): 10638–52. doi:10.1523/
JNEUROSCI.2807-09.2009.
Barttfeld, P., Wicker, B., Cukier, S., Navarta, S., Lew, S., and Sigman, M. (2011). A big-world network
in ASD: Dynamical connectivity analysis reflects a deficit in long-range connections and an excess of
short-range connections. Neuropsychologia 49(2): 254–63. doi:10.1016/j.neuropsychologia.2010.11.024.
Behrmann, M., Avidan, G., Marotta, J. J., and Kimchi, R. (2005). Detailed exploration of face-related
processing in congenital prosopagnosia: 1. Behavioral findings. Journal of Cognitive Neuroscience
17(7): 1130–49. doi:10.1162/0898929054475154.
Berkes, P. and Wiskott, L. (2005). Slow feature analysis yields a rich repertoire of complex cell properties.
Journal of Vision 5(6). doi:10.1167/5.6.9.
Biederman, I. (1987). Recognition-by-components: A theory of human image understanding. Psychological
Review 94(2): 115–47. doi:10.1037/0033-295X.94.2.115
Bölte, S., Holtmann, M., Poustka, F., Scheurich, A., and Schmidt, L. (2007). Gestalt perception and
local-global processing in high-functioning autism. Journal of Autism and Developmental Disorders
37(8): 1493–504. doi:10.1007/s10803-006-0231-x.
Blake, R., Turner, L. M., Smoski, M. J., Pozdol, S. L., & Stone, W. L. (2003). Visual Recognition of
Biological Motion is Impaired in Children With Autism. Psychological Science 14(2): 151–57.
doi:10.1111/1467-9280.01434.
Bouvet, L., Rousset, S., Valdois, S., & Donnadieu, S. (2011). Global precedence effect in audition and
vision: evidence for similar cognitive styles across modalities. Acta Psychologica 138(2): 329–35.
doi:10.1016/j.actpsy.2011.08.004.
Brown, C., Gruber, T., Boucher, J., Rippon, G., and Brock, J. (2005). Gamma abnormalities during
perception of illusory figures in autism. Cortex 41(3): 364–76. doi:10.1016/S0010-9452(08)70273-9.
Busigny, T. and Rossion, B. (2011). Holistic processing impairment can be restricted to faces in acquired
prosopagnosia: Evidence from the global/local Navon effect. Journal of Neuropsychology 5(1): 1–14.
doi:10.1348/174866410X500116.
Busse, L., Ayaz, A., Dhruv, N. T., Katzner, S., Saleem, A. B., Schölvinck, M. L., et al. (2011). The detection
of visual contrast in the behaving mouse. The Journal of Neuroscience 31(31): 11351–61. doi:10.1523/
JNEUROSCI.6689-10.2011.
Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., and Munafò, M. R.
(2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews
Neuroscience 14(5): 365–76. doi:10.1038/nrn3475.
Caparos, S., Ahmed, L., Bremner, A. J., de Fockert, J. W., Linnell, K. J., and Davidoff, J. (2012). Exposure
to an urban environment alters the local bias of a remote culture. Cognition 122(1) : 80–5. doi:10.1016/j.
cognition.2011.08.013.
Caparos, S., Linnell, K. J., Bremner, A. J., Fockert, J. W. de, and Davidoff, J. (2013). Do local and global
perceptual biases tell us anything about local and global selective attention? Psychological Science.
doi:10.1177/0956797612452569.
Chakravarthi, R. and Pelli, D. G. (2011). The same binding in contour integration and crowding. Journal of
Vision 11(8). doi:10.1167/11.8.10.
Chater, N. (1996). Reconciling simplicity and likelihood principles in perceptual organization. Psychological
Review 103(3): 566–81.
Chen, Y., Nakayama, K., Levy, D., Matthysse, S., and Holzman, P. (2003). Processing of global, but not
local, motion direction is deficient in schizophrenia. Schizophrenia Research 61(2-3): 215–27.
Cox, D. D., Meier, P., Oertelt, N., and DiCarlo, J. J. (2005). ‘Breaking’ position-invariant object recognition.
Nature Neuroscience 8(9): 1145–7. doi:10.1038/nn1519.
Individual Differences in Local and Global Perceptual Organization 731
Cronbach, L. (1957). The two disciplines of scientific psychology. American Psychologist 12(11): 671–84.
Dakin, S. and Frith, U. (2005). Vagaries of visual perception in autism. Neuron 48(3): 497–507.
doi:10.1016/j.neuron.2005.10.018.
Dale, G. and Arnell, K. M. (2013). Investigating the stability of and relationships among global/
local processing measures. Attention, Perception and Psychophysics 75(3): 394–406. doi:10.3758/
s13414-012-0416-7.
Davis, R. A. O., Bockbrader, M. A., Murphy, R. R., Hetrick, W. P., and O’Donnell, B. F. (2006). Subjective
perceptual distortions and visual dysfunction in children with autism. Journal of Autism and
Developmental Disorders 36(2): 199–210. doi:10.1007/s10803-005-0055-0.
de-Wit, L. (2013). Stimuli used to study individual differences in local and global perceptual organization.
figshare. doi:10.6084/m9.figshare.707082.
de-Wit, L. H., Kubilius, J., Wagemans, J., and Op de Beeck, H. P. (2012). Bistable Gestalts reduce activity in
the whole of V1, not just the retinotopically predicted parts. Journal of Vision 12(11). doi:10.1167/12.11.12.
Del Viva, M. M., Igliozzi, R., Tancredi, R., and Brizzolara, D. (2006). Spatial and motion integration in
children with autism. Vision Research 46(8-9): 1242–52. doi:10.1016/j.visres.2005.10.018.
Demeyer, M. and Machilsen, B. (2012). The construction of perceptual grouping displays using GERT.
Behavior Research Methods 44(2): 439–46. doi:10.3758/s13428-011-0167-8.
Doherty, M. J., Campbell, N. M., Tsuji, H., and Phillips, W. A. (2010). The Ebbinghaus
illusion deceives adults but not young children. Developmental Science 13(5): 714–21.
doi:10.1111/j.1467-7687.2009.00931.x.
Driver, J. and Mattingley, J. B. (1998). Parietal neglect and visual awareness. Nature Neuroscience
1(1): 17–22. doi:10.1038/217.
Driver, J., Davis, G., Russell, C., Turatto, M., and Freeman, E. (2001). Segmentation, attention and
phenomenal visual objects. Cognition 80(1–2): 61–95.
Duncan, J. (2012). How Intelligence Happens. Yale University Press.
Elder, J. H. and Goldberg, R. M. (2002). Ecological statistics of Gestalt laws for the perceptual organization
of contours. Journal of Vision 2(4): 324–53. doi:10:1167/2.4.5.
Elliott, M. A. and Müller, H. J. (1998). Synchronous information presented in 40-Hz flicker enhances visual
feature binding. Psychological Science 9(4): 277–83. doi:10.1111/1467-9280.00055.
Fang, F., Kersten, D., and Murray, S. O. (2008). Perceptual grouping and inverse fMRI activity patterns in
human visual cortex. Journal of Vision 8(7). doi:10.1167/8.7.2.
Förster, J., & Higgins, E. (2005). How global versus local perception fits regulatory focus. Psychological
Science 16(8): 631–36. doi:10.1111/j.1467-9280.2005.01586.x
Friston, K. (2008). Hierarchical models in the brain. PLoS Computational Biology, 4(11), e1000211.
doi:10.1371/journal.pcbi.1000211.
Frith, U. (1989). Autism: Explaining the enigma. Oxford: Blackwell.
Gasper, K., & Clore, G. L. (2002). Attending to the big picture: mood and global versus local processing of
visual information. Psychological Science 13(1): 34–40.
Geisler, W. S. (2008). Visual perception and the statistical properties of natural scenes. Annual Review of
Psychology 59(1): 167–92. doi:10.1146/annurev.psych.58.110405.085632.
Gilaie-Dotan, S., Kanai, R., Bahrami, B., Rees, G., and Saygin, A. P. (2012). Neuroanatomical
correlates of biological motion detection. Neuropsychologia 51(3): 457–63. doi:10.1016/j.
neuropsychologia.2012.11.027.
Goodbourn, P. T., Bosten, J. M., Hogg, R. E., Bargary, G., Lawrance-Owen, A. J., and Mollon, J. D. (2012).
Do different ‘magnocellular tasks’ probe the same neural substrate? Proceedings of the Royal Society,
B. Biological sciences 279(1745): 4263–71. doi:10.1098/rspb.2012.1430.
Goodenough, D. R. (1976). The role of individual differences in field dependence as a factor in learning and
memory. Psychological Bulletin 83(4): 675–94.
732 de-Wit and Wagemans
Gottschaldt, K. (1926). Über den Einfluß der Erfahrung auf die Wahrnehmung von Figuren. I. Über den
Einfluß gehäufter Einprägung von Figuren auf ihre Sichtbarkeit in umfassenden Konfigurationen
[About the influence of experience on the perception of figures]. Psychologische Forschung 8: 261–317.
Grinter, E. J., Maybery, M. T., Van Beek, P. L., Pellicano, E., Badcock, J. C., and Badcock, D. R. (2009).
Global visual processing and self-rated autistic-like traits. Journal of Autism and Developmental
Disorders 39(9): 1278–90. doi:10.1007/s10803-009-0740-5.
Happé, F. G. (1996). Studying weak central coherence at low levels: children with autism do not succumb
to visual illusions. A research note. Journal of Child Psychology and Psychiatry, and its Allied Disciplines
37(7): 873–7.
Harvey, B. M. and Dumoulin, S. O. (2011). The relationship between cortical magnification factor and
population receptive field size in human visual cortex: Constancies in cortical architecture. The Journal
of Neuroscience 31(38): 13604–12. doi:10.1523/JNEUROSCI.2572-11.2011.
He, D., Kersten, D., and Fang, F. (2012). Opposite modulation of high—and low-level visual aftereffects by
perceptual grouping. Current Biology 22(11): 1040–5. doi:10.1016/j.cub.2012.04.026.
Heinzle, J., Kahnt, T., and Haynes, J.-D. (2011). Topographically specific functional connectivity
between visual field maps in the human brain. NeuroImage 56(3): 1426–36. doi:10.1016/j.
neuroimage.2011.02.077.
Hochstein, S. and Ahissar, M. (2002). View from the top: Hierarchies and reverse hierarchies in the visual
system. Neuron 36(5): 791–804. doi:10.1016/S0896-6273(02)01091-7.
Hubel, D. H. and Wiesel, T. N. (1959). Receptive fields of single neurones in the cat’s striate cortex. The
Journal of Physiology 148(3): 574–91.
Johnson, S. C., Lowery, N., Kohler, C., and Turetsky, B. I. (2005). Global-local visual processing in
schizophrenia: evidence for an early visual processing deficit. Biological Psychiatry 58(12): 937–46.
doi:10.1016/j.biopsych.2005.04.053.
Káldy, Z. and Kovács, I. (2003). Visual context integration is not fully developed in 4-year-old children.
Perception 32(6): 657–66. doi:10.1068/p3473.
Kanai, R. and Rees, G. (2011). The structural basis of inter-individual differences in human behaviour and
cognition. Nature Reviews Neuroscience 12(4): 231–42. doi:10.1038/nrn3000.
Kelemen, O., Erdélyi, R., Pataki, I., Benedek, G., Janka, Z., and Kéri, S. (2005). Theory of Mind and
motion perception in schizophrenia. Neuropsychology 19(4): 494–500. doi:10.1037/0894-4105.19.4.494.
Kéri, S., Kelemen, O., Benedek, G., and Janka, Z. (2005). Lateral interactions in the visual cortex of patients
with schizophrenia and bipolar disorder. Psychological Medicine 35(7): 1043–51.
Kourtzi, Z. and Kanwisher, N. (2001). Representation of perceived object shape by the human lateral
occipital complex. Science 293(5534): 1506–9. doi:10.1126/science.1061133.
Kurylo, D. D., Pasternak, R., Silipo, G., Javitt, D. C., and Butler, P. D. (2007). Perceptual organization
by proximity and similarity in schizophrenia. Schizophrenia Research 95(1-3): 205–14. doi:10.1016/j.
schres.2007.07.001.
Li, N. and DiCarlo, J. J. (2010). Unsupervised natural visual experience rapidly reshapes size-invariant
object representation in inferior temporal cortex. Neuron 67(6): 1062–75. doi:10.1016/j.
neuron.2010.08.029.
Masuda, T. and Nisbett, R. E. (2006). Culture and change blindness. Cognitive Science 30(2): 381–99.
doi:10.1207/s15516709cog0000_63.
Matussek, P. (1952). [Studies on delusional perception. I. Changes of the perceived external world in
incipient primary delusion]. Archiv für Psychiatrie und Nervenkrankheiten, vereinigt mit Zeitschrift für
die gesamte Neurologie und Psychiatrie, 189(4), 279–319; contd.
Milne, E. and Szczerbinski, M. (2009). Global and local perceptual style, field-independence, and central
coherence: An attempt at concept validation. Advances in Cognitive Psychology 5: 1–26. doi:10.2478/
v10053-008-0062-8.
Individual Differences in Local and Global Perceptual Organization 733
Milne, E., Swettenham, J., Hansen, P., Campbell, R., Jeffries, H., and Plaisted, K. (2002). High motion
coherence thresholds in children with autism. Journal of Child Psychology and Psychiatry, and its Allied
Disciplines 43(2): 255–63.
Mitchell, P., Mottron, L., Soulières, I., and Ropar, D. (2010). Susceptibility to the Shepard illusion in
participants with autism: Reduced top-down influences within perception? Autism Research 3(3):
113–19. doi:10.1002/aur.130.
Mooney, C. M. (1957). Age in the development of closure ability in children. Canadian Journal of
Psychology 11(4): 219–26.
Mottron, L., Dawson, M., Soulières, I., Hubert, B., & Burack, J. (2006). Enhanced perceptual functioning
in autism: an update, and eight principles of autistic perception. Journal of Autism and Developmental
Disorders 36(1): 27–43. doi:10.1007/s10803-005-0040-7
Muckli, L. (2010). What are we missing here? Brain imaging evidence for higher cognitive functions in
primary visual cortex V1. International Journal of Imaging Systems Technology 20(2): 131–9. doi:10.1002/
ima.v20:2.
Murray, S. O., Kersten, D., Olshausen, B. A., Schrater, P., and Woods, D. L. (2002). Shape perception
reduces activity in human primary visual cortex. Proceedings of the National Academy of Sciences
99(23): 15164–9. doi:10.1073/pnas.192579399.
Must, A., Janka, Z., Benedek, G., and Kéri, S. (2004). Reduced facilitation effect of collinear flankers on
contrast detection reveals impaired lateral connectivity in the visual cortex of schizophrenia patients.
Neuroscience Letters 357(2): 131–4. doi:10.1016/j.neulet.2003.12.046.
Navon, D. (1977). Forest before trees: The precedence of global features in visual perception. Cognitive
Psychology 9(3): 353–83. doi:10.1016/0010-0285(77)90012-3
Newell, A. (1973). You can’t play 20 questions with nature and win: Projective comments on the papers of this
symposium. In: W. G. Chase (ed.), Visual information processing, pp. 283–308. New York: Academic Press.
Nikolaev, A. R., Gepshtein, S., Gong, P., and van Leeuwen, C. (2010). Duration of coherence intervals in
electrical brain activity in perceptual organization. Cerebral Cortex 20(2): 365–82. doi:10.1093/cercor/bhp107.
Nisbett, R. E., and Miyamoto, Y. (2005). The influence of culture: holistic versus analytic perception. Trends
in Cognitive Sciences 9(10): 467–73. doi:10.1016/j.tics.2005.08.004.
Pelli, D. G., Majaj, N. J., Raizman, N., Christian, C. J., Kim, E., and Palomares, M. C. (2009). Grouping
in object recognition: The role of a Gestalt law in letter identification. Cognitive Neuropsychology 26(1):
36–49. doi:10.1080/13546800802550134.
Pellicano, E., & Burr, D. (2012). When the world becomes “too real”: a Bayesian explanation of autistic
perception. Trends in Cognitive Sciences 16(10): 504–10. doi:10.1016/j.tics.2012.08.009
Pellicano, E., Maybery, M., and Durkin, K. (2005). Central coherence in typically developing
preschoolers: Does it cohere and does it relate to mindreading and executive control?
Journal of Child Psychology and Psychiatry, and its Allied Disciplines 46(5): 533–47.
doi:10.1111/j.1469-7610.2004.00380.x.
Peterson, M. A. (1994). Object recognition processes can and do operate before figure–ground organization.
Current Directions in Psychological Science 3(4): 105–111. doi:10.1111/1467-8721.ep10770552.
Poljac, E., de-Wit, L., and Wagemans, J. (2012). Perceptual wholes can reduce the conscious accessibility of
their parts. Cognition 123(2): 308–12. doi:10.1016/j.cognition.2012.01.001
Prodöhl, C., Würtz, R. P., and von der Malsburg, C. (2003). Learning the Gestalt rule of collinearity from
object motion. Neural Computation 15(8): 1865–96. doi:10.1162/08997660360675071.
Ramachandran, V. S. (1985). Guest editorial: The neurobiology of perception. Perception 14: 127–34.
Rao, R. P. N. and Ballard, D. H. (1999). Predictive coding in the visual cortex: A functional interpretation
of some extra-classical receptive-field effects. Nature Neuroscience 2(1): 79–87. doi:10.1038/4580.
Robertson, C. E., Martin, A., Baker, C. I., & Baron-Cohen, S. (2012). Atypical Integration of Motion
Signals in Autism Spectrum Conditions. PLoS ONE 7(11): e48173. doi:10.1371/journal.pone.0048173
734 de-Wit and Wagemans
Robertson, C. E., Kravitz, D. J., Freyberg, J., Baron-Cohen, S., and Baker, C. I. (2013). Tunnel
vision: Sharper gradient of spatial attention in autism. The Journal of Neuroscience 33(16): 6776–81.
doi:10.1523/JNEUROSCI.5120-12.2013.
Romei, V., Driver, J., Schyns, P. G., and Thut, G. (2011). Rhythmic TMS over parietal cortex links distinct
brain frequencies to global versus local visual processing. Current Biology 21(4): 334–7. doi:10.1016/j.
cub.2011.01.035.
Ropar, D. and Mitchell, P. (2001). Susceptibility to illusions and performance on visuospatial tasks in
individuals with autism. The Journal of Child Psychology and Psychiatry and its Allied Disciplines
42(04): 539–49. doi:10.1017/S002196300100717X.
Sayim, B., Westheimer, G., and Herzog, M. H. (2010). Gestalt factors modulate basic spatial vision.
Psychological Science 21(5): 641–4. doi:10.1177/0956797610368811.
Scherf, K. S., Luna, B., Kimchi, R., Minshew, N., and Behrmann, M. (2008). Missing the big picture: Impaired
development of global shape processing in autism. Autism Research 1(2): 114–29. doi:10.1002/aur.17.
Scholl, B. J., Pylyshyn, Z. W., and Feldman, J. (2001). What is a visual object? Evidence from target
merging in multiple object tracking. Cognition 80(1–2): 159–77.
Schwarzkopf, D. S., Song, C., and Rees, G. (2011). The surface area of human V1 predicts the subjective
experience of object size. Nature Neuroscience 14(1): 28–30. doi:10.1038/nn.2706.
Schwarzkopf, D. S., Robertson, D. J., Song, C., Barnes, G. R., and Rees, G. (2012). The frequency of
visually induced gamma-band oscillations depends on the size of early human visual cortex. The Journal
of Neuroscience 32(4): 1507–12. doi:10.1523/JNEUROSCI.4771-11.2012.
Silverstein, S M, Kovács, I., Corry, R., and Valone, C. (2000). Perceptual organization, the disorganization
syndrome, and context processing in chronic schizophrenia. Schizophrenia Research 43(1): 11–20.
Silverstein, S. M., and Keane, B. P. (2011). Perceptual organization impairment in schizophrenia and
associated brain mechanisms: Review of research from 2005 to 2010. Schizophrenia Bulletin 37(4):
690–9. doi:10.1093/schbul/sbr052.
Spencer, J., O’Brien, J., Riggs, K., Braddick, O., Atkinson, J., and Wattam-Bell, J. (2000). Motion
processing in autism: Evidence for a dorsal stream deficiency. Neuroreport 11(12): 2765–7.
Spencer, K. M., Nestor, P. G., Niznikiewicz, M. A., Salisbury, D. F., Shenton, M. E., and McCarley, R. W.
(2003). Abnormal neural synchrony in schizophrenia. The Journal of Neuroscience 23(19): 7407–11.
Sumner, P., Edden, R. A. E., Bompas, A., Evans, C. J., and Singh, K. D. (2010). More GABA, less
distraction: A neurochemical predictor of motor decision speed. Nature Neuroscience 13(7): 825–7.
doi:10.1038/nn.2559.
Sun, L., Grützner, C., Bölte, S., Wibral, M., Tozman, T., Schlitt, S., . . . Uhlhaas, P. J. (2012).
Impaired gamma-band activity during perceptual organization in adults with Autism Spectrum
Disorders: Evidence for dysfunctional network activity in frontal-posterior cortices. The Journal of
Neuroscience 32(28): 9563–73. doi:10.1523/JNEUROSCI.1073-12.2012.
Tallon-Baudry and Bertrand. (1999). Oscillatory gamma activity in humans and its role in object
representation. Trends in Cognitive Sciences 3(4): 151–62.
Tsermentseli, S., O’Brien, J. M., and Spencer, J. V. (2008). Comparison of form and motion coherence
processing in autistic spectrum disorders and dyslexia. Journal of Autism and Developmental Disorders
38(7): 1201–10. doi:10.1007/s10803-007-0500-3.
Uhlhaas, P. J. and Mishara, A. L. (2007). Perceptual anomalies in schizophrenia: Integrating phenomenology
and cognitive neuroscience. Schizophrenia Bulletin 33(1): 142–56. doi:10.1093/schbul/sbl047.
Uhlhaas, P. J., Linden, D. E. J., Singer, W., Haenschel, C., Lindner, M., Maurer, K., and Rodriguez,
E. (2006a). Dysfunctional long-range coordination of neural activity during Gestalt perception in
schizophrenia. The Journal of Neuroscience 26(31): 8168–75. doi:10.1523/JNEUROSCI.2002-06.2006.
Uhlhaas, P. J., Phillips, W. A., Mitchell, G., and Silverstein, S. M. (2006b). Perceptual grouping in
disorganized schizophrenia. Psychiatry Research 145(2–3): 105–17. doi:10.1016/j.psychres.2005.10.016.
Individual Differences in Local and Global Perceptual Organization 735
Uhlhaas, P. J., Millard, I., Muetzelfeldt, L., Curran, H. V., and Morgan, C. J. A. (2007). Perceptual
organization in ketamine users: Preliminary evidence of deficits on night of drug use but not 3 days
later. Journal of Psychopharmacology 21(3): 347–52. doi:10.1177/0269881107077739.
Uhlhaas P. J., Silverstein S. M., Phillips W. A., Lovell P. G. (2004). Evidence for impaired visual
context processing in schizotypy with thought disorder. Schizophr. Res. 68: 249–260. doi:10.1016/
S0920-9964(03)00184-1.
Van de Cruys, S., de-Wit, L., Evers, K., Boets, B., & Wagemans, J. (2013). Weak priors versus overfitting of
predictions in autism: Reply to Pellicano and Burr (TICS, 2012). I-Perception, 4(2), 95–97. doi:10.1068/
i0580ic
Van Leeuwen, C., and Smit, D. J. A. (2012). Restless brains, wandering minds. In: S. Edelman, T. Fekete,
and N. Zach (eds.): Being in time: Dynamical models of phenomenal awareness. Advances in consciousness
research, pp. 121–47. Amsterdam: John Benjamins PC.
Van Loon, A. M., Knapen, T., Scholte, H. S., St. John-Saaltink, E., Donner, T. H., and Lamme, V. A. F.
(2013). GABA shapes the dynamics of bistable perception. Current Biology (in press). doi:10.1016/j.
cub.2013.03.067.
Vogel, E. K. and Awh, E. (2008). How to exploit diversity for scientific gain using individual
differences to constrain cognitive theory. Current Directions in Psychological Science 17(2): 171–6.
doi:10.1111/j.1467-8721.2008.00569.x.
Wagemans, J., Notebaert, W., and Boucart, M. (1998). Lorazepam but not diazepam impairs identification
of pictures on the basis of specific contour fragments. Psychopharmacology 138(3–4): 326–33.
Wagemans, J., Elder, J. H., Kubovy, M., Palmer, S. E., Peterson, M. A., Singh, M., and von der Heydt, R.
(2012a). A century of Gestalt psychology in visual perception: I. Perceptual grouping and figure-ground
organization. Psychological Bulletin 138(6): 1172–217. doi:10.1037/a0029333.
Wagemans, J., Feldman, J., Gepshtein, S., Kimchi, R., Pomerantz, J. R., van der Helm, P. A., and van
Leeuwen, C. (2012b). A century of Gestalt psychology in visual perception: II. Conceptual and
theoretical foundations. Psychological Bulletin 138(6): 1218–52. doi:10.1037/a0029334.
Wang, R., Li, J., Fang, H., Tian, M., and Liu, J. (2012). Individual differences in holistic processing predict
face recognition ability. Psychological Science 23(2): 169–77. doi:10.1177/0956797611420575.
White, S. J. and Saldaña, D. (2011). Performance of children with autism on the Embedded Figures
Test: A closer look at a popular task. Journal of Autism and Developmental Disorders 41(11): 1565–72.
doi:10.1007/s10803-011-1182-4.
Wichmann, F. A., Drewes, J., Rosas, P., and Gegenfurtner, K. R. (2010). Animal detection in natural
scenes: Critical features revisited. Journal of Vision 10(4). doi:10.1167/10.4.6.
Wilmer, J. B. (2008). How to use individual differences to isolate functional organization, biology, and
utility of visual functions; with illustrative proposals for stereopsis. Spatial Vision 21(6): 561–79.
doi:10.1163/156856808786451408.
Witkin, H. A. (1962). Psychological differentiation: studies of development. New York: Wiley.
Witkin, H. A. and Asch, S. E. (1948). Studies in space orientation: Further experiments on perception of
the upright with displaced visual fields. Journal of Experimental Psychology 38(6): 762–82.
Witkin, H. A. and Berry, J. W. (1975). Psychological differentiation in cross-cultural perspective. Journal of
Cross-Cultural Psychology 6(1): 4–87.
Yang, E., Tadin, D., Glasser, D. M., Hong, S. W., Blake, R., and Park, S. (2012). Visual context processing in
schizophrenia. Clinical Psychological Science. doi:10.1177/2167702612464618.
Yong, E. (2012). Replication studies: Bad copy. Nature 485(7398): 298–300. doi:10.1038/485298a.
Yovel, G. and Kanwisher, N. (2005). The neural basis of the behavioral face-inversion effect. Current Biology
15(24): 2256–62. doi:10.1016/j.cub.2005.10.072.
Zaretskaya, N., Anstis, S., and Bartels, A. (2013). Parietal cortex mediates conscious perception of illusory
Gestalt. The Journal of Neuroscience 33(2): 523–31. doi:10.1523/JNEUROSCI.2905-12.2013.
Chapter 36
1 Introduction
The visual system possesses the remarkable ability to rapidly group elements in a complex visual
environment based on a range of factors first elucidated by the Gestalt psychologists, including
proximity, similarity, and common fate (Wertheimer 1923). However, there is also a competition
for neural representation, given constraints on neuronal tuning and the presence of large recep-
tive fields at higher levels of visual association cortex (Desimone and Duncan 1995). To deal with
the complexity that exists in the environment, there need to be processes which prioritize the
information that is most relevant to on-going behavior. Representing the world efficiently requires
both the selection of a fraction of the information that reaches our senses and the organization of
this information into coherent and meaningful elements.
In this chapter, we discuss the dynamic interplay between (on the one hand) visual, selective
attention and (on the other) perceptual organization, two important processes that allow us to
perceive a seamless, integrated world. In describing this interplay, we will draw on evidence from
neuropsychology, which provides striking examples where (i) perceptual organization appears to
operate despite a patient having a very poor ability to select visual information, and (ii) spatial
attention appears to operate even when perceptual organization is impaired. At least at first sight,
such evidence provides one of the strongest examples of perceptual organization being independ-
ent of visual attention. Whether this is a robust conclusion will be something we will review. In
this chapter, we will predominantly focus on perceptual grouping.
1.1 A neuropsychological
example of the interplay of attention
and perceptual organization
As we shall review, neuropsychology provides many striking examples of the interplay between
attention and perceptual organization. A case described by Alexander Luria in 1959 provides
a good illustration. Luria reported a patient with simultanagnosia after bilateral occipitopari-
etal brain injury—a major impairment in “seeing” more than one object at a time. The patient
was shown two versions of the Star of David, formed by two overlapping triangles. When the
triangles differed in color, the patient only reported a single triangle. However, when triangles
were the same color, the patient immediately perceived the complete star. Similarly, when two
separate shapes were briefly exposed, only one was seen at a time. Nevertheless, when the shapes
were identical, or combined into a single structure through a connecting line, their perception
was facilitated (Luria 1959). This case study demonstrates how perceptual organization (notably
Mutual interplay between perceptual organization and attention 737
grouping by similarity or connectedness) can determine where attention is allocated and which
objects are accessible for explicit report.
The mutual interplay between perceptual grouping and attention can be assessed through dif-
ferent lenses, answering at least three distinct but related questions:
• Can perceptual grouping constrain visual attention, determining which objects will be selected
and be candidates for explicit report?
• Can perceptual grouping occur even without (focused) attention, or does perceptual grouping
fully depend on the availability of attentive resources?
• Can visual attention modulate perceptual grouping, determining how elements are grouped to
form meaningful wholes?
Note that evidence that perceptual grouping constrains attention, and that grouping can oper-
ate without focused attention, can be taken to indicate that attention has no influence on group-
ing. However this would be an incorrect inference, since evidence for grouping without attention
does not necessarily indicate that attention does not modulate grouping under appropriate condi-
tions. This is the conclusion we will come to.
In the next paragraphs, we will first define the concept of “visual attention,” distinguish it from
the concept of “awareness,” and describe the most common attentional neuropsychological defi-
cits after stroke. We will then tackle each of our questions, drawing on evidence from neuropsy-
chological studies in patients with attention deficits, along with evidence from behavioral and
neuroimaging studies in healthy volunteers. We will then outline a framework for the dynamic
modulation of perceptual grouping by attention. In particular, we will argue that perceptual
grouping is weakly constrained by visual attention, but that attention nevertheless can play a role
in dynamically altering the “weighting” of elements in any organized structure, especially under
conditions in which stored knowledge and learning cannot play a major role.
2 Visual attention
2.1 Assigning attentional priorities
Visual attention can be defined as the set of cognitive functions that prioritize visual information
according to our current task goals and expectations. Many models of selective attention posit
that processing resources are allocated to perceptual units on the basis of the dynamically evolv-
ing peak of activity in an “attentional priority map” (e.g., Bays et al. 2010; Bisley and Goldberg
2010; Bundesen 1990; Gillebert et al. 2012; Ipata et al. 2009; Mavritsaki et al. 2011; Ptak 2012;
Vandenberghe and Gillebert 2009; Vandenberghe et al. 2012). The attentional priority map pro-
vides an abstract, topographical representation of the environment in which each object (or loca-
tion) is “weighted” by its sensory characteristics and its current behavioral relevance. At any given
moment in time, attention is directed towards the object (or location) with the highest priority
(e.g., Koch and Ullman 1985; Treisman 1998). These models are strongly based on the concept
of a salience map. The concept of a saliency map was proposed by Itti and Koch (Itti and Koch
2000; Koch and Ullman 1985) to refer to a map which encodes the local conspicuity (physical
“saliency”) in the visual scene. The term priority map however goes beyond this to posit the
joint influence of bottom-up and top-down factors, such as behavioral goals and expectations
(Bisley and Goldberg 2010; Ptak 2012; Vandenberghe and Gillebert 2009). The attentional priority
map is a key concept in the Theory of Visual Attention (TVA) (Bundesen 1990; Bundesen et al.
2005, 2011), a mathematical framework related to the biased competition account (Desimone
and Duncan 1995), which we will return to discuss in detail. Evidence from single-unit studies,
738 Gillebert and Humphreys
functional neuroimaging, and lesion-symptom mapping in patients with brain damage suggests
that attentional priorities are encoded in a network of frontoparietal areas—the so-called dorsal
attention network—which includes the intraparietal sulcus and the frontal eye fields (Bisley and
Goldberg 2010; Corbetta and Shulman 2002; Gillebert et al. 2012; Gillebert et al. 2011; Ptak 2012;
Vandenberghe and Gillebert 2009).
Visual extinction differs from hemispatial neglect as it is usually only detected with brief pres-
entations of at least two competing stimuli (Heilman et al. 1993). Patients with visual extinction
fail to detect a contralesional stimulus only when it is presented together with a competing
ipsilesional stimulus. In the conventional clinical task for extinction in the visual domain, the
patient is presented with either a visibly wiggling finger on the left or the right side, or with two
wiggling fingers concurrently on both sides (Bender 1952; Humphreys et al. 2012). Patients
with visual extinction can detect a single stimulus on either side, but are impaired at detecting
the contralesional stimulus when two stimuli are presented simultaneously on opposite sides.
Visual extinction, primarily associated with damage to the right temporoparietal junction (e.g.,
Chechlacz et al. 2013; Ticini et al. 2010; Vossel et al. 2011), has typically been attributed to
the brain lesion biasing attentional selection, so that less attentional weight is allocated to the
contra- relative to the ipsilesional side of space. The weight assigned to the contralesional side
can be sufficient for a single contralesional item to be detected, but this item then loses any
competition for selection when a competing stimulus appears simultaneously on the ipsile-
sional side (Duncan et al. 1997).
Patients with simultanagnosia, typically induced by bilateral lesions of the occipitoparietal cor-
tex and underlying white matter (Chechlacz et al. 2012), show impaired report of two stimuli
relative to one, are poor at integrating multiple objects in a scene, and at integrating local ele-
ments into a coherent object (Bálint 1909; Rizzo and Vecera 2002). In other words, simultag-
nosic patients are biased towards selecting the local shape representations (unless counteracted by
grouping between local elements) rather than more global stimuli (Shalev et al. 2004).
These deficits of visual attention may be a consequence of damage to or dysfunction of the
attentional priority map (Ptak and Fellrath 2013). For example, patients with hemispatial neglect
may fail to assign attentional priorities to events in the contralesional side of space—resulting in a
competitive advantage for ipsilesional events to be candidates for attentional orienting. In particu-
lar, visual attention deficits in patients with hemispatial neglect may be driven by impairment in
integrating bottom-up and top-down factors to compute attentional priorities (Dombrowe et al.
2012; Ptak and Fellrath 2013).
(Gilchrist et al. 1996), collinearity (Boutsen and Humphreys 2000; Gilchrist et al. 1996; Mattingley
et al. 1997; Pavlovskaya et al. 2007), common shape (Gilchrist et al. 1996; Humphreys 1998; Ptak and
Schnider 2005) and common contrast polarity (Gilchrist et al. 1996; Humphreys 1998).
Mattingley et al. (1997), for example, presented a patient with left-sided extinction with a
sequence of displays, consisting of four circles arranged to form a square (Figure 36.1a). On each
trial, quarter segments were briefly removed from the circles either from the left, from the right,
from both sides, or not at all. The patient’s task was to detect the side of the offsets. When the
segments were configured such that no grouping emerged, bilateral removal of quarter segments
Extinction:
<20% left detections
From which
side where the
segments
removed?
No extinction:
>80% left detections
Time
Baseline
Brightness
Grouping factor
Collinearity
Connectedness
Surroundness
0 10 20 30
Number of two-item responses (/30)
Fig. 36.1 Perceptual grouping and recovery from extinction. (a) Example of a task requiring
discrimination between displays where segments were briefly removed from circles on the left, right,
both sides or on neither side. On bilateral trials, when segments were removed on the outer side
of the circle, extinction occurred. When segments were removed on the inner side of the circle,
inducing a Kanizsa figure, no extinction was observed. Adapted from Mattingley et al. (1997).
(b) Results on a detection task from two-item displays as a function of the grouping among the
contra- and ipsilesional item. The task required the discrimination between displays with no, one, or
two items.
Adapted from Glyn W. Humphreys, Neural representation of objects in space: a dual coding account, Philosophical
Transactions of The Royal Society B: Biological Sciences, 353 (1373), pp. 1341–1351, doi: 10.1098/rstb.1998.0288
Copyright © 1998, The Royal Society.
Mutual interplay between perceptual organization and attention 741
induced extinction: the patient made more errors for offset detections on the left side which were
presented together with right-sided offsets, when compared with unilateral left presentations.
Extinction, however, was less severe when the stimulus configuration could be grouped to form a
Kanizsa square (see also Conci et al. 2009).
Several of these factors were investigated in GK, a patient who suffered bilateral lesions of the
occipitoparietal and parietotemporal region, resulting in Bálint’s syndrome and in extinction of
left-sided targets. Humphreys and colleagues (Gilchrist et al. 1996; Humphreys 1998) presented
GK either with a single stimulus in the left or right visual field, or with two stimuli, one in the
left and one in the right visual field. GK showed recovery from extinction if the elements had: the
same brightness (two white or two black circles), collinear edges (with aligned squares), a con-
necting line (joining circles with opposite contrast polarities), and inside-outside relations (e.g.,
a left-field circle appearing within a surrounding rectangle) (Figure 36.1b). Grouping not only
operated between items presented in the impaired and his “better” visual field, but also when both
items were presented within the impaired visual field.
These data suggest that patients with visual attention deficits can explicitly report the contral-
esional stimulus if perceptual grouping allows it to be processed together with the ipsilesional
stimulus. The benefit of perceptual grouping may result from attentional priorities being assigned
to the perceptual group as a whole, rather than to the items constituting the group, therefore facili-
tating the selection of individual items within the group. In other words, the ability to compute
attentional priority for one item in the display (e.g., the ipsilesional item in extinction) may spread
this attentional priority to the item with which it is grouped.
used together (e.g., bottle pouring underneath a glass; Riddoch et al. 2010; Riddoch et al. 2006;
Riddoch et al. 2002). Several factors appear to contribute to this result. The effect is stronger when
objects are used frequently together, and are correctly positioned for the action (Riddoch et al.
2006), but it is also eliminated if the objects are inverted (Riddoch et al. 2011). Such results suggest
that the familiarity of the action as it is standardly seen (with objects in their usual orientation
for the interaction) is important for grouping the objects for selection. Riddoch et al. (2010) addi-
tionally suggest that it is the implied motion from one object to another which links the objects
together so they are encoded as a single perceptual unit.
complex, are associated with agnosia, an impaired object recognition that cannot be attributed to
visual loss (see chapter by Behrmann and colleagues, this volume, for a discussion of prosopag-
nosia, an impairment of face recognition). In the case of apperceptive agnosia, the percept of the
object is not fully constructed—hence these patients may have deficits in perceptual grouping.
Double dissociations can indeed be found. In contrast to neglect (Schindler et al. 2009), patients
with agnosia can normally orient their attention to the contralesional visual field, but their allo-
cation of attention is not influenced by objects (de-Wit et al. 2009; Vecera and Behrmann 1997).
We conclude that perceptual organization can influence the distribution of attentional weights
through representation in the ventral visual stream rather than in the parietal cortex. Nevertheless,
the setting of spatial attentional weights can be dissociated from such ventral input, in cases of
agnosia (de-Wit et al. 2009; Vecera and Behrmann 1997).
similar response pattern in visual cortex (Martinez et al. 2007; Martinez et al. 2006) and there is neural
activation of unattended items if they share a featural property with an attended item (Saenz et al. 2002).
These studies suggest that attention has a tendency to spread throughout perceptual groups
(Richard et al. 2008). In other words, attending to one element of a perceptual group can cause
attention to spread to other elements of the same perceptual group, and therefore enhancing the
sensory representation of these elements. Inversely, grouping between distracter elements can
facilitate visual search because distracters can be rejected together—a process termed spreading
suppression (e.g., Dent et al. 2011; Donnelly et al. 1991; Duncan and Humphreys 1989; Gilchrist
et al. 1997; Humphreys et al. 1989). Hence, the outcome of perceptual grouping constrains visual
attention.
Not only can attention spread throughout a perceptual group, a good perceptual group can
in itself capture attention (Humphreys and Riddoch 2003; Humphreys et al. 1994; Kimchi et al.
2007; Yeshurun et al. 2009). Kimchi and colleagues (2007) presented participants with displays
containing eight distracters and a target defined from its location relative to a cue. On some trials,
a subset of the elements grouped to form a diamond based on the Gestalt principle of collinear-
ity. Compared to the condition when no perceptual group was present in the display, reaction
times to the target were shorter when the cue appeared within the perceptual group and longer
when the cue occurred outside the perceptual group (Kimchi et al. 2007). Similarly, given two
stimuli, simultagnosic patients tend to perceive the stimulus whose parts group more strongly
(Humphreys et al. 1994), even when the strong group is less complex than the competing weak
group (Humphreys and Riddoch 2003). Furthermore, Humphreys and Riddoch (2003) showed
that attention is drawn to the location of the strong group, facilitating the identification of a sub-
sequently presented letter in that location.
Fig. 36.2 Perceptual grouping without awareness or attention. (a) Example of a display used in the
“inattention paradigm” developed by Mack et al. (1992). Participants were to judge which of the two
arms of the cross was longer. The elements in the background could be grouped by color similarity.
Participants were asked surprise questions about the background grouping. (b,c) Example of a type of
display used by Moore and Egeth (1997). Participants were to judge which of two horizontal lines was
longer, while dots in the background formed displays such as in the Ponzo (b) or Müller-Lyer illusion (c).
Line judgments were influenced by the illusions.
Data from A. Mack, B. Tang, R. Tuma, S. Kahn, and I. Rock, Perceptual organization and attention, Cognitive
Psychology, 24(4), pp. 475–501, 1992.
However, the inability to explicitly report grouping, i.e. not being aware of it, when attention
is engaged in a concurrent demanding task does not necessarily imply that perceptual grouping
in itself requires attention. In studies of patients with blindsight, and also in normal observers
with stimuli presented under masking conditions, there can be enhanced perceptual processing of
stimuli that the observer is unaware of, indicating that attention to the location of an object does
not necessarily imply awareness of that object (Kentridge et al. 1999); awareness can be dissoci-
ated from attention. In addition, limited explicit report/awareness of a stimulus may, for example,
also reflect poor encoding of the item into memory. To counteract this criticism, Moore and Egeth
(1997) used an implicit measure of perceptual grouping: observers were to judge the length of
line segments, presented along with background elements that were entirely task-irrelevant. The
background elements were arranged so that, if perceptually grouped, they could induce optical
illusions, such as the Ponzo illusion (Figure 36.2b) or the Müller-Lyer illusion (Figure 36.2c).
Although observers appeared unaware of the background elements when retrospectively ques-
tioned, arrangement of the elements clearly modulated line length judgments. For example, when
the background pattern could induce the Ponzo illusion (Figure 36.2b), the line that was closer
to the converging end of the background pattern was judged to be longer than the line that was
further away from the converging end. This suggests that perceptual grouping can occur with-
out attention. Several other studies in healthy volunteers and patients with hemispatial neglect
support these findings (Chan and Chua 2003; Kimchi and Razpurker-Apfeld 2004; Lamy et al
2006; Russell and Driver 2005; Shomstein et al. 2010). For example, Shomstein and colleagues
(2010) investigated whether perceptual grouping in the poorly attended (contralesional) visual
field of neglect patients affected performance on stimuli presented in the intact (ipsilesional)
visual field. To assess this, they adapted a paradigm developed by Russell and Driver (2005): they
asked patients with hemispatial neglect to perform a change detection task on complex target
stimuli, successively presented to the ipsilesional hemifield (Figure 36.3a). At the same time, irrel-
evant distracter elements appeared in the contralesional hemifield, either changing or retaining
their perceptual grouping on successive displays. Changes in perceptual grouping of the con-
tralesional distracters produced congruency effects on the attended (ipsilesional) target-change
746 Gillebert and Humphreys
(a) Effect of irrelevant grouping in the contralesional hemifield on change detection in the
ipsilesional hemifield
judgment—for example, the time take to decide that two ipsilesional stimuli differed was speeded
if the grouping relations in the contralesional field changed. The magnitude of the effect was the
same in neglect patients and control participants. Again it appears that perceptual grouping can
take place in the absence of attention allocated to the elements forming the perceptual grouping.
There is converging evidence too from patients with simultanagnosia. Even though normal
participants can show a bias to global hierarchical shapes, rather than to their local constituents
(Navon 1977) (see Figure 36.4a) (see chapter by Kimchi, this volume, for a detailed analysis of the
processing of hierarchical figures), simultagnosic patients tend to show a local bias—they may rec-
ognize the local elements whilst being poor at explicitly reporting the global shape (Huberle and
Karnath 2006; Karnath et al. 2000). However, the same patients can be faster at naming the local
letters when their identity is congruent with the global letter compared to when it is incongruent.
These congruency effects again suggest that, even if the global shape is not available for explicit
report, grouping based on proximity of local elements can still occur in simultagnosic patients.
In line bisection tasks, patients with hemispatial neglect have to indicate the midpoint of a
horizontal line presented on a piece of paper in front of them. Deviation of the estimated mid-
point towards the side of brain damage is typically regarded as being indicative of hemispatial
neglect. Vuilleumier and colleagues (Vuilleumier and Landis 1998; Vuilleumier et al. 2001b)
Mutual interplay between perceptual organization and attention 747
Congruent Incongruent
Illusory contour
Real contour
used Kanizsa-type illusory figures to examine whether patients with neglect would also deviate
from the midpoint when marking the midpoint of illusory contours rather than real contours
(Figure 36.4b). Bisection judgments in neglect patients were similar on Kanizsa stimuli with
illusory contours and connected stimuli with real contours, even though the patients could not
detect the contralateral inducers explicitly. These results suggest that neglect patients can implic-
itly group inducing elements prior to the stage where the attentional bias towards the ipsilesional
side of space arises. Interestingly, patients with lesions extending posteriorly to the lateral occipital
complex did not show this systematic bisection pattern, suggesting that implicit grouping may
depend on the integrity of lateral occipital areas (Vuilleumier et al. 2001b).
Other evidence that perceptual grouping can occur without observers paying attention to
the constituent elements comes from fMRI studies in healthy volunteers. One line of work has
exploited the visual suppression that occurs between simultaneously presented, proximal visual
elements. These competitive interactions appear to occur automatically, without attention, in
early visual cortex (Kastner et al. 1998; Reynolds et al. 1999). McMains and Kastner (2010)
assessed whether the level of competitive interaction induced by task-irrelevant elements varied
as a function of the strength of perceptual grouping between the elements. They found that
competitive interactions in early visual cortex and V4 were reduced when the elements could
be grouped on the basis of the Gestalt principles of collinearity, proximity, or illusory contour
748 Gillebert and Humphreys
formation compared to when the same stimuli could not be grouped, even if these elements
were task-irrelevant and observers performed a concurrent demanding task (McMains and
Kastner 2010).
Whether or not perceptual grouping requires attentive resources may, however, also depend on
the type of perceptual grouping involved (Han et al. 1999; Han et al. 2001; Han et al. 2002; Kimchi
and Razpurker-Apfeld 2004). Kimchi and Razpurker-Apfeld (2004) used Russell and Driver’s par-
adigm (2005) to study different forms of grouping under inattention. On each trial, participants
were presented with two successive displays; each containing a central target matrix surrounded
by task-irrelevant grouped background elements, and individuals performed a demanding change
detection task on the target matrix. Grouping between the background elements stayed the same
or changed across successive displays, independent of any change in the target matrix. Grouping
of columns/rows by color similarity and grouping of shape by homogeneous elements affected
performance on the central change detection task (Figure 36.3b). Grouping of shape by color sim-
ilarity, however, did not result in congruency effects, suggesting that the latter form of grouping is
contingent upon the availability of (sufficient) attentional resources. Whether or not attention is
necessary for grouping to occur, may not be an all-or-none phenomenon. Kimchi and colleagues
(Kimchi and Peterson 2008; Kimchi and Razpurker-Apfeld 2004) proposed that a continuum of
attentional requirements exists as a function of the processes involved in different types of group-
ing. According to this view, grouping of shape by color similarity may be a weaker form of group-
ing requiring more attentional resources.
Other evidence for attention playing a necessary role in grouping is suggested by both brain
imaging and neuropsychological evidence. These studies indicate that damage to posterior pari-
etal cortex, a brain region implicated in attentional control, disrupts grouping (e.g., Zaretskaya
et al. 2013). Global pattern coding, for which local integration processes are not sufficient, also
seem to depend on the integrity of brain areas controlling attention, such as the intraparietal
cortex. Lestou et al. (2014) observed reduced activity to global radial and concentric Glass pat-
terns in structurally preserved intermediate regions such as the lateral occipital complex, after
lesions of the intraparietal cortex. This suggests that the intraparietal cortex plays a critical role in
modulating grouping in regions such as the lateral occipital cortex, which are typically thought
to respond to perceptual groups. Furthermore, perceptual grouping in neglect patients may not
be as efficient in patients compared to healthy volunteers. Han and Humphreys (2007) examined
the role of the frontoparietal cortex in top-down modulation of perceptual grouping by recording
ERPs from two patients with frontoparietal lesions and eight controls. In controls grouping by
proximity and collinearity was indexed by short-latency activity over the medial occipital cortex
and long-latency activity over the occipitoparietal areas. For the patients, however, both the short-
and long-latency activities were eliminated or weakened.
We can conclude from the above studies that some types of perceptual grouping can occur with-
out focused attention, although attentive resources appear to be necessary for the outputs of these
grouping processes to be accessible for explicit report. In contrast, other forms of grouping cannot
be accomplished optimally without focused attention (see also Kimchi 2009). Additional research
is needed to investigate in more detail which forms of grouping require attentional resources.
stimulus are lower when it is flanked by collinear, oriented grating stimuli, but only when the
flankers are attended. In a subsequent study, Freeman and colleagues (2003) showed that the
attentional modulation persists even for high flanker contrasts, suggesting that attention acts by
integration of the local elements into a global form, rather than by changing the local sensitivity
to the flankers themselves. Goldsmith and Yeari (2003) demonstrated that effects of grouping
are found under conditions of divided attention—allowing attention to spread across the vis-
ual field—but that grouping effects are reduced under conditions of focused attention. Effects of
attention have also been observed for higher-level types of grouping. For example, Roberts and
Humphreys (2011) showed that the benefit of positioning pairs of objects for action is reduced by
cueing attention towards one of the objects. Converging evidence has been obtained using fMRI
(Han et al. 2005a) and ERP techniques (Han et al. 2005b) by Han and colleagues showing that
proximity grouping is modulated by whether stimuli fall within an attended region. Furthermore,
de Haan and Rorden (2010) showed that similarity grouping can be modulated by whether or not
the grouping mechanism is relevant for the task.
Other studies (McMains and Kastner 2011) hypothesized that attentional modulation of corti-
cal activity may vary as a function of the degree of perceptual grouping in the display. Participants
were presented either with a strong perceptual group (i.e. an illusory shape), a weak perceptual
group (i.e. an illusory shape with ill-defined borders), or no perceptual group. McMains and
Kastner observed that the amount of attentional modulation on competitive interactions in
early visual cortex depended on the degree of competition left unresolved by bottom-up pro-
cesses: attentional modulation was greatest for displays without perceptual groups—when neural
competition was little influenced by bottom-up mechanisms—and smallest, although still signifi-
cantly present, for displays containing a strong perceptual group. However, when observers paid
attention to the elements forming the perceptual group, competitive interactions were similar
for all levels of perceptual grouping, suggesting that bottom-up and top-down processes interact
dynamically to maximally resolve neural competition.
Acknowledgements
We would like to thank Lee de-Wit and one anonymous reviewer for their valuable feedback on
this chapter. Preparation of this work was supported by an ERC Advanced Investigator Award to
GWH and a Sir Henry Wellcome Fellowship to CRG (grant number 098771/Z/12/Z).
References
Bálint, R. (1909). Seelenlähmung des “Schauens,” optische Ataxie, räumliche Störung der Aufmerksamkeit.
Monatschrift für Psychiatrie und Neurologie 25: 51–81.
Baylis, G. and Driver, J. (1992). Visual parsing and response competition: The effect of grouping factors.
Perception & Psychophysics 51(2): 145–62.
Baylis, G., Driver, J., and Rafal, R. (1993). Visual extinction and stimulus repetition. Journal of Cognitive
Neuroscience 5(4): 453–66.
Bays, P., Singh-Curry, V., Gorgoraptis, N., Driver, J., and Husain, M. (2010). Integration of goal- and
stimulus-related visual signals revealed by damage to human parietal cortex. The Journal of Neuroscience
30(17): 5968–78.
Behrmann, M. and Tipper, S. P. (1994). Object-based attentional mechanisms: Evidence from patients with
unilateral neglect. In: C. Umilta and M. Moscovitch (eds.), Attention and Performance XV: Conscious
and Nonsconscious Processing and Cognitive Functioning, pp. 351–75. Cambridge: MIT Press.
Behrmann, M., Moscovitch, M., Black, S., and Mozer, M. (1990). Perceptual and conceptual mechanisms
in neglect dyslexia: Two contrasting case studies. Brain 113(4): 1163–83.
Behrmann, M., Zemel, R., and Mozer, M. (1998). Object-based attention and occlusion: Evidence from
normal participants and a computational model. Journal of Experimental Psychology. Human Perception
and Performance 24(4): 1011–36.
Ben-Av, M., Sagi, D., and Braun, J. (1992). Visual attention and perceptual grouping. Perception &
Psychophysics 52(3): 277–94.
Bender, M. B. (1952). Disorders in Perception. Springfield: Thomas Publisher.
Berti, A., Allport, A., Driver, J., Dienes, Z., Oxbury, J., and Oxbury, S. (1992). Levels of processing for
visual stimuli in an “extinguished” field. Neuropsychologia 30(5): 403–15.
Bisley, J. and Goldberg, M. (2010). Attention, intention, and priority in the parietal lobe. Annual Review of
Neuroscience 33: 1–21.
Boutsen, L. and Humphreys, G. (2000). Axis-based grouping reduces visual extinction. Neuropsychologia
38(6): 896–905.
Braet, W. and Humphreys, G. (2006). The “special effect” of case mixing on word
identification: Neuropsychological and transcranial magnetic stimulation studies dissociating case
mixing from contrast reduction. Journal of Cognitive Neuroscience 18(10): 1666–75.
Brunn, J. and Farah, M. (1991). The relation between spatial attention and reading: Evidence from the
neglect syndrome. Cognitive Neuropsychology 8(1): 59–75.
Bundesen, C. (1990). A theory of visual attention. Psychological Review 97(4): 523–47.
Bundesen, C., Habekost, T., and Kyllingsbæk, S. (2005). A neural theory of visual attention: Bridging
cognition and neurophysiology. Psychological Review 112(2): 291–328.
Bundesen, C., Habekost, T., and Kyllingsbæk, S. (2011). A neural theory of visual attention and short-term
memory (NTVA). Neuropsychologia 49(6): 1446–57.
Buxbaum, L. J. and Coslett, H. B. (1994). Neglect of chimeric figures: Two halves are better than a whole.
Neuropsychologia 32(3): 275–88.
Casco, C., Grieco, A., Campana, G., Corvino, M., and Caputo, G. (2005). Attention modulates
psychophysical and electrophysiological response to visual texture segmentation in humans. Vision
Research 45(18): 2384–96.
Mutual interplay between perceptual organization and attention 751
Chan, W. and Chua, F. (2003). Grouping with and without attention. Psychonomic Bulletin & Review
10(4): 932–8.
Chechlacz, M., Rotshtein, P., Bickerton, W. L., Hansen, P. C., Deb, S., and Humphreys, G. W. (2010).
Separating neural correlates of allocentric and egocentric neglect: Distinct cortical sites and common
white matter disconnections. Cognitive Neuropsychology 27(3): 277–303.
Chechlacz, M., Rotshtein, P., Hansen, P. C., Riddoch, J. M., Deb, S., and Humphreys, G. W. (2012). The
neural underpinings of simultanagnosia: Disconnecting the visuospatial attention network. Journal of
Cognitive Neuroscience 24(3): 718–35.
Chechlacz, M., Rotshtein, P., Hansen, P. C., Deb, S., Riddoch, M. J., and Humphreys, G. W. (2013). The
central role of the temporo-parietal junction and the superior longitudinal fasciculus in supporting
multi-item competition: Evidence from lesion-symptom mapping of extinction. Cortex 49(2): 487–506.
Conci, M., Böbel, E., Matthias, E., Keller, I., Müller, H., and Finke, K. (2009). Preattentive surface and
contour grouping in Kanizsa figures: Evidence from parietal extinction. Neuropsychologia 47(3): 726–32.
Corbetta, M. and Shulman, G. L. (2002). Control of goal-directed and stimulus-driven attention in the
brain. Nature Reviews Neuroscience 3(3): 201–15.
Coslett, H., and Saffran, E. (1991). Simultanagnosia: To see but not two see. Brain 114(4): 1523–45.
de-Wit, L. H., Kentridge, R. W., and Milner, A. D. (2009). Object-based attention and visual area LO.
Neuropsychologia 47(6): 1483–90.
de Haan, B. and Rorden, C. (2010). Similarity grouping and repetition blindness are both influenced by
attention. Frontiers in Human Neuroscience 4: 20.
Dent, K., Humphreys, G. W., and Braithwaite, J. J. (2011). Spreading suppression and the guidance of search by
movement: Evidence from negative color carry-over effects. Psychonomic Bulletin & Review 18(4): 690–6.
Desimone, R. and Duncan, J. (1995). Neural mechanisms of selective visual attention. Annual Review of
Neuroscience 18: 193–222.
Dombrowe, I., Donk, M., Wright, H., Olivers, C. N., and Humphreys, G. W. (2012). The contribution
of stimulus-driven and goal-driven mechanisms to feature-based selection in patients with spatial
attention deficits. Cognitive Neuropsychology 29(3): 249–74.
Donnelly, N., Humphreys, G. W., and Riddoch, M. J. (1991). Parallel computation of primitive shape
descriptions. Journal of Experimental Psychology. Human Perception and Performance 17(2): 561–70.
Driver, J. and Baylis, G. (1989). Movement and visual attention: The spotlight metaphor breaks down.
Journal of Experimental Psychology. Human Perception and Performance 15(3): 448–56.
Driver, J. and Halligan, P. (1991). Can visual neglect operate in object-centred co-ordinates? An affirmative
single-case study. Cognitive Neuropsychology 8(6): 475–96.
Driver, J., Mattingley, J., Rorden, C., and Davis, G. (1997). Extinction as a pardigm measure of attentional
bias and restricted capacity following brain injury. In: P. Thier and H. O. Karnath (eds.), Parietal Lobe
Contributions to Orientation in 3D Space, pp. 401–29. Heidelberg: Springer-Verlag.
Duncan, J. (1984). Selective attention and the organization of visual information. Journal of Experimental
Psychology. General 113(4): 501–17.
Duncan, J. and Humphreys, G. W. (1989). Visual search and stimulus similarity. Psychological Review
96(3): 433–58.
Duncan, J., Humphreys, G., and Ward, R. (1997). Competitive brain activity in visual attention. Current
Opinion in Neurobiology 7(2): 255–61.
Egly, R., Driver, J., and Rafal, R. (1994). Shifting visual attention between objects and locations: Evidence
from normal and parietal lesion subjects. Journal of Experimental Psychology. General 123(2): 161–77.
Freeman, E., Sagi, D., and Driver, J. (2001). Lateral interactions between targets and flankers in low-level
vision depend on attention to the flankers. Nature Neuroscience 4(10): 1032–6.
Freeman, E., Driver, J., Sagi, D., and Zhaoping, L. (2003). Top-down modulation of lateral interactions
in early vision: Does attention affect integration of the whole or just perception of the parts? Current
Biology 13(11): 985–9.
752 Gillebert and Humphreys
Geng, J. and Behrmann, M. (2006). Competition between simultaneous stimuli modulated by location
probability in hemispatial neglect. Neuropsychologia 44(7): 1050–60.
Gilchrist, I., Humphreys, G. W., and Riddoch, M. (1996). Grouping and extinction: Evidence for low-level
modulation of visual selection. Cognitive Neuropsychology 13(8): 1223–49.
Gilchrist, I., Humphreys, G. W., Riddoch, M., and Neumann, H. (1997). Luminance and edge information
in grouping: A study using visual search. Journal of Experimental Psychology. Human Perception and
Performance 23(2): 464–80.
Gillebert, C. R., Mantini, D., Thijs, V., Sunaert, S., Dupont, P., and Vandenberghe, R. (2011). Lesion
evidence for the critical role of the intraparietal sulcus in spatial attention. Brain 134: 1694–709.
Gillebert, C. R., Dyrholm, M., Vangkilde, S., Kyllingsbæk, S., Peeters, R., and Vandenberghe, R.
(2012). Attentional priorities and access to short-term memory: Parietal interactions. NeuroImage
62(3): 1551–62.
Golay, L., Schnider, A., and Ptak, R. (2008). Cortical and subcortical anatomy of chronic spatial neglect
following vascular damage. Behavioral and Brain Functions 4: 43.
Goldsmith, M. and Yeari, M. (2003). Modulation of object-based attention by spatial focus under
endogenous and exogenous orienting. Journal of Experimental Psychology. Human Perception and
Performance 29(5): 897–918.
Green, C., and Hummel, J. (2006). Familiar interacting object pairs are perceptually grouped. Journal of
Experimental Psychology. Human Perception and Performance 32(5): 1107–19.
Han, S. and Humphreys, G. (2007). The fronto-parietal network and top-down modulation of perceptual
grouping. Neurocase 13(4): 278–89.
Han, S., Humphreys, G. W., and Chen, L. (1999). Parallel and competitive processes in hierarchical
analysis: Perceptual grouping and encoding of closure. Journal of Experimental Psychology. Human
Perception and Performance 25(5): 1411–32.
Han, S., Song, Y., Ding, Y., Yund, E., and Woods, D. (2001). Neural substrates for visual perceptual
grouping in humans. Psychophysiology 38(6): 926–35.
Han, S., Ding, Y., and Song, Y. (2002). Neural mechanisms of perceptual grouping in humans as revealed
by high density event related potentials. Neuroscience Letters 319(1): 29–32.
Han, S., Jiang, Y., Mao, L., Humphreys, G. W., and Gu, H. (2005a). Attentional modulation of perceptual
grouping in human visual cortex: Functional MRI studies. Human Brain Mapping 25(4): 424–32.
Han, S., Jiang, Y., Mao, L., Humphreys, G. W., and Qin, J. (2005b). Attentional modulation of perceptual
grouping in human visual cortex: ERP studies. Human Brain Mapping 26(3): 199–209.
Harms, L. and Bundesen, C. (1983). Color segregation and selective attention in a nonsearch task.
Perception & Psychophysics 33(1): 11–19.
Heilman, K., Watson, R., and Valenstein, E. (1993). Neglect and related disorders. In: K. Heilman and
E. Valenstein (eds.), Clinical Neuropsychology, pp. 279–336. New York: Oxford University Press.
Heinke, D. and Humphreys, G. W. (2003). Attention, spatial representation, and visual neglect: Simulating
emergent attention and spatial memory in the selective attention for identification model (SAIM).
Psychological Review 110(1): 29–87.
Hillis, A. E., Newhart, M., Heidler, J., Barker, P. B., Herskovits, E. H., and Degaonkar, M. (2005).
Anatomy of spatial attention: Insights from perfusion imaging and hemispatial neglect in acute stroke.
The Journal of Neuroscience 25(12): 3161–7.
Howe, P., Incledon, N., and Little, D. (2012). Can attention be confined to just part of a moving object?
Revisiting target-distractor merging in multiple object tracking. PloS One 7(7): e41491.
Huberle, E. and Karnath, H. (2006). Global shape recognition is modulated by the spatial distance of local
elements—Evidence from simultanagnosia. Neuropsychologia 44: 905–11.
Humphreys, G. W. (1998). Neural representation of objects in space: A dual coding account. Philosophical
Transactions of the Royal Society B: Biological Sciences 353(1373): 1341–51.
Mutual interplay between perceptual organization and attention 753
Koch, C. and Ullman, S. (1985). Shifts in selective visual attention: Towards the underlying neural circuitry.
Human Neurobiology 4(4): 219–27.
Kramer, A. and Jacobson, A. (1991). Perceptual organization and focused attention: the role of objects and
proximity in visual processing. Perception & Psychophysics, 50(3): 267–84.
Kumada, T. and Humphreys, G. (2001). Lexical recovery from extinction: Interactions between visual form
and stored knowledge modulate visual selection. Cognitive Neuropsychology 18(5): 465–78.
Lestou, V., Lam, J.M., Humphreys, K., Kourtzi, Z., and Humphreys, G.W. (2014). A dorsal visual route
necessary for global form perception: evidence from neuropsychological fMRI. Journal of Cognitive
Neuroscience, 26(3): 621–34.
Lamy, D., Segal, H., and Ruderman, L. (2006). Grouping does not require attention. Perception &
Psychophysics 68(1): 17–31.
Lavie, N. and Driver, J. (1996). On the spatial extent of attention in object-based visual selection. Perception
& Psychophysics 58(8): 1238–51.
Lestou, V. Lam, J.M., Humphreys, K., Kourtzi, Z., Humphreys, G.W. (2014). A dorsal visual route
necessary for global form perception: evidence from neuropsychological fMRI. Journal of Cognitive
Neuroscience, 26(3), 621–34.
Luria, A. (1959). Disorders of “simultaneous perception” in a case of bilateral occpitoparietal brain injury.
Brain 82: 437–49.
Mack, A. and Rock, I. (1998). Inattentional Blindness. Cambridge: MIT Press.
Mack, A., Tang, B., Tuma, R., Kahn, S., and Rock, I. (1992). Perceptual organization and attention.
Cognitive Psychology 24(4): 475–501.
Malhotra, P. A., Soto, D., Li, K., and Russell, C. (2013). Reward modulates spatial neglect. Journal of
Neurology Neurosurgery and Psychiatry 84(4): 366–9.
Marr, D. (1982). Vision. San Francisco: W. H. Freeman and Co.
Martinez, A., Teder-Salejarvi, W., and Hillyard, S. A. (2007). Spatial attention facilitates selection of
illusory objects: evidence from event-related brain potentials. Brain Research 1139: 143–52.
Martinez, A., Teder-Salejarvi, W., Vazquez, M., Molholm, S., Foxe, J. J., Javitt, D. C., et al. (2006). Objects
are highlighted by spatial attention. Journal of Cognitive Neuroscience 18(2): 298–310.
Mattingley, J., Davis, G., and Driver, J. (1997). Preattentive filling-in of visual surfaces in parietal
extinction. Science 275(5300): 671–4.
Mavritsaki, E., Heinke, D., Allen, H., Deco, G., and Humphreys, G. W. (2011). Bridging the gap between
physiology and behavior: evidence from the sSoTS model of human visual attention. Psychological
Review 118(1): 3–41.
McMains, S. and Kastner, S. (2010). Defining the units of competition: Influences of perceptual
organization on competitive interactions in human visual cortex. Journal of Cognitive Neuroscience
22(11): 2417–26.
McMains, S. and Kastner, S. (2011). Interactions of top-down and bottom-up mechanisms in human visual
cortex. The Journal of Neuroscience 31(2): 587–97.
Mesulam, M. M. (2000). Attentional networks, confusional states, and neglect syndromes. In: M.
M. Mesulam (ed.), Principles of Behavioral and Cognitive Neurology, 2nd edn., pp. 174–256.
New York: Oxford University Press.
Moore, C. and Egeth, H. (1997). Perception without attention: Evidence of grouping under conditions of
inattention. Journal of Experimental Psychology. Human Perception and Performance 23(2): 339–52.
Moore, C., Yantis, S., and Vaughan, B. (1998). Object-based visual selection: Evidence from perceptual
completion. Psychological Science 9(2): 104–10.
Navon, D. (1977). Forest before trees: The precedence of global features in visual perception. Cognitive
Psychology 9(3): 353–83.
Neisser, U. (1967). Cognitive Psychology. New York: Appleton-Century-Crofts.
Mutual interplay between perceptual organization and attention 755
Norman, L. J., Heywood, C. A., and Kentridge, R. W. (2013). Object-based attention without awareness.
Psychological Science, 24(6): 836–43.
Pavlovskaya, M., Soroker, N., and Bonneh, Y. (2007). Extinction is not a natural consequence of unilateral
spatial neglect: evidence from contrast detection experiments. Neuroscience Letters 420(3): 240–4.
Posner, M. I. (1994). Attention: The mechanisms of consciousness. Proceedings of the National Academy of
Sciences of the United States of America 91(16): 7398–403.
Prinz, J. J. (2011). Is attention necessary and sufficient for consciousness? In: C. Mole, D. Smithies, and
W. Wu (eds.), Attention: Philosophical and Psychological Essays, pp. 174–203. Oxford: Oxford University
Press.
Ptak, R. (2012). The frontoparietal attention network of the human brain: action, saliency, and a priority
map of the environment. The Neuroscientist 18(5): 502–15.
Ptak, R. and Fellrath, J. (2013). Spatial neglect and the neural coding of attentional priority. Neuroscience
and Biobehavioral Reviews 37(4): 705–22.
Ptak, R. and Schnider, A. (2005). Visual extinction of similar and dissimilar stimuli: Evidence for
level-dependent attentional competition. Cognitive Neuropsychology, 22(1): 111–27.
Ptak, R., Valenza, N., and Schnider, A. (2002). Expectation-based attentional modulation of visual
extinction in spatial neglect. Neuropsychologia 40(13): 2199–205.
Pylyshyn, Z. and Storm, R. (1988). Tracking multiple independent targets: Evidence for a parallel tracking
mechanism. Spatial Vision 3(3): 179–97.
Reynolds, J. H., Chelazzi, L., and Desimone, R. (1999). Competitive mechanisms subserve attention in
macaque areas V2 and V4. The Journal of Neuroscience 19(5): 1736–53.
Richard, A. M., Lee, H., and Vecera, S. P. (2008). Attentional spreading in object-based attention. Journal of
Experimental Psychology. Human Perception and Performance 34(4): 842–53.
Riddoch, M. and Humphreys, G. (1983). The effect of cueing on unilateral neglect. Neuropsychologia
21(6): 589–99.
Riddoch, M., Humphreys, G., Cleton, P., and Fery, P. (1990). Interaction of attentional and lexical
processes in neglect dyslexia. Cognitive Neuropsychology 7(5–6): 479–517.
Riddoch, M., Humphreys, G. W., Edwards, S., Baker, T., and Willson, K. (2002). Seeing the action:
Neuropsychological evidence for action-based effects on object selection. Nature Neuroscience 6(1): 82–9.
Riddoch, M., Humphreys, G., Hickman, M., Clift, J., Daly, A., and Colin, J. (2006). I can see what you are
doing: Action familiarity and affordance promote recovery from extinction. Cognitive Neuropsychology
23(4): 583–605.
Riddoch, M., Bodley Scott, S., and Humphreys, G. (2010). No direction home: Extinction is affected by
implicit motion. Cortex 46(5): 678–84.
Riddoch, M., Pippard, B., Booth, L., Rickell, J., Summers, J., Brownson, A., et al. (2011). Effects of
action relations on the configural coding between objects. Journal of Experimental Psychology. Human
Perception and Performance 37(2): 580–7.
Rizzo, M. and Vecera, S. P. (2002). Psychoanatomical substrates of Bálint’s syndrome. Journal of Neurology,
Neurosurgery, and Psychiatry 72(2): 162–78.
Roberts, K. and Humphreys, G. W. (2011). Action relations facilitate the identification of briefly-presented
objects. Attention, Perception & Psychophysics 73(2): 597–612.
Rock, I., Linnett, C., Grant, P., and Mack, A. (1992). Perception without attention: Results of a new
method. Cognitive Psychology 24(4): 502–34.
Rossetti, Y., Rode, G., Pisella, L., Farné, A., Li, L., Boisson, D., et al. (1998). Prism adaptation to a
rightward optical deviation rehabilitates left hemispatial neglect. Nature 395(6698): 166–9.
Russell, C. and Driver, J. (2005). New indirect measures of “inattentive” visual grouping in a
change-detection task. Perception & Psychophysics 67(4): 606–23.
756 Gillebert and Humphreys
Saenz, M., Buracas, G. T., and Boynton, G. M. (2002). Global effects of feature-based attention in human
visual cortex. Nature Neuroscience 5(7): 631–2.
Schindler, I., McIntosh, R. D., Cassidy, T. P., Birchall, D., Benson, V., Ietswaart, M., et al. (2009). The
disengage deficit in hemispatial neglect is restricted to between-object shifts and is abolished by prism
adaptation. Experimental Brain Research 192(3): 499–510.
Scholl, B., Pylyshyn, Z., and Feldman, J. (2001). What is a visual object? Evidence from target merging in
multiple object tracking. Cognition 80(1–2): 159–77.
Seron, X., Coyette, F., and Bruyer, R. (1989). Ipsilateral influences on contralateral processing in neglect
patients. Cognitive Neuropsychology 6(5): 475–98.
Shalev, L., Humphreys, G. W., and Mevorach, C. (2004). Global processing of compound letters in a patient
with Bálint’s syndrome. Cognitive Neuropsychology 22(6): 737–51.
Shomstein, S., Kimchi, R., Hammer, M., and Behrmann, M. (2010). Perceptual grouping operates
independently of attentional selection: evidence from hemispatial neglect. Attention, Perception &
Psychophysics 72(3): 607–18.
Sieroff, E., Pollatsek, A., and Posner, M. (1988). Recognition of visual letter strings following injury to the
posterior visual spatial attention system. Cognitive Neuropsychology 5(4): 427–49.
Stone, S., Halligan, P., and Greenwood, R. (1993). The incidence of neglect phenomena and related
disorders in patients with an acute right or left hemisphere stroke. Age and Ageing 22(1): 46–52.
Tian, Y. H., Huang, Y., Zhou, K., Humphreys, G. W., Riddoch, M. J., and Wang, K. (2011). When
connectedness increases hemispatial neglect. PloS One 6(9): e24760.
Ticini, L. F., de Haan, B., Klose, U., Nagele, T., and Karnath, H. O. (2010). The role of temporo-parietal
cortex in subcortical visual extinction. Journal of Cognitive Neuroscience 22(9): 2141–50.
Tipper, S. P. and Behrmann, M. (1996). Object-centered not scene-based visual neglect. Journal of
Experimental Psychology. Human Perception and Performance 22(5): 1261–78.
Treisman, A. (1982). Perceptual grouping and attention in visual search for features and for objects. Journal
of Experimental Psychology. Human Perception and Performance 8(2): 194–214.
Treisman, A. (1998). Feature binding, attention and object perception. Philosophical Transactions of the
Royal Society B: Biological Sciences 353(1373): 1295–306.
Vallar, G. and Perani, D. (1986). The anatomy of unilateral neglect after right-hemisphere stroke lesions.
A clinical/CT-scan correlation study in man. Neuropsychologia 24(5): 609–22.
Vandenberghe, R. and Gillebert, C. R. (2009). Parcellation of parietal cortex: Convergence between
lesion-symptom mapping and mapping of the intact functioning brain. Behavioural Brain Research
199(2): 171–82.
Vandenberghe, R., Molenberghs, P., and Gillebert, C. R. (2012). Spatial attention deficits in humans: The
critical role of superior compared to inferior parietal lesions. Neuropsychologia 50(6): 1092–103.
Vecera, S. and Behrmann, M. (1997). Spatial attention does not require preattentive grouping.
Neuropsychology 11(1): 30–43.
Vecera, S. and Farah, M. (1994). Does visual attention select objects or locations? Journal of Experimental
Psychology. General 123(2): 146–60.
Verdon, V., Schwartz, S., Lovblad, K. O., Hauert, C. A., and Vuilleumier, P. (2010). Neuroanatomy
of hemispatial neglect and its functional components: A study using voxel-based lesion-symptom
mapping. Brain 133(3): 880–94.
Vossel, S., Eschenbeck, P., Weiss, P. H., Weidner, R., Saliger, J., Karbe, H., et al. (2011). Visual extinction
in relation to visuospatial neglect after right-hemispheric stroke: Quantitative assessment and statistical
lesion-symptom mapping. Journal of Neurology, Neurosurgery and Psychiatry 82(8): 862–8.
Vuilleumier, P. (2000). Faces call for attention: Evidence from patients with visual extinction.
Neuropsychologia 38(5): 693–700.
Vuilleumier, P. and Landis, T. (1998). Illusory contours and spatial neglect. Neuroreport 9(11): 2481–4.
Mutual interplay between perceptual organization and attention 757
Vuilleumier, P. and Rafal, R. (1999). “Both” means more than “two”: Localizing and counting in patients
with visuospatial neglect. Nature Neuroscience 2(9): 783–4.
Vuilleumier, P. and Rafal, R. (2000). A systematic study of visual extinction. Between- and within-field
deficits of attention in hemispatial neglect. Brain 123: 1263–79.
Vuilleumier, P. and Sagiv, N. (2001). Two eyes make a pair: Facial organization and perceptual learning
reduce visual extinction. Neuropsychologia 39(11): 1144–9.
Vuilleumier, P., Sagiv, N., Hazeltine, E., Poldrack, R., Swick, D., Rafal, R., et al. (2001a). Neural fate
of seen and unseen faces in visuospatial neglect: A combined event-related functional MRI and
event-related potential study. Proceedings of the National Academy of Sciences of the United States of
America 98(6): 3495–500.
Vuilleumier, P., Valenza, N., and Landis, T. (2001b). Explicit and implicit perception of illusory contours in
unilateral spatial neglect: Behavioural and anatomical correlates of preattentive grouping mechanisms.
Neuropsychologia 39(6): 597–610.
Ward, R., Goodrich, S., and Driver, J. (1994). Grouping reduces visual extinction: Neuropsychological
evidence for weight-linkage in visual selection. Visual Cognition 1(1): 101–29.
Wertheimer, M. (1923). Untersuchungen zur Lehre von der Gestalt, II. Psychologische Forschung 4: 301–50.
Translated as “Investigations on Gestalt principles, II.”. In: L. Spillmann (ed.) (2012). On motion and
figure-ground organization, pp. 2127–82. Cambridge: MIT Press.
Wu, Y., Chen, J., and Han, S. (2005). Neural mechanisms of attentional modulation of perceptual grouping
by collinearity. Neuroreport 16(6): 567–70.
Wyart, V. and Tallon-Baudry, C. (2008). Neural dissociation between visual awareness and spatial
attention. The Journal of Neuroscience 28(10): 2667–79.
Yeshurun, Y., Kimchi, R., Sha’shoua, G., and Carmel, T. (2009). Perceptual objects capture attention. Vision
Research 49(10) 1329–35.
Young, A. W., Hellawell, D. J., and Welch, J. (1992). Neglect and visual recognition. Brain 115: 51–71.
Zaretskaya, N., Anstis, S., and Bartels, A. (2013). Parietal cortex mediates conscious perception of illusory
gestalt. The Journal of Neuroscience 33(2): 523–31.
Chapter 37
Unlike most objects, for which recognition at the category-level is usually sufficient (e.g., ‘chair’
Rosch et al. 1976), recognizing faces at the individual-level (e.g., ‘Bob’ rather than ‘Joe’) is essential
in day-to-day interactions. But face recognition, as a perceptual process, is not trivial: in addition
to the fact that recognition must be rapidly and accurately accomplished, there is the added per-
ceptual burden as all faces consist of the same kinds of features (eyes, nose, and mouth) appearing
in the same configuration (eyes above nose, nose above mouth). Thus, an obvious challenge asso-
ciated with face recognition is the need to individuate a large number of visually similar exemplars
successfully, while, at the same time, to generalize across perceptual features that are not critical
for the purpose of identification, such as differences in illumination or viewpoint, or even in
the age of the face and changes in hairstyle, amongst others. As evident, the cognitive demands
of face perception differ from most other forms of non-face object recognition. Unsurprisingly,
then, there are many instances where performance with faces differs from performance with other
categories of objects. For example, inversion of the input disrupts recognition for faces dispro-
portionately compared with other objects (Yin 1969), and changing the spatial relations between
features impairs face perception to a greater degree than is true for other objects (Tanaka and
Sengco 1997).
In light of these apparent distinctions, many have posited that faces are processed differently
from other objects, and that the representations and/or processes that mediate face perception are
qualitatively different from those supporting the recognition of other non-face object categories
(Farah et al. 1995; Farah et al. 1998; Tanaka and Farah 2003). Specifically, according to some pro-
ponents, face processing is thought to require encoding as a whole or a Gestalt, and this is neces-
sary in order to ensure that, during processing, the input matches a face template that enforces
the first-order configuration of parts (e.g., eyes above nose, nose above mouth). Such (holistic or
unified) representations are believed to facilitate the extraction of second-order configural infor-
mation (e.g., spacing between features) that is coded as deviations from the template prototype
(Diamond and Carey 1986). This second-order spatial or configural information is, according
to some researchers, particularly critical for distinguishing between objects that are structurally
very similar; the class of faces is a paradigmatic example of a collection of homogenous exemplars
(for review, see Maurer et al. 2002). A possible corollary of the assumption that face representa-
tions are processed holistically is that the individual parts are not explicitly or independently
represented. In its extreme version, this view assumes that faces are not decomposed into parts at
all and, moreover, the parts themselves are especially difficult to access (Davidoff and Donnelly
1990). Consistent with this is the claim that the face template may have no internal part structure;
as stated, ‘the representation of a face used in face recognition is not composed of the faces’ parts’
Holistic Face Perception 759
(Tanaka and Farah 1993). On such an account, there is mandatory perceptual integration across
the entire face region (McKone 2008), or, similarly, mandatory interactive processing of all facial
information (Yovel and Kanwisher 2004) (and for a recent review of holistic processing in relation
to the development of face perception, see McKone et al. 2012). Note that the notion of a unified
face template bears similarity to the view espoused by Gestalt psychologists and the reader is
referred to other chapters in this volume that articulate this concept in greater depth (for example,
Koenderink, this volume) and also that offer empirical evidence for the use of such a Gestalt and
individual differences therein (for example, de-Wit and Wagemans, this volume).
In this chapter, we focus specifically on the viability of a unified face template as implicated in
face perception. We first review behavioral evidence suggesting that face recognition is indeed
holistic in nature (Part 1), and we draw on data from normal observers and patient groups to
support this point. In Part 2, we examine the nature of the mechanisms that give rise to holistic
face recognition. Specifically, we argue that holistic face processing is not necessarily based on
template-like, undifferentiated representations and, rather, we suggest that holistic processing can
also be accomplished by alternative mechanisms such as an automatic attentional strategy and/or
that it can emerge from the interactive processing of face configuration and features. We conclude
by claiming that holistic processing is engaged in face perception but that the underlying mecha-
nism is not likely to be that of a single, unified template.
Note that there are two versions of the composite task being used in the literature, and an ongoing debate over
1
which is more appropriate (e.g., Gauthier and Bukach 2007 vs. Robbins and McKone 2007). The interested
reader might also wish to consult the recent exchange by Rossion (2013) and by Richler and Gauthier (2013).
Details of this debate are beyond the scope of this chapter.
760 Behrmann, Richler, Avidan, and Kimchi
2 2
Sensitivity (d’)
1.5 1.5
1 1
task-irrelevant face half (e.g., the bottom). The face stimuli were taken from the MPI face data-
base (Troje and Bülthoff 1996). Holistic processing is indexed by a failure to selectively attend
to just the one half of the face: because faces are processed as wholes, the task-irrelevant face
half cannot be successfully ignored and, consequently, influences judgments on the target face
half. Thus, participants are more likely to produce a false alarm (say ‘different’) when the two
top halves are identical and when their bottom halves differ than when both the top and the two
bottom halves of the two faces are identical. Interference from the task-irrelevant half is reduced
when the normal face configuration is disrupted by misaligning the face halves (Hole 1994;
Richler et al. 2008), and, as one might expect from the holistic face view, is absent for non-face
objects (Farah et al. 1998; Richler et al. 2011d) (see Figure 37.2). Importantly, the magnitude of
holistic processing as indexed by the interference in the composite task is a significant predictor
Holistic Face Perception 761
of face recognition abilities more generally (DeGutis et al. 2013; McGugin et al. 2012; Richler
et al. 2011b), validating the presumed role of holistic processing as an important component of
face recognition2.
Prosopagnosia
Support for the claim that face processing is necessarily holistic (i.e., faces treated as an undiffer-
entiated whole) is also gleaned from the findings that individuals who suffer from prosopagnosia
and fail to recognize faces appear unable to process visual information in a holistic or configural
fashion. In one of the earliest case studies, Levine and Calvanio (1989) argued that patient LH
suffered from a deficit in configural processing, which they defined as ‘the ability to identify by
getting an overview of an item as a whole in a single glance’ (p. 160). This patient painstakingly
analyses a stimulus such as a face detail-by-detail, over several visual fixations, noting the shapes
of the features and their spatial relationships. Consistent with the failure to represent the whole,
this patient was also impaired in the Gestalt completion tests of visual closure. Similar descrip-
tions abound for other cases. In his popular book The Man who Mistook his Wife for a Hat, Oliver
Sacks reports the following incident concerning his patient, Dr P.
Sacks noted that when Dr. P. looked at him, he seemed to fixate on individual features of his face—an
eye, the right ear, his chin—instead of taking it in as a whole. The only faces he got right were of his
brother— ‘Ach, Paul! That square jaw, those big teeth; I would know Paul anywhere!’—and Einstein
whom he also seemed to recognize from characteristic features—Einstein’s signature hair and mustache.
Considerable empirical evidence supports such anecdotes, with the central claim being that a
breakdown in holistic processing or the ability to integrate the disparate local elements of a face
into a coherent unified representation is causally related to the impairment in face processing
(Barton 2009; Rivest et al. 2009). Indeed, it has been suggested that a key characteristic of patients
with acquired prosopagnosia (AP) is the inability to derive a unified perceptual representation
from the multiple features of an individual face (Ramon et al. 2010; Saumier et al. 2001). Similar
claims have been made about individuals with congenital prosopagnosia (CP). CP is a more
recently recognized deficit in face recognition that occurs in the absence of frank neurological
damage and altered cognition or vision, and is apparently present even over the course of develop-
ment. The growing consensus is that CP individuals are also unable to rapidly process the whole of
the face (e.g., Avidan et al. 2011; Behrmann et al. 2006; Lobmaier et al. 2010; Palermo et al. 2011),
and it appears that the patterns of impairment in face perception are extremely similar across the
acquired and congenital groups of prosopagnosia (although performance in perceiving emotional
expression may differ across the groups, e.g. Humphreys et al. 2007).
We now consider the same sources of evidence gleaned from individuals with prosopagnosia
as we did with the normal participants (part-whole and composite paradigms) and we consider
some additional data from experiments that manipulate spatial configuration between face parts
and spatial relations sensitivity). Rather few studies have directly examined the part-whole effect
in prosopagnosia. In a variant of the standard part-whole task, two well-characterized APs showed
a slight part-over-whole face advantage for eyes trials, in contrast to whole-over-part advantage
found in controls, suggesting that these prosopagnosic individuals have severe holistic processing
2 Studies that have not found support for this relationship have been criticized for the measure of holistic
processing used (Konar et al. 2010) and erroneous interpretation of a correlation based on difference scores
(Wang et al. 2012).
762 Behrmann, Richler, Avidan, and Kimchi
deficits, at least for the eye region (Busigny et al., 2010; Ramon et al. 2010). Similar findings were
obtained in a small group of congenital (or as they define them, developmental) prosopagnos-
ics who showed a lack of a holistic advantage for both Korean and Caucasian faces (though CPs
overall holistic advantage for Caucasian faces was not significantly different from that of controls,
who did show a significant advantage) (DeGutis et al. 2011). Compatible with these findings is
the result of an incomplete part-whole task (no isolated parts trials) in which a single patient
was significantly worse at discriminating part changes in faces than controls, but not for houses
(de Gelder and Rouw, 2000a). These data support the claim that the prosopagnosic individual did
not benefit from the context of the face when making part judgments. A recent study has repli-
cated the lack of benefit from the whole in CP but it appears that this may be specific to the eyes
as trials in which the mouth was presented in context versus alone showed no differential perfor-
mance across CP and controls (Degutis et al. 2012). The differential reliance on mouth versus eye
processing in prosopagnosia has been reported on several occasions (Barton et al. 2003; Bukach
et al. 2008; Caldara et al. 2005).
As has been the case with normal individuals (see above), the composite face paradigm has
been employed to explore the underlying processing in individuals with prosopagnosia. In con-
trast with normal individuals, in the context of a composite face paradigm, congenital prosopag-
nosic individuals performed equivalently with aligned and misaligned faces and were impervious
to (the normal) interference from the task-irrelevant bottom part of faces (Avidan et al. 2011).
Interestingly, the extent to which these individuals were impervious to the misalignment manipu-
lation, was correlated with poorer performance on diagnostic face processing tasks (such as the
Cambridge Face Memory Test; Duchaine and Nakayama, 2006). Consistent with these results,
others have also shown that prosopagnosic (both AP and CP) individuals show reduced interfer-
ence from the unattended part of the face in the composite face paradigm (Busigny et al. 2010;
Ramon et al. 2010) (note, however, that again, not every individual with prosopagnosia evinces
the same profile and some appear to show the normal interference effects; Le Grand et al. 2006;
Susilo et al. 2010). In general, these findings have been taken as evidence to support the notion
that the severity of the face recognition impairment is directly related to the difficulty in attending
to multiple parts of the face in parallel.
Individuals with prosopagnosia also show reduced sensitivity to the spacing between the fea-
tures, implying a difficulty in representing the ‘second order’ relations between facial features.
For example, Ramon and Rossion (2010) reported that patient PS, who suffers from acquired
prosopagnosia, performed poorly on a task that required matching unfamiliar faces in which
the faces differed either with respect to local features or inter-feature distances, over the upper
and lower areas of the face. PS was impaired at matching when the relative distances between the
features differed and this was true even when the location of the features was held constant (and
uncertainty about their position was eliminated) (Caldara et al. 2005; Orban de Xivry et al. 2008).
Consistent with this, patients with prosopagnosia appear to adopt an analytical feature-by-feature
face processing style and focus only on a small spatial window at a time (Bukach et al. 2006).
The failure to focus on the eye region of the face (Bukach et al. 2006; Bukach et al. 2008; Caldara
et al. 2005; Rossion et al. 2009) as well as the relative distances between features (Barton and
Cherkasova 2005; Barton et al. 2002), as mentioned above, may be a direct consequence of defec-
tive holistic processing (Rivest et al. 2009). Also, in a paradigm in which interocular distance or
the distance between the nose and mouth were altered or the relative distances between features
was changed, prosopagnosic patients perform more poorly when required to decide which of
three faces was ‘odd’ (Barton et al. 2002).
Holistic Face Perception 763
Finally, we review those studies, which examine whether both configural and/or featural pro-
cessing are affected in prosopagnosia. For example, some studies that directly examined featural
versus configural processing have found that while CPs show face discrimination deficits for faces
that differ only in configural information (Lobmaier et al. 2010), whereas others report that CPs
are impaired in discriminating both faces that differ only in configural information and faces that
differ only in featural information (Barton et al. 2003; Duchaine et al. 2007; Yovel and Duchaine
2006). However, Le Grand et al. (2006) found that three of their eight developmental prosopag-
nosic individuals were impaired in discrimination of faces that differed in the shape of internal
features, four were impaired in discrimination of faces that differed in spacing, and one partici-
pant performed normally on both discrimination tasks. Taken together, these findings suggest
that CPs can be impaired in processing featural information, configural information, or both.
Whether the impairment in configural and featural processing versus configural processing alone
reflects the heterogeneity in the population or whether the methodological differences in the vari-
ous paradigms elicit somewhat different patterns of performance, remains to be determined.
of the compound letter (Gao et al. 2011; Macrae and Lewis 2002). Similarly, Curby et al. (2012)
found that inducing a negative mood—a manipulation that is believed to promote a local process-
ing bias (Basso et al. 1996)—led to a decrease in holistic processing measured in the composite
task relative to inducing a positive or neutral mood. Thus, as is evident, promoting global vs. local
attentional biases can obviously influence holistic processing, but there is no simple explanation
for how such manipulations would alter the use of a face template, or disrupt face representations.
For example, although it is conceivable that these global/local manipulations operate on a tem-
plate representation, such that a global bias enhances the Gestalt representation and a local bias
draws attention to features, it is unclear how the latter would work if the face features were not
independently represented in the first place. The key distinction then is between an underlying
holistic template, which serves as the representation of a face versus a mechanism that allows for
rapid processing of the disparately represented features in tandem.
Finally, according to the holistic representation view, inverted faces do not fit the face template
(first-order configuration is disrupted), and so should (and could) never be processed holistically
(e.g., Rossion and Boremanse 2008). Thus, the holistic representation view posits a qualitative pro-
cessing difference between upright and inverted faces. However, a growing body of work suggests
that performance differences between upright and inverted faces are quantitative, such that upright
faces and inverted faces are processed in qualitatively the same way, but that upright faces are pro-
cessed more efficiently than inverted faces (Loftus et al. 2004; Riesenhuber et al. 2004; Sekuler et al.
2004). Inversion effects (and their loss in patients with prosopagnosia) have also been documented
for non-face objects, especially those that have a canonical orientation (de Gelder et al. 1998; de
Gelder and Rouw 2000b). Consistent with this more graded account of inversion effects, results
from a composite task show that both upright and inverted faces are processed equally holistically,
but overall performance is better and faster for upright faces (Richler et al. 2011c)3.
One interesting consequence of the difference in processing efficiency for upright versus
inverted faces is that holistic effects require longer presentation times to be observed for inverted
faces (Richler et al. 2011c). Interference from task-irrelevant parts are observed for upright faces
presented for as little as 50ms (Richler et al. 2009b; Richler et al. 2011c), and the modulation
of this interference due to misalignment that characterizes holistic processing occurs with pres-
entation times of 183ms. In contrast, although performance is above chance for inverted faces
presented for 50ms and 183ms, there is no evidence for holistic processing of inverted faces until
presentation times of 800ms (Richler et al. 2011c).
The interaction between presentation time and holistic processing challenges the holistic repre-
sentation account for several reasons. First, the holistic representation account would not predict
that presentation time should influence holistic processing—faces either are or are not encoded
into the face template, and, consequently, holistic processing should be all or none. Second, the fact
that presentation time influences holistic processing suggests that parts are, in fact, being encoded
independently: above chance performance in the composite task only requires encoding of the
target part, whereas interference indicative of holistic processing in the composite task requires
that the irrelevant part be encoded as well. Accordingly, one interpretation of these results is that
at 50ms and 183ms only the target part of inverted faces could be encoded, resulting in successful
performance but no interference. Longer presentation times are required to encode both parts of
This study also shows that the results of studies that find reduced holistic processing of inverted faces are
3
driven by differences in response bias between upright and inverted faces. Interested readers are encouraged
to see Richler et al. (2012) and Richler and Gauthier (2013) for discussion of this issue.
Holistic Face Perception 765
inverted faces, so more time is required to observe interference. In contrast, although they may
be encoded separately, both the target and distractor part in upright faces can be encoded within
50ms (Curby and Gauthier 2009), leading to interference from holistic processing at the fastest
presentation times.
While compelling, the evidence for independent part representations based on the interaction
between holistic processing and time in Richler et al. (2011c) is certainly speculative. However,
other findings also suggest that individual face features can be used in face recognition (e.g.,
Cabeza and Kato 2000; Rhodes et al. 2006; Schwarzer and Massaro 2001), indicating that part
representations are accessible. Indeed, participants can recognize previously learned faces with
above chance accuracy when the face parts are presented in a scrambled configuration, a condi-
tion in which recognition must rely on feature information alone because configural informa-
tion has been removed. Although recognition performance is better in a blurred condition where
facial configuration is maintained but facial featural information is ‘blurred out’ compared to the
scrambled condition, above chance performance in the scrambled condition implies that feature
representations are available and can be used, as well (Schwaninger et al. 2009; see also Hayward
et al. 2008). In fact, at the extreme, face discrimination performance can be guided by a single
feature in the absence (or near absence) of configural variability (Amishav and Kimchi 2010).
face parts were always presented in an aligned format. Square regions surrounding the two face
halves were either the same color and aligned, or different colors and misaligned. Remarkably,
this manipulation led to a decrease in holistic processing that was similar in magnitude to that
observed when face parts themselves are misaligned. In other words, discouraging the grouping
of face parts by disrupting classic Gestalt cue of common region reduced holistic processing in the
same manner as physically misaligning the face parts.
(a)
A B
C D
(b) (c)
CP Matched CP Matched
1000 controls 16 controls
950 14
Baseline Baseline
Response time (ms)
Filtering 12 Filtering
900
Error rate (%)
10
850 8
800 6
4
750
2
700 0
Configural
Configural
Configural
Configural
judgment
judgment
judgment
judgment
judgment
judgment
judgment
judgment
Featural
Featural
Featural
Featural
Fig. 37.3 (a) The stimulus set used used in Amishav and Kimchi (2010) and Kimchi et al. 2012. Faces
in each row (Faces A and B and Faces C and D) vary in their configural information (inter-eyes and
nose-mouth distance) but have the same components (eyes, nose, and mouth). Faces in each column
(Faces A and C and Faces B and D) vary in their components (eyes, nose, and mouth) but have the
same configural information (inter-eyes and nose-mouth distance).
Reproduced from Psychonomic Bulletin & Review, 17(5), pp. 743–748, Perceptual integrality of componential and
configural information in faces, Rama Amishav and Ruth Kimchi, doi: 0.3758/PBR.17.5.743 Copyright © 2010,
Springer-Verlag. With kind permission from Springer Science and Business Media.
prosopagnosics provides strong evidence that featural information and configural information
are perceptually separable and processed independently by individuals with congenital prosop-
agnosia implying that, in contrast with normal observers, these individuals do not perceive faces
holistically.
768 Behrmann, Richler, Avidan, and Kimchi
The finding that information about the parts and information about the configuration of a face
are available is also noted in fMRI and electrophysiological recording that indicate the existence of
both whole-, and part-based representations in face-selective regions of the human and monkey
brain (Harris and Aguirre 2008, 2010) suggesting that part-based and holistic neural tuning are
possible in face-selective regions such as the right fusiform gyrus, further suggesting that such
tuning is surprisingly flexible and dynamic. Similar findings have been uncovered in studies with
non-human primates (Freiwald et al. 2009). Holistic processing is largely attenuated when only
high spatial frequencies are preserved in the stimulus (Goffaux 2009; Goffaux and Rossion 2006)
(but see Cheung et al. 2008, who found equal holistic processing for LSF and HSF faces). However,
a face in high spatial frequencies is still well detected as being a face by the observers, suggesting
again that detecting a face (and presumably activating the template representation of an upright
face) may not be enough to involve holistic processing. More recently, evidence indicated that
holistic processing might depend on the availability of discriminative local feature information
(Goffaux et al. 2012).
Before we conclude, we draw some speculative observations about the mechanisms we have
considered and their possible generality. We have articulated a perspective in which face parts
are processed holistically and in which, over the course of experience, this integrated processing
becomes more automatized. Similar mechanisms may play out in other visual domains as well at
both lower and higher levels of the visual system where context (co-occurrence of other infor-
mation) is present. For example, similar discussion about holistic processing is present in the
literature about crowding and the need and difficulty to extract individual components from the
multiplicity of items; debates about the inability to attend to only a part and whether this affects
the perception of the whole are rife in that field too (Oliva and Torralba 2007). Finally, discus-
sions about context in scene perception have a similar flavor and so we tentatively suggest that
similar mechanisms in which higher-order statistics are derived from the input, especially with
greater experience, may be at play throughout the visual system (e.g., Bar and Aminoff 2003).
Conclusions
There is abundant behavioral evidence that face recognition is holistic based on effects that are
observed in faces but not non-face objects in normal observers, and that are absent in patient
groups characterized by face recognition deficits. But there remains disagreement about what
mechanisms are responsible. Of course, what it means for face recognition to be ‘holistic’ need not
be all-or-none. Here, we have argued against the holistic representation view that, in the extreme,
posits that faces are represented as undifferentiated wholes with no explicit representation of indi-
vidual features. However, ‘more-than-features’ can take on more graded meanings. For example,
spatial relations between face features may be explicitly represented and used in addition to infor-
mation about the features themselves.
It is also important to note that the alternatives to the extreme holistic representation view that
we have proposed here—automatic attentional strategy account and the interactive account—are
not mutually exclusive. For example, proponents of the view that holistic processing is the result
of interactivity between features and configuration often describe face features as being processed
in parallel (Kimchi and Amishav 2010; Macho and Leder 1998; see also Fific and Townsend 2010),
which may be consistent with the notion that attention is automatically deployed to the entire
face at the same time (Richler et al. 2011d). Importantly, certain aspects of these two accounts
need to be empirically reconciled. For example, the classic finding in the composite task (used to
Holistic Face Perception 769
support the automatic attentional strategy account) is that participants cannot selectively attend
to one face half (e.g., Richler et al. 2008), but in the Garner paradigm (used to support the inter-
active account) participants are able to make classification judgments based on one feature while
successfully ignoring other features (Amishav and Kimchi 2010). Moreover, the failures of selec-
tive attention documented in the composite task are also observed for inverted faces (Richler
et al. 2011c), but interactivity of features and configuration assessed in the Garner paradigm are
specific to upright faces (Kimchi and Amishav 2010). Thus, the two paradigms lead to different
conclusions about whether processing differences between upright and inverted faces are qualita-
tive vs. quantitative. One potential reason for these discrepancies is that the coarse parts used in
the composite task (full face halves) contain both feature changes (e.g., a different bottom part will
have a different mouth) but also subtle configural changes, whereas in the Garner paradigm used
by Amishav and Kimchi (2010) feature and configural information are fully isolated and manipu-
lated independently. An exciting avenue for future research is to explore how these two lines of
work and the theoretical accounts they support come together to explain normal face perception.
Acknowledgements
The preparation of this chapter and the associated research was supported by a grant from the
National Science Foundation to MB (BCS0923763), by a grant from the Temporal Dynamics of
Learning Center, SBE0542013 (G. Cottrell), and by a grant from the Israeli Science Foundation
(ISF, 384/10) to GA.
References
Amishav, R., and Kimchi, R. (2010). ‘Perceptual integrality of componential and configural information in
faces’. Psychon Bull Rev 17(5): 743–8.
Avidan, G., Tanzer, M., and Behrmann, M. (2011). ‘Impaired holistic processing in congenital
prosopagnosia’. Neuropsychologia 49(9): 2541–52. doi: 10.1016/j.neuropsychologia.2011.05.002.
Bar, M., and Aminoff, E. (2003). ‘Cortical analysis of visual context’. Neuron 38(2): 347–58.
Barton, J. J. S. (2009). ‘What is meant by impaired configural processing in acquired prosopagnosia?’
Perception 38(2): 242–60.
Barton, J. J. S., and Cherkasova, M. V. (2005). ‘Impaired spatial coding within objects but not between
objects in prosopagnosia’. Neurology 65(2): 270–4.
Barton, J. J. S., Press, D. Z., Keenan, J. P., and O’Connor, M. (2002). ‘Lesions of the fusiform face area
impair perception of facial configuration in prosopagnosia’. Neurology 58: 71–8.
Barton, J. J. S., Cherkasova, M. V., Press, D. Z., Intriligator, J. M., and O’Connor, M. (2003).
‘Developmental prosopagnosia: a study of three patients’. Brain Cogn 51(1): 12–30.
Basso, M. R., Schefft, B. K., Ris, M. D., and Dember, W. N. (1996). ‘Mood and global-local visual
processing’. Journal of the International Neuropsychological Society 2(3): 249–55.
Behrmann, M., Avidan, G., Leonard, G. L., Kimchi, R., Luna, B., Humphreys, K., and Minshew, N.
(2006). ‘Configural processing in autism and its relationship to face processing’. Neuropsychologia
44(1): 110–29.
Bukach, C. M., Bub, D. N., Gauthier, I., and Tarr, M. J. (2006).‘ Perceptual expertise effects are not all
or none: spatially limited perceptual expertise for faces in a case of prosopagnosia’. J Cogn Neurosci
18(1): 48–63.
Bukach, C. M., Le Grand, R., Kaiser, M. D., Bub, D. N., and Tanaka, J. W. (2008). ‘Preservation of mouth
region processing in two cases of prosopagnosia’. J Neuropsychol 2(Pt 1): 227–44.
770 Behrmann, Richler, Avidan, and Kimchi
Busigny, T., Joubert, S., Felician, O., Ceccaldi, M., and Rossion, B. (2010). ‘Holistic perception of
the individual face is specific and necessary: evidence from an extensive case study of acquired
prosopagnosia’. Neuropsychologia 48(14): 4057–92. doi: 10.1016/j.neuropsychologia.2010.09.017.
Cabeza, R., and Kato, T. (2000). ‘Features are also important: contributions of featural and configural
processing to face recognition’. Psychol Sci 11(5): 429–33.
Caldara, R., Schyns, P., Mayer, E., Smith, M. L., Gosselin, F., and Rossion, B. (2005). ‘Does Prosopagnosia
Take the Eyes Out of Face Representations? Evidence for a Defect in Representing Diagnostic Facial
Information following Brain Damage’. J Cogn Neurosci 17(10): 1652–66.
Cheung, O. S., Richler, J. J., Palmeri, T. J., and Gauthier, I. (2008). ‘Revisiting the Role of Spatial
Frequencies in the Holistic Processing of Faces’. Journal of Experimental Psychology: Human Perception
and Performance 34(6): 1327–36.
Curby, K. M., and Gauthier, I. (2009). ‘The temporal advantage for individuating objects of
expertise: perceptual expertise is an early riser’. J Vis 9(6): 7.1-13. doi: 10.1167/9.6.7.
Curby, K. M., Johnson, K. J., and Tyson, A. (2012). ‘Face to face with emotion: holistic
face processing is modulated by emotional state’. Cognition and Emotion 26(1): 93–102.
doi: 10.1080/02699931.2011.555752.
Curby, K. M., Goldstein, R. R., and Blacker, K. (2013). ‘Disrupting perceptual grouping of face parts
impairs holistic face processing’. Atten Percept Psychophys 75(1): 83–91. doi: 10.3758/s13414-012-0386-9.
Davidoff, J., and Donnelly, N. (1990). ‘Object superiority: A comparison of complete and part probes’. Acta
Psychologica 73: 225–43.
de Gelder, B., and Rouw, R. (2000a). ‘Configural face processes in acquired and developmental
prosopagnosia: evidence for two separate face systems?’ NeuroReport 11(14): 3145–50.
de Gelder, B., and Rouw, R. (2000b). ‘Paradoxical configuration effects for faces and objects in
prosopagnosia’. Neuropsychologia 38(9): 1271–9.
de Gelder, B., Bachoud-Levi, A. C., and Degos, J. D. (1998). ‘Inversion superiority in visual agnosia may be
common to a variety of orientation polarised objects besides faces’. Vision Research 38(18): 2855–61.
de-Wit, L. and Wagemans, J. (in press). Individual differences in local and global perceptual organization. In
Oxford Handbook of Perceptual Organization, edited by J. Wagemans. (Oxford: Oxford University Press).
DeGutis, J., DeNicola, C., Zink, T., McGlinchey, R., and Milberg, W. (2011). ‘Training with own-race
faces can improve processing of other-race faces: evidence from developmental prosopagnosia’.
Neuropsychologia 49(9): 2505–13. doi: 10.1016/j.neuropsychologia.2011.04.031.
DeGutis, J., Cohan, S., Mercado, R. J., Wilmer, J., and Nakayama, K. (2012). ‘Holistic processing of the
mouth but not the eyes in developmental prosopagnosia’. Cognitive Neuropsychology 29(5–6): 419–46.
doi: 10.1080/02643294.2012.754745.
DeGutis, J., Wilmer, J., Mercado, R. J., and Cohan, S. (2013). ‘Using regression to measure holistic face
processing reveals a strong link with face recognition ability’. Cognition, 126(1), 87–100. doi: 10.1016/j.
cognition.2012.09.004.
Diamond, R., and Carey, S. (1986). ‘Why faces are and are not special: An effect of expertise’. Journal of
Experimental Psychology: General 115: 107–17.
Duchaine, B., and Nakayama, K. (2006). The Cambridge Face Memory Test: Results for neurologically
intact individuals and an investigation of its validity using inverted face stimuli and prosopagnosic
participants. Neuropsychologia, 44(4), 576–585.
Duchaine, B., Yovel, G., and Nakayama, K. (2007). ‘No global processing deficit in the Navon task in 14
developmental prosopagnosics’. Soc Cogn Affect Neurosci 2(2): 104–13. doi: 10.1093/scan/nsm003.
Farah, M. J., Tanaka, J. W., and Drain, H. M. (1995). ‘What causes the face inversion effect?’ Journal of
experimental psychology. Human perception and performance 21(3): 628–34.
Farah, M. J., Wilson, K. D., Drain, M., and Tanaka, J. W. (1998). ‘What is “special” about face perception?’
Psychol Rev 105(3): 482–98.
Holistic Face Perception 771
Macho, S., and Leder, H. (1998). ‘Your eyes only? A test of interactive influence in the processing of facial
features’. Journal of experimental psychology. Human perception and performance 24(5): 1486–500.
Macrae, C. N., and Lewis, H. L. (2002). ‘Do I know you? Processing orientation and face recognition’.
Psychological Science 13(2): 194–6.
Maurer, D., Le Grand, R., and Mondloch, C. J. (2002). ‘The many faces of configural processing’. TRENDS
in Cognitive Sciences 6(6): 255–60.
McGugin, R. W., Richler, J. J., Herzmann, G., Speegle, M., and Gauthier, I. (2012). ‘The Vanderbilt
Expertise Test reveals domain-general and domain-specific sex effects in object recognition’. Vision
Research 69: 10–22. doi: 10.1016/j.visres.2012.07.014.
McKone, E. (2008). ‘Configural processing and face viewpoint’. Journal of experimental psychology. Human
perception and performance 34(2): 310–27. doi: 10.1037/0096-1523.34.2.310.
McKone, E., Crookes, K., Jeffery, L., and Dilks, D. D. (2012). ‘A critical review of the development of
face recognition: Experience is less important than previously believed’. Cognitive Neuropsychology.
doi: 10.1080/02643294.2012.660138.
Michel, C., Rossion, B., Han, J., Chung, C. S., and Caldara, R. (2006). ‘Holistic processing is finely tuned
for faces of one’s own race’. Psychological Science 17(7): 608–15. doi: 10.1111/j.1467-9280.2006.01752.x.
Mondloch, C. J., Elms, N., Maurer, D., Rhodes, G., Hayward, W. G., Tanaka, J. W., and Zhou, G.
(2010).‘Processes underlying the cross-race effect: an investigation of holistic, featural, and relational
processing of own-race versus other-race faces’. Perception 39(8): 1065–85.
Navon, D. (1977). ‘Forest before trees: The precedence of global features in visual perception’. Cognitive
Psychology 9(3): 353–83.
Oliva, A., and Torralba, A. (2007). ‘The role of context in object recognition’. TRENDS in Cognitive Sciences
11(12): 520–7. doi: 10.1016/j.tics.2007.09.009.
Orban de Xivry, J. J., Ramon, M., Lefevre, P., and Rossion, B. (2008). ‘Reduced fixation on the upper area
of personally familiar faces following acquired prosopagnosia’. J Neuropsychol 2(Pt 1): 245–68.
Palermo, R., Willis, M. L., Rivolta, D., McKone, E., Wilson, C. E., and Calder, A. J. (2011). ‘Impaired
holistic coding of facial expression and facial identity in congenital prosopagnosia’. Neuropsychologia
49(5): 1226–35. doi: 10.1016/j.neuropsychologia.2011.02.021.
Ramon, M., and Rossion, B. (2010). ‘Impaired processing of relative distances between features
and of the eye region in acquired prosopagnosia—two sides of the same holistic coin?’ Cortex;
a journal devoted to the study of the nervous system and behavior 46(3): 374–389. doi: 10.1016/j.
cortex.2009.06.001.
Ramon, M., Busigny, T., and Rossion, B. (2010). ‘Impaired holistic processing of unfamiliar
individual faces in acquired prosopagnosia’. Neuropsychologia 48(4): 933–44. doi: 10.1016/j.
neuropsychologia.2009.11.014.
Rhodes, G., Hayward, W. G., and Winkler, C. (2006). ‘Expert face coding: configural and component
coding of own-race and other-race faces’. Psychonomic Bulletin and Review 13(3): 499–505.
Richler, J. J., Tanaka, J. W., Brown, D. D., and Gauthier, I. (2008). ‘Why does selective attention to parts fail
in face processing?’ J Exp Psychol Learn Mem Cogn 34(6): 1356–68. doi: 10.1037/a0013080.
Richler, J. J., Bukach, C. M., and Gauthier, I. (2009a). ‘Context influences holistic processing of nonface
objects in the composite task’. Atten Percept Psychophys 71(3): 530–40. doi: 10.3758/APP.71.3.530.
Richler, J. J., Mack, M. L., Gauthier, I., and Palmeri, T. J. (2009b). ‘Holistic processing of faces happens at a
glance’. Vision Research 49(23): 2856–61. doi: 10.1016/j.visres.2009.08.025.
Richler, J. J., Cheung, O. S., and Gauthier, I. (2011a). ‘Beliefs alter holistic face processing . . . if response
bias is not taken into account’. J Vis 11(13): 17. doi: 10.1167/11.13.17.
Richler, J. J., Cheung, O. S., and Gauthier, I. (2011b). ‘Holistic processing predicts face recognition’.
Psychological Science 22(4): 464–71. doi: 10.1177/0956797611401753.
Holistic Face Perception 773
Richler, J. J., Mack, M. L., Palmeri, T. J., and Gauthier, I. (2011c). ‘Inverted faces are (eventually) processed
holistically’. Vision Research 51(3): 333–42. doi: 10.1016/j.visres.2010.11.014.
Richler, J. J., Wong, Y. K., and Gauthier, I. (2011d). ‘Perceptual Expertise as a Shift from Strategic
Interference to Automatic Holistic Processing’. Current Directions in Psychological Science 20(2): 129–34.
doi: 10.1177/0963721411402472.
Richler, J. J., Palmeri, T. J., and Gauthier, I. (2012). ‘Meanings, mechanisms, and measures of holistic
processing’. Front Psychol 3: 553. doi: 10.3389/fpsyg.2012.00553.
Richler, J. J., and Gauthier, I. (2013). ‘When intuition fails to align with data: A reply to Rossion (2013)’.
Visual Cognition 21(2): 254–76.
Riesenhuber, M., Jarudi, I., Gilad, S., and Sinha, P. (2004). ‘Face processing in humans is compatible with a
simple shape-based model of vision’. Proc Biol Sci 271 Suppl 6: S448–450. doi: 10.1098/rsbl.2004.0216.
Rivest, J., Moscovitch, M., and Black, S. (2009). ‘A comparative case study of face recognition: the
contribution of configural and part-based recognition systems, and their interaction’. Neuropsychologia
47(13): 2798–811. doi: 10.1016/j.neuropsychologia.2009.06.004.
Robbins, R., and McKone, E. (2003). ‘Can holistic processing be learned for inverted faces?’ Cognition
88: 79–107.
Robbins, R., and McKone, E. (2007). ‘No face-like processing for objects-of-expertise in three behavioural
tasks’. Cognition 103(1): 34–79. doi: 10.1016/j.cognition.2006.02.008.
Rosch, E. H., Mervis, C. B., Gray, W. D., Johnson, D. M., and Boyes-Braem, P. (1976). ‘Basic objects in
natural categories’. Cognitive Psychology 8: 382–439.
Rossion, B. (2013). ‘The composite face illusion: A whole window into our understanding of holistic face
perception’. Visual Cognition 21(2): 139–253.
Rossion, B., and Boremanse, A. (2008). ‘Nonlinear relationship between holistic processing of individual
faces and picture-plane rotation: evidence from the face composite illusion’. J Vis 8(4): 3 1–13.
doi: 10.1167/8.4.3.
Rossion, B., Kaiser, M. D., Bub, D., and Tanaka, J. W. (2009). ‘Is the loss of diagnosticity of the eye region
of the face a common aspect of acquired prosopagnosia?’ J Neuropsychol 3(Pt 1): 69–78.
Rossion, B., Prieto, E. A., Boremanse, A., Kuefner, D., and Van Belle, G. (2012). ‘A steady-state visual
evoked potential approach to individual face perception: Effect of inversion, contrast-reversal and
temporal dynamics’. NeuroImage 63(3): 1585–1600. doi: 10.1016/j.neuroimage.2012.08.033.
Saumier, D., Arguin, M., and Lassonde, M. (2001). ‘Prosopagnosia: a case study involving problems in
processing configural information’. Brain Cogn 46(1–2): 255–9.
Schwaninger, A., Lobmaier, J. S., Wallraven, C., and Collishaw, S. (2009). ‘Two routes to face
perception: evidence from psychophysics and computational modeling’. Cognitive Science 33(8): 1413–40.
doi: 10.1111/j.1551-6709.2009.01059.x.
Schwarzer, G., and Massaro, D. W. (2001). ‘Modeling face identification processing in children and adults’.
Journal of Experimental Child Psychology 79(2): 139–61. doi: 10.1006/jecp.2000.2574.
Sekuler, A. B., Gaspar, C. M., Gold, J. M., and Bennett, P. J. (2004). ‘Inversion leads to quantitative, not
qualitative, changes in face processing’. Curr Biol 14(5): 391–6.
Susilo, T., McKone, E., Dennett, H., Darke, H., Palermo, R., Hall, A., . . . Rhodes, G. (2010). ‘Face
recognition impairments despite normal holistic processing and face space coding: evidence from a case
of developmental prosopagnosia’. Cogn Neuropsychol 27(8): 636–64. doi: 10.1080/02643294.2011.613372.
Tanaka, J. W., and Farah, M. J. (1993). ‘Parts and wholes in face recognition’. Quarterly Journal of
Experimental Psychology 46A: 225–45.
Tanaka, J. W., and Farah, M. J. (2003). ‘The holistic representation of faces’. In Analytic and Holistic
Processes in Perception of Faces, Objects and Scenes, edited by G. Rhodes and M. A. Peterson.
(New York: Oxford University Press).
774 Behrmann, Richler, Avidan, and Kimchi
Tanaka, J. W., and Sengco, J. A. (1997). ‘Features and their configuration in face recognition’. Mem Cognit
25(5): 583–92.
Troje, N., and Bülthoff, H. H. (1996). ‘Face recognition under varying poses: The role of texture and shape’.
Vision Research 36: 1761–71.
Wang, R., Li, J., Fang, H., Tian, M., and Liu, J. (2012). ‘Individual differences in holistic processing predict
face recognition ability’. Psychological Science 23(2): 169–77. doi: 10.1177/0956797611420575.
Wenger, M. J., and Townsend, J. T. (2006). ‘On the costs and benefits of faces and words: process
characteristics of feature search in highly meaningful stimuli’. Journal of experimental psychology.
Human perception and performance 32(3): 755–79. doi: 10.1037/0096-1523.32.3.755.
Wong, Y. K., and Gauthier, I. (2010). ‘Holistic processing of musical notation: Dissociating failures of
selective attention in experts and novices’. Cognitive, affective and behavioral neuroscience 10(4): 541–51.
doi: 10.3758/CABN.10.4.541.
Yin, R. K. (1969). ‘Looking at upside-down faces’. Journal of Experimental Psychology 81: 141–5.
Young, A. W., Hellawell, D., and Hay, D. C. (1987). ‘Configurational information in face perception’.
Perception 16: 747–59.
Yovel, G., and Duchaine, B. (2006). ‘Specialized face perception mechanisms extract both part and spacing
information: evidence from developmental prosopagnosia’. Journal of Cognitive Neuroscience 18(4): 580–93.
doi: 10.1162/jocn.2006.18.4.580.
Yovel, G., and Kanwisher, N. (2004). ‘Face perception: domain specific, not process specific’. Neuron
44(5): 889–98.
Chapter 38
Video t
monitor
Percept durations
Mirror stereoscope
or LCD shutters
Gamma
Frequency
distribution
Percept durations
Fig. 38.1 (a) Examples of perceptually ambiguous stimuli. Inspecting any of these figures will elicit
perceptual alternations between two roughly equally probable interpretations. The first two stimuli
are examples of ambiguous perspective that can arise when three-dimensional forms are rendered as
two-dimensional images, as commonly occurs in the retinal image of the external world. Over time,
the two perspectives or ‘view points’ alternate. The third example shows an instance of ambiguous
segregation between figure and ground. A vase is perceived when the white region is interpreted as
figure, or as two faces in profile when the black region is interpreted as figure. (b) Binocular rivalry is
a very actively researched form of ambiguous perception. Separate images are presented to the eyes,
usually by means of a mirror stereoscope. Any significant interocular difference in orientation, color,
texture, movement, etc. will suffice to trigger binocular rivalry, which is experienced as a series of
irregular perceptual alternations over time as first one image is perceived and then the other. While
one image is perceived, the other is suppressed from visual awareness. A given image therefore
undergoes periods of dominance and suppression. All forms of bistable perception produce a
skewed gamma-like distribution when the durations of many dominance periods are pooled. For
binocular rivalry, the peak of this distribution typically would be around 2–3 seconds, with occasional
longer dominance periods.
perception is broadly similar in that all involve exclusive alternations between the competing
perceptual interpretations. One common hallmark is the apparent randomness of the alter-
nations between competing interpretations, as evidenced by the gamma-like, skewed normal
frequency histograms of dominance durations (Fox and Herrmann 1967) (see Figure 38.1b).
Binocular Rivalry and Perceptual Ambiguity 777
Several studies have shown that diverse instances of perceptual rivalry all exhibit this pattern of
temporal dynamics (Carter and Pettigrew 2003; Long and Toppino 2004; Brascamp et al. 2005;
van Ee 2005; O’Shea et al. 2009), suggesting that it may be a general characteristic of bistable
perception.
In this chapter we focus on the most widely studied form of bistable perception, binocular
rivalry (Blake 2001; Tong 2001; Blake and Logothetis 2002; Alais and Blake 2005). We begin by
describing the basic properties of binocular rivalry, and then review work on rivalry relating to
perceptual organization, including figure/ground segregation and perceptual grouping. The sec-
ond half of the chapter broadens the scope by discussing the role of attention in binocular rivalry
and considering the impact of top-down and contextual influences. Broader still, the final section
examines recent work studying binocular rivalry in a multisensory context.
Binocular Rivalry
Binocular rivalry is a compelling bistable phenomenon first systematically studied by
Wheatstone (1838) following his invention of the mirror stereoscope. Binocular rivalry occurs
when each eye views incompatible images at the same retinal location, where ‘incompatible’
means stimuli sufficiently different to prevent a binocular match. This can be easily achieved
in the laboratory using a mirror stereoscope to present a different image to each eye, as shown
in Figure 38.1b. Perceptually, binocular rivalry is experienced as seemingly random fluctua-
tions in dominance between one image and the other that continue as long as the dissimilar
images are viewed. For stimuli of similar salience, these stochastic fluctuations tend to even
out over time so that each image is seen equally often during extended viewing. Stimulus
salience in binocular rivalry is largely governed by low-level stimulus properties, such as con-
trast, luminance, and orientation, with a relatively small but demonstrable role for high-level
stimulus factors such as attention and context (reviewed later in the chapter). Generally, while
one image is dominant, little or no trace of the other image is perceived. Interest in binocular
rivalry has increased in recent decades, in part because rivalry allows systematic examination
of processes governing perceptual competition, neural dynamics and selection of the contents
of visual awareness.
Although binocular rivalry has much in common with other forms of bistable perception,
some very important differences set binocular rivalry apart. First, binocular rivalry is unique
in presenting a different stimulus to each eye, whereas other bistable examples involve a single
stimulus viewed binocularly. This interocular conflict disrupts normal binocular vision and trig-
gers binocular rivalry, in part because the conflict interferes with the establishment of binocular
correspondence necessary for stereomatching. Second, the alternations in binocular rivalry are
generally mutually exclusive, such that when one image is perceived the other is completely sup-
pressed. Other forms of bistable perception involve a single stimulus that supports two interpre-
tations, and it is those interpretations that alternate over time while the stimulus itself remains
visible. The Necker cube, for example, elicits bistable alternations of perceived perspective without
any part of cube disappearing from visual awareness. Third, binocular rivalry has a strong local
component, as revealed by the phenomenon of piecemeal rivalry in which large images tend to
alternate as a patchwork (O’Shea et al. 1997). By contrast, other bistable stimuli tend to alternate
globally and do not exhibit obvious ‘piecemeal’ states. There are, however, conditions under which
rivalry behaves globally, and this makes it useful as a tool for studying perceptual organization.
Accordingly, the following sections review basic features of binocular rivalry that illustrate its
links to the principles of perceptual organization.
778 Alais and Blake
(a)
(b)
Fig. 38.2 (a) Figure/ground segregation has not been widely investigated in binocular rivalry. In
this stimulus, the left- and right-eye stimuli contain clearly defined central ‘figure’ regions that are
mismatched in color and orientation, as does the surrounding ‘ground’ region – although with the
inverse arrangement. Perceptual organization prioritizing figure over ground should produce more
vigorous rivalry for the central region, which would manifest as a faster rivalry alternation rate and
stronger suppression of the unseen stimulus – both well-known consequences of increasing stimulus
strength. (b) Perceptual interpretation of the rivaling monocular images can also influence binocular
rivalry. The left image simulates a ground plane and the right image a ceiling plane. Both images
are identical except for a 180° rotation added to one of them, however a ground plane has greater
ecological relevance in our interaction with the world. Consistent with the ground plane having
more salience, it tends to predominate over the ceiling plane in overall dominance duration and
returns to dominance more quickly than the ceiling plane when suppressed.
surface properties such as natural boundary contours (Ooi and He 2006) and the coherence of
surfaces (Ooi and He 2003) influence dynamics and dominance durations in rivalry. As an exam-
ple, continuous or homogenous surfaces tend to dominate over discontinuous images (Ooi and
He 2003).
(b)
Fig. 38.3 Two examples of rivalry stimuli that engage in large-scale perceptual organization. (a) First
published by Diaz-Caneja in 1928, these two images show a tendency to alternate as globally coherent
patterns, switching between entirely red horizontal lines and entirely green concentric lines. Theories
explaining rivalry as a competition between monocular channels predict that the dominant percept
should never be globally coherent as one or the other of the bipartite monocular stimuli should be
dominant at any given moment. The fact that the dominant percept may become grouped into a
coherent whole shows that perceptual organization can occur interocularly and combine independent
monocular views into perceptual wholes. (b) Dichoptically viewing the upper pair of images produces
rivalrous alternations between the left- and right-eye stimuli. The lower pair also produce left- vs.
right-eye rivalry, but in addition produces periods of rivalry between the coherent images (the monkey
face vs. the page of text) which requires grouping elements from each image simultaneously across
the eyes (Kovacs et al. 1996). These demonstrations show that coherent perceptual organisation can
be imposed on conflicting monocular images when strong Gestalts are present. Because this requires
interocular grouping, it implies a binocular process over-riding earlier interocular suppression.
Reproduced from E. Diaz-Caneja, Sur l’alternance binoculaire, Annales D’Oculistique , 165, pp. 721–31, Copyright
© 1928, The Author.
Reproduced from Ilona Kovács, Thomas V. Papathomas, Ming Yang, and Ákos Fehér, When the brain changes its
mind: Interocular grouping during binocular rivalry, Proceedings of the National Academy of Sciences, USA, 93
(26), pp. 15508–15511, Figures 1 a and b, Copyright (1996) National Academy of Sciences, U.S.A.
Binocular Rivalry and Perceptual Ambiguity 781
Spatial interactions
Fig. 38.4 When large stimuli engage in rivalry their perceptual alternations are not global but
piecemeal. Instead of coherent oscillations between one whole image and the other, a multitude of
local rivalry zones appears with each appearing to alternate independently of the others. These local
zones of suppression may exhibit coordinated alternation dynamics, especially when adjacent zones
share collinear or near-collinear contours, as illustrated by the ‘association field’. This can be studied
using discrete orientation patches and varying relative orientation and distance. In continuous
stimuli, as shown in the annular stimuli on the right-hand side, these local interactions manifest as
travelling waves of dominance when the orientation is collinear or nearly so. Such a stimulus will first
emerge from suppression in a local region and then smoothly emerge from suppression following
a wave front travelling along the orientation. In an annulus with radial orientation, travelling
dominance waves are not generally observed and piecemeal rivalry is more likely.
adjacent in the same hemifield (therefore projecting to adjacent columns in the same cortical
hemisphere), and was still quite strong when the rivaling stimuli were placed on either side of fixa-
tion. The fact that grouping was still observed for grating patches placed on either side of fixation
suggests that callosal connections between hemispheres are able to establish the adjacency of the
grating patches in the visual field as well as their orientation relationship. Consistent with this sug-
gestion, a study of binocular rivalry in a split-brain observer found that coordinated dominance
Binocular Rivalry and Perceptual Ambiguity 783
between rivalry patches did not occur when those patches were located either side of the midline
(O’Shea and Corballis 2005). The corpus callosum does indeed seem critical for perceptual group-
ing across the vertical midline.
Binocular rivalry is therefore a process occurring in local zones, but these can group together
into pairs or larger ensembles (Bonneh and Sagi 1999) according to the principle of the ‘asso-
ciation field’ (Field et al. 1993). This notion (see Figure 38.4) is similar to the Gestalt principle
of common fate or good continuation and posits that collinear orientations will tend to associ-
ate more strongly than oblique contours (Alais et al. 2006), and that the strength of association
declines with distance. The association field is thought to have a basis in the long-range horizontal
connections in V1 which are known to be longer and stronger for collinear orientations and to fall
off monotonically with angular difference (Kapadia et al. 1995). Related work shows that spatial
interactions influencing rivalry can arise outside regions of the visual field within which rivalry
is occurring. For instance, the predominance and strength of suppression of a patch of grating
engaged in rivalry are influenced by a surrounding grating that is not engaged in rivalry (Paffen
et al. 2004; Paffen et al. 2005). This interaction is thought to have a neural basis in center-surround
interactions between classical and extended receptive fields (e.g., Blakemore and Tobin 1972;
Fitzpatrick 2000).
Another line of work pointing to local grouping between rivalry zones comes from studies of
‘traveling waves’ of rivalry dominance (Wilson et al. 2001; Kang et al. 2010). These studies exam-
ined the often noted observation that when a large rivalry stimulus is suppressed, dominance will
often breakthrough in a single small region and then spread like a wave, sweeping across the entire
stimulus until it is fully visible. Psychophysical observations have shown that traveling waves tend
to travel faster and further along collinear contours than non-collinear contours (see Figure 38.4),
in keeping with the association field hypothesis (Wilson et al. 2001; Kang et al. 2010). An fMRI
study (Lee et al. 2005) has shown that when a traveling wave is experienced in rivalry it produces
a concomitant wave of changing BOLD activity across the occipital cortex that is correlated spa-
tially and temporally with the perceived traveling wave. The speed of the wave in perception, in
other words, is tightly correlated with the spreading wave within neural tissue, as is the spatial
movement of the wave in the visual field and in retinotopic cortical areas (Lee et al. 2007).
Taken together, these findings are consistent with binocular rivalry being a local process with
lateral interactions capable of coordinating rivalry states across adjacent locations, thereby allow-
ing coherent states to emerge through perceptual grouping and synchronized transitions. Rivalry
thus exhibits spatial grouping over space and time. This grouping is made possible by cooperation
along collinear or near-collinear orientations and is likely mediated by lateral cortico-cortical
networks (Kapadia et al. 1995; Angelucci et al. 2002). For a full review of contour interactions, see
Hess et al. (this volume). Consistent with this reasoning, natural images—which contain locally
correlated orientations across spatial scales—tend to resist breaking into piecemeal zones and will
remain coherent at much larger image sizes than gratings will (Alais and Melcher 2007). Natural
images will also tend to predominate over non-natural images when the two are pitted against one
another in rivalry (Baker and Graf 2009).
et al. 2010). What underlies the temporal dynamics of binocular rivalry? This section will review
the factors governing rivalry dynamics, and in doing so will lay the groundwork for the subse-
quent sections discussing top-down and contextual influences on binocular rivalry.
Levelt (1965), one of the first to examine rivalry dynamics in detail, borrowed the idea of recip-
rocal inhibition from early neurophysiologists. He contended that when conflicting rival images
first activate respective neural populations, reciprocal inhibition would inevitably cause one
response to dominate the other. The reason is that a stronger response in one population—even
a slight one—leads to greater inhibition over the other population. Any degree of advantage less
inhibition is exerted back by the weaker population, freeing the stronger population to respond
even more strongly (and exert still further inhibition over the other). This process rapidly leads to
one population completely inhibiting the other so that only one image is visible. Most subsequent
models of binocular rivalry have employed reciprocal inhibition to account for rivalry suppres-
sion (Lehky 1988; Blake 1989; Mueller 1990; Laing and Chow 2002; Freeman 2005).
Reciprocal inhibition offers an explanation of the suppression of one image at rivalry onset,
but how does it explain the ensuing alternation of perceptual dominance? Simply adding neural
adaptation to the reciprocal inhibition process is sufficient to account for ongoing fluctuations in
dominance because it reverses the process. Adaptation gradually attenuates the responses within
the dominant population, progressively weakening its inhibitory hold over the suppressed popu-
lation. Concurrent with weakening inhibition, the suppressed neurons are also recovering from
adaptation incurred in their previous dominance phase and are thus gaining strength. Over time,
responses in the two populations converge towards a balance point where any minor change in
response can trigger a flip in perceptual dominance. The adapting reciprocal inhibition model of
binocular rivalry is sufficient to explain both suppression and alternation dynamics. Importantly,
the tipping point is somewhat variable, as it is influenced by external factors such as eye move-
ments or blinks, or by internal factors such as attentional shifts or neuronal noise in response
levels (Kim et al. 2006; Lankheet 2006; Moreno-Bote et al. 2007). These potential tipping fac-
tors assume increasing significance as the tipping point approaches and can trigger perceptual
shifts at irregular times, consistent with the fundamentally stochastic nature of rivalry dynamics
(Brascamp et al. 2006; Shpiro et al. 2009).
The adapting reciprocal inhibition model of rivalry predicts that suppression strength should
weaken over a dominance period, reaching a minimum level just prior to a dominance switch.
Two studies testing this prediction found sensitivity for detecting probes in the suppressed
eye late in a suppression period were not better than early in the period (Fox and Check 1968;
Norman et al. 2000), implying that inhibition was not weakening over time. However, two limi-
tations may explain their null finding. First, both studies used gratings as rival stimuli but meas-
ured sensitivity using completely different probes (letters or small spots of light) that would not
tap into the same neurons signaling (and adapting to) the suppressed grating. Second, the ‘late’
probes in these studies were presented at the median dominance duration so that no genuinely
late probes were measured. Recently, a new approach solved these problems (Alais et al. 2010a).
First, the probe was a contrast increment of the suppressed stimulus itself, meaning it directly
probed contrast sensitivity of the neurons encoding the suppressed stimulus. Second, in a new
‘reverse correlation’ approach, hundreds of probes were presented at random times and their
timing relative to suppression onset was later mapped onto observers’ rivalry alternation data. In
this design, probes could fall early or late in a rivalry phase with equal probability. Plotting probe
sensitivity within rivalry phases showed a striking reciprocity: dominance performance was ini-
tially stable but declined late in the period, and suppression performance was initially stable but
improved in a complementary fashion late in the period (Figure 38.5). The complementarity
Binocular Rivalry and Perceptual Ambiguity 785
100 2500
90 2000
% Correct probes
70 1000
60 500
50 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Normalized duration
Fig. 38.5 The classical model of rivalry is based on reciprocal inhibition reciprocal inhibition between
competing neural representations of images viewed by the left and right eyes. This model explains
how a monocular image becomes suppressed, and the ongoing alternation dynamics are attributed to
adaptation occurring within the currently dominant neurons and thus shifting the balance of inhibition.
A key prediction of this model is that suppression strength should weaken during a rivalry phase as
adaptation increases. This was confirmed in a recent study that had observers detect randomly timed
probe stimuli at random contrasts over many hundreds of trials to build up a picture of contrast sensitivity
over a rivalry phase (Alais et al 2010a). Data from this method are illustrated here and show contrast
sensitivity declining over a period of dominance, with a corresponding reciprocal rise in sensitivity during
suppression. The two sensitivity curves converge just prior to a change in perceptual dominance.
of these curves confirms the reciprocity of the model, and their convergence late in the period
confirms the role of adaptation in rivalry dynamics.
A study by van Ee (van Ee 2009) explored the role of noise in rivalry dynamics using a computa-
tional model. A comparison was made between adding noise to the adapting representation of the
dominant stimulus or to the cross-inhibited neural activity. The intention was to clarify whether
the mutual inhibition process adapts, as has been suggested (Klink et al. 2010), or whether it is
the response to the dominant stimulus. Results showed that adding noise to the cross-inhibition
process did not produce typical rivalry dynamics, but adding noise to the dominant response
did. They suggest this reflects differing time scales. Cross-inhibition is a fast process (millisecond
scale) and no amount of noise perturbation produces significant variations in dominance dura-
tions (typically lasting a second or so). However, noise added to the adaptation of the dominant
stimulus does produce typical rivalry dynamics, showing that noisy adaptation within a recipro-
cal inhibition framework can account for stochastic rivalry dynamics. This and related work by
others has seen noise and adaptation become key, interacting features in recent rivalry models
(Brascamp et al. 2006; Kim et al. 2006; Moreno-Bote et al. 2007; Kang and Blake 2011; Seely and
Chow 2011; Roumani and Moutoussis 2012).
Another key characteristic of rivalry dynamics is that phase durations are significantly affected
by stimulus contrast (Mueller and Blake 1989; Lankheet 2006). Rivalry alternation rate reliably
increases as the contrast of both stimuli increases, with each stimulus perceived for shorter peri-
ods on average. Within the reciprocal inhibition model, this is attributed to faster adaptation
arising from stronger neural responses to high-contrast stimuli. Interestingly, increasing the con-
trast of only one stimulus will also increase alternation rates but in a curious way: increasing
one image’s contrast can slightly increase its dominance duration, but the main consequence is
786 Alais and Blake
to decrease the dominance duration of the other image (Levelt 1965; Mueller and Blake 1989;
Bossink et al. 1993). This counterintuitive relationship is easily explained within the framework of
reciprocal inhibition where a given stimulus generates not an isolated response but one linked to
the response generated by the other, competing stimulus.
This underscores the distinction between overall rivalry alternation rate and the relative dura-
tions of the dominance and suppression phases making up a rivalry cycle, which is referred to
as ‘predominance’. Rivalry predominance is measured by tracking rivalry alternations and then
calculating the proportion of time each image was visible. Alternation rate relates to the period of
a full rivalry cycle (i.e., dominance plus suppression duration), whereas predominance effectively
measures the duty cycle (the proportion of each phase relative to the cycle period). Both measures
are important, as a change in predominance of one stimulus over the other (e.g., from 50:50 to
70:30) could go unnoticed if only alternation rate were measured. This is an important point for
the following sections where we discuss how perceptual organization, as manifest through a vari-
ety of contextual and top-down effects, influences rivalry dynamics. By way of preview, contextual,
and top-down effects in rivalry generally affect the duration that a given rival target is dominant,
but less often when it is suppressed. This implies that perceptual organization’s influence during
rivalry operates primarily on the rival pattern already selected for conscious awareness.
eye, which would normally trigger a dominance switch, was less likely to cause a switch when
it occurred at the attended location, compared to the three unattended locations. Voluntary
attention can therefore help maintain the ‘selected’ image despite transient exogenous stimuli.
These authors also used a monocular pop-out cue flanking a suppressed image to show that
involuntary attention directed to a suppressed stimulus could cause it to become dominant. In
related work, Paffen and Van der Stigchel (2010) presented rivalry at two locations and added
an exogenous cue around one of them, finding that alternations occurred earlier and more
frequently at the cued location, linking rivalry dynamics to the spatio-temporal properties of
visual attention. In other words, drawing attention to a spatial location increases the rate of
perceptual alternation at that location.
Object-based attention can also bias which image dominates in binocular rivalry. In one study
(Mitchell et al. 2004), observers were first presented with two objects superimposed in transpar-
ency that were binocularly viewed for a brief period before shutter glasses activated and streamed
them separately to the two eyes to trigger rivalry. Just before the rivalry stage, one object was
exogenously cued with a transient movement. This caused the cued object to achieve perceptual
dominance at rivalry onset and showed that an object selection made during normal binocular
viewing is maintained despite a change to rivalrous dichoptic viewing. A subsequent study using
different techniques drew the same conclusions (Chong and Blake 2006). Endogenous cuing, too,
has been shown to produce a similar effect (Chong et al. 2005; Klink et al. 2008), although in
both cases the cue’s influence in determining image dominance is restricted to the early phase of
rivalry, after which normal alternation dynamics are observed. Studies with other kinds of per-
ceptually bistable stimuli show similar modulatory effects of attention (Struber and Stadler 1999;
van Ee 2005) in that attention can bias which percept tends to dominate, although several studies
have found that attentional control over rivalry is generally weaker than control over other forms
of bistability (Meng and Tong 2004; van Ee et al. 2005).
These studies manipulated attention by selecting one of the perceptual alternatives, either
endogenously or exogenously. An alternative approach involves directing attention away from
the rival stimuli towards a peripheral secondary task. Paffen et al. used this method to show that
removing attention from the stimuli causes rival alternations to slow. The slowing effect was
graded, being stronger for a more difficult secondary task (Paffen et al. 2006), with some evi-
dence that alternations cease altogether when attention is completely removed from rival stimuli
(Brascamp and Blake 2012). A similar paradigm was used to show that perceptual alternations
in bistable motion perception are also slowed by a difficult attentional distractor (Pastukhov and
Braun 2007). In a neuroimaging study examining the withdrawal of attention, Lee et al. (2007)
investigated rivalry between large images designed to produce a travelling wave of dominance fol-
lowing a path of ‘good continuation’ along locally similar orientations. With attention directed to
the rival images, the traveling waves of perceptual dominance produced corresponding waves of
activity sweeping across retinotopic areas V1, V2, and V3. However, when attention was diverted
to a letter monitoring task at the center of the display, activity in V2 and V3 no longer indicated a
travelling wave and rivalry-related activity was restricted to V1.
seemingly ‘high-level’ influences can govern the occurrence and dynamics of rivalry, as can feed-
back from mid-level vision (Alais and Blake 1998; Watson et al. 2004; Pearson and Clifford 2005;
van Boxtel et al. 2008). Top-down approaches to rivalry, in focusing on interpretation of ambigu-
ous retinal input, broaden the scope of potential influences on rivalry. We will focus here on
results implicating high-level influences operating during rivalry, for those results bear on the
role of perceptual organization in governing rivalry dynamics. We start by summarizing findings
from a growing list of studies showing that the meaning or emotional content of rivalry stimuli
can influence rivalry dynamics.
The question of cognitive and motivational influences on rivalry goes back to the middle of
the previous century (reviewed by Walker 1978). In early studies, rival stimuli with conflicting
emotional or symbolic content were presented to different groups and predominance was meas-
ured. When Jewish and Catholic observers viewed the star of David versus a Christian cross,
Jewish observers tended to see the star more than the cross, and vice versa for Catholic observers
(Losciuto and Hartley 1963). In a similar vein, figures a person had seen before tended to pre-
dominate in rivalry over figures never seen before (Goryo 1969). These results were interpreted
to mean that non-visual factors such as affective content and familiarity influence the resolution
of stimulus conflict during binocular rivalry (Walker 1978). Recently, interest in this question has
returned with several new papers addressing this topic (reviewed by (Blake 2013)). For exam-
ple, studies report that emotionally arousing pictures—whether positive or negative—produce
longer dominance durations than non-arousing pictures, even when both images have compara-
ble low-level image properties (Sheth and Pham 2008). Dominance durations are also longer for
emotional faces rivalling against neutral faces. An emotional face is also more likely to dominate
first at rivalry onset (Alpers and Pauli 2006). More remarkably, neutral looking faces dominate
significantly longer if they have previously been associated with negative social behaviors through
conditioning (‘threw a chair at a classmate’), relative to faces associated with positive or neutral
behaviors (Anderson et al. 2011). Even the simple act of imagining a given stimulus can sub-
sequently boost its dominance in rivalry, implying a boost in stimulus strength from the act of
imagining (Pearson et al. 2008).
Top-down influences such as these are not too surprising given our knowledge that attention
can modulate rivalry durations (Lack 1978; Paffen et al. 2006): familiar, imagined, or emotion-
ally charged stimuli may command greater attention and, hence, receive a boost in rivalry.
Accordingly, enhanced rivalry predominance could arise from lengthened dominance dura-
tions, for it is presumably the dominant stimulus that receives attention during rivalry. Is that
the sole basis of context’s modulation of rivalry? To answer this, we turn to recent work using a
new procedure that isolates context’s influence on suppression durations. These new studies all
employ continuous flash suppression (CFS: Figure 38.6), a robust form of binocular rivalry pro-
duced when one eye views a rapidly changing array of densely overlaid, high-contrast shapes
(the CFS inducer) and the other eye views a more conventional, static rival image (Tsuchiya and
Koch 2005). Because of the broadband spatio-temporal energy spectrum of the CFS inducer
(Yang and Blake 2012), it is always the initially dominant stimulus at rivalry onset, and it
remains dominant for an unusually long duration compared to rivalry produced by conven-
tional rival stimuli.
Exploiting the robustness of CFS, recent studies have used a variant whereby the CFS inducer is
initially presented to one eye and a probe stimulus is presented to the other eye shortly after. The
predominance of CFS at onset prevents observers from seeing the probe at first, but probe con-
trast is steadily increased until eventually the observer can indicate in which of four display quad-
rants the probe appeared. In some cases, contrast of the CFS inducer is also gradually decreased,
Binocular Rivalry and Perceptual Ambiguity 789
Stimulus Percept
Left eye Right eye
Time
Fig. 38.6 An illustration of the flash suppression paradigm, a new method of producing interocular
suppression. A sequence of independent Mondrian-like arrays is presented at a rate of ~10 Hz to
one eye and causes the image in the other eye to be very deeply suppressed and for far longer
periods (several 10s of seconds) than is typical of binocular rivalry. Because the dynamic inducing
pattern has a broad and dense spatio-temporal energy spectrum it will always be dominant over the
static image at onset.
to ensure the probe will eventually be perceived. The dependent measure is the duration of sup-
pression, the period from probe onset until successful reporting of the probe’s location. Using
this approach, several recent studies have asked what stimulus properties empower an initially
suppressed probe to overcome the potent suppression from the CFS inducer. Whatever those
properties turn out to be, they cannot be due to a boost from attention because the identity and
location of the suppressed probe remains unknown to the observer until it emerges from suppres-
sion. Some examples of findings from these studies are:
• Upright faces emerge from suppression more quickly than inverted faces, as do words printed
in familiar script that can be read by an observer compared to words in unfamiliar script (Jiang
et al. 2007).
• Angry faces escape suppression faster than neutral or happy faces (Yang et al. 2007; Tsuchiya
et al. 2009).
• Faces implying direct eye contact break suppression faster than the same faces with gaze slightly
diverted (Stein et al. 2011).
• Scenes containing an object (e.g., a watermelon) in a bizarre context (a basketball game) are
freed from suppression faster than the same scenes with a contextually appropriate object (e.g.,
a basketball) (Mudrik et al. 2011).
Based on this kind of speeded emergence from suppression, most (but not all) of these stud-
ies conclude that meaning, affective connotation and contextual relevance of suppressed stimuli
are still registered, despite being completely absent from visual awareness. At first glance, these
kinds of findings seem to rule out attention as the modulating factor in enhanced predominance
of certain stimuli engaged in rivalry. However, there are some reasons to take that conclusion
with a grain of salt. Two papers that used CFS together with emotional faces adopted a more
cautious tone by pointing to actual feature differences between faces that break suppression early
and those that do not (Yang et al. 2007; Gray et al. 2013). Also, the investigators that documented
gaze direction’s effect on dominance (Stein et al. 2011a) expressed in another paper doubt about
the adequacy of control measures typically employed to rule out alternative explanations (Stein
et al. 2011b).
790 Alais and Blake
promote that selection via feedback to early visual areas (Leopold and Logothetis 1999). Further
evidence for this view comes from studies showing frontal (Sterzer and Kleinschmidt 2007) and
parietal (Britz et al. 2011) activity preceding occipital activity associated with perceptual alterna-
tions, although these studies used ambiguous motion and Necker cubes—stimuli that are clearly
bistable but lack the interocular conflict that triggers rivalry. One study that did use binocular
rivalry confirmed fronto-parietal activation associated with perceptual alternations but a phase
analysis indicated the activity resulted from occipital sources (Kamphuisen et al. 2008). This study,
together with a subsequent one (Knapen et al. 2011), implies that fronto-parietal activations may
be a result of experiencing rivalry alternations rather than a cause of those activations.
A recent TMS study implicated parietal cortex in mediating perceptual alternations (Carmel
et al. 2010), finding that TMS applied over right superior parietal cortex (SPL) shortened rivalry
dominance durations. Later, Kanai, Carmel, Bahrami and Rees (Kanai et al. 2011) reported that
disrupting right anterior SPL shortened dominance durations, while disrupting right posterior
SPL increased dominance durations. Contrasting results, however, were found in a similar study
that used TMS over anterior SPL and reported increased rivalry durations (Zaretskaya et al. 2010).
The reason for this discrepancy is not clear and more research will be needed to resolve it but it
suffices to implicate parietal cortex in binocular rivalry dynamics.
A Bayesian View
As evidence has emerged for top-down and contextual processing in binocular rivalry, so have
new theoretical models of rivalry that formalize the interpretative aspect of perception and its
response to ambiguous input (e.g., Sterzer et al. 2009), including models based on a Bayesian
probabilistic framework (Dayan 1998; Hohwy et al. 2008; Sundareswara and Schrater 2008). On
the Bayesian view (see Feldman, this volume, for a full analysis of Bayesian models of perceptual
organization), the existence of incompatible monocular images precludes a single interpretation
of the visual environment. That is, there is a very low prior probability that both images could be
true simultaneously (two different objects in the same visual location logically is not possible). If
the likelihoods of each image being true are roughly equal, the model is faced with two equivalent
solutions and perception alternates between the two competing percepts. On this view, binocular
rivalry is a consequence of the conflicting interpretations of the left—and right-eye images, rather
than of inhibitory connections between early feature-tuned neurons (Dayan 1998). This kind of
model can accommodate a good deal of the traditional low-level psychophysical data about bin-
ocular rivalry (reviewed in Hohwy et al. 2008). It is also well suited to describing how multisen-
sory interactions help resolve visual ambiguity. Where one visual image is correlated with signals
in another modality, that visual image will have a higher likelihood than the other and will receive
a higher weighting in alternation dynamics and therefore tend to dominate rivalry perception.
Through learning and experience, too, certain auditory, visual and tactile combinations will have
high prior probabilities and be favored when the visual stimuli alone may be ambiguous.
Conclusion
We began the chapter by mentioning a school of thought that sees perception as a process of infer-
ence and interpretation, a tradition that stretches back to Helmholtz in the late nineteenth century.
Although binocular rivalry has been an active field since those times, most rivalry research con-
ducted since Levelt’s seminal work in the 1960s has focused on basic stimulus features and early
cortical processing. Although low-level factors are undoubtedly important in binocular rivalry,
the chapter’s second half focused on more recent work showing the significance of top-down
792 Alais and Blake
References
Alais, D. and Blake, R. (1998). ‘Interactions between global motion and local binocular rivalry’. Vision Res
38(5): 637–44.
Alais, D. and Blake, R. (1999). ‘Grouping visual features during binocular rivalry’. Vision Res
39(26): 4341–53.
Alais, D. and Blake, R. (2005). Binocular rivalry. (Cambridge: MIT Press).
Alais, D. and Melcher, D. (2007). ‘Strength and coherence of binocular rivalry depends on shared stimulus
complexity’. Vision Res 47(2): 269–79.
Alais, D., R. P. O’Shea, et al. (2000). ‘On binocular alternation’. Perception 29(12): 1437–45.
Alais, D., J. Lorenceau, et al. (2006). ‘Contour interactions between pairs of Gabors engaged in binocular
rivalry reveal a map of the association field’. Vision Res 46(8–9): 1473–87.
Alais, D., Cass, J. et al. (2010a). ‘Visual sensitivity underlying changes in visual consciousness’. Current
Biology 20: 1362–7.
Alais, D., Newell, F. N. et al. (2010b). ‘Multisensory processing in review: from physiology to behaviour’.
Seeing Perceiving 23(1): 3–38.
Alais, D., van Boxtel, J. J. et al. (2010c). ‘Attending to auditory signals slows visual alternations in binocular
rivalry’. Vision Res 50(10): 929–35.
Binocular Rivalry and Perceptual Ambiguity 793
Alexander, L. T. (1951). ‘The influence of figure-ground relationships in binocular rivalry’. J Exp Psychol
41(5): 376–81.
Alpers, G. W. and Pauli, P. (2006). ‘Emotional pictures predominate in binocular rivalry’. Cognition and
emotion 20: 596–607.
Anderson, E., Siegel, E. H. et al. (2011). ‘The visual impact of gossip’. Science 332(6036): 1446–8.
Angelucci, A., Levitt, J. B. et al. (2002). ‘Circuits for local and global signal integration in primary visual
cortex’. J Neurosci 22(19): 8633–46.
Baker, D. H. and Graf, E. W. (2009). ‘Natural images dominate in binocular rivalry’. Proc Natl Acad Sci USA
106(13): 5436–41.
Bisley, J. W. (2011). ‘The neural basis of visual attention’. J Physiol 589(Pt 1): 49–57.
Blake, R. (1989). ‘A neural theory of binocular rivalry’. Psychol Rev 96(1): 145–67.
Blake, R. (2001). ‘A Primer on Binocular Rivalry, Including Current Controversies’. Brain and Mind 2: 5–38.
Blake, R. (2013). Binocular rivalry updated. In The New Visual Neurosciences, edited by J. S. Werner and L.
M. Chalupa. (Cambridge, MA: MIT Press).
Blake, R. and Logothetis, N. K. (2002). ‘Visual competition’. Nat Rev Neurosci 3(1): 13–21.
Blake, R., Sobel, K. V. et al. (2004). ‘Neural synergy between kinetic vision and touch.’ Psychol Sci
15(6): 397–402.
Blakemore, C. and Tobin, E. A. (1972). ‘Lateral inhibition between orientation detectors in the cat’s
visual cortex’. Experimental brain research Experimentelle Hirnforschung Expérimentation cérébrale
15(4): 439–40.
Bonneh, Y. and Sagi, D. (1999). ‘Configuration saliency revealed in short duration binocular rivalry’. Vision
Res 39(2): 271–81.
Bossink, C. J., Stalmeier, P. F. et al. (1993). ‘A test of Levelt’s second proposition for binocular rivalry’. Vision
Res 33(10): 1413–19.
Brancucci, A. and Tommasi, L. (2011). ‘ “Binaural rivalry”: Dichotic listening as a tool for the investigation
of the neural correlate of consciousness’. Brain Cogn 76(2): 7.
Brascamp, J. W., van Ee, R. et al. (2005). ‘Distributions of alternation rates in various forms of bistable
perception’. Journal of Vision 5(4): 287–98.
Brascamp, J. W., van Ee, R. et al. (2006). ‘The time course of binocular rivalry reveals a fundamental role of
noise’. Journal of Vision 6(11): 1244–56.
Brascamp, J. W., and Blake, R. (2012) ‘Inattention abolishes binocular rivalry: perceptual evidence’.
Psychological Science 23: 1159–67.
Britz, J., Pitts, M. A. et al. (2011). ‘Right parietal brain activity precedes perceptual alternation during
binocular rivalry’. Hum Brain Mapp 32(9): 1432–42.
Carmel, D., Walsh, V. et al. (2010). ‘Right parietal TMS shortens dominance durations in binocular rivalry’.
Curr Biol 20(18): R799–800.
Carter, O. L., Konkle, T. et al. (2008). ‘Tactile rivalry demonstrated with an ambiguous apparent-motion
quartet’. Curr Biol 18(14): 1050–4.
Carter, O. L. and Pettigrew, J. D. (2003). ‘A common oscillator for perceptual rivalries?’ Perception
32(3): 295–305.
Chen, Y. C., Yeh, S. L. et al. (2011). ‘Crossmodal constraints on human perceptual awareness: auditory
semantic modulation of binocular rivalry’. Front Psychol 2: 212.
Chong, S. C. and Blake, R. (2006). ‘Exogenous attention and endogenous attention influence initial
dominance in binocular rivalry’. Vision Res 46(11): 1794–803.
Chong, S. C., D. Tadin, et al. (2005). ‘Endogenous attention prolongs dominance durations in binocular
rivalry’. Journal of Vision 5(11): 1004–12.
Conrad, V., Bartels, A. et al. (2010). ‘Audiovisual interactions in binocular rivalry’. Journal of Vision
10(10): 27.
794 Alais and Blake
Cosmelli, D., David, O. et al. (2004). ‘Waves of consciousness: ongoing cortical patterns during binocular
rivalry’. Neuroimage 23(1): 128–40.
Dayan, P. (1998). ‘A hierarchical model of binocular rivalry’. Neural Comput 10(5): 1119–35.
Denham, S. L., & Winkler, I. (2014). ‘Auditory perceptual organization’. In J. Wagemans (Ed.), Oxford
Handbook of Perceptual Organization (in press). Oxford, U.K.: Oxford University Press.
Desimone, R. and Duncan, J. (1995). ‘Neural mechanisms of selective visual attention’. Annu Rev Neurosci
18: 193–222.
Dorrenhaus, W. (1975). ‘Pattern specific visual competition’. Naturwissenschaften 62(12): 578–9.
Ernst, M. O. and Bulthoff, H. H. (2004). ‘Merging the senses into a robust percept’. Trends Cogn Sci
8(4): 162–9.
Feldman, J. (2014). ‘Bayesian models of perceptual organization’. In J. Wagemans (Ed.), Oxford Handbook of
Perceptual Organization (in press). Oxford, U.K.: Oxford University Press.
Field, D. J., Hayes, A. et al. (1993). ‘Contour integration by the human visual system: evidence for a local
“association field”.’. Vision Res 33(2): 173–93.
Fitzpatrick, D. (2000). ‘Seeing beyond the receptive field in primary visual cortex’. Curr Opin Neurobiol
10(4): 438–43.
Fox, R. and Check, R. (1968). ‘Detection of motion during binocular rivalry suppression’. J Exp Psychol
78(3): 388–95.
Fox, R. and Herrmann, J. (1967). ‘Stochastic properties of binocular rivalry alternations’. Perception &
Psychophysics 2: 432–6.
Freeman, A. W. (2005). ‘Multistage model for binocular rivalry’. J Neurophysiol 94(6): 4412–20.
Goryo, K. (1969). ‘The effect of past experience on binocular rivalry’. Japanese Psychological Research 11: 46–53.
Gray, K. L., Adams, W. J. et al. (2013). ‘Faces and awareness: Low-level, not emotional factors, determine
perceptual dominance’. Emotion, 13(3): 537–44, doi: 10.1037/a0031403.
Hess, R. F., May, K. A., & Dumoulin, S. O. (2014). ‘Contour integration: Psychophysical,
neurophysiological and computational perspectives’. In J. Wagemans (Ed.), Oxford Handbook of
Perceptual Organization (in press). Oxford, U.K.: Oxford University Press.
Hohwy, J., Roepstorff, A, et al. (2008). ‘Predictive coding explains binocular rivalry: an epistemological
review’. Cognition 108(3): 687–701.
Hupe, J. M. and Rubin, N., et al. (2003). ‘The dynamics of bi-stable alternation in ambiguous motion
displays: a fresh look at plaids’. Vision Res 43(5): 531–48.
Jiang, Y., Costello, P. et al. (2007). ‘Processing of invisible stimuli: advantage of upright faces and
recognizable words in overcoming interocular suppression’. Psychol Sci 18(4): 349–55.
Lunghi, C., Morrone, M. C. et al. (2014). ‘Auditory and tactile signals combine to influence vision during
binocular rivalry’. J. Neurosci, 34(3): 784–92.
Kamphuisen, A., Bauer, M. et al. (2008). ‘No evidence for widespread synchronized networks in binocular
rivalry: MEG frequency tagging entrains primarily early visual cortex’. Journal of Vision 8(5): 4 1–8.
Kanai, R., Carmel, D. et al. (2011). ‘Structural and functional fractionation of right superior parietal cortex
in bistable perception’. Curr Biol 21(3): R106–7.
Kang, M. and Blake, R. (2005). ‘Perceptual synergy between seeing and hearing revealed during binocular
rivalry’. Psichologija 32: 7–15.
Kang, M. S. and Blake, R. (2011). ‘An integrated framework of spatiotemporal dynamics of binocular
rivalry’. Front Hum Neurosci 5: 88.
Kang, M.-S., Lee, S.-H. et al. (2010). ‘Modulation of spatiotemporal dynamics of binocular rivalry by
collinear facilitation and pattern-dependent adaptation’. Journal of Vision 10(11): 3–3.
Kapadia, M. K., Ito, M. et al. (1995). ‘Improvement in visual sensitivity by changes in local context: parallel
studies in human observers and in V1 of alert monkeys’. Neuron 15(4): 843–56.
Binocular Rivalry and Perceptual Ambiguity 795
Kappers, A. M. L., & Bergmann Tiest, W. M. (2014). ‘Tactile and haptic perceptual organization’. In
J. Wagemans (Ed.), Oxford Handbook of Perceptual Organization (in press). Oxford, U.K.: Oxford
University Press.
Kastner, S. and Ungerleider, L. G. (2000). ‘Mechanisms of visual attention in the human cortex’. Annu Rev
Neurosci 23: 315–41.
Kaufman, L. (1963). ‘On the Spread of Suppression and Binocular Rivalry’. Vision Res 61: 401–15.
Kim, C. Y. and R. Blake (2007). ‘Illusory colors promote interocular grouping during binocular rivalry’.
Psychon Bull Rev 14(2): 356–62.
Kim, Y. J., Grabowecky, M. et al. (2006). ‘Stochastic resonance in binocular rivalry’. Vision Res 46(3): 392–406.
Klink, P. C., van Ee, R. et al. (2008). ‘Early interactions between neuronal adaptation and voluntary control
determine perceptual choices in bistable vision’. Journal of Vision 8(5): 16 11–18.
Klink, P. C., Brascamp, J. W. et al. (2010). ‘Experience-driven plasticity in binocular vision’. Current Biology
20(16): 1464–9.
Knapen, T., Brascamp, J. et al. (2011). The role of frontal and parietal brain areas in bistable perception.
J Neurosci 31: 10293–301.
Koffka, K. (1935). Principles of Gestalt Psychology. (New York: Harcourt Brace).
Kogo, N., & van Ee, R. (2014). ‘Neural mechanisms of figure-ground organization: Border-ownership,
competition and perceptual switching’. In J. Wagemans (Ed.), Oxford Handbook of Perceptual
Organization (in press). Oxford, U.K.: Oxford University Press.
Kovacs, I., Papathomas, T. V. et al. (1996). ‘When the brain changes its mind: interocular grouping during
binocular rivalry’. Proc Natl Acad Sci USA 93(26): 15508–11.
Lack, L. C. (1978). Selective attention and the control of binocular rivalry. (The Hague: The Netherlands,
Mouton).
Laing, C. R. and Chow, C. C. (2002). ‘A spiking neuron model for binocular rivalry’. J Comput Neurosci
12(1): 39–53.
Lankheet, M. J. (2006). ‘Unraveling adaptation and mutual inhibition in perceptual rivalry’. Journal of
Vision 6(4): 304–10.
Lee, S. H. and Blake, R. (2004). ‘A fresh look at interocular grouping during binocular rivalry’. Vision Res
44(10): 983–91.
Lee, S.-H., Blake, R. et al. (2005). ‘Traveling waves of activity in primary visual cortex during binocular
rivalry’. Nat Neurosci 8(1): 22–3.
Lee, S. H., Blake, R. et al. (2007). ‘Hierarchy of cortical responses underlying binocular rivalry’. Nat
Neurosci 10(8): 1048–54.
Lehky, S. R. (1988). ‘An astable multivibrator model of binocular rivalry’. Perception 17(2): 215–28.
Leopold, D. A. and Logothetis, N. K. (1999). ‘Multistable phenomena: changing views in perception’.
Trends in Cognitive Sciences 3(7): 254–64.
Levelt, W. (1965). On Binocular Rivalry. (Soesterberg, The Netherlands: Institute for Perception).
Long, G. M. and Toppino, T. C. (2004). ‘Enduring interest in perceptual ambiguity: alternating views of
reversible figures’. Psychol Bull 130(5): 748–68.
Losciuto, L. A. and Hartley, E. L. (1963). ‘Religious Affiliation and Open-Mindedness in Binocular
Resolution’. Percept Mot Skills 17: 427–30.
Lumer, E. D., Friston, K. J. et al. (1998). ‘Neural correlates of perceptual rivalry in the human brain’. Science
280(5371): 1930–4.
Lumer, E. D. and Rees, G. (1999). ‘Covariation of activity in visual and prefrontal cortex associated with
subjective visual perception’. Proc Natl Acad Sci USA 96(4): 1669–73.
Lunghi, C. and Alais, D. (2013). ‘Touch Interacts with Vision during Binocular Rivalry with a Tight
Orientation Tuning’. PLoS ONE 8(3): e58754.
796 Alais and Blake
Lunghi, C., Binda, P. et al. (2010). ‘Touch disambiguates rivalrous perception at early stages of visual
analysis’. Current Biology 20(4): R143-R144.
Lunghi, C., Morrone, M. C. et al. (2014). ‘Auditory and tactile signals combine to influence vision during
binocular rivalry’. J Neurosci 34(3): 784–792.
Maruya, K., Yang, E. et al. (2007). ‘Voluntary action influences visual competition’. Psychol Sci
18(12): 1090–98.
Meng, M. and Tong, F. (2004). ‘Can attention selectively bias bistable perception? Differences between
binocular rivalry and ambiguous figures’. Journal of Vision 4(7): 539–51.
Miller, S. M., Liu, G. B. et al. (2000). ‘Interhemispheric switching mediates perceptual rivalry’. Curr Biol
10(7): 383–92.
Mitchell, J. F., Stoner, G. R. et al. (2004). ‘Object-based attention determines dominance in binocular
rivalry’. Nature 429(6990): 410–13.
Moreno-Bote, R., Rinzel, J. et al. (2007). ‘Noise-induced alternations in an attractor network model of
perceptual bistability’. J Neurophysiol 98(3): 1125–39.
Mudrik, L., Deouell, L. Y. et al. (2011). ‘Scene congruency biases Binocular Rivalry’. Conscious Cogn
20(3): 756–67.
Mueller, T. J. (1990). ‘A physiological model of binocular rivalry’. Vis Neurosci 4(1): 63–73.
Mueller, T. J. and Blake, R. (1989). ‘A fresh look at the temporal dynamics of binocular rivalry’. Biol Cybern
61(3): 223–32.
Nguyen, V. A., Freeman, A. W. et al. (2003). ‘Increasing depth of binocular rivalry suppression along two
visual pathways’. Vision Res 43(19): 2003–8.
Norman, H. F., Norman, J. F. et al. (2000). ‘The temporal course of suppression during binocular rivalry’.
Perception 29(7): 831–41.
Ooi, T. L. and He, Z. J. (1999). ‘Binocular rivalry and visual awareness: The role of attention’. Perception
28: 551–74.
Ooi, T. L. and He, Z. J. (2003). ‘A distributed intercortical processing of binocular rivalry: psychophysical
evidence’. Perception 32(2): 155–66.
Ooi, T. L. and He, Z. J. (2006). ‘Binocular rivalry and surface-boundary processing’. Perception
35(5): 581–603.
O’Shea, R. P. and Corballis, P. M. (2005). ‘Visual grouping on binocular rivalry in a split-brain observer’.
Vision Res 45(2): 247–61.
O’Shea, R. P., Sims, A. J. et al. (1997). ‘The effect of spatial frequency and field size on the spread of
exclusive visibility in binocular rivalry’. Vision Res 37(2): 175–83.
O’Shea, R. P., Parker, A. et al. (2009). ‘Monocular rivalry exhibits three hallmarks of binocular
rivalry: evidence for common processes’. Vision Res 49(7): 671–81.
Ozkan, K. and Braunstein, M. L. (2009). ‘Predominance of ground over ceiling surfaces in binocular
rivalry’. Atten Percept Psychophys 71(6): 1305–12.
Paffen, C. L. E., te Pas, S. F. et al. (2004). ‘Center-surround interactions in visual motion processing during
binocular rivalry’. Vision Research 44: 1635–9.
Paffen, C. L. E. and S. Van der Stigchel (2010). ‘Shifting spatial attention makes you flip: Exogenous
visual attention triggers perceptual alternations during binocular rivalry’. Attention, Perception, &
Psychophysics 72(5): 1237–43.
Paffen, C. L. E., Alais, D. et al. (2005). ‘Center-surround inhibition deepens binocular rivalry suppression’.
Vision Res 45(20): 2642–9.
Paffen, C. L. E., Alais, D. et al. (2006). ‘Attention speeds binocular rivalry’. Psychological Science 17(9): 752–6.
Pastukhov, A. and J. Braun (2007). ‘Perceptual reversals need no prompting by attention’. Journal of Vision
7(10): 5 1–17.
Binocular Rivalry and Perceptual Ambiguity 797
Pearson, J. and Clifford, C. W. G. (2005). ‘When your brain decides what you see: grouping across
monocular, binocular, and stimulus rivalry’. Psychological science: a journal of the American Psychological
Society/APS 16(7): 516–19.
Pearson, J., Clifford, C. W. et al. (2008). ‘The functional impact of mental imagery on conscious perception’.
Curr Biol 18(13): 982–6.
Pressnitzer, D. and Hupe, J. M. (2006). ‘Temporal dynamics of auditory and visual bistability reveal
common principles of perceptual organization’. Current Biology 16(13): 1351–7.
Roumani, D. and K. Moutoussis (2012). ‘Binocular rivalry alternations and their relation to visual
adaptation’. Front Hum Neurosci 6: 35.
Seely, J. and Chow, C. C. (2011). ‘Role of mutual inhibition in binocular rivalry’. J Neurophysiol
106(5): 2136–50.
Sekuler, R., Sekuler, A. B. et al. (1997). ‘Sound alters visual motion perception’. Nature 385(6614): 308.
Sheth, B. R. and Pham, T. (2008). ‘How emotional arousal and valence influence access to awareness’. Vision
Res 48(23–24): 2415–24.
Shpiro, A., Moreno-Bote, R. et al. (2009). ‘Balance between noise and adaptation in competition models of
perceptual bistability’. J Comput Neurosci 27(1): 37–54.
Spence, C. (2011). ‘Crossmodal correspondences: a tutorial review’. Atten Percept Psychophys 73(4): 971–95.
Spence, C. (2014). ‘Cross-modal perceptual organization’. In J. Wagemans (Ed.), Oxford Handbook of
Perceptual Organization (in press). Oxford, U.K.: Oxford University Press.
Stein, T., Senju, A. et al. (2011a). ‘Eye contact facilitates awareness of faces during interocular suppression’.
Cognition 119(2): 307–11.
Stein, T., Hebart, M. N. et al. (2011b). ‘Breaking Continuous Flash Suppression: A New Measure of
Unconscious Processing during Interocular Suppression?’ Front Hum Neurosci 5: 167.
Sterzer, P. and Kleinschmidt, A. (2007). ‘A neural basis for inference in perceptual ambiguity’. Proc Natl
Acad Sci USA 104(1): 323–8.
Sterzer, P., Kleinschmidt, A. et al. (2009). ‘The neural bases of multistable perception’. Trends Cogn Sci
13(7): 310–18.
Sterzer, P. and Rees, G. (2008). ‘A neural basis for percept stabilization in binocular rivalry’. J Cogn Neurosci
20(3): 389–99.
Struber, D. and Stadler, M. (1999). ‘Differences in top-down influences on the reversal rate of different
categories of reversible figures’. Perception 28(10): 1185–96.
Sundareswara, R. and Schrater, P. R. (2008). ‘Perceptual multistability predicted by search model for
Bayesian decisions’. Journal of Vision 8(5): 12 11–19.
Tong, F. (2001). ‘Competing Theories of Binocular Rivalry: A Possible Resolution’. Brain and Mind 2: 55–83.
Tsuchiya, N. and Koch, C. (2005). ‘Continuous flash suppression reduces negative afterimages’. Nat Neurosci
8(8): 1096–101.
Tsuchiya, N., Moradi, F. et al. (2009). ‘Intact rapid detection of fearful faces in the absence of the amygdala’.
Nat Neurosci 12(10): 1224–5.
van Boxtel, J. J. A., Alais, D. et al. (2008). ‘Retinotopic and non-retinotopic stimulus encoding in binocular
rivalry and the involvement of feedback’. Journal of Vision 8(5): 1–10.
van Ee, R. (2005). ‘Dynamics of perceptual bi-stability for stereoscopic slant rivalry and a comparison with
grating, house-face, and Necker cube rivalry’. Vision Res 45(1): 29–40.
van Ee, R. (2009). ‘Stochastic variations in sensory awareness are driven by noisy neuronal
adaptation: evidence from serial correlations in perceptual bistability’. J Opt Soc Am A Opt Image Sci Vis
26(12): 2612–22.
van Ee, R., Adams, W. J. et al. (2003). ‘Bayesian modeling of cue interaction: bistability in stereoscopic slant
perception’. J Opt Soc Am A Opt Image Sci Vis 20: 1398–406.
798 Alais and Blake
van Ee, R., van Dam, L. C. et al. (2005). ‘Voluntary control and the dynamics of perceptual bi-stability’.
Vision Res 45(1): 41–55.
van Ee, R., van Boxtel, J. J. et al. (2009). ‘Multisensory congruency as a mechanism for attentional control
over perceptual selection’. J Neurosci 29(37): 11641–9.
van Lier, R. and De Weert, C. M. M. (2003). ‘Intra- and interocular colour-specific activation during
dichoptic suppression’. Vision Res 43(10): 1111–6.
Vanrie, J., Dekeyser, M. et al. (2004). ‘Bistability and biasing effects in the perception of ambiguous
point-light walkers’. Perception 33: 547–60.
von Helmholtz, H. (1925). Treatise on physiological optics. (New York: Dover).
Walker, P. (1978). ‘Binocular rivalry: central or peripheral selective processes?’. Psychological Bulletin
85: 376–89.
Watson, T., Pearson, J. et al. (2004). ‘Perceptual grouping of biological motion promotes binocular rivalry’.
Current Biology 14(18): 1670–4.
Wertheimer, M. (1923). ‘Untersuchungen zur Lehre von der Gestalt, II’. Psychologische Forschung 4: 301–50.
Wheatstone, C. (1838). ‘Contributions of the Physiology of vision. Part the first. On some remarkable, and
hitherto unobserved, phenomena of binocular vision’. Philosophical Transactions of the Royal Society of
London 128: 371–94.
Whittle, P., Bloor, D. C. et al. (1968). ‘Some experiments on figural effects in binocular rivalry’. Perception
& Psychophysics 4: 183–8.
Wilson, H. R., Blake, R. et al. (2001). ‘Dynamics of travelling waves in visual perception’. Nature
412(6850): 907–10.
Yang, E. and R. Blake (2012). ‘Deconstructing continuous flash suppression’. Journal of Vision 12(3): 8.
Yang, E., Zald, D. H. et al. (2007). ‘Fearful expressions gain preferential access to awareness during
continuous flash suppression’. Emotion 7(4): 882–6.
Zaretskaya, N., Thielscher, A. et al. (2010). ‘Disrupting parietal function prolongs dominance durations in
binocular rivalry’. Curr Biol 20(23): 2106–11.
Zhou, W., Jiang, Y. et al. (2010). ‘Olfaction Modulates Visual Perception in Binocular Rivalry’. Curr Biol
20: 1356–58.
Chapter 39
Perceptual organization
and consciousness
D. Samuel Schwarzkopf and Geraint Rees
Introduction
All of our lives revolve around our conscious experience of the world we inhabit. In spite of that,
the questions of why we have consciousness in the first place and how much it influences our
perception and action remains largely unanswered. Is consciousness just an epiphenomenon, a
genetic quirk that arose in the course of evolution as a consequence of other processes in the
human brain, or does it have a teleological purpose? For vision, this interpretation depends not
only on the object or feature that is the current focus of attention, but also on the perceptual
context in which it is embedded. Yet surprisingly, little is currently understood about how per-
ceptual organization affects our consciousness, whether conscious awareness of sensory stimuli
is a prerequisite for interpreting them as coherent objects and scenes, and the underlying neural
processes in the human brain.
This chapter will review the state of research on how consciousness is entwined with the per-
ceptual organization of sensory input. The first section, ‘Access to Consciousness’, describes the
categorical nature of how our conscious perception is typically viewed and how this can be used to
make inferences about the neural correlates of consciousness. The following section, ‘Unconscious
Perceptual Organization’, goes into more depth on the interaction between awareness of a stimu-
lus and the brain’s interpretation of it. This also includes a discussion of studies trying to address
the question as to whether there is any information that requires conscious awareness of the stim-
ulus to be processed. The final section, ‘Phenomenological Contents of Consciousness’, describes
research going beyond the purely categorical aspects of our awareness but instead concentrating
on the mechanisms determining a person’s percept of the environment.
Access to Consciousness
We are all familiar with the ways in which our awareness and our perception interact. At any
point in time, our sense organs are bombarded by an overwhelming amount of input; however,
we are usually not aware of this information overload. Rather, we usually feel that we are only
really conscious of a particular part or aspect of the environment. Moreover, some aspects of our
sensorium are usually or almost always outside our awareness (James 1890). For example, we are
generally unaware of our heartbeat or of the workings of our internal organs even though there
are afferent nerves continuously transmitting signals to our brain. Only when something requires
our attention, for example when we are hungry or sick, do we usually feel anything about our bod-
ies, and even then it is merely a vague feeling, not a thorough awareness of all our affected bodily
800 Schwarzkopf and Rees
functions. Thus, the focus of awareness constantly fluctuates, partly under our own volition and
partly for reasons that are mostly outside our control.
Studies investigating the neural events that determine whether a sensation reaches conscious-
ness and what kind of perceptual processing occurs unconsciously can take several forms. One
obvious approach is to manipulate directly whether the observer is aware of the sensory stimulus.
In the visual domain this is typically done through masking procedures of which there are numer-
ous variations. It is possible to mask a stimulus from being consciously perceived by presenting a
masking stimulus either directly after or before the onset of the stimulus. Among such methods,
meta-contrast masking (Breitmeyer and Ogmen 2000) employs masking stimuli with contours of
opposite contrast polarity to the stimulus of interest being presented subsequent to that stimu-
lus. This method can render even bright stimuli invisible to the observer. An extension to this
method employs a mask presented for a longer period before the stimulus of interest. Repeating
this several times results in a ‘standing wave of invisibility’ that can render a stimulus invisible for
prolonged periods (Macknik and Livingstone 1998). This methodology can show that informa-
tion about stimulus orientation is present in primary visual cortex (V1) even when the orienta-
tion does not reach awareness (Haynes and Rees 2005), consistent with behavioural experiments
showing that grating stimuli rendered invisible through various forms of masking can produce
contextual interactions or adaptation effects on contrast or orientation perception (Clifford and
Harris 2005; Falconbridge, Ware, and MacLeod 2010; Motoyoshi and Hayakawa 2010).
While such methods can be very effective in removing a stimulus from conscious access and
typically allow excellent experimental control over awareness, they share the caveat that they are
based on substantial perturbations of the stimulus and that it therefore becomes difficult to distin-
guish the effect of changes in the stimulus parameters from changes in consciousness. It is unsur-
prising that a stimulus presented in close temporal proximity to another stimulus will interfere
with the neuronal response to that stimulus (Macknik and Livingstone 1998). Nevertheless, this
approach can provide important insights into what distinguishes conscious and unconscious pro-
cessing as long as this stimulus confound is taken into account. In essence, if a stimulus can exert
unconscious effects when rendered invisible through masking (or any other stimulus manipula-
tion), this is sufficient evidence that it is processed even in the absence of awareness. However,
when no unconscious effects are observed, the interpretation is more complicated. The only direct
conclusion that can be made in this situation is that the processing of a stimulus is disrupted by
this stimulus manipulation. Further inference on the role of conscious awareness can only be
made through convergent evidence combining other masking procedures or different manipula-
tions of awareness.
Another popular approach to studying unconscious processing is therefore directly to exploit
the fluctuating focus of awareness. To do this, one can use multistable perception. Ambiguous
images, like those shown in Figure 39.1, can be interpreted in more than one way, but only one
interpretation is ever experienced at a time. The dynamics and behavioural studies of ambiguous
images are discussed in detail in the chapter by Alais and Blake (this volume). For example, the
Necker cube (Figure 39.1A) can be perceived such that the upper corner is either facing forward
or facing backward. Sometimes a third state is reported in which the impression of depth is lost
entirely—a two-dimensional collection of parallelograms. Critically, however, it is impossible to
see all of these interpretations simultaneously.
Under ideal situations, comparing the variable percept evoked by such ambiguous images dis-
sociates the contents of awareness (which alternate) from physical stimulation (which remains
unchanging). Naturally, this is based on the assumption that peripheral processes in the individ-
ual perceiving these stimuli are constant between the different perceptual experiences. This may
Perceptual Organization and Consciousness 801
(a) (b)
(c) (d)
Fig. 39.1 Examples of ambiguous stimuli showing both traditional examples (a, b) and stimuli that
become multistable because of changes in how the visual system interprets low-level information (c,
d). (a). The Necker Cube for which perception alternates between which face is interpreted as being
in front. (b). Binocular rivalry. When viewing this stimulus with red-blue anaglyph glasses perception
alternates between the two oblique grating patches (see the chapter by Alais and Blake for an
in-depth discussion and more examples). (c). Even though only the black bars are visible and physically
moving up and down (denoted by red arrows), perception can interpret this stimulus also as a black
diamond shape (implied by the dashed grey lines) viewed behind white, vertical occluding bars.
(Please refer to http://www.pnas.org/content/suppl/2002/10/26/192579399.DC1/5793Movie2Legend.
html for a moving demonstration. (d). Each of the four pairs of discs constantly circles around a hinge
point (denoted by red arrows). We can interpret this locally as four pairs of discs, but perception can
also be dominated by a global interpretation in which there are two groups of four dots arranged in
the squares implied by the dashed lines. Please refer to http://anstislab.ucsd.edu/2012/11/27/local-
and-global-motion-with-juno-kim/ for a moving demonstration and a discussion of the parameters
determining whether the local or global interpretation predominates.
not be the case in all situations. For example, subtle eye movements may change the retinal projec-
tion of the Necker cube and favour one interpretation of the two-dimensional image over another
(Einhäuser, Martin, and König 2004). In this context it is also worth noting that eye movements
do not correspond with perceived depth of a stimulus but reflect low-level attributes of the image
(Wismeijer et al. 2008, 2010). For ambiguous structure-from-motion stimuli that lead to percep-
tion of a three-dimensional shape spinning either clockwise or anti-clockwise, the percept may
depend on whether attention is directed to the dots drifting to the left or to the right. Moreover,
for many ambiguous stimuli one of the interpretations is more dominant. Thus, provided such
802 Schwarzkopf and Rees
peripheral factors are controlled for adequately, this approach permits a stronger inference to
be made about the neural correlates of consciousness than manipulating the stimulus directly.
However, by using multistable stimuli one loses direct experimental control over the observer’s
conscious perceptual experience.
One particular form of bistable perception occurs when two different stimuli are presented
to separate paired sensory organs, so that the brain receives conflicting sensory inputs. This has
been studied most extensively with binocular rivalry, when each eye is presented with a different
image. Rather than seeing an incoherent mixture or blend of the two images, conscious percep-
tion typically alternates between each monocular percept just as with other types of ambiguous
stimuli. A third piecemeal percept, where the perceived image is a mosaic of images seen by the
left and right eyes can also occur. During the switches between alternate interpretations, percep-
tion does not flip instantaneously from one state to another but changes rapidly from an initiating
location across the visual field, akin to a wave travelling across the image. Psychophysical studies
of binocular rivalry and such perceptual waves also receive much greater attention in the chapter
by Alais and Blake (this volume).
Of course, the eyes are not the only sensory organs that come as a pair. Therefore, it is unsur-
prising that there are equivalents of binocular rivalry for other senses. In binaural rivalry, the two
ears hear different sequences of tones. The resulting percept alternates between the specific sensa-
tions rather than evoking a cacophony of mismatching sounds (van Ee et al. 2009; Brancucci and
Tommasi 2011). Even more surprising, in binaral rivalry two different odours are administered
separately to each of the nostrils and again the perceived smell switches back and forth between
the two (Zhou and Chen 2009). Unlike binocular rivalry, which occurs naturally under normal
viewing conditions outside Panum’s fusional area, binaural and binaral rivalry are sensory condi-
tions that must be artificially created in a laboratory. In the normal environment of an organism
it is not probable that each of the nostrils would receive conflicting smells or that completely dif-
ferent sounds would reach each of the ears without any crossover between the two. On the other
hand, in natural vision the images projected onto the two retinas are generally quite distinct and
there are frequent occurrences where two completely different images are seen at least by parts of
each retina: for example, the region blocked by the nose. Moreover, outside Panum’s fusional area
binocular fusion does not occur. Fusing the two retinal images in a meaningful way is the basis
of stereovision and thus important for judging depth and distance. Thus, binocular rivalry is an
extreme situation that reveals a mechanism associated with normal visual processing away from
fixation. Binaural and binaral rivalry, on the other hand, seem to be a purer demonstration of the
processes underlying the wavering focus of awareness. It is therefore of note that in spite of this,
the three forms of bisensory rivalry are phenomenologically very similar.
Perhaps the simplest form of bistable perception occurs when two stimuli are superimposed or
mixed. In the visual domain this is sometimes referred to as monocular rivalry, that is, when the
same picture contains two different images. Again, the focus of perception can alternate between
the two individual images. Even though this effect may not have the same potency as binocular
rivalry or other ambiguous images, it underlines that all of the sensory input cannot be processed
simultaneously with equal processing resources. We can focus on one component image but only
perceive the other one as a distracting background blur or vice versa (O’Craven, Downing, and
Kanwisher 1999); alternatively, we may force vision to perceive both at the same time but this only
results in a messy, broken-up percept.
It should also be noted that the fact that perception can be multistable at all has implications for
our understanding of the perceptual apparatus. The reason that we are not conscious of all possible
interpretations of an ambiguous stimulus could be related to a limit in the capacity with which the
Perceptual Organization and Consciousness 803
brain can perceptually organize and interpret the overwhelming sensory input. However, if this is
true, this must mean that some information can only be processed with awareness of the stimulus.
Conversely, the fact that our percept does not simply stabilize into one of the possibilities is incon-
sistent with any account that the brain merely interprets the sensorium using the most probable
prior expectation. Instead perhaps the continuous fluctuation in perception reflects the brain’s
way to search for the appropriate solution when faced with strongly ambiguous input. Reconciling
theories of prediction with rivalrous perception remains an important topic for future research.
What neural processes underlie the perceptual switches and periods of perceptual dominance
in multistable perception? The advent of modern neuroimaging techniques like positron emis-
sion tomography (PET), functional magnetic resonance imaging (fMRI), electroencephalography
(EEG), and magnetoencephalography (MEG) have made it possible to measure neural activity
throughout the human brain whilst measuring behavioural reports of the observer’s percep-
tual state in real time. Such experiments show that regions of superior parietal and prefrontal
cortex, which are also associated with attentional deployment, are active during the transitions
from one perceptual state to another (Lumer, Friston, and Rees 1998). Moreover, the structure
of such regions is related to the frequency of perceptual switches. Specifically, individual differ-
ences in the grey matter volume in right superior parietal cortex correlate with the switch rate
for a structure-from-motion stimulus (Kanai, Bahrami, and Rees 2010). Causally manipulating
neural activity in these regions using transcranial magnetic stimulation (TMS) with continuous
theta-burst stimulation decreases switch rate (Kanai et al. 2010), showing that these areas play a
causal role in generating perceptual switches. Moreover, applying TMS to a slightly more anterior
part of parietal cortex has the opposite effect on switch rates in binocular rivalry (Carmel et al.
2010; Zaretskaya et al. 2010). Taken together this suggests a sophisticated model in which parietal
(and perhaps prefrontal) cortices play a complex causal role in generating top-down signals that
ultimately resolve perceptual competition in ventral visual cortex (Kanai et al. 2011).
The link between brain structure and the switch rate in these forms of perceptual rivalry also
hints at the possibility that these processes are deeply rooted in human physiology. While grey
matter volume can change over the lifespan and there is some short-term experience-dependent
plasticity associated with learning motor tasks (Draganski et al. 2004), there is a strong herit-
able component to switch rate in multistable perception (Miller et al. 2010; Shannon et al. 2011).
Moreover, switch rate correlates with the occurrence and severity of bipolar disorder (Pettigrew
and Miller 1998; Miller et al. 2003; Krug et al. 2008; Nagamine et al. 2009). This obviously does
not imply that binocular rivalry, and bistable perception in general, is causal to psychiatric or neu-
rological conditions but it suggests that rivalry shares mechanisms affected in these conditions.
Recent studies investigated the balance of excitatory and inhibitory signalling in visual cortex
motivated by the assumption that this balance relates to the dynamics of perceptual rivalry (van
Loon et al. 2013), which could be altered in certain conditions (Aznar Casanova et al. 2013; Said
et al. 2013).
Naturally, the focus of awareness does not exist in isolation from wider perceptual process-
ing. While there is a strong stochastic element to how and when perceptual transitions occur
during multistable perception, the timing of such transitions is also strongly influenced by the
stimuli used and other factors, such as what stimuli had been presented previously or atten-
tional deployment. So it is possible to some degree to control perceptual alternations through
selectively attending to one particular interpretation (Ooi and He 1999; Hugrass and Crewther
2012), although binocular rivalry may be less susceptible to voluntary control than other forms
of multistability (Meng and Tong 2004). Moreover, when viewing of a binocular rivalry stimulus
is interrupted by a blank epoch, the first percept reported when the rivalrous stimulus returns is
804 Schwarzkopf and Rees
frequently the same as the one last perceived before the blank epoch (Leopold et al. 2002). Even
more fundamentally, basic image statistics can influence bistable perception. During binocular
rivalry, sharp edges with high contrasts and sudden movement usually result in perceptual domi-
nance, while homogeneous regions of an image tend to be suppressed. Thus rivalrous images that
contain a large degree of heterogeneity in one eye but homogenous regions in the other tend to
be dominated by the heterogeneous image. The sudden appearance of one monocular image can
substantially bias the percept to being dominated by that image, a process known as flash suppres-
sion (Wolfe 1984), perhaps because sudden appearance of a stimulus is particularly salient (Cole
et al. 2004).
This phenomenon can be exploited to sustain perceptual dominance of one eye for prolonged
periods. One eye views a dynamic stream of constantly changing patterns of high-contrast geo-
metric shapes (e.g. a Mondrian-like pattern) while the other views a low-contrast stimulus. Under
the right circumstances such continuous flash suppression (CFS) results in complete dominance
of perception for extended periods of time by the dynamic stimulus, thus suppressing the other
monocular stimulus from awareness (Tsuchiya and Koch 2005). It is however critical to keep in
mind that this suppression may differentially affect the low-level stimulus components, such as the
stimulus spatial frequency (Yang and Blake 2012) and the phase alignment of stimulus and mask
(Maehara et al. 2009). CFS has been used to study unconscious stimulus processing in numerous
studies and enjoys increasing popularity due to the ease of its use. In one variant of these experi-
ments, the contrast of the suppressed image is gradually increased and the critical parameter to
be measured is the ‘time to emergence’ when the suppressed stimulus breaks through the masking
stimulus in the other eye and reaches awareness. Comparing this parameter for different stimulus
conditions can reveal differences in the unconscious processing of the images (Jiang, Costello, and
He 2007). However, it is always important to keep in mind the time it takes a stimulus to break
interocular suppression may be determined not necessarily by a stimulus parameter of interest
but could also involve other, low-level features of the suppressed image. Further, it is possible that
a faster time to emergence does not actually reflect unconscious processing but rather the speed
(or other dynamics) with which the stimulus breaks through suppression once it has passed the
threshold to conscious perception.
At an even more basic level, image statistics vie for perceptual dominance. When one eye views
white noise images, while the other views noise images filtered to fall within the 1/f spectrum typi-
cally observed in natural scenes (Field 1987; Simoncelli and Olshausen 2001), the latter dominate
perception for a significantly longer periods than the white noise images (Baker and Graf 2009).
This may suggest that the visual system selectively responds by bringing stimuli whose image sta-
tistics conform with the natural world to the focus of awareness. However, the same may not apply
to higher-order image statistics, such as the collinearity or co-circularity of orientated segments
in the image. While some studies show that collinear gratings in a binocular rivalry stimulus tend
perceptually to transition as a group (Alais and Blake 1999), there have also been reports that
when a noisy field of grating patches of random orientations is paired with a field of varying levels
of co-circularity in the other eye, it is the incoherent, random pattern that dominates perception
(Hunt, Mattingley, and Goodhill 2012), even though the natural environment contains a high
degree of such co-circular regularities (Geisler et al. 2001; Geisler 2008). The reason for that may
be that the two monocular images in that study were not perfectly overlapping, so that individual
patches were not in directly rivalry with one another. As a matter of particular relevance to the
question as to how the visual system organizes stimulus elements into coherent objects it is inter-
esting that interocular suppression spreads along contours and around angles and even across
gaps in a contour provided that it is interpreted as arising from occlusion (Maruya and Blake
Perceptual Organization and Consciousness 805
2009). It is evident that the same processes that are involved in organizing our perception into a
coherent representation of the environment have complex interactions with awareness.
Bistability of the contents of awareness can also be experienced with regard to how the brain
interprets information as a coherent whole. A stimulus like that shown in Figure 39.1C can
be perceived in different states, reflecting the way individual stimulus elements are regarded
as being independent or part of a larger object (Murray et al. 2002; Fang, Kersten, and Murray
2008). In the local state the two lines are perceived as drifting up or down, i.e. the veridical
interpretation. However, in the global state the observer instead reports the lines as the sides
of a square that is moving left and right behind several occluding rectangles. Which particular
interpretation currently dominates perception also influences the aftereffects from using these
stimuli as adaptors (He, Kersten, and Fang 2012). A similar stimulus is shown in Figure 39.1D.
There are four groups of stimuli, each comprising two discs circling around a central hinge point.
Under the local interpretation, each of these groups is perceived as independent moving objects
(perhaps akin to binary star systems). However, in the global state discs from distant locations
are grouped into larger entities, resulting in the percept of two squares rotating around one
another. Neuroimaging experiments show that in the global state, neural responses in early vis-
ual cortex to such stimuli are reduced relative to the local interpretation (Zaretskaya, Anstis, and
Bartels 2013). Such a response pattern is a hallmark of coherent perceptual organization, pos-
sibly indicative of predictive coding by which areas higher up in the processing hierarchy send
feedback signals to early visual cortex that cancel out the neural activity that is ‘explained away’
by coherent objects (Rao and Ballard 1999; Murray et al. 2002; Joo, Boynton, and Murray 2012).
However, such an interpretation is complicated by the fact that while responses in early visual
cortex are reduced, this reduction is general to the whole region, rather than specifically to the
location responding to the stimulus (de-Wit et al. 2012). Moreover, the neural representation
of the stimulus is enhanced (Kok, Jehee, and de Lange 2012), which could be related to the fact
that there is reduced variability in stimulus features (Dumoulin and Hess 2006) and thus reduced
lateral inhibition (which would appear as metabolic activity in neuroimaging measurements)
between adjacent neuronal populations with different tuning properties (Kinoshita, Gilbert, and
Das 2009). While such lower-level explanations cannot entirely account for findings supporting
the predictive coding hypothesis in the context of ambiguous stimuli, the underlying neural
mechanisms are probably more complicated than the predictive coding account proposes.
The beauty of these particular stimulus examples lies in the fact that, like all bistable images, the
stimuli themselves are physically constant and only perceptual organization alternates. However,
one problem with these particular forms of bistable perceptual organization is that our interpre-
tation is typically fairly biased towards one state. For instance, in the latter example the percept
becomes more predominantly local as the speed of rotation is increased, and, more critically, it
tends to become more global with prolonged exposure (Anstis and Kim 2011). This is also why it
is necessary to adapt stimulus parameters continuously to ensure relatively equal dominance of
each state (Zaretskaya et al. 2013), something that is typically less problematic for more classical
ambiguous stimuli like binocular rivalry or structure-from-motion displays that constantly switch
between perceptual states. Nevertheless, as these and other studies illustrate, stimuli like these can
be used successfully to reveal how grouping processes influence the contents of awareness.
One way to reveal neural correlates of consciousness and to understand what information is
processed in the absence of awareness is to rely entirely on whether a stimulus gains access to
conscious report or not. Multistability is not the only means of doing this. For example, there have
been demonstrations of priming effects exerted by stimuli that remained undetected in change
blindness paradigms (Silverman and Mack 2006; Yeh and Yang 2009). Interestingly, while previous
806 Schwarzkopf and Rees
neuroimaging and TMS experiments implicate right parietal and dorsolateral prefrontal cortex in
signalling for the presence of a change of the stimulus (Beck et al. 2001, 2006; Turatto, Sandrini,
and Miniussi 2004), there is also evidence to suggest that the memory trace of a stimulus can be
boosted by applying TMS to visual cortical areas encoding the stimulus (Schwarzkopf et al. 2010).
Research on the neural correlates of consciousness (Rees, Keiman, and Koch 2002) has also
implicated that recurrent connectivity between brain regions in the sensory hierarchy is criti-
cally important for conscious perception of a stimulus. The visibility of a visual stimulus under
meta-contrast masking correlates with effective connectivity between early visual areas and fusi-
form cortex, which seems to relate to activity in the region immediately surrounding the reti-
notopic representation of the stimulus in early visual cortex (Haynes, Driver, and Rees 2005).
Further, it has been proposed that feedback from higher regions into earlier areas is critical for
conscious perception (Roelfsema, Lamme, and Spekreijse 1998; Lamme and Roelfsema 2000;
Lamme 2006), although others have argued that at least for visual masking paradigms conscious-
ness varies due to disruptions in feed-forward processing (Tse et al. 2005; Dehaene et al. 2006;
Macknik and Martinez-Conde 2007).
very poorly understood, the role it plays in our interpretation of the environment is also difficult
to establish. Are there any perceptual functions that require conscious awareness of the stimulus?
Alternatively, could consciousness simply be a product of the mind but irrelevant for how the
brain analyses sensory information?
There have been numerous demonstrations of how unconscious stimuli can have complex and
powerful effects on behaviour. Images of emotional faces rendered invisible through masking,
can influence behavioural performance (Yang, Zald, and Blake 2007; Faivre, Berthet, and Kouide
2012; Almeida et al. 2013) and produce brain activation in neuroimaging experiments linked to
emotional processing, like enhanced amygdala responses to fearful faces (Williams et al. 2004;
De Gelder et al. 2005). This suggests that the neural mechanisms required for detecting emo-
tional expressions operate even when we are not aware of the stimulus. Similar findings have been
made for social information in faces. For example, the time for a face to emergence from continu-
ous flash suppression (i.e. the time it takes for a low contrast face stimulus to break through the
dichoptic mask) is influenced by its dominance or trustworthiness (Stewart et al. 2012). It has
been argued that the information about emotional valence, in particular fear responses, is con-
fined to low spatial frequencies and bypasses the high-resolution image analysis in early visual
cortex entirely (Vuilleumier et al. 2003; Winston, Vuilleumier, and Dolan 2003) through a subcor-
tical pathway. This would suggest that while perceptual analysis necessary for such primal emo-
tional responses is independent of awareness, conscious processing may nevertheless be required
for detailed perceptual organization.
However, even more complex information is processed in the absence of awareness. For exam-
ple, semantic information can be processed without awareness and break through binocular
suppression (Costello et al. 2009), although it is unclear how much semantic information can
be processed whilst undergoing dichoptic suppression (Zimba and Blake 1983). Organizing
local image features like lines and angles into letters, and subsequently letters into words, must
require fairly sophisticated processing. At least to some extent this process must be preserved in
the absence of conscious awareness. Whether or not an invisible stimulus exerts an influence on
perception probably also depends on what aspect of perception is measured: while a high-order
visual stimulus, like a spiral, may not produce adaptation (unlike simpler stimuli, like a grat-
ing) when masked from awareness, a complex, naturalistic image may still capture attentional
resources (Lin and He 2009). Further, as discussed earlier, one important aspect to consider is also
that the means by which a stimulus is rendered invisible may influence whether a stimulus can
have subliminal effects (Faivre et al. 2012; Yang and Blake 2012). A briefly presented stimulus fol-
lowed by a mask may be available to complete perceptual processing even though it is unavailable
to conscious report. On the other hand, presenting the same stimulus under conditions of binocu-
lar rivalry may eliminate its neural representation in higher brain regions where the information
about the stimulus eye of origin is lost.
In light of this problem, it is even more interesting that even the processing of complex natural
images appears to proceed under continuous flash suppression that renders the images invisible.
One study measured the time to emergence to visual scenes that were either congruent with the
natural world or contained some form of inconsistency, such as an archer using a tennis racket
instead of an arrow or basketball players using a watermelon instead of a ball (Mudrik et al. 2011).
Intriguingly, incongruent scenes broke through perceptual suppression faster than congruent
scenes. This may suggest that even the complex integration of objects in their semantic context
can occur in the absence of awareness. Even if we assume that this effect may in some way be
influenced by low-level image properties (an explanation which is somewhat unlikely due to the
diverse range of natural stimuli used in that study) and bypasses detailed visual analysis through
808 Schwarzkopf and Rees
different pathways, it must require some complex processes to identify the out-of-place features.
This finding is in some way contrary to the aforementioned reports of a bias for more ‘natural’
stimuli to dominate in binocular rivalry (Baker and Graf 2009). However, as discussed in the pre-
vious section, it is also important to note that the measure used by this study, time to emergence
from CFS, may not truly reflect the processing that occurs under suppression but the detection of
incongruent scenes at the moment of transition between suppression and visibility, which in turn
results in them reaching perceptual dominance with a faster speed.
In contrast to this finding, the neural representation of complex visual stimuli may not be
the same in the absence of awareness as during conscious viewing. For example, one study
used multivariate pattern decoding techniques to decode distributed activations measured with
high-resolution functional MRI in higher ventral visual cortex to distinguish processing associ-
ated with viewing of face or house images (Sterzer, Haynes, and Rees 2008). While it was possible
to decode which of the two stimulus classes was being processed, regardless of whether or not they
were rendered invisible by continuous flash suppression, the results suggested that the nature of
the pattern information under awareness and invisibility was different. This is notably different
from the situation in early visual cortex, where the neural representation of invisible orientated
gratings is similar to visible stimuli (Haynes and Rees 2005). The overall visual response in higher
visual brain regions to stimuli rendered invisible through binocular fusion (when two comple-
mentary images are presented to each eye and perceived merely as a uniform blank) can be very
similar, albeit weaker, to that to visible stimuli (Moutoussis and Zeki 2002; Schurger et al. 2010).
This suggests that there may be fundamental differences in terms of how information about the
visual stimulus is encoded during unconscious processing.
It has been argued that one neural correlate of awareness is the reliability of the visual response
to the stimulus (Schurger et al. 2010). Using functional MRI and multivariate decoding analysis
these authors showed that the pattern of activation produced by invisible stimuli is indeed more
variable compared to that for visible stimuli. However, it seems curious to regard this as a neural
correlate of consciousness: by definition variability must be determined over the course of multi-
ple or prolonged measurements. Consciousness, on the other hand, can vary from one moment
to the next. While it is certainly possible that one property granting neural representations access
to consciousness may be its temporal stability, the response patterns in functional MRI are meas-
ured on a trial-by-trial basis with each trial comprising slow haemodynamic measurements over
several seconds. It seems unlikely that response variability between such trials can explain the
absence (or presence) of awareness across all trials because awareness of a stimulus operates at
much faster time scales. More importantly, because this study employed a stimulus manipulation
(binocular fusion) to render the stimulus invisible, it is a demonstration of the earlier discussion
of masking methods: it is impossible to rule out that the reduced reliability of fMRI responses is
correlated to consciousness or merely a result of differences in the stimulus. Only a design that
compares conscious and unconscious trials with identical stimulation can conclusively arbitrate
between those possibilities.
Nevertheless, the finding is interesting because it suggests that without awareness a stabiliz-
ing influence on the neural representation may be lost. This is also supported by psychophysical
experiments showing that without awareness, behavioural tuning to orientation is broader, con-
sistent with greater variability (Ling and Blake 2009). In that study, awareness was manipulated
by using binocular rivalry with flash suppression, and comparing identical stimulus conditions in
the presence and absence of awareness, rather than directly manipulating the stimulus to render
it invisible. This provides stronger evidence that the differences indeed relate to consciousness
rather than physical differences in visual input.
Perceptual Organization and Consciousness 809
Another interesting aspect of Schurger and colleagues’ finding was that the brain regions where
the most diagnostic information about the visual images was encoded differed between visible
and invisible stimuli (Schurger et al. 2010). While the former selectively activated well-replicated
areas in ventral cortex known to respond preferentially to images of faces and houses, respectively,
invisible stimuli on the other hand were decoded by more posterior regions in intermediate fusi-
form cortex presumably corresponding to areas V4 and the VO complex (Wandell, Dumoulin,
and Brewer 2007). While these regions are already sensitive to relatively complex visual infor-
mation, they are not as selective for object identity. It is therefore possible that in the absence of
awareness visual information is encoded in a more incoherent form, relying on more primitive
features rather than abstract classes. At least some perceptual organization, transforming geomet-
ric primitives into coherent and meaningful objects, may thus require consciousness.
To test this notion, in behavioural experiments we measured priming effects produced by
simple visual shapes that were either visible or rendered invisible by fast counter-phase flicker,
a method that seems to allow for at least low-level processing of visual information to occur
(Falconbridge et al. 2010). Shapes comprised sparse fragments and could either be defined by
the position or the orientation of the elements (Schwarzkopf and Rees 2010). We observed that
priming effects from invisible stimuli on the discrimination of shapes of the opposite feature only
occurred when the primes were defined by orientation. Moreover, this effect disappeared when
the discrimination targets were rescaled. This indicates that without awareness, oriented elements
are not integrated into an abstract representation of a shape but that some more local processes
involved in spatial integration, possibly confined to early retinotopic cortex, are nonetheless func-
tioning. Consciousness, it seems, is after all required for some more abstract analysis of the visual
environment.
This notion was also supported by an experiment in which we tested whether Kanizsa triangles
are formed when the inducers producing this type of illusory contour are rendered invisible by
continuous flash suppression, but a central region containing the illusory contour produced by
the stimulus configuration was available to conscious perception (Harris et al. 2011). Participants
were required to discriminate the orientation of the illusory contour. Without awareness, per-
formance was consistently at chance levels, indicating that participants could not perceive the
illusory contour. This contrasts with a control experiment where we showed that simultaneous
brightness contrast (Figure 39.2A), the contextual modulation of perceived brightness when a
stimulus is presented against a dark or light background, is preserved even when the background
is suppressed from awareness. This null finding for perception of illusory contours when the
inducers are suppressed from awareness cannot be explained by lack of statistical power, because
each participant performed a large number of trials and performance was extremely consistent
across the group. However, as previously discussed with any of these studies in which awareness
is manipulated by a change in the stimulus, it is possible that the dichoptic masking procedure,
rather than consciousness per se, interfered with the formation of the illusory contours. Others
have shown that when masking, illusory contours are not perceived when the inducers are sup-
pressed during binocular rivalry (Sobel and Blake 2003). There is evidence that illusory contours
are mediated by binocular neurons (Liu, Stevenson, and Schor 1994; Gillam and Nakayama 1999;
Häkkinen and Nyman 2001) that may have been affected by dichoptic masking. One argument
speaking against that is that Kanizsa triangles enhance the speed with which a stimulus breaks
through binocular suppression (Wang, Weng, and He 2012), although this is inconsistent with
the absence of any effect on dominance periods during binocular rivalry (Sobel and Blake 2003),
and it remains unclear to what degree the time to emergence from binocular suppression reflects
unconscious processing per se.
810 Schwarzkopf and Rees
Fig. 39.2 Visual illusions. (a). Simultaneous brightness contrast. The luminance of the two circles
is identical. (b). Contrast suppression. The contrast in the two circular patches is identical. (c).
Ebbinghaus illusion. The size of the two light grey circles is the identical. (d). Ponzo illusion. The length
of the two horizontal lines is identical. (e). Mueller-Lyer illusion. The length of horizontal section of the
two arrows is identical. (f). Shepard’s Tables. The surface area of the two tables is identical.
It is also likely that inferring illusory contours operates through a multi-stage process where
first the local stimulus features are segmented and grouped into objects, which then produces the
illusory percept possibly mediated by hierarchically earlier stages of visual processing through
feedback (Kogo et al. 2010); see also the chapter by Kogo and van Ee, this volume). This is consist-
ent with the finding that stimuli that mimic the salience of Kanizsa figures but that do not pro-
duce the percept of illusory contours produce similar neural responses in lateral occipital cortex,
a region presumed to be involved in extracting surfaces and objects (Stanley and Rubin 2003).
It also agrees with recent findings that the perception of Kanizsa stimuli depends not only on
processing in early visual cortex but also on feedback from higher lateral occipital cortex (Wokke
et al. 2013). The arrangement of the inducers may attract attention to the Kanizsa stimulus with-
out producing an actual percept of illusory contours. This is not an unlikely explanation because
there is considerable evidence that, while related, attention is a process distinct from awareness
(Kentridge, Heywood, and Weiskrantz 1999; Lamme 2003; Koch and Tsuchiya 2007; Bahrami
et al. 2008a, 2008b; Zhaoping 2008). Further, the spread of attentional responses in V1 is deter-
mined by Gestalt principles (Wannig, Stanisor, and Roelfsema 2011). The extent to which pro-
cessing of illusory contours occurs without awareness thus still remains a question to be resolved
by future research. However, our results already point towards the fact that illusory contours are
formed at least at a higher-level stage of processing beyond where signals from the two eyes are
still separate.
Perceptual Organization and Consciousness 811
Interestingly in this context, there have been findings from stroke patients with parietal extinc-
tion (where a stimulus on the side contralateral to a parietal lesion remains undetected if a
simultaneous ipsilateral stimulus is presented). Grouping of stimuli that form Kanizsa figures
can alleviate the effects of extinction (Mattingley, Davis, and Driver 1997; Conci et al. 2009), sug-
gesting that these processes are not dependent on awareness of the stimulus. However, again in
this situation it is unclear which comes first: the production of illusory contours or the segmenta-
tion of stimuli into surfaces. This line of research is discussed in greater detail in the chapter by
Gillebert and Humphreys (this volume).
influence our judgment of object size by exploiting inherent assumptions about perspective.
Finally, some illusions like the rotating snakes (http://www.ritsumei.ac.jp/~akitaoka/index-e.
html) motion that is not physically present in the image. Similarly, in the percept of illusory con-
tours and amodal completion in images like the aforementioned Kanizsa figures, or the extrapo-
lation of edges from abutting line segments (see the chapter by Kogo and van Ee, this volume,
for an in-depth discussion of these processes), we perceive a faint luminance edge that can be of
remarkable clarity simply due to the presence of inducing image components that imply the pres-
ence of a figure or an edge even though there is no physical luminance contrast. Thus, even very
simple geometric stimulus features can influence and alter the contents of awareness, making us
experience things that are not actually there.
Naturally, this list is not exhaustive but meant to give an overview of the different types of visual
illusions. One thing that they all share is that they affect the contents of our awareness by letting us
see things that are at odds with physical reality. Many neuroimaging studies show that the neural
representation of our perceived environment can be found even at relatively early stages of corti-
cal visual processing. For example, activity produced by physically identical stimuli in primary
visual cortex (V1) reflects their perceived size (Murray, Boyaci, and Kersten 2006). Subsequent
work shows that this was not solely due to larger responses to stimuli perceived as larger and that
this effect required participants to attend to the stimulus (Fang, Boyaci, et al. 2008). More recently,
this effect was further corroborated by the finding that the perceived size of a retinal afterim-
age is also reflected by V1 activity (Sperandio, Chouinard, and Goodale 2012). Intriguingly, the
perceived size of afterimages is also susceptible to contextual size illusions (Sperandio, Lak, and
Goodale 2012).
Consistent with this, in our own experiments the Ebbinghaus illusion is reduced under
dichoptic presentation when inducers and target stimuli are presented to different eyes (Song,
Schwarzkopf, and Rees 2011). Such absent or weak interocular transfer of an effect indicates that
it must be at least partly mediated by early stages of visual processing where the information from
the two eyes has not been fully combined, such as V1. We therefore hypothesized that the cortical
surface area of V1, which varies quite considerably between individuals (Andrews, Halpern, and
Purves 1997; Dougherty et al. 2003), might co-vary with the strength of such size illusions. In
particular, we reasoned that if the circuits mediating these illusions (lateral connections, feedback
pathways) do not scale with V1 surface area, the strength of these illusions should thus be reduced
in individuals with a larger V1. We measured the surface area of V1 in thirty individuals using
functional MRI and retinotopic mapping procedures (Schwarzkopf, Song, and Rees 2011) and
compared that to the magnitude of the Ebbinghaus and a variant of the Ponzo illusion measured
behaviourally in a psychophysics lab. As predicted, illusion magnitude was negatively correlated
with V1 surface area. In subsequent experiments we further show that this correlation is present
for both components of the Ebbinghaus stimulus, that is, both for contexts with both small and
large inducers. Our results further support the interpretation that the cortical distance over which
the contextual interaction occurs is a major factor determining illusion strength (Schwarzkopf
and Rees 2013). While correlational studies like this cannot resolve the question of causality and
the specific circuits mediating the illusion remain to be identified, our findings suggest that the
surface area of V1 at least in part reflects the subjective awareness of object size.
All of the examples in this section thus far have been in the visual domain. As with perceptual
science in general, vision has received most attention. However, there are also perceptual illu-
sions in other sensory domains and it is important not to neglect these as of course all sensory
input contributes to our subjective experience of the world. One example is the Aristotle illusion
from the somatosensory modality that can occur when we cross our fingers (as when wishing
Perceptual Organization and Consciousness 813
somebody luck, or hoping for our Nature manuscript to be accepted for publication) and then
touching a single marble so that it is held in between the two fingertips. One then has the experi-
ence (especially when moving the marble along the surface of a table or the floor) that there are
two marbles, each touching one finger (Aristotle 1924). This percept may arise because in our
interpretation of the somatosensory input the fingers are not normally crossed, and so under typi-
cal conditions the sensation caused by this finger configuration would truly reflect the presence
of two independent objects. Different sensory modalities may also interact to produce perceptual
illusions, such as in the flash-beep illusion where the presence of a two sounds presented in brief
succession simultaneously with a single visual flash can produce the percept of two independent
flashes (Shams, Kamitani, and Shimojo 2000; Watkins et al. 2007). Interestingly, how prone an
individual is to this illusion correlates with grey matter volume in early visual cortex (De Haas
et al. 2012). Another example is the McGurk effect (McGurk and MacDonald 1976), which occurs
when an auditory vocalization of a syllable is presented together with an incongruent movie of
a face vocalizing a different syllable. The actual percept tends to be a mixture of the two modali-
ties. Interestingly, in the context of the topics discussed earlier, congruency between the visual
face stimulus and the auditory vocalization helps the face break through interocular suppression
(Alsius and Munhall 2013); however, face stimuli rendered invisible through CFS did not produce
the McGurk illusion, suggesting that in order for a stimulus to exert multimodal effects it must be
consciously perceived (Palmer and Ramsey 2012).
Conclusion
In this chapter, we outlined some of the ways in which consciousness interacts with the perceptual
organization of our sensory input. Not only does the brain’s interpretation of stimuli influence
whether or not they reach the focus of our awareness, but we can also regard the way a scene is
perceived to be a reflection of our subjective experience, the contents of awareness. We described
a number of experiments investigating the processes by which our percepts are shaped by the
brain and how to separate those functions that operate in the absence of awareness from those
that require conscious processing. What kinds of sensory information can be interpreted without
awareness remains unclear. The literature on this question is patchy, with several studies inves-
tigating small aspects of unconscious perceptual processing, but a general theory tying together
these findings is elusive.
It also still remains unresolved how different means of removing a stimulus from conscious
access relate in terms of their neural mechanism, and as such in how far they can be compared.
The best experimental manipulations to study consciousness are those that keep the stimulus con-
stant and instead rely on subjective differences in awareness to dissociate objective physical prop-
erties from subjective experience. This makes bistable stimuli and contextual illusions popular
targets for experimental investigations, but the approach is not suited to addressing all questions.
Therefore, a more comprehensive comparison of different masking techniques will be instrumen-
tal in advancing our understanding of the role consciousness plays in perceptual organization.
References
Alais, D. and R. Blake (1999). ‘Grouping Visual Features during Binocular Rivalry’. Vision Research
39: 4341–4353.
Almeida, J., P. E. Pajtas, B. Z. Mahon, K. Nakayama, and A. Caramazza (2013). ‘Affect of the
Unconscious: Visually Suppressed Angry Faces Modulate our Decisions’. Cognitive Affective &
Behavioral Neuroscience 13: 94–101.
814 Schwarzkopf and Rees
Alsius, A. and K. G. Munhall (2013). ‘Detection of Audiovisual Speech Correspondences without Visual
Awareness’. Psychological Science 24: 423–431.
Andrews, T. J., S. D. Halpern, and D. Purves (1997). ‘Correlated Size Variations in Human Visual Cortex,
Lateral Geniculate Nucleus, and Optic Tract’. Journal of Neuroscience 17: 2859–2868.
Anstis, S. and J. Kim (2011). ‘Local versus Global Perception of Ambiguous Motion Displays’. Journal of
Vision 11 (3): 13.
Aristotle (1924). Metaphysics. Oxford: Oxford University Press.
Aznar Casanova, J. A., J. A. Amador Campos, M. Moreno Sánchez, and H. Supér (2013). ‘Onset Time
of Binocular Rivalry and Duration of Inter-dominance Periods as Psychophysical Markers of ADHD’.
Perception 42: 16–27.
Bahrami, B., D. Carmel, V. Walsh, G. Rees, and N. Lavie (2008a). ‘Spatial Attention Can Modulate
Unconscious Orientation Processing’. Perception 37: 1520–1528.
Bahrami, B., D. Carmel, V. Walsh, G. Rees, and N. Lavie (2008b). ‘Unconscious Orientation Processing
Depends on Perceptual Load’. Journal of Vision 8 (3): 12.
Baker, D. H. and E. W. Graf (2009). ‘Natural Images Dominate in Binocular Rivalry’. Proceedings of the
National Academy of Sciences USA 106: 5436–5441.
Beck, D. M., N. Muggleton, V. Walsh, and N. Lavie (2006). ‘Right Parietal Cortex Plays a Critical Role in
Change Blindness’. Cerebral Cortex 16: 712–717.
Beck, D. M., G. Rees, C. D. Frith, and N. Lavie (2001). ‘Neural Correlates of Change Detection and Change
Blindness’. Nature Neuroscience 4: 645–650.
Bonneh, Y. S., A. Cooperman, and D. Sagi (2001). ‘Motion-induced Blindness in Normal Observers’.
Nature 411: 798–801.
Brancucci, A. and L. Tommasi (2011). ‘“Binaural Rivalry”: Dichotic Listening as a Tool for the Investigation
of the Neural Correlate of Consciousness’. Brain and Cognition 76: 218–224.
Breitmeyer, B. G. and H. Ogmen (2000). ‘Recent Models and Findings in Visual Backward
Masking: A Comparison, Review, and Update’. Perception and Psychophysics 62: 1572–1595.
Carmel, D., V. Walsh, N. Lavie, and G. Rees (2010). ‘Right Parietal TMS Shortens Dominance Durations in
Binocular Rivalry’. Current Biology 20: R799–R800.
Clifford, C. W. G. and J. A. Harris (2005). ‘Contextual Modulation outside of Awareness’. Current Biology
15: 574–578.
Cole, G. G., R. W. Kentridge, C. A. Heywood, and G. G. Cole (2004). ‘Visual Salience in the Change
Detection Paradigm: The Special Role of Object Onset’. Journal of Experimental Psychology: Human
Perception and Performance 30: 464–477.
Conci, M., E. Böbel, E. Matthias, I. Keller, H. J. Müller, et al. (2009). ‘Preattentive Surface and
Contour Grouping in Kanizsa Figures: Evidence from Parietal Extinction’. Neuropsychologia
47: 726–732.
Costello, P., Y. Jiang, B. Baartman, K. McGlennen, and S. He (2009). ‘Semantic and Subword Priming
during Binocular Suppression’. Consciousness and Cognition 18: 375–382.
De Gelder, B., J. S. Morris, and R. J. Dolan (2005). ‘Unconscious Fear Influences Emotional Awareness of
Faces and Voices’. Proceedings of the National Academy of Sciences USA 102: 18682–18687.
De Haas, B., R. Kanai, L. Jalkanen, and G. Rees (2012). ‘Grey Matter Volume in Early Human Visual
Cortex Predicts Proneness to the Sound-induced Flash Illusion’. Proceedings of the Royal Society
B: Biological Sciences 279: 4955–4961.
Dehaene, S. S., J.-P. Changeux, L. Naccache, J. Sackur, and C. Sergent (2006). Conscious, preconscious,
and subliminal processing: a testable taxonomy. Trends in Cognitive Sciences (REGUL edn)
10: 204–211.
de-Wit, L. H., J. Kubilius, J. Wagemans, and H. P. Op de Beeck (2012). ‘Bistable Gestalts Reduce Activity in
the Whole of V1, not just the Retinotopically Predicted Parts’. Journal of Vision 12 (11): 12.
Perceptual Organization and Consciousness 815
Donner, T. H., D. Sagi, Y. S. Bonneh, and D. J. Heeger (2008). ‘Opposite Neural Signatures of
Motion-induced Blindness in Human Dorsal and Ventral Visual Cortex’. Journal of Neuroscience
28: 10298–10310.
Dougherty, R. F., V. M. Koch, A. A. Brewer, B. Fischer, J. Modersitzki, et al. (2003). ‘Visual Field
Representations and Locations of Visual Areas V1/2/3 in Human Visual Cortex’. Journal of Vision 3 (10): 1.
Draganski, B., C. Gaser, V. Busch, G. Schuierer, U. Bogdahn, et al. (2004). ‘Neuroplasticity: Changes in
Grey Matter Induced by Training’. Nature 427: 311–312.
Dumoulin, S. O. and R. F. Hess (2006). ‘Modulation of V1 Activity by Shape: Image-statistics or
Shape-based Perception?’ Journal of Neurophysiology 95: 3654–3664.
Einhäuser, W., K. A. C. Martin, and P. König (2004). ‘Are Switches in Perception of the Necker Cube
Related to Eye Position?’ European Journal of Neuroscience 20: 2811–2818.
Faivre, N., V. Berthet, and S. Kouider (2012). ‘Nonconscious Influences from Emotional
Faces: A Comparison of Visual Crowding, Masking, and Continuous Flash Suppression’. Frontiers in
Psychology 3: 129.
Falconbridge, M., A. Ware, and D. I. A. MacLeod (2010). ‘Imperceptibly Rapid Contrast Modulations
Processed in Cortex: Evidence from Psychophysics’. Journal of Vision 10 (8): 21.
Fang, F., H. Boyaci, D. Kersten, and S. O. Murray (2008). ‘Attention-dependent Representation of a Size
Illusion in Human V1’. Current Biology 18: 1707–1712.
Fang, F., D. Kersten, and S. O. Murray (2008). ‘Perceptual Grouping and Inverse fMRI Activity Patterns in
Human Visual Cortex’. Journal of Vision 8 (7): 2.
Field, D. J. (1987). ‘Relations between the Statistics of Natural Images and the Response Properties of
Cortical Cells’. Journal of the Optical Society of America A 4: 2379–2394.
Geisler, W. S., J. S. Perry, B. J. Super, and D. P. Gallogly (2001). ‘Edge Co-occurrence in Natural Images
Predicts Contour Grouping Performance’. Vision Research 41: 711–724.
Geisler, W. S. (2008). ‘Visual Perception and the Statistical Properties of Natural Scenes’. Annual Review of
Psychology 59: 167–192.
Gillam, B. and K. Nakayama (1999). ‘Quantitative Depth for a Phantom Surface Can Be Based on
Cyclopean Occlusion Cues Alone’. Vision Research 39: 109–112.
Gregory, R. L. (2008). ‘Emmert’s Law and the Moon Illusion’. Spatial Vision 21: 407–420.
Häkkinen, J. and G. Nyman (2001). ‘Phantom Surface Captures Stereopsis’. Vision Research 41: 187–199.
Harris, J. J., D. S. Schwarzkopf, C. Song, B. Bahrami, and G. Rees (2011). ‘Contextual Illusions Reveal the
Limit of Unconscious Visual Processing’. Psychological Science 22: 399–405.
Haynes, J.-D., J. Driver, and G. Rees (2005). ‘Visibility Reflects Dynamic Changes of Effective Connectivity
between V1 and Fusiform Cortex’. Neuron 46: 811–821.
Haynes, J.-D. and G. Rees (2005). ‘Predicting the Orientation of Invisible Stimuli from Activity in Human
Primary Visual Cortex’. Nature Neuroscience 8: 686–691.
He, D., D. Kersten, and F. Fang (2012). ‘Opposite Modulation of High- and Low-level Visual Aftereffects by
Perceptual Grouping’. Current Biology 22: 1040–1045.
Howe, C. Q. and D. Purves (2004). ‘Size Contrast and Assimilation Explained by the Statistics of Natural
Scene Geometry’. Journal of Cognitive Neuroscience 16: 90–102.
Howe, C. Q. and D. Purves (2005). ‘The Müller-Lyer Illusion Explained by the Statistics of Image-source
Relationships’. Proceedings of the National Academy of Sciences USA 102: 1234–1239.
Hugrass, L. and D. Crewther (2012). ‘Willpower and Conscious Percept: Volitional Switching in Binocular
Rivalry’. PLoS ONE. 7: e35963.
Hunt, J. J., J. B. Mattingley, and G. J. Goodhill (2012). ‘Randomly Oriented Edge Arrangements Dominate
Naturalistic Arrangements in Binocular Rivalry’. Vision Research 64: 49–55.
James W. (1890). The Principles of Psychology. New York: Holt.
816 Schwarzkopf and Rees
Mattingley, J. B., G. Davis, and J. Driver (1997). ‘Preattentive Filling-in of Visual Surfaces in Parietal
Extinction’. Science 275: 671–674.
Meng, M. and F. Tong (2004). ‘Can Attention Selectively Bias Bistable Perception? Differences between
Binocular Rivalry and Ambiguous Figures’. Journal of Vision 4 (7): 2.
Miller, S. M., B. D. Gynther, K. R. Heslop, G. B. Liu, P. B. Mitchell, et al. (2003). ‘Slow Binocular Rivalry in
Bipolar Disorder’. Psychological Medicine 33: 683–692.
Miller, S. M., N. K. Hansell, T. T. Ngo, G. B. Liu, J. D. Pettigrew, et al. (2010). ‘Genetic Contribution to
Individual Variation in Binocular Rivalry Rate’. Proceedings of the National Academy of Sciences USA
107: 2664–2668.
Motoyoshi, I. and S. Hayakawa (2010). ‘Adaptation-induced Blindness to Sluggish Stimuli’. Journal of Vision
10 (2): 16.
Moutoussis, K. and S. Zeki (2002). ‘The Relationship between Cortical Activation and Perception
Investigated with Invisible Stimuli’. Proceedings of the National Academy of Sciences USA 99: 9527–9532.
Mudrik, L., A. Breska, D. Lamy, and L. Y. Deouell (2011). ‘Integration without Awareness: Expanding the
Limits of Unconscious Processing’. Psychological Science 22: 764–770.
Murray, S. O., D. Kersten, B. A. Olshausen, P. Schrater, and D. L. Woods (2002). ‘Shape Perception
Reduces Activity in Human Primary Visual Cortex’. Proceedings of the National Academy of Sciences
USA 99: 15164–15169.
Murray, S. O., H. Boyaci, and D. Kersten (2006). ‘The Representation of Perceived Angular Size in Human
Primary Visual Cortex’. Nature Neuroscience 9: 429–434.
Nagamine, M., A. Yoshino, M. Miyazaki, Y. Takahashi, and S. Nomura (2009). ‘Difference in Binocular
Rivalry Rate between Patients with Bipolar I and Bipolar II Disorders’. Bipolar Disorders 11: 539–546.
O’Craven, K. M., P. E. Downing, and N. Kanwisher (1999). ‘fMRI Evidence for Objects as the Units of
Attentional Selection’. Nature 401: 584–587.
Ooi, T. L. and Z. J. He (1999). ‘Binocular Rivalry and Visual Awareness: The Role of Attention’. Perception
28: 551–574.
Palmer, T. D. and A. K. Ramsey (2012). ‘The Function of Consciousness in Multisensory Integration’.
Cognition. 125: 353–364.
Pettigrew, J. D. and S. M. Miller (1998). ‘A “Sticky” Interhemispheric Switch In Bipolar Disorder?’
Proceedings of the Royal Society B: Biological Sciences 265: 2141–2148.
Ramachandran, V. S. and R. L. Gregory (1991). ‘Perceptual Filling In of Artificially Induced Scotomas in
Human Vision’. Nature 350: 699–702.
Rao, R. P. and D. H. Ballard (1999). ‘Predictive Coding in the Visual Cortex: A Functional Interpretation of
Some Extra-classical Receptive-field Effects’. Nature Neuroscience 2: 79–87.
Rees, G., G. Kreiman, and C. Koch (2002). ‘Neural Correlates of Consciousness in Humans’. Nature
Reviews Neuroscience 3: 261–270.
Roberts, B., M. G. Harris, and T. A. Yates (2005). ‘The Roles of Inducer Size and Distance in the
Ebbinghaus Illusion (Titchener Circles)’. Perception 34: 847–856.
Roelfsema, P. R., V. A. Lamme, and H. Spekreijse (1998). ‘Object-based Attention in the Primary Visual
Cortex of the Macaque Monkey’. Nature 395: 376–381.
Said, C. P., R. D. Egan, N. J. Minshew, M. Behrmann, and D. J. Heeger (2013). ‘Normal Binocular Rivalry
in Autism: Implications for the Excitation/Inhibition Imbalance Hypothesis’. Vision Research 77: 59–66.
Schölvinck, M. L. and G. Rees (2009). ‘Attentional Influences on the Dynamics of Motion-induced
Blindness’. Journal of Vision 9 (1): 38.
Schölvinck, M. L. and G. Rees (2010). ‘Neural Correlates of Motion-induced Blindness in the Human
Brain’. Journal of Cognitive Neuroscience 22: 1235–1243.
Schurger, A., F. Pereira, A. Treisman, and J. D. Cohen (2010). ‘Reproducibility Distinguishes Conscious
from Nonconscious Neural Representations’. Science 327: 97–99.
818 Schwarzkopf and Rees
Schwarzkopf, D. S. and G. Rees (2010). ‘Interpreting Local Visual Features as a Global Shape Requires
Awareness’. Proceedings of the Royal Society B: Biological Sciences. http://rspb.royalsocietypublishing.org/
content/early/2010/12/04/rspb.2010.1909.
Schwarzkopf, D. S., J. Silvanto, S. Gilaie-Dotan, and G. Rees (2010). ‘Investigating Object Representations
during Change Detection in Human Extrastriate Cortex’. European Journal of Neuroscience
32: 1780–1787.
Schwarzkopf, D. S., C. Song, and G. Rees (2011). ‘The Surface Area of Human V1 Predicts the Subjective
Experience of Object Size’. Nature Neuroscience 14: 28–30.
Schwarzkopf, D. S. and G. Rees (2013). ‘Subjective Size Perception Depends on Central Visual Cortical
Magnification in Human V1’. PLoS ONE 8: e60550.
Shams, L., Y. Kamitani, and S. Shimojo (2000). ‘Illusions. What You See Is What You Hear’. Nature
408: 788.
Shannon, R. W., C. J. Patrick, Y. Jiang, E. Bernat, and S. He (2011). ‘Genes Contribute to the Switching
Dynamics of Bistable Perception’. Journal of Vision 11 (3): 8.
Silverman, M. E. and A. Mack (2006). ‘Change Blindness and Priming: When it Does and Does Not Occur’.
Consciousness and Cognition 15: 409–422.
Simoncelli, E. P. and B. A. Olshausen (2001). ‘Natural Image Statistics and Neural Representation’. Annual
Review of Neuroscience 24: 1193–1216.
Sobel, K. V. and R. Blake (2003). ‘Subjective Contours and Binocular Rivalry Suppression’. Vision Research
43: 1533–1540.
Song, C., D. S. Schwarzkopf, and G. Rees (2011). ‘Interocular Induction of Illusory Size Perception’. BMC
Neuroscience 12: 27.
Sperandio, I., P. A. Chouinard, and M. A. Goodale (2012). ‘Retinotopic Activity in V1 Reflects the
Perceived and not the Retinal Size of an Afterimage’. Nature Neuroscience 15: 540–542.
Sperandio, I., Lak, A., and M. A. Goodale (2012). ‘Afterimage Size is Modulated by Size-contrast Illusions’.
Journal of Vision 12 (2): 18.
Stanley, D. A. and N. Rubin (2003). ‘fMRI Activation in Response to Illusory Contours and Salient Regions
in the Human Lateral Occipital Complex’. Neuron 37: 323–331.
Sterzer, P., J.-D. Haynes, and G. Rees (2008). ‘Fine-scale Activity Patterns in High-level Visual Areas
Encode the Category of Invisible Objects’. Journal of Vision 8 (15): 10.
Stewart, L. H., Ajina, S., Getov, S., Bahrami, B., A. Todorov, et al. (2012). ‘Unconscious Evaluation of Faces
on Social Dimensions’. Journal of Experimental Psychology: General 141: 715–727.
Tse, P. U., S. Martinez-Conde, A. A. Schlegel, and S. L. Macknik (2005). ‘Visibility, Visual Awareness, and
Visual Masking of Simple Unattended Targets are Confined to Areas in the Occipital Cortex beyond
Human V1/V2’. Proceedings of the National Academy of Sciences USA 102: 17178–17183.
Tsuchiya, N. and C. Koch (2005). ‘Continuous Flash Suppression Reduces Negative Afterimages’. Nature
Neuroscience 8: 1096–1101.
Turatto, M., M. Sandrini, and C. Miniussi (2004). ‘The Role of the Right Dorsolateral Prefrontal Cortex in
Visual Change Awareness’. NeuroReport 15: 2549–2552.
van Ee, R., J. J. A. van Boxtel, A. L. Parker, and D. Alais (2009). ‘Multisensory Congruency as a Mechanism
for Attentional Control over Perceptual Selection’. Journal of Neuroscience 29: 11641–11649.
van Loon, A. M., T. Knapen, H. S. Scholte, E. St John-Saaltink, T. H. Donner, et al. (2013). ‘GABA Shapes
the Dynamics of Bistable Perception’. Current Biology 23: 823–827.
Vuilleumier, P., J. L. Armony, J. Driver, and R. J. Dolan (2003). ‘Distinct Spatial Frequency Sensitivities for
Processing Faces and Emotional Expressions’. Nature Neuroscience 6: 624–631.
Wandell, B. A., S. O. Dumoulin, and A. A. Brewer (2007). ‘Visual Field Maps in Human Cortex’. Neuron
56: 366–383.
Perceptual Organization and Consciousness 819
Wang, L., X. Weng, and S. He (2012). ‘Perceptual Grouping without Awareness: Superiority of Kanizsa
Triangle in Breaking Interocular Suppression’. PLoS ONE 7: e40106.
Wannig, A., L. Stanisor, and P. R. Roelfsema (2011). ‘Automatic Spread of Attentional Response
Modulation along Gestalt Criteria in Primary Visual Cortex’. Nature Neuroscience 14: 1243–1244.
Watkins, S., L. Shams, O. Josephs, and G. Rees (2007). ‘Activity in Human V1 Follows Multisensory
Perception’. Neuroimage 37: 572–578.
Weil, R. S., J. M. Kilner, J. D. Haynes, and G. Rees (2007). ‘Neural Correlates of Perceptual Filling-in of an
Artificial Scotoma in Humans’. Proceedings of the National Academy of Sciences USA 104: 5211–5216.
Weil, R. S., S. Watkins, and G. Rees (2008). ‘Neural Correlates of Perceptual Completion of an Artificial
Scotoma in Human Visual Cortex Measured Using Functional MRI’. Neuroimage 42: 1519–1528.
Williams, M. A., A. P. Morris, F. McGlone, D. F. Abbott, and J. B. Mattingley (2004). ‘Amygdala Responses
to Fearful and Happy Facial Expressions under Conditions of Binocular Suppression’. Journal of
Neuroscience 24: 2898–2904.
Winston, J. S., P. Vuilleumier, and R. J. Dolan (2003). ‘Effects of Low-spatial Frequency Components of
Fearful Faces on Fusiform Cortex Activity’. Current Biology 13: 1824–1829.
Wismeijer, D. A., R. van Ee, and C. J. Erkelens (2008). ‘Depth Cues, rather than Perceived Depth, Govern
Vergence’. Experimental Brain Research 184: 61–70.
Wismeijer, D. A., Erkelens, C. J., R. van Ee, and M. Wexler (2010). ‘Depth Cue Combination in
Spontaneous Eye Movements’. Journal of Vision 10 (6): 25.
Wokke, M. E., A. R. E. Vandenbroucke, H. S. Scholte, and V. A. F. Lamme (2013). ‘Confuse your
Illusion: Feedback to Early Visual Cortex Contributes to Perceptual Completion’. Psychological Science
24: 63–71.
Wolfe, J. M. (1984). ‘Reversing Ocular Dominance and Suppression in a Single Flash’. Vision Research
24: 471–478.
Yang, E., D. H. Zald, and R. Blake (2007). ‘Fearful Expressions Gain Preferential Access to Awareness
during Continuous Flash Suppression’. Emotion 7: 882–886.
Yang, E. and R. Blake (2012). ‘Deconstructing Continuous Flash Suppression’. Journal of Vision 12 (3): 8.
Yeh, Y.-Y. and C.-T. Yang (2009). ‘Is a Pre-change Object Representation Weakened under Correct
Detection of a Change?’ Consciousness and Cognition 18: 91–102.
Zaretskaya, N., S. Anstis, and A. Bartels (2013). ‘Parietal Cortex Mediates Conscious Perception of Illusory
Gestalt’. Journal of Neuroscience 33: 523–531.
Zaretskaya, N., A. Thielscher, N. K. Logothetis, and A. Bartels (2010). ‘Disrupting Parietal Function
Prolongs Dominance Durations in Binocular Rivalry’. Current Biology 20: 2106–2111.
Zhaoping, L. (2008). ‘Attention Capture by Eye of Origin Singletons even without Awareness: A Hallmark
of a Bottom-up Saliency Map in the Primary Visual Cortex’. Journal of Vision 8: 1.1–1.18.
Zhou, W. and D. Chen (2009). ‘Binaral Rivalry between the Nostrils and in the Cortex’. Current Biology
19: 1561–1565.
Zimba, L. D. and R. Blake (1983). ‘Binocular Rivalry and Semantic Processing: Out of Sight, Out of Mind’.
Journal of Experimental Psychology: Human Perception and Performance 9: 807–815.
Chapter 40
Visual perception textbooks and handbooks customarily do not include sections devoted to the
topic of time perception (the exception is van de Grind, Grusser, and Lunkenheimer 1973). But
this may soon change, with this chapter a sign of the times. In journals, the literature on tempo-
ral factors has grown very rapidly, and reviews in journals of time perception have proliferated
(Vroomen and Keetels 2010; Holcombe 2009; Wittmann 2011; Eagleman 2010; Grondin 2010;
Nishida and Johnston 2010; Spence and Parise 2010). In an attempt to restrict this review to fun-
damental issues, only simple judgments of temporal order will be considered. The rapidly growing
literature on duration judgments will not be discussed.
Interpreting experimental results requires assumptions. For temporal experience, it is tempting
to think of temporal experience as forming a single timeline, with all sensations mapped to points
or extents on that timeline. This assumption is often implicit in the literature, together with another
assumption to allow for the experience of simultaneity: that sensations closer than a certain inter-
val, the duration of the ‘simultaneity window’, are perceived as simultaneous (Meredith et al. 1987).
Yet it is far from clear whether experience comprises a single ordered timeline. This chapter
will question this assumption and ultimately suggest that our experience is frequently the product
of organizational processes whose purpose is not to create an ordered timeline. Rather, simpler
grouping and segmentation processes can be more important, with ordering sometimes only a
byproduct or not occurring at all.
Similar matters have arisen in the study of spatial perception. Marr (1982) suggested that the
visual system delivered a representation of the ordered 3-D layout of all the objects and surfaces in
a scene. This is similar to the ordered timeline view of temporal experience. The evidence suggests
that visual representation may be more impoverished than what Marr envisioned (Koenderink,
Richards, and van Doorn 2012) but in the spatial domain can still provide ordered and metric
depth relations (van Doorn et al. 2011). Whether our timeline of experience achieves that level of
organization, a consistent ordering, remains unclear.
One alternative to a well-ordered timeline is that we sometimes experience objects and quali-
ties with undefined temporal relationships. That is, there may be some percepts for which we do
not have an experience of before or after, and where the explanation for this failure is not simply
that the two stimuli fall within the simultaneity window. A possible example is provided in the
animations showcased at http://www.psych.usyd.edu.au/staff/alexh/research/colorMotionSimple.
In those animations, a field of dots alternates between leftward motion and rightward motion.
In synchrony with the motion direction alternation, the dots’ colour alternates between red and
green. Yet at alternation rates above about six times per second, one is unable to judge the pairing
of motion and colour, for example whether the leftward motion is paired with red or with green
The Temporal Organization of Perception 821
(Arnold 2005; Holcombe and Clifford 2012). Yet this rate is slow enough that the successive col-
ours and motions should not fall inside the same simultaneity window (Wittmann 2011).
A potentially related phenomenon was reported by William James in 1890. In Chapter 15 of his
Principles of Psychology, James claimed that
When many impressions follow in excessively rapid succession in time, although we may be distinctly
aware that they occupy some duration, and are not simultaneous, we may be quite at a loss to tell which
comes first and which last. (p.610)
Unfortunately, James provided no examples, so we do not know to what he was referring. More
detailed descriptions of dissociations of temporal order judgments and asynchrony judgments
have been provided by Jaśkowski and others (Jaśkowski 1991; Allan 1975), however these may
be explainable by decision criterion differences for the two tasks of a few tens of milliseconds.
A temporal order deficit that seems less likely to be explained by decision criteria differences was
reported by Holcombe, Kanwisher, and Treisman (2001), and can be experienced here: http://
www.psych.usyd.edu.au/staff/alexh/research/MOD/demo.html. When four letters are presented
serially, each for about 200 ms, and the sequence repeats, observers are typically unable to report
their order. Yet if the sequence is presented just once, the order of the items is easily perceived (for
a possible auditory analogue, see Warren et al. 1969).
What are the implications of this phenomenon for the nature of temporal experience? It may
mean that temporal experience is less organized than spatial experience. Ordering seems more
integral to our representations of space, which benefit from the retinotopic organization of vis-
ual cortices. The positions of items on the retina are readily available thanks to this topography
(although determining their locations in external space is another matter, requiring more myste-
rious mechanisms). This organization also affords parallel processing of a large range of locations.
Orientation and boundary processing as well as local motion processing occur at many locations
simultaneously, providing some spatial relationships preattentively and continuously (e.g. Levi
1996; Forte, Hogben, and Ross 1999). At a larger scale, perception of certain global forms is based
on massively parallel processing (Clifford, Holcombe, and Pearson 2004), which may also be true
of perceiving the location of the centroid of a large array (Alvarez 2011).
The visual brain has retinotopy but does not seem to have chronotopy. That is, no brain area
seems to include an array of neurons that systematically respond to different times, arranged
in temporal order. A possible exception is neurons selective for temporal rank order in
movement-related areas of cortex (Berdyyeva & Olson, 2010), but as far as we know these are
not involved in time perception. Our knowledge of the relative times of stimuli surely suffers for
lack of a chronotopic representation. Not only does the lack of chronotopy suggest the absence
of a readily available ordered temporal array, it may also mean less parallel processing of dis-
tinct times than of distinct locations. It is difficult to imagine that the brain gets by without any
parallel temporal processing, and without any sort of temporally structured buffer. Smithson
and Mollon (2006) and Smith et al. (2011) have provided some evidence for a temporally struc-
tured buffer in vision, but overall temporal processing seems less pre-organized than spatial
processing.
Retinotopy (or chronotopy) is not a full solution to the problem of perceiving spatial (or tem-
poral) relationships, even ignoring the complication of movements of the eyes and body. There
are aspects of spatial perception that are not achieved by specialized parallel processing, and those
solutions might also be used in temporal processing.
Two recent pieces of research suggest that some spatial relationships become available via serial,
one-by-one processing, through shifts of attention (Holcombe, Linares, and Vaziri-Pashkam
822 Holcombe
2011; Franconeri et al. 2011). With a moving spatial array, the Holcombe et al. (2011) study docu-
mented an inability to apprehend the spatial order of the items in the array when the items moved
faster than the speed limit on attentional tracking. This, together with a telling pattern of errors,
indicated that a time-consuming shift of spatial attention was necessary to determine the spatial
relationships among the stimuli. Converging evidence from Franconeri et al. (2011) suggests that
shifts of spatial attention are also involved in perceiving spatial relationships among static stimuli.
Attention may serve to select stimuli of interest for the limited-capacity processing that deter-
mines temporal and spatial relations.
Some aspects of the rich spatial layout we enjoy are thus a result of accumulated represen-
tations from multiple shifts of attention (see Cavanagh et al. 2010 for related ideas). In this
dependence on serial processing, spatial experience may be similar to temporal experience.
But even these attention-mediated aspects of spatial perception seem to capitalize on the par-
allel processing advantage of retinotopy. Shifting attention involves moving from activating
one set of location-labelled neurons to another set of location-labelled neurons (assuming
local sign has been set during the development of the organism—Lotze 1881). This may help
to calculate the vector of the attention shift, which then indicates the relative location of the
two regions.
Although it is limited by the absence of chronotopy, temporal processing does reap some ben-
efits from retinotopy. Thanks to retinotopy, motion detectors can operate in parallel across the
visual field. The motion direction they compute indicates the temporal order of stimuli.
It has also been suggested that retinotopy allows the visual system to compute in parallel
whether stimuli across the visual field change together (in synchrony) or not. Some investigators
suggested that this occurs not just for the luminance transients known to engage the motion
system, but also direction and contrast changes (Usher and Donnelly 1998; Lee and Blake 1999).
Follow-up work, however, supported alternative explanations (Dakin and Bex 2002; Beaudot
2002; Farid and Adelson 2001; Farid 2002). The issue remains unsettled, but the continued
absence of good evidence for parallel temporal processing feeds the suspicion that perception of
relative timing is serial and possibly attention-mediated. Temporal processing may be restricted
to what can be processed serially in the short interval before it disappears from our sensory
buffer.
In some ways even better than chronotopy would be time-stamping of all stimuli by an inter-
nal clock. The time stamp might be provided by a dedicated internal clock comprising a pace-
maker and counter (Treisman 1963; Ivry and Schlerf 2008) or a neural network with intrinsic
dynamics and an internal model of the network that translates the network state into the cur-
rent time (Karmarkar and Buonomano 2007). With time-stamping, relative timing of two events
is judged by simply comparing the time-stamps of the two events, just as is done by desktop
computers with files on a hard drive. If this were automatic and preattentive, then we might
have better-organized temporal experience than spatial experience. But there is little or no evi-
dence for extensive time-stamping. Instead the system may rely on less reliable information, like
the relative activation of different stimulus types. Because activation in cortex and presumably
short-term memory typically decreases over time, the most active item is likely to be the last one
presented, the second most active the item presented before, etc. This ‘recency’ scheme is sub-
ject to distortion as other factors like attention can affect which item is most active (Reeves and
Sperling 1986). The use of relative activation might also be thwarted with repeating displays that
result in saturation of the activation of multiple items.
An earlier paragraph described the alternating-motion display for which one cannot deter-
mine which colour goes with which motion direction (http://www.psych.usyd.edu.au/staff/alexh/
The Temporal Organization of Perception 823
research/colorMotionSimple). The repetition of this display may saturate in memory the acti-
vation levels of the colours and motions, preventing the use of relative activation levels to pair
the features. Another reason feature pairing may be difficult here is because pairing ordinarily
involves using salient temporal transients to temporally segment the dynamic scene (Holcombe
and Cavanagh 2008; Nishida and Johnston 2010; Nishida and Johnston 2002). The unusual unin-
terrupted motion of the alternating-motion display results in continual transients that swamp
registration of the transient associated with the colour change, and without other cues to rapidly
guide attention to the transients of interest (Holcombe and Cavanagh 2008), temporal experience
of the colour and motion remains poorly organized.
Only when the rate is slow can attention select an individual phase of the cycle, and that selec-
tion returns two features, indicating they occurred at the same time (Holcombe and Cavanagh
2008). This is like spatial visual search, for which Treisman and Gelade (1980) suggested that
attentional mediation is required to perceive that a colour and shape originate from the same
location. For time, strong luminance transients serve to engage the selective mechanism (perhaps
attention, or a ‘when’ pathway) that can make temporal relations explicit.
Thus determination of temporal order and simultaneity is best when just two punctate, discrete
events with strong transients are presented. In the remainder of this chapter we will set aside the
segmentation and processing capacity problems created by complex scenes. For the ideal situation
of two stimuli, we will examine how sophisticated visual temporal processing can be.
There is an important basic theoretical distinction between the time a percept is created and
at what time the observer experiences the event to have taken place. The analogous distinction
in spatial perception is uncontroversial, with the phrase ‘where an object is perceived’ taken to
mean ‘where an object is perceived to be’ rather than where in the brain the percept is created.
Yet if time is substituted for space and we write ‘when an object is perceived’, this will be inter-
preted by many as the time the percept was created rather than the time the percept refers to.
This is the issue of brain time vs event time—whether the brain processes events such that when
a percept arises is not identical to the time it is experienced as having occurred (Dennett and
Kinsbourne 1992).
Event time advocates have affirmed the distinction and moreover claimed that the system rou-
tinely considers the time of sensory signals together with other cues to infer the time of the cor-
responding stimuli in the external world. But this conclusion may be premature.
The alternative to brain time theory is that some property of signals other than when they arrive
affects when the associated events are perceived to have taken place. The brain may have adaptive
processes that result in perceived timing being closer to veridical than they would be otherwise.
But some question this supposition, among them Moutoussis, who writes that ‘the idea of the
perception of the time of a percept being different to the time that the actual percept is being
perceived, seems quite awkward’ (Moutoussis 2012: 4).
To other thinkers (e.g. Dennett and Kinsbourne 1992), this would be no more peculiar than
spatial illusions, wherein the perceived location of an object is dissociated from its retinal location
(e.g. Roelofs 1935; De Valois and De Valois 1991). Time perception may be as much a construc-
tive, interpretational process as is space perception. But to date, the evidence is that time percep-
tion does not adaptively take into account various cues to correct timing as comprehensively as
spatial perception uses spatial cues.
The perceptual correlate of the intensity-related neural delay also manifests in motion signal
processing. Roufs (1963) and Arden and Weale (1954) presented two flashes simultaneously and
side by side on a dark background. When one flash was brighter than the other, motion was per-
ceived from the brighter flash to the dimmer flash. Stromeyer and Martini (2003) documented a
similar effect for two gratings differing in contrast rather than luminance. Motion was perceived
in the direction from the higher-contrast grating to the lower-contrast grating, consistent with
physiological evidence for latency decreasing with contrast as well as with luminance (Shapley and
Victor 1978; Benardete and Kaplan 1999). A number of other motion illusions are also consistent
with the effect of luminance or contrast on latency (Purushothaman et al. 1998; Ogmen et al. 2004;
Lappe and Krekelberg 1998; White, Linares, and Holcombe 2008; Kitaoka and Ashida 2007).
An apparent concordance of physiological latency and percepts is also observed for stimuli
darker than the background vs stimuli brighter than the background. ON-centre ganglion cells
in primate retina respond ~5 ms faster than OFF-centre cells. Correspondingly, psychophysical
motion nulling experiments in humans indicate that dark dots have a processing latency of about
3 ms shorter than bright dots (Del Viva, Gori, and Burr 2006).
Together these illusions indicate that brain time rules when it comes to neural latency differ-
ences caused by variations in luminance or contrast. Unfortunately we cannot exclude the pos-
sibility that the brain engages in partial compensation for the latency difference while consistently
falling short of full compensation. But the size of the effects are similar in human perceptual stud-
ies and in the latency of physiological responses in nonhuman animals (Maunsell et al. 1999; Oram
et al. 2002), so any neural accounting for latency differences must be woefully under-complete.
To explain these phenomena, defenders of the event time hypothesis may argue that they are
an exception, perhaps because these luminance-related latency differences are unimportant to the
organism. But this argument is less than compelling, as explained in the next section.
the perception→action mapping. That is, the error signal may not propagate to the deeper (sensa-
tion and perception) layers of the system because they are farther from the teaching feedback.
small for most events, during storms we sometimes experience a very large timing difference. A dis-
tant thunderclap is heard a few seconds after the light from the physically simultaneous lightning
bolt. Because we do not perceive distant thunder and lightning as simultaneous, clearly our brain
does not reconstruct the simultaneity of these events. This is unsurprising even for advocates of event
time reconstruction, because the nature of the event and its distance is not easily perceived. For much
closer events, however, from a few centimetres to a few dozen metres away, some have suggested that
neural processing does result in perceiving an associated sound and light as simultaneous.
Studies of the issue have generally presented a light and a sound at different distances and
different relative timings. According to the event time hypothesis, the point of subjective simul-
taneity for the sound and the light should shift with greater object distance. That is, for greater
object distances, larger sound delays should be considered simultaneous. However, different stud-
ies have yielded very different results. Keetels and Vroomen (2012) and Vroomen and Keetels
(2010) provide good reviews of the subject and consider various explanations for the discrepancy
between those that favour the hypothesis (Sugita and Suzuki 2003; Alais and Carlile 2005; Engel
and Dougherty 1971; Kopinska and Harris 2004) and those that do not (Arnold, Johnston, and
Nishida 2005; Heron et al. 2007; Lewald and Guski 2003; Stone et al. 2001). The issue is complex,
for example because negative findings can be blamed on the experimenters presenting the visual
and auditory information in such a way that the observer perceives the distance to the sound
inaccurately. Second, whether trials with different times and distances were blocked or mixed
can change the adaptation state of the observer, and as this can shift the simultaneity point (as
described below), it might explain some of the findings supporting latency compensation.
when presented with simultaneous stimulation to ankle and forehead, tended to report that the
forehead was stimulated first. More specifically, in those five participants the ankle had to be
touched 23 to 30 ms earlier than the forehead for the best chance of perceived simultaneity. In the
sixth observer, he instead found evidence for simultaneity constancy, with the point of subjective
simultaneity being true physical simultaneity. It is hard to know what to conclude, and indeed
Klemm himself expressed some frustration. Klemm also noted that even when participants per-
formed the temporal task without a problem, some continued to report that, as described in the
previous paragraph, it felt artificial to categorize temporal order.
Halliday and Mingay (1964) performed a similar study, but unfortunately with only two partici-
pants. For both participants, Halliday and Mingay concluded that touches of more distal body parts
(toe vs index finger, in their case) were perceived to have occurred later. Harrar and Harris (2005)
followed with more experiments that yielded the same result, using temporal order judgments to
infer the time difference for subjective simultaneity. Quantitatively, pooling the data across their
six participants, they reported that the difference in perceived timing was approximately that pre-
dicted by the differences in simple reaction time to the body parts involved. Unfortunately, they
did not assess whether some participants were different than others, so we do not know if there
was the significant variation between participants that Klemm found. Bergenheim et al. (1996)
also investigated the issue, and like the others found evidence that stimulation of the more distal
body parts was perceived later than more proximal areas. However, Bergenheim et al. suggested
that the discrepancy they found between foot and arm (12 ms) was not as large as it should be for
the difference in conduction latency indicated by physiological studies.
In summary, all researchers found that on average, stimulation of distal areas of the skin was
perceived as occurring earlier in time than stimulation of more proximal areas. If there is any
compensation at all, it appears that the proportion of latency difference compensated for is small,
or the proportion of people who compensate for latency is small. Settling the issue will require
more studies of this topic using modern physiological methods, larger numbers of participants,
and enough data per participant to assess simultaneity constancy in each participant.
To evaluate whether the times at which signals are perceived reflects compensation for signal
processing latencies, we have reviewed the effects on perceptual latency of luminance, originat-
ing modality, the speed of sound, and the length of tactile fibers. The support in the literature for
adaptive compensation in these instances ranges from none to mixed.
Yet one class of studies provides strong evidence for limited compensation. These are the stud-
ies of adaptation to asynchrony. The phenomenon involved suggests a path to understanding the
imperfect and limited processing that can compensate for differential latency.
Machulla, and Ernst 2009). Compensation for a particular asynchrony has also been observed for
the temporal delay between actions and their sensory consequences (Cunningham, Billock, and
Tsou 2001; Stetson et al. 2006), and these shifts do not seem to be caused by shifting the physical
time of stimulus-evoked neural signals (Roach et al. 2010).
Not only do these results constitute evidence for event time reconstruction rather than reliance
on brain time, but they also indicate how latency differences might be known, through learning.
The rationale for these shifts may stem from the statistics of the natural environment, where the
distribution of the relative timing of stimulation by external events is likely to be centred on or
near zero (simultaneity). Processes for compensation of any consistent departures of the average
may therefore cause the adaptation effects.
These adaptation effects are analogous to after-effects for other aspects of perception such as
motion and orientation. Accordingly, to explain these effects researchers typically invoke similar
neural mechanisms as those that have been proposed to explain traditional adaptation effects.
Specifically, a typical suggestion is that neurons in the brain are selective for the adapted feature,
and that adaptation of these neurons causes the after-effect. In the case of the intersensory timing
shifts, both Roach et al. (2010) and Cai, Stetson, and Eagleman (2012) suggest that the responsi-
ble neurons are multimodal neurons tuned to different asynchronies between the modalities. In
the cat, there are indeed multimodal neurons that prefer different asynchronies (Meredith et al.
1987) and these also appear to exist in rhesus monkeys (Wallace, Wilkinson, and Stein 2012).
The relative timing perceived may reflect the differing activity of these neurons. Adaptation shifts
this activity difference in a manner that compensates for the asynchrony (Roach et al. 2010 Cai,
Stetson and Eagleman 2012).
extend the logic of explaining simple asynchrony adaptation with multimodal neurons by posit-
ing neurons that are jointly selective for actor and audiovisual timing. But this might lead to a
combinatorial explosion of neurons, as the contingency on ‘actor’ is unlikely to be the only pos-
sible contingency. A range of neurons would be needed for each kind of contingency. A process
with more flexibility should be considered.
The processing that shifts decision criteria may fit the bill of a suitably flexible process that can
accommodate different contingencies. In signal detection theory, the criterion is a threshold level
of the internal signal that the observer uses to decide which response to make. In the context of
a simultaneity judgment the relevant signal may be something like the difference in the internal
timing of the auditory response and the visual response. This signal is assumed to have a Gaussian
distribution. As the timing difference is signed (indicating whether auditory was before vs after
visual), two criteria may be involved: one for the positive side of the distribution (discriminating
simultaneous from auditory after visual) and one for the negative side (discriminating simultane-
ous from visual after auditory). See Yarrow et al. (2011) for discussion.
Shifts of these decision criteria result in shifts in points of subjective simultaneity, from which
perceived timing is inferred. Repeated exposure to a particular asynchrony might cause the sys-
tem to shift the decision criteria in the direction of compensation. This account is in a different
spirit than those involving adaptation of a population of asynchrony-tuned neurons (Roach et al.
2009; Cai, Stetson, and Eagleman 2012). Among psychophysicists, criterion shifts are often con-
sidered uninteresting. The notion seems to be that a criterion shift is more likely to be caused by
observers taking a different attitude towards their percepts rather than perception itself changing.
In contrast, the asynchrony-tuned neuron account is firmly a theory of change of percepts, from
a shift in underlying neural populations. Fortunately, there is some hope of distinguishing these
accounts by experiment, although this has not yet been done. The asynchrony-tuned neuron code
account appears to predict that sensitivity will change, not just criterion.
The evidence in the literature appears consistent with a shift in criteria (Fujisaki et al. 2004;
Vroomen et al. 2004; Yarrow et al. 2011; Hanson, Heron, and Whitaker 2008). Certainly, no one
has demonstrated that their result could not be explained by a shift in criteria or greater variability
in criteria (Roach et al. 2010; Yarrow et al. 2011).
But one should not dismiss lack of evidence for sensitivity change as implying that percepts
did not change. As Michael Morgan and colleagues have pointed out, even some indisputably
perceptual effects, like the motion after-effect, may be caused by criterion shifts (or ‘subtractive
adaptation’) rather than sensitivity changes (Morgan, Chubb, and Solomon 2011; Morgan and
Glennerster 1991; Morgan, Hole, and Glennerster 1990).
Thus an after-effect that manifests only as a criterion shift is not necessarily non-perceptual. To
get a fuller view of what needs to be explained, future investigations should document the scope
of contingencies adapted to. Perhaps, given an appropriate task and stimulus exposure protocol,
timing shifts could be accomplished for completely arbitrary stimulus pairings, with one pair of
criteria for pictures of Jennifer Aniston, another for pictures of pink koalas, and another for a
person whose face you didn’t encounter until the experiment began. For the brain to accomplish
such a feat, some process has to store these criteria and trot them out for the appropriate tasks and
stimuli. This topic is rarely discussed in the adaptation literature, but raises interesting issues that
may be widespread in the study of human cognition and learning.
While the Roseboom and Arnold (2011) result may herald an explosion of contingent timing
shifts, this may be restricted to situations of high temporal uncertainty regarding the time of sen-
sory signals. For rather than using a simple tone and flash as had been used in previous studies,
Roseboom and Arnold (2011) presented extended, time-varying video and auditory stimuli. The
832 Holcombe
video clip involved facial movements of the actor that extended over what appears to be (from
the supplementary clip provided in the paper) several hundred milliseconds, and the duration
of the auditory syllable signal was probably also at least a few hundred milliseconds. Both were
complex stimuli with multiple features occurring over their time-course, with differing durations
and without unambiguous discrete onsets and offsets. In such a situation, to determine whether
the stimuli were simultaneous, one must identify which stimulus features should go together.
The adaptation process may then be one of associating particular features of the extended video
signal that occur at certain times with particular features of the auditory train. This might be the
explanation of the results—after repeated experience hearing a particular part of the auditory
train presented simultaneously with a particular lip movement, one may learn that is the way that
particular speaker talks. Deviations from that learned timing for simultaneity are then perceived,
correctly, as temporally shifted from that speaker’s usual timing. This may thus be a criterion shift,
and one that does not generalize to cases where the auditory-visual matching is unambiguous.
This interpretation that the contingent asynchrony adaptation found by Roseboom and Arnold
(2011) will not generalize to unambiguous audiovisual correspondence situations gets some sup-
port from the results of Heron et al. (2012). Like Roseboom and Arnold (2011), Heron et al. (2012)
tested whether intersensory asynchrony adaptation could be contingent on the identity of the stim-
ulus. Instead of using different actors paired with their respective voices, they used high spatial
frequency gratings with high-pitched tones and low spatial frequency gratings with low-pitched
tones. Other researchers have shown that observers tend to spontaneously associate these values
(Evans and Treisman 2010; Spence 2011), suggesting they are not entirely unnatural associations.
Yet unlike Roseboom and Arnold (2011), these authors found that the asynchrony adaptation did
not ‘stick’ to the identity of the stimulus, but was instead tied to the spatial location. Thus they
demonstrated adaptation to opposite asynchronies (visual before auditory and visual after audi-
tory) tied to distinct locations. This is compatible with mediation by a brain area like the superior
colliculus that is retinotopically organized and has neurons tuned to audiovisual asynchronies. The
accounts based on a population of neurons tuned to various asynchronies therefore remains viable.
We have considered whether the brain sets the perceived timing of sensory signals to com-
pensate for learned or imputed sensory latencies. In a limited way it does, but the scope of the
phenomenon and nature of the underlying processing remains obscure.
Summary
We do not yet know whether perception consistently represents event sequences as a timeline, in
the way that in the spatial domain we have a strong sense of the layout of a scene. It may be that
temporal experience is more impoverished.
When several to many stimuli are presented rather than just a few, most of the temporal relations
may be unavailable or reliant on unreliable cues like relative strength of the items in short-term
memory (Reeves and Sperling 1986). When just two stimuli accompanied by strong transients are
presented, they are more likely to engage attention and result in a clear percept of temporal order
(Fujisaki and Nishida 2007).
Extracting certain spatial relationships also seems to require attentional mediation (Holcombe,
Linares, and Vaziri-Pashkam 2011; Franconeri et al. 2011). But aspects of spatial perception take
advantage of the brain’s topographic arrays to process information in parallel, whereas the visual
brain may lack a chronotopic bank of processors.
In recent years much of the literature has focused on deciding between the event time recon-
struction theory and brain time. But the reality may be a modest amount of event time reconstruc-
tion that emerges from a recalibration process that shifts cross-modal simultaneity points after
prolonged exposure to asynchrony. Operating in parallel with this recalibration may be organiza-
tional processes that create temporal illusions as a byproduct of Gestalt grouping (Benussi 1913).
In evolutionary history, success at event reconstruction has likely been a factor in selecting the
winning organisms over the now-extinct losers. But segmenting events and identifying them may
have been both more important for the organism and more feasible than determining exact event
timing. When absolute timing is critical, learning of sensorimotor mappings may be used for cor-
rect timing of behaviour rather than changes to perception.
Acknowledgments
I thank Lars T. Boenke, Colin Clifford, and Paolo Martini for discussions, and Lars T. Boenke,
Alex L. White, and Daniel Linares for comments on an earlier version of the manuscript. I thank
Alex L. White for the point that in snapping one’s fingers, it is not obvious which part of the
visual sequence generated the sound. Lars T. Boenke translated Klemm (1925) from German into
English. The writing of this chapter was supported by ARC grants DP110100432 and FT0990767.
834 Holcombe
References
Alais, D. and S. Carlile (2005). ‘Synchronizing to Real Events: Subjective Audiovisual Alignment Scales
with Perceived Auditory Depth and Speed of Sound’. Proceedings of the National Academy of Sciences of
the United States of America 102(6): 2244–2247.
Albertazzi, L. (1999). ‘The Time of Presentess. A Chapter in Positivistic and Descriptive Psychology’.
Axiomathes 10: 49–73.
Allan, L. G. (1975). ‘The Relationship between Judgments of Successiveness and Judgments of Order’.
Perception and Psychophysics 18: 29–36.
Allik, J. and K. Kreegipuu (1998). ‘Multiple Visual Latency’. Psychological Science 9: 135–138.
Alpern M. (1954). ‘The Relation of Visual Latency to Intensity’. A.M.A. Archives. of Ophthamology
51: 369–374.
Alvarez, G. A. (2011). Representing multiple objects as an ensemble enhances visual cognition. Trends in
Cognitive Sciences 15(3): 122–131. doi:10.1016/j.tics.2011.01.003.
Arden, G. B. and R. A. Weale (1954). ‘Variations of the Latent Period of Vision’. Proceedings of the Royal
Society of London B 142: 258–267.
Arnold, D. H. (2005). ‘Perceptual Pairing of Colour and Motion’. Vision Research 45(24): 3015–3026.
Arnold, D. H., A. Johnston, and S. Nishida (2005). Timing sight and sound. Vision Research 45: 1275–1284.
doi:10.1016/j.visres.2004.11.014.
Beaudot, W. H. (2002). Role of onset asynchrony in contour integration. Vision Research, 42(1), 1–9.
Benardete, E. A. and E. Kaplan (1999). ‘The Dynamics of Primate M Retinal Ganglion Cells’. Visual
Neuroscience 16: 355–368.
Benussi, V. (1913). Psychologie der Zeitauffassung. Winter: Heidelberg.
Berdyyeva, T. K. and C. R. Olson (2010). Rank signals in four areas of macaque frontal cortex during
selection of actions and objects in serial order. Journal of Neurophysiology 104(1): 141–159.
Bergenheim, M., H. Johansson, B. Granlund, and J. Pedersen (1996). ‘Experimental Evidence for a
Sensory Synchronization of Sensory Information to Conscious Experience’. In Towards a Science of
Consciousness: The First Tucson Discussions and Debates, edited by S. R. Hameroff, A. W. Kaszniak, and
A. C. Scott, pp. 301–310. Cambridge, MA: MIT Press.
Cai, M., C. Stetson, and D. M. Eagleman. (2012). A Neural Model for Temporal Order Judgments and
their Active Recalibration: A Common Mechanism for Space and Time? Frontiers in Psychology
3(November): 1–11. doi:10.3389/fpsyg.2012.00470.
Cavanagh, P., A. R. Hunt, A. Afraz, and M. Rolfs (2010). ‘Visual Stability Based on Remapping of Attention
Pointers’. Trends in Cognitive Sciences 14(4): 147–153. doi:10.1016/j.tics.2010.01.007.
Clifford, C. W. G., A. O. Holcombe, and J. Pearson (2004). Rapid global form binding with loss of
associated colors. Journal of Vision 4: 1090–1101.
Cunningham, D. W., V. A. Billock, and B. H. Tsou (2001). ‘Sensorimotor Adaptation to Violations of
Temporal Contiguity’. Psychological Science 12: 532–5.
Dakin, S. C. and P. J. Bex (2002) The Role of Synchrony in Contour Binding: Some Transient Doubts
Sustained. Journal of the Optical Society of America A 19(04): 678–686
De Valois, R. L. and K. K. De Valois (1991). ‘Vernier Acuity with Stationary Moving Gabors’. Vision
Research 31(9): 1619–1626.
Del Viva, M. M., M. Gori and D. C. Burr (2006). ‘Powerful Motion Illusion Caused by Temporal
Asymmetries in ON and OFF Visual Pathways’. Journal of Neurophysiology 95(6): 3928–32. doi:10.1152/
jn.01335.2005
Dennett, D. and M. Kinsbourne (1992). ‘Time and the Observer: The Where and When of Consciousness
in the Brain’. Behavioral and Brain Sciences 15(1992): 1–35.
The Temporal Organization of Perception 835
Heron, J., J. V. M. Hanson, and D. Whitaker (2009). ‘Effect Before Cause: Supramodal Recalibration of
Sensorimotor Timing’. PLoS ONE 4: e7681. doi:10.1371/journal.pone. 0007681.
Heron, J., N. W. Roach, J. V. M. Hanson, P. V. McGraw, and D. Whitaker (2012). ‘Audiovisual Time
Perception is Spatially Specific’. Experimental Brain Research 218(3): 477–485. doi:10.1007/
s00221-012-3038-3.
Hess C. V. (1904) Untersuchungen über den Erregungsvorgan im Sehorgan der Katze bei kurz- und bei
länger dauernder Reizung. Pflügers Arch ges Physiolo 101: 226–262.
Holcombe, A. O. and P. Cavanagh (2008). ‘Independent, Synchronous Access to Color and Motion
Features’. Cognition 107(2): 552–580.
Holcombe, A. O. (2009). ‘Seeing Slow and Seeing Fast: Two Limits on Perception’. Trends in Cognitive
Science 13(5): 216–221.
Holcombe, A. O., D. L. Linares, and M. Vaziri-Pashkam (2011). ‘Perceiving Spatial Relationships via
Attentional Tracking and Shifting’. Current Biology 21: 1–5.
Holcombe, A. O. and C. W. Clifford (2012). ‘Failures to Bind Spatially Coincident Features: Comment on
Di Lollo’. Trends in Cognitive Science 16(8): 402.
Holcombe, A. O., N. Kanwisher, and A. Treisman (2001). ‘The Midstream Order Deficit’. Perception and
Psychophysics 63(2): 322–329.
Ivry, R. B. and J. E. Schlerf (2008). Dedicated and intrinsic models of time perception. Trends in Cognitive
Sciences 12(7): 273–280.
James, W. (1890). Principles of Psychology. Accessed from http://psychclassics.yorku.ca/James/Principles/
Jaśkowski, P. (1991). ‘Two-Stage Model for Order Discrimination’. Perception and Psychophysics 50: 76–82.
Kafaligonul, H. and G. R. Stoner (2010). ‘Auditory Modulation of Visual Apparent Motion with Short
Spatial and Temporal Interval’. Journal of Vision 10: 1–13. doi:10.1167/10.12.31.
Karmarkar, U. R. and D. V. Buonomano (2007). Timing in the absence of clocks: encoding time in neural
network states. Neuron, 53(3): 427–38.
Kitaoka, A. and H. Ashida (2007). A variant of the anomalous motion illusion based upon contrast and
visual latency. Perception, 36(7), 1019–1035. doi:10.1068/p5362
Kitaoka, A. and H. Ashida (2003). ‘Phenomenal Characteristics of the Peripheral Drift Illusion’. Vision
Research 15: 261–262.
Keetels, M. and J. Vroomen (2012). ‘Perception of Synchrony between the Senses’. In Frontiers in the Neural
Basis of Multisensory Processes, edited by M. T. Wallace and M. M. Murray, pp. 147–178. London:
CRC Press.
Klemm, O. (1925). ‘Über die Wirksamkeit kleinster Zeitunterschiede auf dem Gebiete des Tastsinns’. Archiv
fur die gesamte Psychologie 50: 205–220.
Koenderink, J., W. Richards, and A. van Doorn (2012). ‘Space-time Disarray and Visual Awareness’.
i-Perception 3(3): 159–162. doi:10.1068/i0490sas.
Köhler, W. (1947). Gestalt Psychology: An Introduction to New Concepts in Modern Psychology.
New York: Liveright.
Kopinska, A. and L. R. Harris. (2004). ‘Simultaneity Constancy’. Perception 33(9): 1049–1060.
Lappe, M., & Krekelberg, B. (1998). The position of moving objects. Perception, 27(12), 1437–1449.
Lee, S. H., and R. Blake (1999). Visual form created solely from temporal structure. Science, 284(5417),
1165–1168.
Levi, D. (1996). ‘Pattern Perception at High Velocities’. Current Biology 6(8): 1020–1024.
Lewald, J. and R. Guski (2004). ‘Auditory–Visual Temporal Integration as a Function of Distance: No
Compensation for Sound-transmission Time in Human Perception’. Neuroscience Letters
357(2): 119–122.
Lotze, H. (1881). Grundzüge der Psychologie. Leipzig: Dictate aus den Vorlesungen S. Hirzel.
The Temporal Organization of Perception 837
McBeath, M. K., J. G. Neuhoff, and D. J. Schiano (1993). ‘Familiar Suspended Objects Appear Smaller than
Actual Independent of Viewing Distance’. Paper presented at the Annual Convention of the American
Psychological Society, Chicago, IL.
Macefield, G., S. C. Gandevia, and D. Burke (1989). ‘Conduction Velocities of Muscle and Cutaneous
Afferents in the Upper and Lower Limbs of Human Subjects’. Brain 112(6): 1519–1532.
McLeod, P., C. McLaughlin, and I. Nimmo-Smith (1985). ‘Information Encapsulation and Automaticity
Evidence from the Visual Control of Finely Timed Actions’. In Attention and Performance XI, edited by
M. I. Posner and O. S. Marin. Hillsdale, NJ: Erlbaum.
McLeod, P. and S. Jenkins (1991). ‘Timing Accuracy and Decision Time in High-speed Ball Games’.
International Journal of Sport Psychology 22: 279–295.
Marr, D. (1982). Vision. San Francisco, CA: Freeman.
Maunsell, J. H., G. M. Ghose, J. A. Assad, C. J. McAdams, C. E. Boudreau, and B. D. Noerager (1999).
‘Visual Response Latencies of Magnocellular and Parvocellular LGN Neurons in Macaque Monkeys’.
Visual Neuroscience 16(1): 1–14.
Meredith, M. A., J. W. Nemitz, and B. E. Stein (1987). Determinants of multisensory integration in
superior colliculus neurons. I. Temporal factors. Journal of Neuroscience, 7(10): 3215–3229.
Morein-Zamir, S., S. Soto-Faraco, and A. Kingstone. (2003). ‘Auditory Capture of Vision: Examining
Temporal Ventriloquism’. Cognitive Brain Research 17(1): 154–163.
Morgan, M. J., G. J. Hole, and A. Glennerster (1990). ‘Biases and Sensitivities in Geometrical Illusions’.
Vision Research 30: 1793–1810.
Morgan, M. J. and A. Glennerster (1991). ‘Efficiency of Locating Centres of Dot-clusters by Human
Observers’. Vision Research 31: 2075–2083.
Morgan, M. J., C. Chubb, and J. A. Solomon (2011). ‘Evidence for a Subtractive Component in Motion
Adaptation’. Vision Research 51: 2312–2316.
Morgan, M., B. Dillenburger, S. Raphael, and J. A. Solomon (2012). ‘Observers Can Voluntarily Shift their
Psychometric Functions without Losing Sensitivity’. Attention, Perception and Psychophysics 74: 185–193.
Moutoussis, K. (2012). Asynchrony in Visual Consciousness and the Possible Involvement of Attention.
Frontiers in Psychology 3: 1–9.
Musacchia, G., C. E. and Schroeder (2009). ‘Neuronal Mechanisms, Response Dynamics and Perceptual
Functions of Multisensory Interactions in Auditory Cortex’. Hearing Research 258(1–2): 72–79.
doi:10.1016/j.heares.2009.06.018.
Nijhawan, R. (2008). ‘Visual Prediction: Psychophysics and Neurophysiology of Compensation for Time
Delays’. Behavioral and Brain Sciences 31: 179–239.
Nishida, S. and A. Johnston (2002). ‘Marker Correspondence, not Processing Latency, Determines
Temporal Binding of Visual Attributes’. Current Biology 12(5): 359–368.
Nishida S. and A. Johnston (2010). ‘The Time Marker Account of Cross-channel Temporal Judgments’.
In Space and Time in Perception and Action, edited by R. Nijhawan and B. Khurana, pp. 278–300.
Cambridge: Cambridge University Press.
Ogmen, H., S.S. Patel, H.E. Bedell, and K. Camuz (2004). Differential latencies and the dynamics of the
position computation process for moving targets, assessed with the flash-lag effect. Vision Research
44: 2109–2128.
Oram, M. W., D. Xiao, B. Dritschel, and K. R. Payne (2002). ‘The Temporal Resolution of Neural
Codes: Does Response Latency Have a Unique Role?’ Philosophical Transactions of the Royal Society
B: Biological Sciences 357(1424): 987–1001.
Purushothaman, G., S. S. Patel, H. E. Bedell, and H. Ogmen (1998). Moving ahead through differential
visual latency. Nature 396(6710): 424. doi:10.1038/24766.
Reeves, A. and G. Sperling (1986). ‘Attention Gating in Short-term Visual Memory’. Psychological Review
93(2): 180–206.
838 Holcombe
Regan, D. (1989). Human Brain Electrophysiology: Evoked Potentials and Evoked Magnetic Fields in Science
and Medicine. New York: Elsevier.
Roach, N. W., J. Heron, D. Whitaker, and P. V. McGraw (2010). ‘Asynchrony Adaptation Reveals Neural
Population Code for Audio-visual Timing’. Proceedings of the Royal Society: Biological Sciences
278(1710): 1314–1322. doi:10.1098/rspb.2010.1737.
Roelofs, C. (1935). ‘Optische localisation’. Archive fur Augenheilkunde 109: 395–415.
Roseboom, W. and D. H. Arnold (2011). Twice upon a time: multiple concurrent temporal recalibrations of
audiovisual speech. Psychological Science, 22(7): 872–7. doi:10.1177/0956797611413293.
Roseboom, W., S. Nishida, W. Fujisaki, and D. H. Arnold (2011). ‘Audio-visual Speech Timing Sensitivity
Is Enhanced in Cluttered Conditions’. PloS ONE 6(4): 1–8. doi:10.1371/journal.pone.0018309.
Roufs, J. A. J. (1963). ‘Perception Lag as a Function of Stimulus Luminance’. Vision Research 3: 81–91.
Schneider, K. A. and D. Bavelier (2003). ‘Components of Visual Prior Entry’. Cognitive Psychology
47(4): 333–366.
Shams, L., Y. Kamitani, and S. Shimojo (2002). ‘Visual Illusion Induced by Sound’. Cognitive Brain Research
14(1): 147–152.
Shams, L., Y. Kamitani, and S. Shimojo (2000). ‘Illusions. What You See Is What You Hear’. Nature
408(6814): 788.
Shapley, R. M. and J. D. Victor (1978). ‘The Effect of Contrast on the Transfer Properties of Cat Retinal
Ganglion Cells’. Journal of Physiology 285: 275–298.
Shore, D. I., E. Spry, and C. Spence (2002). ‘Confusing the Mind by Crossing the Hands’. Cognitive Brain
Research 14: 153–163.
Sinico, M. (1999). ‘Benussi and the History of Temporal Displacement’. Axiomathes 10: 75–93.
Smith, W. S., J. D. Mollon, R. Bhardwaj, and H. E. Smithson (2011). ‘Is There Brief Temporal Buffering of
Successive Visual Inputs?’ The Quarterly Journal of Experimental Psychology: 64(4): 767–791.
Smithson, H. and J. Mollon (2006). ‘Do Masks Terminate the Icon?’ Quarterly Journal of Experimental
Psychology 59(1): 150–160.
Snowden, R., P. Thompson, and T. Troscianko (2006). Basic Vision. Oxford: Oxford University Press.
Spence, C. and C. Parise (2010). ‘Prior-entry: A Review’. Consciousness and Cognition 19(1): 364–79.
doi:10.1016/j.concog.2009.12.001.
Spence, C. (2011). ‘Crossmodal Correspondences: A Tutorial Review’. Attention, Perception, and
Psychophysics 73: 971–995.
Stetson, C., X. Cui, P. R. Montague, and D. M. Eagleman (2006). ‘Motor-sensory Recalibration Leads to an
Illusory Reversal of Action and Sensation’. Neuron 51: 651–659.
Stone, J. V., M. M. Hunkin, J. Porrill, R. Wood, V. Keeler, M. Beanland, M. Port, et al. (2001). ‘When is
Now? Perception of Simultaneity’. Proceedings of the Royal Society: Biological Sciences 268(1462): 31–8.
doi:10.1098/rspb.2000.1326.
Stromeyer, C. F. and P. Martini (2003). ‘Human Temporal Impulse Response Speeds Up with Increased
Stimulus Contrast’. Vision Research 43(3): 285–298.
Sugita, Y. and Y. Suzuki (2003). Audiovisual perception: Implicit estimation of sound-arrival time. Nature
421(6926): 911.
Tanji, J. (2001). ‘Sequential Organization of Multiple Movements: Involvement of Cortical Motor Areas.
Annual Reviews of Neuroscience 24: 631– 651.
Treisman, A. and G. Gelade (1980). A feature integration theory of attention. Cognitive Psychology
12: 97–136.
Treisman, M. (1963). Temporal discrimination and the indifference interval: Implications for a model of the
“internal clock”. Psychological Monographs General Applied 77(13): 1–31.
Usher, M. and N. Donnelly (1998). Visual synchrony affects binding and segmentation in perception.
Nature 394(9 July): 179–182.
The Temporal Organization of Perception 839
Uttal, W. R. (1979). ‘Do Central Nonlinearities Exist?’ Behavioral and Brain Sciences 2: 286.
van Eijk, R. L., A. Kohlrausch, J. F. Juola, and S. van de Par (2008). ‘Audiovisual Synchrony and Temporal
Order Judgments: Effects of Experimental Method and Stimulus Type’. Perception and Psychophysics
70(6): 955–968.
Van de Grind, W. A., O. -J. Grüsser, and H. U. Lunkenheimer (1973). Temporal transfer properties of the
afferent visual system. Psychophysical, neurophysiological and theoretical investigations. In R. Jung
(Ed.), Handbook of sensory physiology (Vol. VII/3, pp. 431–573). Berlin: Springer, Chapter 7
van Doorn, A. J., J. J. Koenderink, and J. Wagemans (2011). Rank order scaling of pictorial depth.
i-Perception (special issue on Art & Perception) 2: 724–744. doi:10.1068/i0432aap.
Vicario, G. B. (2003). ‘Temporal Displacement’. In The Nature of Time: Geometry, Physics, and Perception,
edited by R. Buccheri, M. Saniga, and M. S. Stuckey, pp. 53–66. Dordrecht: Kluwer Academic.
von der Malsburg, C. (1981). ‘The Correlation Theory of Brain Function’. In Models of Neural Networks II:
Temporal Aspects of Coding and Information Processing in Biological Systems, edited by J. L. Domany, J. L.
van Hemmen and K. Schulten, pp. 95–119. New York: Springer-Verlag (reprinted in 1994).
Vroomen, J. and M. Keetels (2010). ‘Perception of Intersensory Synchrony: A Tutorial Review’. Attention,
Perception, and Psychophysics 72(4): 871–884. doi:10.3758/APP.
Vroomen, J., M. Keetels, B. de Gelder, and P. Bertelson (2004). ‘Recalibration of Temporal Order
Perception by Exposure to Audio-visual Asynchrony’. Cognitive Brain Research 22(1): 32–5.
doi:10.1016/j.cogbrainres.2004.07.003.
Wackermann, J. (2007). ‘Inner and Outer Horizons of Time Experience’. The Spanish jOurnal of Psychology
10(1): 20–32.
Wallace, M. T., L. K. Wilkinson, & B. E. Stein (2012). ‘Representation and Integration of Multiple Sensory
Inputs in Primate Superior Colliculus’. Journal of Neurophysiology 76: 1246–1266.
Warren, R. M., C. J. Obusek, R. M. Farmer, and R. P. Warren (1969). ‘Auditory Sequence: Confusion of
Patterns Other than Speech or Music’. Science: 164: 586–587.
White, A. L., D. Linares, and A. O. Holcombe (2008). Visuomotor timing compensates for changes in
perceptual latency. Current Biology 18(20): R951–3.
Williams, J. M. and A. Lit (1983). ‘Luminance-dependent Visual Latency for the Hess Effect, the Pulfrich
Effect and Simple Reaction Time’. Vision Research 23: 171–179.
Wilson, J. A, & S. M. Anstis (1969). Visual delay as a function of luminance. The American Journal of
Psychology 82(3): 350–8.
Wittmann, M. (2011). ‘Moments in Time’. Frontiers in Integrative Neuroscience 5(October): 1–9.
doi:10.3389/fnint.2011.00066.
Yarrow, K., N. Jahn, S. Durant, and D. H. Arnold (2011). ‘Shifts of Criteria or Neural Timing? The
Assumptions Underlying Timing Perception Studies’. Consciousness and Cognition 20(4): 1518–1531.
doi:10.1016/j.concog.2011.07.003.
Section 9
Applications of perceptual
organization
Chapter 41
Introduction
There is hardly a law of vision that is not found again serving camouflage.
(Metzger 1936, transl. Spillman 2009. p. 85)
Animal camouflage is subtle and beautiful to the human eye, but it is has evolved to deceive non-
human adversaries. Multiple mechanisms are involved. For example, crypsis works by defeating
figure-ground segregation, whereas patterns that disguise the animal as a commonplace object or
lead to misclassification are known as masquerade and mimicry (Endler 1981; Ruxton, Speed, and
Sherratt 2004b; but see also Stevens and Merilaita 2009 for a discussion of these terms). Mimetic
patterns, which are often conspicuous, work by similarity to a different animal, typically one that
is avoided by the predator, whereas in masquerade the animal resembles a commonplace but val-
ueless object, such as a bird-dropping or plant thorn. Early Gestalt psychologists used examples
from animal camouflage to illustrate their principles of perception (Metzger 2009), which were,
in turn, used to explain deceptive coloration (Keen 1932). What was not appreciated, or underes-
timated, in early studies of animal camouflage were the differences in vision between humans and
other animals, even though it is these ‘other animals’ that have been the selective force in evolu-
tion (Endler 1978; Cuthill and Bennett 1993; Bennett, Cuthill, and Norris 1994). Conversely, there
has been a view that certain aspects of vision, such as object completion, may require mechanisms
specific to the neocortex, and so are not expected in animals without such a structure (Nieder
2002; Shapley, Rubin, and Ringach 2004; Zylinski, Darmaillacq, and Shashar 2012; van Lier and
Gerbino this volume). The fact that camouflage is effective against humans suggests that common
principles of perceptual organization apply across diverse visual environments, eye designs, and
types of brain. In any case, camouflage offers an approach to the vision of non-human animals
that is both more naturalistic and very different from standard methods, such as tests of associa-
tive learning.
Historically, biological camouflage was studied from about 1860 to 1940 as evidence for the the-
ory of natural selection and for military applications. Notable contributors included the American
artist Thayer (1896, 1909), who was fascinated by countershading and disruptive coloration, and
the English zoologist Cott whose beautifully illustrated book Adaptive coloration in animals (1940)
set out principles of camouflage such as ‘maximum disruptive contrast’ and ‘differential blending’
(Figure 41.2A). Cott’s view that these principles are attributable to the ‘optical properties’ of the
image, rather than being physiological or psychological phenomena, ignored the possible influence
844 Osorio and Cuthill
of differences in perception between animals. Cott could not have been aware of the diversity of
animal colour vision. A trichromatic bee (with ultraviolet, blue, and green photoreceptors), a tet-
rachromatic bird (with UV, blue, green, and red photoreceptors), and a trichromatic human will
process identical spectral radiance in different ways, but all these animals face common challenges,
such as figure-ground segmentation and colour constancy. Furthermore, for camouflage that has
evolved as concealment against multiple visual systems (e.g. a praying mantis in foliage, concealed
both to its insect prey and reptilian and avian predators), the common denominators will prevail
over viewer-specific solutions. As the ultimate common denominator is the physical world, one
might, for example, expect the colours of many camouflaged animals to be based on pigments
that have similar reflectances to natural backgrounds across a broad spectral range, even though
in principle a metamer might be effective against any one visual system (Wente and Phillips 2005;
Chiao et al. 2011).
In contrast to Cott, Metzger’s account of camouflage in The Laws of Seeing (2009), was explic-
itly cognitive, not optical, drawing attention to the Gestalt psychological principles of ‘belong-
ing’, ‘common fate’, and ‘good continuation’. Metzger also devotes a chapter to the obliteration of
3D form, by countershading. More recently Julesz’s (1971, 1981) influential work in vision was
motivated by the idea that image segregation by texture, depth, and motion evolved to break
camouflage. His lecture at the 1998 European Conference on Visual Perception was entitled ‘In
the Last Minutes of the Evolution of Life, Stereoscopic Depth Perception Captured the Input
Layer to the Visual Cortex to Break Camouflage’ (Frisby 2004). Julesz’s ideas remain relevant
to understanding texture matching, and also raise the question of whether any camouflage can
defeat the stereo-depth and motion-sensitive mechanisms that allow figure-ground segregation
in ‘random-dot’ images.
Recently, research on camouflage has been stimulated by the realization that direct evidence for
how particular types of camouflage exploit perceptual mechanisms was sparser than textbooks
might suggest. In addition, such evidence as did exist had been evaluated via human perception
of colour and pattern, not the evolutionarily relevant viewer. For example, the bright warning col-
ours of toxic insects such as ladybirds have evolved under the selective pressure exerted by, among
others, bird eyes and brains, and avian colour vision is tetrachromatic and extends into the ultra-
violet (Cuthill 2006). This has led to experimental tests, within the natural environment, of basic
camouflage principles such as disruptive coloration and countershading, informed by physiologi-
cally based models of non-human low-level vision (Cuthill et al. 2005; Stevens and Cuthill 2006).
Biologists also recognize that animal coloration patterns often serve multiple functions, includ-
ing sexual and warning signals, non-visual purposes such as thermoregulation and mechanical
strengthening. Not only can animal colours only be understood in the light of trade-offs between
these functions (Ruxton et al. 2004b), but it is often difficult to be sure which function is relevant
(Stuart-Fox and Moussali 2009).
Other recent studies, which we describe here, have investigated animals that can change their
appearance, such as chameleons (Stuart-Fox and Moussali 2009), flatfish and especially cuttlefish
(Figure 41.1). Cuttlefish, like other cephalopod molluscs control their appearance with extraor-
dinary facility, which allows them to produce a vast range of camouflage patterns under visual
control. These patterns illustrate interesting and subtle features of camouflage design, includ-
ing disruptive and depth effects. However, the special feature of actively controlled camouflage
is that one can ask what visual features and image parameters the animals use to select colora-
tion patterns. This gives us remarkable insights into perceptual organization in these advanced
invertebrates.
Spots
(a)
i ii
Blotches
iii iv
(b)
Fig. 41.1 Images of (a) a flatfish, the plaice (Pleuronectes platessa) and (b) a cuttlefish (Sepia
officinalis), which vary their appearance to match the background. The plaice varies the level of
expression of two patterns, which we call blotches and spots. These can be expressed at low
levels (i), separately (ii, iii), or mixed (iv). The cuttlefish displays a great range of patterns. Here the
upper left panel illustrates an animal expressing a Disruptive type of pattern on a checkerboard
background, and the lower left a Mottle on the background with the same power spectrum
but randomized phase. The right-hand panel shows two animals on a more natural background
expressing patterns with both disruptive and mottle elements.
Adapted from Emma J. Kelman, Palap Tiptus and Daniel Osorio, Juvenile plaice (Pleuronectes platessa) produce
camouflage by flexibly combining two separate patterns, The Journal of Experimental Biology, 209 (17),
pp. 3288–3292, Figure 1, doi: 10.1242/jeb.02380 © 2006, The Company of Biologists.
846 Osorio and Cuthill
Principles of Camouflage
A naive view is that camouflage ‘matches the background’, but the simplicity of the concept has
proved deceptive and led to controversies about definitions up to the present day (for instance
Stevens and Merilaita’s 2009 arguments about cryptic camouflage). An exact physical match, such
that the pattern on the animal and the substrate against which it is viewed are perceptually identi-
cal, is possible only with a uniform background; if only because differences in pattern phase at
the boundary between object and background, or 3D cues from shadowing on its surface, are
almost inevitable. A fascinating example of near-perfect background matching, in this very literal
sense, is produced by the scales of many fish that work as vertical mirrors. Ideally, such mirrors
reflect the ‘space-light’ of open water so that a viewer sees the same light as it would with uninter-
rupted line of sight, making the fish invisible (Denton 1970; Jordan, Partridge, and Roberts 2012).
Accepting that invisibility through exact replication of the occluded background is rarely achiev-
able, in the biological literature ‘background matching’ (largely replacing earlier terms such as
‘general protective resemblance’) is taken to mean matching the visual texture of the background.
That texture may be a continuous patterned surface such as tree bark, or it may include discrete
3D objects, such as pebbles or leaves, that could in principle be segregated separately. Exactly how
best to match the background is a topic we return to in ‘The problem of multiple backgrounds’.
Logically distinct from crypsis is ‘masquerade’, where an animal mimics a specific background
object that is inedible or irrelevant (leaf-mimicking butterflies and bird’s-dropping-mimicking insect
pupae are classic examples; Skelhorn, Rowland, and Ruxton 2010a; Skelhorn et al. 2010b). Although a
stick insect benefits from both matching its generally stick-textured background as well as looking like
a stick, the distinction can be made when such an animal is seen against a non-matching background.
Masquerading as a stick can be successful even when completely visible, whereas matching a sample
of the background texture ceases to be an effective defence when the animal is readily segmented
from the background. Masquerade depends on the mechanisms of object recognition and relative
abundance of model and mimic (frequency dependent selection), rather than perceptual organiza-
tion, so we say no more about it here but refer the reader to a recent review (Skelhorn et al. 2010a).
Historically (Cott 1940), two main camouflage strategies have been recognized: cryptic and dis-
ruptive camouflage. Cryptic camouflage relies on the body pattern in some sense matching its back-
ground. At present there is no simple way to predict whether two visual textures will match, yet
the quality of camouflage patterns is striking, especially considering the complexity of generating
naturalistic visual textures in computer graphics (Portilla and Simoncelli 2000; Peyré 2009; Allen
et al. 2011; Rosenholtz 2013). The lack of a simple theory for the classification of visual textures, as
envisaged by Julesz (1981, 1984; Kiltie, Fan, and Laine 1995), has limited progress in the understand-
ing of camouflage, which leaves this area open. However, the adaptive camouflage of flatfish and
cuttlefish offers an experimental approach to the question of what range of patterns is needed for one
type of natural background—namely seafloor habitats—and to test what local image parameters and
features are used by these marine animals to classify the substrates that they encounter.
Disruptive camouflage ‘classically’ involves well-defined (e.g. high-contrast) visual features that
create false edges and hence interfere with figure-ground segregation (Figures 41.1–41.3; Cott
1940; Osorio and Srinivasan 1991; Cuthill et al. 2005). However, the idea can be generalized to
any mechanism that interferes with perceptual grouping of the object’s features. Hence disruptive
camouflage gives a more direct route to understanding principles of perceptual organization. It has
had more attention than cryptic camouflage, which works by matching the background, perhaps
because, in some sense, it appears to be more sophisticated, involving deceptions resembling opti-
cal illusions. A major impetus for recent research has been the realization that the effectiveness of
disruptive camouflage had been accepted for over a century without direct test (Merilaita 1998;
Cuthill et al. 2005). It may be that the widespread use of (allegedly) disruptive patterning in military
Camouflage and Perceptual Organization in the Animal Kingdom 847
(a) (b)
(c)
Fig. 41.2 (a) Drawings adapted from the artwork by Hugh Cott illustrating coincident colours
that create false contours on the leg and body of the frog Rana temporaria Cott. (b) The frog
Lymnodynastes tasmaniensis showing enhanced edges to the camouflage pattern. (c) See Cott’s
(1940, Figure 17) interpretation of the enhanced border on the wing of a butterfly as being
consistent with a surface discontinuity. It is an interesting question how often such intensity profiles
occur in nature.
Reproduced from H.B. Cott, Adaptive Coloration in Animals, Figure 21, Methuen, London, UK Copyright © 1940,
Methuen.
Reproduced from D. Osorio and M. V. Srinivasan, Camouflage by Edge Enhancement in Animal Coloration
Patterns and Its Implications for Visual Mechanisms, Proceedings of The Royal Society B, 244 (1310), pp. 81–85,
DOI: 10.1098/rspb.1991.0054 Copyright © 1991, The Royal Society.
Reproduced from H.B. Cott, Adaptive Coloration in Animals, Figure 17, Methuen, London, UK Copyright © 1940,
Methuen.
camouflage, where historically the early inspiration was often from nature (Behrens 2002, 2011),
reinforced its acceptance as ‘proven’ in biology. Given that crypsis depends upon matching the back-
ground, whereas disruptive effects depend upon creating false edges or surfaces, it is an interesting
question as to how crypsis and disruptive coloration work in tandem—a topic we return to later.
We now outline experimental studies of camouflage relevant to four main aspects of perceptual
organization: first, cryptic coloration and background matching; second, the problem of obscur-
ing edges; third the problem of obscuring 3D form; and fourth the concealment of motion.
Fig. 41.3 Artificial targets, baited with mealworms, survived better under bird predation if the
contrasting colour patches intersected the ‘wing’ edges (bottom left) than targets bearing otherwise
similar oak-bark-like textures that did not intersect the edges (top left). High contrast edge-disrupting
patterns and differential blending with the background reduce the signal from the target’s outline
(right-hand panels: edge images from applying a Laplacian-of-Gaussian filter to similar targets).
Reproduced from Martin Stevens and Innes C Cuthill, Disruptive coloration, crypsis and edge detection in early
visual processing, Proceedings of The Royal Society B, 273 (1598), pp. 2433–38, DOI: 10.1098/rspb.2006.3556
Copyright © 2006, The Royal Society.
knowledge a small basis-set of spatial mechanisms analogous to cone fundamentals has not been
identified. Indeed, the principle of sparse coding argues for a large set of low-level mechanisms
(Simoncelli and Olhausen 2001). Similarly, systems for generating naturalistic visual textures in
computer graphics involve many free parameters (Portilla and Simoncelli 2000; Peyré 2009), but,
even so, graphics do not convincingly resemble natural surfaces. It is therefore intriguing that
cryptic camouflage often matches the background so well (Figure 41.1).
Hanlon (2007) has proposed that three main types of camouflage pattern—which he calls
Uniform, Mottle, and Disruptive—are widespread in both aquatic and terrestrial animals. This
classification often seems to work, but the number of distinguishable backgrounds and camou-
flage patterns is much greater than three. However, it is possible that a small basis-set of patterns
can generate cryptic camouflage for a wide range of backgrounds (Julesz 1984). Coloration pat-
terns are typically under genetic control and, at least in the wings of butterflies and moths, a small
number of developmental mechanisms underlie much diversity (Beldade and Brakefield 2002).
Camouflage and Perceptual Organization in the Animal Kingdom 849
An animal lineage with a suitable ‘basis-set’ of genetically defined patterns would perhaps be able
to evolve camouflage for a range of natural backgrounds. Certainly, the coat pattern variation
in all living cat species does not seem to be heavily constrained by taxonomic similarity (Allen
et al. 2011). Instead, the colour variation, which could plausibly be generated by slight changes in
the reaction-diffusion equations underlying pattern development, has readily switched between
spots, stripes, and uniform fur in relation to habitat type.
Flatfish Patterns
Three studies have looked at how flatfish vary their visual appearance (Fig 41.1A). We encourage the
reader to view images of these animals via the internet. Saidel (1988) found that two North American
species, the southern flounder (Paralichthys lethostigma) and the winter flounder (Pseudopleuronectes
americanus), control the level of expression of a single pattern in response to varying backgrounds.
Both species control the contrast in a pattern of dark and light, somewhat blurred, spots roughly
10 mm across. In Paralichthys both the mean reflectance and the contrast of the background influence
the coloration, and the maximum contrast across the body ranged from 14% to 70% (Saidel 1988).
Another North Atlantic species, the plaice (Pleuronectes platessa; Figure 41.1A; Kelman, Tiptus, and
Osorio 2006), has an advantage over the summer and winter flounders in that it can add two patterns
to a fairly uniform ‘ground’ pattern. One of these patterns comprises predominantly about thirty
small (<5 mm diameter) dark and light spots in roughly equal numbers; the other is blurred dark
blotches, which form a low-frequency grating-like pattern. The fish mixes these two patterns freely,
changing appearance over the course of a few minutes according to the visual background.
The most elaborate adaptive coloration described in a fish is for the eyed flounder Bothus occela-
tus. When Ramachandran and co-workers (1996) analysed Fourier-transformed images of the
fish, they found that three principal components accounted for the range of patterns that the ani-
mals could display in their aquaria. The authors describe the components as composed of a ‘low vs
high’ spatial frequency channel, a medium spatial frequency channel, and a narrow-band channel
at eight cycles per fish. It is not easy to relate directly these principal components, defined in terms
of spatial frequency, to body patterns, but the eight-cycle per fish channel probably corresponds
to a regular pattern of dark blotches much like those on the plaice (Figure 41.1A; Ramachandran
et al. 1996, Figure 41.1C). Another pattern corresponds to the roughly 100 light annular (or ‘ocel-
lar’) features and a smaller number (about thirty) of dark annuli that give this fish its name. In
addition, the fish can display a finer-grained gravel-like texture. Apart from the evidence for three
principal components, the fish can apparently display isolated features, such as a single dark spot.
850 Osorio and Cuthill
Ramachandran and co-workers (1996) pointed out that the eyed flounder lives in shallow tropi-
cal water, which is relatively clear. They suggested that this could explain why it has a more elaborate
coloration system than the summer and winter flounders, which have only one degree of freedom
in their pattern: changing contrast. It is tempting to suggest—though without direct evidence—that
flatfish use one, two, or three basic patterns according to the visual environment in which they live.
Fish that live in clearer water of more varied habitats would benefit from a greater range of patterns.
Shohet and co-workers (2007) make a similar proposal for different cuttlefish species.
Cuttlefish
Although flatfish often have good camouflage, their adaptive coloration is much simpler than that
of cephalopod molluscs, especially octopuses and cuttlefish (Figure 41.1B). These animals change
their skin coloration under visual control in a fraction of a second, and can even produce moving
patterns of dark bands. Observation of cuttlefish coloration patterns, produced in response to
varying backgrounds, allows unique insights into the vision of these extraordinary molluscs—and
of their adversaries, especially teleost fish (Langridge, Boon, and Osorio 2007).
European cuttlefish (Sepia officinalis) body patterns are produced by the controlled expression of
about forty visual features known as behavioural components, and they can also control the physical
texture of their skin (Hanlon and Messenger 1988). The level of expression of each component can
be varied in a continuous manner (Kelman, Osorio, and Baddeley 2008). Our unpublished principal
components analysis of the coloration patterns displayed on a large range of natural backgrounds
indicates that there are at least six degrees of freedom in the range of cryptic patterns produced by
cuttlefish (see also Crook, Baddeley, and Osorio 2002). This is suggestive of great flexibility and inde-
pendent control of the separate pattern components, which must be matched by a corresponding
visual ability. At present, however, the way in which the expression of these patterns is coordinated,
and the full range of camouflage patterns produced in natural conditions, remains poorly studied.
Hanlon and Messenger (1988) suggested that five main body patterns are used for camouflage.
These were called: Uniform Light, Stipple, Light Mottle, Dark Mottle, and Disruptive. The reader
should note that the terms for body patterns are capitalized to distinguish them from camou-
flage mechanisms. In particular, it is not certain that the Disruptive pattern works as disrup-
tive rather than cryptic camouflage (Ruxton et al. 2004b; Zylinski and Osorio 2011). As we have
mentioned, Hanlon (2007) has identified three basic types of pattern in cephalopods and other
animals: Uniform, Mottle, and Disruptive. In experimental aquaria, most cuttlefish patterns can
indeed be classified by a combination of mottle and disruptive elements, which is comparable
to the two degrees of freedom seen in the plaice (Figure 41.1). The ‘disruptive’ pattern compo-
nents, defined by expert human observers, include about ten comparatively large well-defined
light and dark features, including a white square on the centre of the animal and a dark head
bar (Figure 41.1B; Hanlon and Messenger 1988; Chiao, Kelman, and Hanlon 2005). The mottle
pattern comprises less crisply defined features, and is comparable to the blotches used by flatfish
(Hanlon and Messenger 1988).
(Marshall and Messenger 1996; Zylinski and Osorio 2011), or local features such as edges, objects,
and depth cues (e.g. Chiao, Chubb, and Hanlon 2007; Zylinski et al. 2009a, 2009b). This work is
reviewed elsewhere (Kelman et al. 2008; Hanlon et al. 2011; Zylinski and Osorio 2011), but the
main conclusions are as follows. Regarding low-level image parameters, cuttlefish are sensitive
to mean reflectance, contrast, spatial frequency, and spatial phase (Kelman et al. 2008). They are
sensitive to orientation, but this affects the body and arm orientation rather than the pattern
displayed (Shohet et al. 2006; Barbosa et al. 2011). Cuttlefish are sensitive both to the presence
of local edges (Zylinski et al. 2009a, 2009b), and whether the spatial organization of local edge
fragments is consistent with the presence of objects (Zylinski et al. 2012). Cuttlefish are sensitive
to visual depth and pictorial cues consistent with visual depth (Kelman et al. 2008). Often the
contrast of the coloration patterns is varied to match approximately the contrast in the back-
ground (Kelman et al. 2008; Zylinski et al. 2009a). Despite their mastery of camouflage cuttlefish
are colour-blind, having only one visual pigment (Marshall and Messenger 1996; Mäthger et al.
2006), but this deficiency seems to have little detriment for camouflage (Chiao et al. 2011), pre-
sumably because reflectance spectra of their natural backgrounds have a simple and predictable
form (yellows-through-browns), where reflectance increases monotonically with wavelength and,
as such, the colour is well predicted by luminance.
Many of the cuttlefish’s responses can be interpreted on the basis that the animals express the
Disruptive pattern on a background composed of discrete objects, whose size approximates that
of the ‘white square’ pattern component, and the Mottle on a textured surface (Figure 41.1B). It
is striking how many image parameters, local features, and higher-level information are used to
make this seemingly simple decision. This leads to a system that is reminiscent of the fact that
humans use multiple mechanisms for figure-ground segregation (Kelman et al. 2008; Zylinski and
Osorio 2011; Zylinski et al. 2012; see also Peterson this volume).
Symmetry
Almost all mobile animals have a clear plane of symmetry, usually bilateral—flatfish are an obvi-
ous exception—and symmetry of both the outline and surface patterning are known Gestalt cues
for perceptual organization (van der Helm this volume). The absence of simple planes of sym-
metry in most natural backgrounds is therefore a potential problem for cryptic animals. Indeed,
Cuthill and co-workers (Cuthill, Hiby, and Lloyd 2006; Cuthill et al. 2006) showed that birds
found symmetrically coloured camouflaged prey more rapidly than asymmetric patterned prey,
although not all symmetrical patterns are necessarily equally easy to detect (Merilaita and Lind
2006). This makes it rather perplexing that more animals have not evolved asymmetric pattern-
ing although, in insects at least, there may be genetical or developmental constraints that make
it hard for surface pattern and underlying body plan to be decoupled. Selection experiments for
changed wing shape in butterflies produce tightly correlated changes in colour pattern (Monteiro,
Brakefield, and French 1997). Thus the genetic control of morphological symmetry, which is
probably constrained by locomotor requirements, seems tightly linked to surface patterning (see
discussion in Cuthill, Stevens, et al. 2006). Regularity could be expected to be another feature that
predators use to break camouflage, and blue tits find prey with spatially regular patterns more
rapidly (Dimitrova and Merilaita 2012).
camouflage as sampling the background was a major conceptual advance, but the question
arises: what sort of background sample is optimal? Endler (1978, 1984, 1991) proposed that
crypsis should be defined as coloration that represents a random sample of the background at the
place and time where predation risk is highest. Others have argued that a random sample is not
necessarily optimal (Merilaita, Tuomi, and Jormalainen 1999; Merilaita, Lyytinen, and Mappes
2001; Ruxton et al. 2004b), supported by experiments showing that not all random samples are
equally concealed (Merilaita et al. 1999). If the background is heterogeneous and a single sample
must be chosen (i.e. no colour change by an individual), what is the best sample? Natural selection
will favour the pattern with the minimum average detectability across all backgrounds it may be
viewed against. The sample that is the minimum average difference from all possible backgrounds
against which it might be viewed is the most likely sample (in the sense of statistical likelihood),
not any random sample (Cuthill and Troscianko 2009). Defining such a maximum likelihood
sample is straightforward for a single perceptual dimension, but not for multiple dimensions
and not when low-level attributes such as colours, lines, and textures have been integrated into
features. However, if we accept such a ‘most likely’ pattern can be defined, three evolutionary
outcomes can be imagined: selection for a single, ‘typical’, specialist colour pattern; negative fre-
quency dependent selection (i.e. the predation intensity on any one pattern—phenotype—varies
with the relative abundance of that phenotype, such that rare phenotypes have an advantage and
common phenotypes are at a disadvantage) for multiple patterns matching different, common,
backgrounds; or selection for a single, ‘compromise’, pattern that combines possible backgrounds
as a weighted average. The best strategy will depend on how relative discriminability varies across
the multiple backgrounds (Merilaita et al. 1999; Houston, Stevens, and Cuthill 2007). Loosely
speaking, similar backgrounds favour a compromise ‘average’ coloration, while the possibility of
being seen against rather different substrates favours a single specialist pattern, or divergent selec-
tion for multiple specialist patterns. In an ingenious experiment where captive blue jays searched
for computer-generated prey, whose coloration was controlled by a genetic algorithm and so
could evolve in response to the birds’ predation success, Bond and Kamil (2006) showed that a
fine-grained homogeneous background selected for a single prey colour whereas coarse-grained
heterogeneous backgrounds selected for polymorphism (multiple types). However, without a
metric for perceived contrast between different textures, the evaluation of what backgrounds can
be considered ‘similar’ or ‘different’ has to be evaluated empirically on a case-by-case basis. This
is an important area for future research and relates directly to the need for a mechanism-rooted
theory of texture perception.
The similarity to the background is not the only factor affecting detectability of a target. The
complexity of the background also affects visual search; that is, locating the target depends on not
only target-distractor similarity but also the amount of variation between background features
that are similar to the target (Duncan and Humphreys 1989). As a result, a camouflaged animal
may be better concealed in more complex habitats independent of its match to the background
(Merilaita et al. 2001; Merilaita 2003; Dimitrova and Merilaita 2010). In line with this, there is
recent evidence for animals choosing backgrounds that are not merely a good match to their own
patterns, but that are more visually complex (Kjernsmo and Merilaita 2012).
Obscuring Edges
The previous section has dealt with how visual textures in camouflage patterns match the back-
ground but, even when there is a close match, visual discontinuities at edges can reveal the outline
of an object or salient features within the object. The latter can include phase differences at the
Camouflage and Perceptual Organization in the Animal Kingdom 853
conjunction of body parts (e.g. limbs against body) or features, such as eyes or their components,
with a contour unlike those in the background. One strategy to obscure edges, which is used by
flatfish and cuttlefish, is to have partially transparent marginal fins that also continue the body
pattern, and hence merge the body into the background (Figure 41.1); partial burying has a simi-
lar effect.
Much better known are disruptive patterns, where colour is used to disguise or distract atten-
tion from the true outline of the animal or salient body parts, and hence to defeat figure-ground
segregation. Thayer (1909) was the first to outline what Cott (1940. p. 47) said were ‘certainly
the most important set of principles relating to concealment’. Both Thayer and Cott were art-
ists, having an intuitive understanding of the use of shading to create false perceptions of shape,
form, and movement, and both were active in campaigning for the adoption of camouflage by
the military in, respectively, the First and Second World Wars (Behrens 2002, 2011). Cott greatly
refined Thayer’s original ideas, and he produced a battery of illustrations from across the animal
kingdom to explain how disruption could work and plausibly to illustrate their action in nature
(Figure 41.2A). However, as recent researchers have realized, the term ‘disruptive coloration’ actu-
ally comprises several mechanisms, and some of those discussed by Thayer and Cott as disruptive
are better classified differently (Stevens and Merilaita 2009). We discuss these in turn.
For Thayer (1909), the central thesis was a paradox: that apparently conspicuous colours could
be concealing. This included patterns we now regard as classic disruptive coloration (he used the
term ‘ruptive’), namely the use of adjacent high-contrast colours to break up shape and form, but
he also extended the principle to patterns that do not conceal but instead deceive in other ways.
For example, the idea that high-contrast patterns could interfere with motion perception and
otherwise confuse attackers is discussed later in the section on Concealing Motion.
‘True’ disruptive coloration, for concealment per se, works against object detection by percep-
tual grouping, but, as Merilaita (1998) clarified, it employs mechanisms above and beyond back-
ground matching. Indeed, in Cott’s (1940) original formulation, it is essential that some colour
patches do not resemble colour patches found in the background; in our own treatment of dis-
ruptive coloration we relax this constraint. For Cott, two components were vital and, although he
did not make the connection, they relate directly to principles of perception. First, some colour
patches must match the background; second, some colour patches must be strongly contrasting
from the first patch type(s) and, in Cott’s and Thayer’s views, also from the background. Cott
called this ‘differential blending’, and we can see this as working against perceptual grouping of
the target by colour similarity. The background matching of some patches creates a weak bound-
ary between the animal and its surround at these junctions. The high and sharp contrast between
other patches on the animal and these background-matching regions creates strong false edges
internal to the animal’s boundary. The effect is that, for the viewer, some colour patches on the ani-
mal are statistically more likely to belong to the background than they are to each other (Cuthill
and Troscianko 2009).
In order to disrupt the outline of the animal, the prediction is that the contrasting colour patches
should intersect the edge of the animal more often than expected if the animal’s pattern was sim-
ply a random sample of the background texture. That is, if the animal’s true outline is interrupted
by high contrast, ‘strong’ pseudo-edges that are perpendicular to the animal’s boundary, then the
viewer gets powerful conflicting evidence for edges that are not consistent with the continuous
outline of a prey item. Merilaita (1998) showed this to be true of the dark and light colour patches
on a marine isopod crustacean. More recently, the efficacy of disruptive patterning against birds
has been demonstrated by using simulated wing patterns on artificial moth-like baited targets
pinned to trees (Cuthill et al. 2005). This study showed that colour blocks that intersected the edge
854 Osorio and Cuthill
of the ‘wing’ reduced the rate of attacks on the models compared to otherwise similar controls
with only internal patterning or that were uniformly coloured. A computer-based experiment
using the same sort of targets on pictures of tree bark replicated the results with humans (Fraser
et al. 2007), suggesting that the perceptual mechanisms being fooled are common across birds
and humans. Most plausible would be continuity of strong edges, suggesting a bounding contour.
Consistent with this, it is striking that edges in camouflage patterns are often ‘enhanced’ with a
light margin to pale regions and a dark margin to dark regions (Figure 41.2B), a fact remarked
upon by Cott (1940). One possible interpretation (Osorio and Srinivasan 1991) is that such fea-
tures strongly excite edge detectors without unduly compromising cryptic camouflage. With this
in mind, Stevens and Cuthill (2006) analysed in situ photographs of the experimental targets used
in the bird predation experiments of Cuthill et al. (2005), appropriately calibrated for avian colour
vision. Using a straight-line detector from machine vision, the Hough transform, allied to a physi-
ologically plausible edge detector, the Marr-Hildreth Laplacian-of-Gaussian, Stevens and Cuthill
(2006) showed that edge-intersecting disruptive coloration defeated target detection, compared to
non-disruptive controls, in a pattern similar to the observed bird predation (Figure 41.3).
A camouflaged animal’s outline is not the only potentially revealing feature; mismatches in
the phases of patterns on adjacent body parts, or the distinctive colour and shape of an eye are
also salient features for a predator. Cott (1940) illustrated species, from birds to fish that have eye
stripes that match the colour of the pupil or iris, effectively forming a background with which the
eye blends. He also noted species with stripes bisecting the eye, using disruption to break up the
circular shape. Similarly, he illustrated frogs whose complex body patterns matched seamlessly on
different parts of the folded leg when sitting hunched up (Figure 41.2A). He called this coincident
disruptive coloration, the adjacency of strong contrasts creating false bounding contours span-
ning different body parts. Recently the effectiveness of coincident disruptive coloration in con-
cealing separate body regions has been experimentally verified in the field, using artificial targets
under bird predation (Cuthill and Székely 2009).
The resurgence in interest in Cott’s theories has focused mainly on concealment of the body’s
edge through peripherally placed disruptive colour patches. As we have discussed, the effects can
be explained as exploiting low-level visual processes, namely edge detection and contour inte-
gration. However, Cott’s and subsequent accounts make frequent reference to disruptive colora-
tion distracting attention from the body’s edge, through internally placed coherent ‘false shapes’
that contrast strongly with the surrounding body coloration. Cott called this ‘surface disruption’
and Stevens and others (2009) showed that this can be as or more effective than edge disruption
against avian predators. It is not clear whether the mechanism is actually diversion of attention, or
a lower-level process such as simultaneous contrast masking nearby (true) edges. Indeed, Cott’s
suggestion that small, highly conspicuous ‘distraction marks’ could decrease predation by dis-
tracting attention has rather equivocal support. One might imagine that if the marks are both
conspicuous and uniquely borne by prey, predators would learn to use these cues to detect prey.
This is what has been found in field experiments on birds searching for artificial prey (Stevens,
Graham et al. 2008). However, in laboratory experiments on birds where trials were intermixed
and there was a correspondingly reduced potential to learn that a mark was a perfect predictor of
prey presence, distraction marks reduced detection (Dimitrova et al. 2009).
There are a number of open questions about disruptive camouflage. Disruptive coloration
is sometimes discussed as if it were a strict alternative to background matching. It is certainly
true that seemingly disruptive camouflage patterns have a high visual contrast, and Cott (1940)
argued for a principle of ‘maximum disruptive contrast’, in which, subject to some patches match-
ing the background (‘differential blending’), the remaining colour patches should be maximally
Camouflage and Perceptual Organization in the Animal Kingdom 855
contrasting from these, and unlike background colours. However, in principle there is no reason
why features that distract from the natural outline of an animal should not present the same level
of contrast as background objects, as is probably the case for the cuttlefish Disruptive pattern
(Mäthger et al. 2006; Kelman et al. 2008; Zylinski et al. 2009a); indeed all military camouflage
patterns described as ‘disruptive’ consist of colours found in natural backgrounds. Stevens and
co-workers (2006), again using artificial moth-like prey in the field, found that bird predation was
lowest for disruptive patterns where the contrast between adjacent patches was high, but all col-
ours were within the background range. Disruptive patterns where some elements had yet higher
contrast, but were rare in the background, had increased predation, although they still fared better
than similarly coloured targets without outline-disrupting elements. In much the same way, when
humans search for similar targets on computer screens, if some prey patch colours are not found
in the background, detectability increases regardless of high internal contrast (Fraser et al. 2007).
The conclusion is that high contrast between adjacent patches is beneficial for the creation of false
bounding contours, but that, contrary to Cott’s suggestion, contrast is constrained by the need to
match common background colours.
Obscuring 3D Form
Both cryptic and disruptive camouflage are often studied from the point of view of 2D image
segregation. However, it is perfectly plausible that animals may benefit from cryptic patterns that
match the light and shade of naturally illuminated scenes, especially when the animal is larger
than the objects that make up the background. The intensity difference between objects in shadow
compared to directly illuminated surfaces can be very much larger than that between reflective
surfaces under uniform illumination, but to our knowledge no one has attempted to establish how
the dynamic range of camouflage patterns matches the intensity range of surfaces such as leaves
or stones or their shadows.
Although there are few if any direct studies, it seems plausible that some camouflage patterns
produce a disruptive effect whereby a continuous body surface is seen as lying in different depth
planes. For example matte black spots or patches can appear as holes in a surface, and white fea-
tures as glossy highlights. Figure 41.2C illustrates Cott’s (1940) interpretation of the enhanced
borders as a 3D effect. A charming example of a false 3D effect is produced by cuttlefish, which
shadow the white square on their mantle to create the effect of a pebble (Langridge 2006).
Countershading
Countershading, like disruptive coloration, is a principle of camouflage that was ‘discovered’ in
the late nineteenth century (Poulton 1890; Thayer 1896), found military application in the early
twentieth century, and has recently been a subject of direct experimental study. Many animals
have a dark upper surface and a pale lower surface separated by an intensity gradient. This type of
pattern counters the effect of natural illumination gradients, on the 3D body, which may benefit
camouflage. Thus when cuttlefish rotate from the usual orientation, they move their dark and light
regions so they remain on the top and bottom body surfaces, respectively (Ferguson, Messenger,
and Budelmann 1994). Historically, the taxonomic ubiquity of such dorso-ventral gradients in
coloration was seen as evidence of the adaptive benefits of concealment of 3D form. However,
there are many adaptive reasons to have such a gradient, some of which see the colour only as an
incidental by-product of the pigment gradient: for example, protection from UV light, or resist-
ance to abrasion—because melanin toughens biological tissues (Kiltie 1988; Ruxton, Speed, and
Kelly 2004a; Rowland 2009). In fact, recent experimental studies on model ‘caterpillars’ coloured
856 Osorio and Cuthill
Concealing Motion
The term ‘motion camouflage’ can be discussed in two contexts: crypsis, when the background
itself moves, and concealment while the animal itself is in motion. To take the first, many back-
grounds have moving elements—leaves in the wind, seaweed in the tide—and an otherwise
background-matching, but static, animal may be revealed by its failure to match the motion sta-
tistics of the background. The swaying, stop-start motion of a chameleon or praying mantis seems
to mimic the rocking of leaves and twigs in the breeze, and the lack of consistent linear motion
towards the prey may itself reduce salience. Analysis of the movements of an Australian lizard, the
jacky dragon Amphibolurus muricatus, shows that when it signals to other members of its species,
its motion statistics move well outside the background distribution, but when not signalling, its
own distribution falls within that of the background (Peters and Evans 2003; Peters, Hemmi, and
Zeil 2007). Cuttlefish reduce the contrast in their body patterns during motion (Zylinski, Osorio,
and Shohet 2009c), perhaps because the high contrast edges seen in disruptive patterning are
more easily detected in motion.
The second issue is whether a moving animal can remain concealed. Many facts point to the
conclusion that motion breaks camouflage. Correlated motion is a strong cue to grouping, so
that an otherwise highly camouflaged object is readily segregated from the background because
its pattern elements share a common fate absent in otherwise identical background elements.
Experiments on the detection of targets on complex backgrounds indicate that, for single targets,
neither background matching nor disruptive camouflage offer any benefits (Hall et al. 2013). This
would explain why big cats stalking prey, and soldiers moving across open ground, move in a
combination of stealthy motion interspersed with frequent pauses.
If the need for motion precludes concealment, other means of defence must be used (e.g. capac-
ity for flight, defensive spines, or toxins), some of which involve the use of colour. Warning colours
associated with unpalatability, or mimicry of such patterns, fall outside the remit of this chapter
(instead see, e.g., Ruxton et al. 2004b), but coloration designed to confuse or deceive has histori-
cally, although erroneously, been bracketed within disruptive coloration and so we discuss it briefly
here. For example, the idea that high-contrast patterns could interfere with judgment of velocity
and otherwise confuse attackers, which goes back to Thayer (1909), was a tactic that became
known as ‘dazzle’ coloration when deployed on ships during both World Wars (see Williams 2001;
Behrens 2002). Part of the alleged success was attributed to interference with the optical range
Camouflage and Perceptual Organization in the Animal Kingdom 857
finding used on U-boats, but the difficulty of judging speed and trajectory has also been cited
(Williams 2001; Behrens 2002). The mechanism(s) by which such patterns have their effects is
less clear, because perception of speed is affected by many factors, notably size, contrast, and
texture orientation (see Scott-Samuel et al. 2011). Dazzle patterning may work through any or
all of such factors. Recent research shows that the perceptual distortions created by high-contrast
stripes can be quite significant for speed (Scott-Samuel et al. 2011) and can affect capture success
(Stevens, Yule and Ruxton et al. 2008). This can be added to the (long) list of proposed evolution-
ary explanations for zebra stripes (see, e.g., Cloudsley-Thompson 1999; Caro 2011). Thayer (1909)
argued that the stripes matched the vertical patterning created by savannah grasses, and so func-
tion through background matching, but Godfrey, Lythgoe, and Rumball (1987), through Fourier
analysis, showed that zebra stripes, unlike tiger stripes, were a poor match to the background.
Alternatively, given that zebra live in herds, the stripes could serve both a background-matching
and disruptive function, if the background is considered to be other zebras. Ironically, given their
frequent occurrence in discussions on camouflage, the only function for zebra stripes that has
been experimentally tested is their effectiveness in repelling biting flies (Waage 1981; Egri et al.
2012; Caro et al. 2014).
Conclusions
The scientific study of animal camouflage and the development of Gestalt psychology drew
heavily from each other in the first half of the twentieth century. Nature provides compelling
examples of the sort of problems a visual system has to solve in separating figure from ground
and in identifying relevant objects for attention. To explain the form of animal camouflage,
it remains essential to understand not only the photoreceptors of the animal from which the
target seeks concealment (photoreceptors which may be very different in number and tuning
from our own), but also the cognitive processes behind perception. It is clear that features
such as disruptive coloration and edge enhancement, coincidence of colour patches across
adjacent body parts, and gradients in shading that counter illumination gradients, to name
but a few, are adaptations against the Gestalt principles used in object segregation. In turn, we
believe that animal camouflage offers an excellent model system in which to test the general-
ity of these principles beyond Homo sapiens.
References
Allen, W. A., R. Baddeley, I. C. Cuthill, and N. E. Scott-Samuel (2012). ‘A Quantitative Test of the
Predicted Relationship between Countershading and Lighting Environment’. Amer. Natur. 180: 762–776.
Allen, W. L., I. C. Cuthill, N. E. Scott-Samuel, and R. Baddeley (2011). ‘Why the Leopard Got Its
Spots: Relating Pattern Development to Ecology in Felids’. Proc. R. Soc. B 278: 1373–1380.
Barbosa A., L. M. Mäthger, K. C. Buresch, J. Kelly, C. Chubb, et al. (2008). ‘Cuttlefish Camouflage: The
Effects of Substrate Contrast and Size in Evoking Uniform, Mottle or Disruptive Body Patterns’. Vision
Res. 48: 1242–1253.
Barbosa, A., J. J. Allen, L. M. Mäthger, and R. T. Hanlon (2011). ‘Cuttlefish Use Visual Cues to Determine
Arm Postures for Camouflage’. Proc. R. Soc. B 279: 84–90.
Behrens, R. R. (2002). False Colors: Art, Design and Modern Camouflage. Dysart, IA: Bobolink Books.
Behrens, R. R. (2011). ‘Nature’s Artistry: Abbott H. Thayer’s Assertions about Camouflage in Art, War and
Nature’. In Animal Camouflage: Mechanisms and Function, edited by M. Stevens and S. Merilaita, pp.
87–100. Cambridge: Cambridge University Press.
858 Osorio and Cuthill
Beldade, P. and P. M. Brakefield (2002). ‘The Genetics and Evo-Devo of Butterfly Wing Patterns’. Nature
Reviews Genetics 3: 442–452.
Bennett, A. T. D., I. C. Cuthill, and K. Norris (1994). ‘Sexual Selection and the Mismeasure of Color’.
Am. Nat. 144: 848–860.
Bond, A. B. and A. C. Kamil (2006). ‘Spatial Heterogeneity, Predator Cognition, and the Evolution of Color
Polymorphism in Virtual Prey’. Proc. Nat Acad. Sci. USA 103: 3214–3219.
Caro, T. (2011). ‘The Functions of Black-and-White Colouration in Mammals’. In Animal
Camouflage: Mechanisms and Function, edited by M. Stevens and S. Merilaita, pp. 298–329.
Cambridge: Cambridge University Press.
Caro, T., A. Izzo, R. C. Reiner, H. Walker, and T. Stankowich. (2014). ‘The Function of Zebra Stripes’.
Nat. Commun. 5: 3535.
Chiao, C.-C. and R. T. Hanlon (2001). ‘Cuttlefish Camouflage: Visual Perception of Size, Contrast
and Number of White Squares on Artificial Substrata Initiates Disruptive Coloration’. J. Exp. Biol.
204: 2119–2125.
Chiao, C.-C., E. J. Kelman, and R. T. Hanlon (2005). ‘Disruptive Body Patterning of Cuttlefish (Sepia
officinalis) Requires Visual Information Regarding Edges and Contrast of Objects in Natural Substrate
Backgrounds’. Biological Bulletin 208: 7–11.
Chiao C.-C., C. Chubb, and R. T. Hanlon (2007). ‘Interactive Effects of Size, Contrast, Intensity and
Configuration of Background Objects In Evoking Disruptive Camouflage in Cuttlefish’. Vision Res.
47: 2223–2235.
Chiao, C.-C, J. K. Wickiser, J. J. Allen, B. Genter, and R. T. Hanlon (2011). ‘Hyperspectral Imaging of
Cuttlefish Camouflage Indicates Good Color Match in the Eyes of Fish Predators’. Proc. Nat. Acad. Sci.
USA 108: 9148–9153.
Cloudsley-Thompson, J. L. (1999). ‘Multiple Factors in the Evolution of Animal Coloration’. Naturwiss.
86: 123–132.
Cott, H. B. (1940). Adaptive Coloration in Animals. London: Methuen.
Crook, A. C., R. J. Baddeley, and D. Osorio (2002). ‘Identifying the Structure in Cuttlefish Visual Signals’.
Phil. Trans. R. Soc. Lond. B 357: 1617–1624.
Cuthill, I. C. and A. T. D. Bennett (1993). ‘Mimicry and the Eye of the Beholder’. Proc. R. Soc. B
253: 203–204.
Cuthill, I. C., M. Stevens, J. Sheppard, T. Maddocks, C. A. Parraga, et al. (2005). ‘Disruptive Coloration
and Background Pattern Matching’. Nature 434, 72–74.
Cuthill, I. C. (2006). ‘Color Perception’. In Bird Coloration. Vol. 1: Mechanisms and Measurement, edited by
G. E. Hill and K. J. McGraw, pp. 3–40. Cambridge, MA: Harvard University Press.
Cuthill, I. C., E. Hiby, and E. Lloyd (2006a). ‘The Predation Costs of Symmetrical Cryptic Coloration’. Proc.
R. Soc. B 273: 1267–1271.
Cuthill, I. C., M. Stevens, A. M. M. Windsor, and H. J. Walker (2006b). ‘The Effects of Pattern Symmetry
on Detection of Disruptive and Background Matching Coloration’. Behav. Ecol. 17: 828–832.
Cuthill I. C. and A. Székely (2009). ‘Coincident Disruptive Coloration’. Phil. Trans. R. Soc. B 364: 489–496.
Cuthill, I. C. and T. S. Troscianko (2009). ‘Animal Camouflage: Biology Meets Psychology, Computer
Science and Art’. Int. J. Des. Nat. Ecodyn. 4(3): 183–202.
Denton, E. J. (1970). ‘On the Organization of Reflecting Surfaces in Some Marine Animals’. Phil. Trans.
R. Soc. B 258: 285–313.
Dimitrova, M., N. Stobbe, H. M. Schaefer, and S. Merilaita (2009). ‘Concealed by
Conspicuousness: Distractive Prey Markings and Backgrounds’. Proc. R. Soc. B 276: 1905–1910.
Dimitrova, M. and S. Merilaita (2010). ‘Prey Concealment: Visual Background Complexity and Prey
Contrast Distribution’. Behav. Ecol. 21: 176–181.
Camouflage and Perceptual Organization in the Animal Kingdom 859
Dimitrova, M. and S. Merilaita (2012). ‘Prey Pattern Regularity and Background Complexity Affect
Detectability of Background-Matching Prey’. Behav. Ecol. 23: 384–390.
Duncan, J. and G. W. Humphreys (1989). ‘Visual Search and Stimulus Similarity’. Psych. Rev. 96: 433–458.
Egri, A., M. Blahó, G. Kriska, R. Farkas, M. Gyurkovszky, S. Åkesson, and G. Horváth (2012)
‘Polarotactic Tabanids Find Striped Patterns with Brightness and/Or Polarization Modulation Least
Attractive: An Advantage Of Zebra Stripes’. J. Exp. Biol. 215: 736–745.
Endler, J. A. (1978). ‘A Predator’s View of Animal Color Patterns’. Evol. Biol. 11: 319–364.
Endler, J. A. (1981). ‘An Overview of the Relationships between Mimicry and Crypsis’. Biol. J. Linn. Soc.
16: 25–31.
Endler, J. A. (1984). ‘Progressive Background Matching in Moths, and a Quantitative Measure of Crypsis’.
Biol. J. Linn. Soc. 22: 187–231.
Endler, J. A. (1991). ‘Interactions between Predators and Prey’. In Behavioural Ecology: An Evolutionary
Approach. 3rd edn, edited by J. R. Krebs and N. B. Davis, pp. 169–196. Oxford: Blackwell.
Ferguson, G., J. Messenger, and B. Budelmann (1994). ‘Gravity and Light Influence the Countershading
Reflexes of the Cuttlefish Sepia officinalis’. J. Exp. Biol. 191: 247–256.
Fraser, S., A. Callahan, D. Klassen, and T. N. Sherratt (2007). ‘Empirical Tests of the Role of Disruptive
Coloration in Reducing Detectability’. Proc. Roy. Soc. B 274: 1325–1331.
Frisby, J. (2004). ‘Bela Julesz 1928—2003: A Personal Tribute’. Perception 33: 633–637.
Godfrey, D., J. N. Lythgoe, and D. A. Rumball (1987). ‘Zebra Stripes and Tiger Stripes: The Spatial
Frequency Distribution of the Pattern Compared to that of the Background is Significant in Display and
Crypsis’. Biol. J. Linn. Soc. 32: 427–433.
Hall, J. R., I. C. Cuthill, R. Baddeley, A. J. Shohet, and N. E. Scott-Samuel (2013). ‘Camouflage, Detection
and Identification of Moving Targets’. Proc. R. Soc. B 280(1758): 20130064.
Hanlon, R. T. and J. B. Messenger (1988). ‘Adaptive Coloration in Young Cuttlefish (Sepia officinalis
L.): The Morphology and Development of Body Patterns and their Relation to Behaviour’. Phil. Trans.
R. Soc. B 320: 437–487.
Hanlon, R. T., J. W. Forsythe, and D. E. Joneschild (1999). ‘Crypsis, Conspicuousness, Mimicry and
Polyphenism as Antipredator Defences of Foraging Octopuses on Indo-Pacific Coral Reefs, with a
Method of Quantifying Crypsis from Video Tapes’. Biol. J. Linn. Soc. 66: 1–22.
Hanlon, R. T. (2007). ‘Cephalopod Dynamic Camouflage’. Curr. Biol. 17: 400–404.
Hanlon, R. T., C.-C. Chiao, L. M. Mäthger, K. C. Buresch, A. Barbosa, J. J. Allen, L. Siemann, and C.
Chubb (2011). ‘Rapid Adaptive Camouflage in Cephalopods’. In Animal Camouflage: Mechanisms and
Functions, edited by M. Stevens, and S. Merilaita, pp. 145–163. Cambridge: Cambridge University Press.
Houston, A. I., M. Stevens, and I. C. Cuthill (2007). ‘Animal Camouflage: Compromise or Specialize in a
2 Patch-Type Environment?’ Behav. Ecol. 18: 769–775.
Jordan, T. M., J. C. Partridge, and N. W. Roberts (2012). ‘Non-Polarizing Broadband Multilayer Reflectors
in Fish’. Nature Photonics 6: 759–763.
Julesz, B. (1971). Foundations of Cyclopean Perception. Chicago: University of Chicago Press.
Julesz, B. (1981). ‘Textons, the Elements of Texture Perception, and their Interactions’. Nature 290: 91–97.
Julesz, B. (1984). ‘A Brief Outline of the Texton Theory of Human Vision’. Trends Neurosci. 7: 41–45.
Keen, A. M. (1932). ‘Protective Coloration in the Light of Gestalt Theory’. J. Gen. Psychol. 6: 200–203.
Kelman, E. J., P. Tiptus, and D. Osorio (2006). ‘Juvenile Plaice (Pleuronectes platessa) Produce Camouflage
by Flexibly Combining two Separate Patterns’. J. Exp. Biol. 209: 3288–3292.
Kelman E. J., D. Osorio, and R. J. Baddeley (2008). ‘A Review of Cuttlefish Camouflage and Object
Recognition and Evidence for Depth Perception’. J. Exp. Biol. 211: 1757–1763.
Kiltie, R. A. (1988). ‘Countershading: Universally Deceptive or Deceptively Universal?’ Trends Ecol. Evol.
3: 21–23.
860 Osorio and Cuthill
Kiltie, R. A., J. Fan, and A. F. Laine (1995). ‘A Wavelet-Based Metric for Visual Texture Discrimination with
Applications in Evolutionary Ecology’. Math. Biosci. 126: 21–39.
Kjernsmo, K. and S. Merilaita (2012). ‘Background Choice as an Anti-Predator Strategy: The Roles of
Background Matching and Visual Complexity in the Habitat Choice of the Least Killifish’. Proc. R. Soc.
B. 279: 4192–4198.
Landy, M. S. and N. Graham (2004). ‘Visual perception of texture’. In The Visual Neurosciences, edited by
L. M. Chalupa and J. S. Werner, pp. 1106–1118. Cambridge, MA: MIT Press.
Langridge, K. V. (2006). ‘Symmetrical Crypsis and Asymmetrical Signalling in the Cuttlefish Sepia
officinalis’. Proc. R. Soc. B. 273: 959–967.
Langridge, K. V., M. Broom, and D. Osorio (2007). ‘Selective Signalling by Cuttlefish to Predators’. Current
Biology 17 R1044–R1045.
Marshall, N. J. and J. B. Messenger (1996). ‘Colour-Blind Camouflage’. Nature 382: 408–409.
Mäthger, L., A. Barbosa, S. Miner, and R. T. Hanlon (2006). ‘Color Blindness and Contrast Perception in
Cuttlefish (Sepia officinalis) Determined by a Visual Sensorimotor Assay’. Vis. Res. 46: 1746–1753.
Merilaita, S. (1998). ‘Crypsis through Disruptive Coloration in an Isopod’. Proc. Roy. Soc. B. 265:
1059–1064.
Merilaita, S., J. Tuomi, and V. Jormalainen (1999). ‘Optimization of Cryptic Coloration in Heterogeneous
Habitats’. Biol. J. Linn. Soc. 67: 151–161.
Merilaita, S., A. Lyytinen, and J. Mappes (2001). ‘Selection for Cryptic Coloration in a Visually
Heterogeneous Habitat’. Proc R. Soc. Lond. B 268: 1925–1929.
Merilaita, S. (2003). ‘Visual Background Complexity Facilitates the Evolution of Camouflage’. Evolution
57: 1248–1254.
Merilaita, S. and J. Lind (2006). ‘Great Tits (Parus major) Searching for Artificial Prey: Implications for
Cryptic Coloration and Symmetry’. Behav. Ecol. 17: 84–87.
Metzger, W. (2009). Laws of Seeing, trans. by L. Spillman and S. Lehar. Cambridge, MA: MIT Press.
(Originally published 1936. Gesetze des Sehens. Frankfurt: Kramer.)
Monteiro, A., P. M. Brakefield, and V. French (1997). ‘The Relationship between Eyespot Shape and
Wing Shape in the Butterfly Bicyclus anynana: A Genetic and Morphometrical Approach’. J. Evol. Biol.
10: 787–802.
Nieder A. (2002). ‘Seeing More than Meets the Eye: Processing of Illusory Contours in Animals’. J. Comp.
Physiol. A 188: 249–260.
Osorio, D., Srinivasan, M. V. (1991). Camouflage by edge enhancement in animal coloration patterns and
its implications for visual mechanisms. Proc. R. Soc. Lond. B, 244: 81–85.
Peters, R. A. and C. S. Evans (2003). ‘Design of the Jacky Dragon Visual Display: Signal and Noise
Characteristics in a Complex Visual Environment’. J. Comp. Physiol. A 189: 447–459.
Peters, R. A., J. M. Hemmi, and J. Zeil (2007). ‘Signalling against the Wind: Modifying Motion Signal
Structure in Response to Increased Noise’. Curr. Biol. 17: 1231–1234.
Peyré, G. (2009). ‘Sparse Modeling of Textures’. J. Mathematical Imaging and Vision 34: 17–31.
Portilla, J. and E. P. Simoncelli (2000). ‘A Parametric Texture Model Based on Joint Statistics of Complex
Wavelet Coefficients’. Int. J. Computer Vision: 40: 49–70.
Poulton, E. B. (1890). The Colours of Animals: Their Meaning and Use. Especially Considered in the Case of
Insects. 2nd edn. London: Kegan Paul, Trench Trübner and Co.
Ramachandran, V. S., C. W. Tyler, R. L. Gregory, D. Rogers-Ramachandran, S. Duensing, C. Pillsbury,
and C. Ramachandran (1996). ‘Rapid Adaptive Camouflage in Tropical Flounders’. Nature
379: 815–818.
Rowland, H. M., M. P. Speed, G. D. Ruxton, M. Edmunds, M. Stevens, and I. F. Harvey (2007).
‘Countershading Enhances Cryptic Protection: An Experiment with Wild Birds and Artificial Prey’.
Anim. Behav. 74: 1249–1258.
Camouflage and Perceptual Organization in the Animal Kingdom 861
Rowland, H. M., I. C. Cuthill, I. F. Harvey, M. P. Speed, and G. D. Ruxton (2008). ‘Can’t Tell the
Caterpillars from the Trees: Countershading Enhances Survival in a Woodland’. Proc. R. Soc. B
275: 2539–2545.
Rowland, H. M. (2009). ‘From Abbott Thayer to the Present Day: What Have We Learned about the
Function of Countershading?’ Phil. Trans. R. Soc. B 364: 519–527.
Ruxton, G. D., M. P. Speed, and D. Kelly (2004a). ‘What, if Anything, is the Adaptive Function of
Countershading?’ Anim. Behav. 68: 445–451.
Ruxton, G., M. Speed, and T. Sherratt (2004b). Avoiding Attack: The Evolutionary Ecology of Crypsis,
Warning Signals and Mimicry. Oxford: Oxford University Press.
Saidel, W. M. (1988). ‘How to Be Unseen: An Essay in Obscurity’. In Sensory Biology of Aquatic Animals,
edited by J. Atema, R. Fay, A. N. Popper, and W. Tavolga, pp. 487–513. New York: Springer.
Scott-Samuel, N. E., R. Baddeley, C. E. Palmer, and I. C. Cuthill (2011). ‘Dazzle Camouflage Affects Speed
Perception’. PLoS One 6(6): e20233.
Shapley, R. M., N. Rubin, and D. Ringach (2004). ‘Visual Segmentation and Illusory Contours’. In The
Visual Neurosciences, edited by L. M. Chalupa and J. S. Werner, pp. 1119–1128. Chicago: MIT Press.
Shohet A. J., R. J. Baddeley, J. C. Anderson, E. J. Kelman, and D. Osorio (2006). ‘Cuttlefish Response
to Visual Orientation of Substrates, Water Flow and a Model of Motion Camouflage’. J. Exp. Biol.
209: 4717–4723.
Shohet, A., R. J. Baddeley, J. Anderson, and D. Osorio (2007). ‘Cuttlefish Camouflage: A Quantitative
Study of Patterning’. Biol. J. Linn. Soc. 92: 335–345.
Simoncelli, E. P. and B. A. Olhausen (2001). ‘Natural Image Statistics And Neural Representation’. Ann.
Rev. Neurosci. 24: 1193–1216.
Skelhorn, J., H. M. Rowland, and G. D. Ruxton (2010a). ‘The Evolution and Ecology of Masquerade’.
Bio. J. Linn. Soc. 99: 1–8.
Skelhorn, J., H. M. Rowland, M. P. Speed, and G. D. Ruxton (2010b). ‘Masquerade: Camouflage Without
Crypsis’. Science 327: 51.
Stevens, M. and I. C. Cuthill (2006). ‘Disruptive Coloration, Crypsis and Edge Detection in Early Visual
Processing’. Proc. R. Soc. B 273: 2141–2147.
Stevens, M., I. C. Cuthill, A. M. M. Windsor, and H. J. Walker (2006). ‘Disruptive Contrast in Animal
Camouflage’. Proc. R. Soc. B 273: 2433–2438.
Stevens, M., J. Graham, I. S. Winney, and A. Cantor (2008). ‘Testing Thayer’s Hypothesis: Can Camouflage
Work by Distraction?’ Biol. Lett. 4: 648–650.
Stevens, M., D. H. Yule, and G. D. Ruxton (2008). ‘Dazzle Coloration and Prey Movement’. Proc. R. Soc. B
275: 2639–2643.
Stevens, M. and Merilaita, S. (2009). Animal camouflage: current issues and new perspectives. Phil. Trans.
R. Soc. B 364: 423–427.
Stevens, M., I. S. Winney, A. Cantor, and J. Graham (2009). ‘Object Outline and Surface Disruption in
Animal Camouflage’. Proc. R. Soc. B 276: 781–786.
Stuart-Fox D. and A. Moussalli (2009). ‘Camouflage, Communication and Thermoregulation: Lessons from
Colour Changing Organisms’. Phil. Trans. R. Soc. B 364: 463–470.
Thayer, A. H. (1896). ‘The Law Which Underlies Protective Coloration’. Auk 13: 477–482.
Thayer, G. H. (1909). Concealing-Coloration in the Animal Kingdom: An Exposition of the Laws of Disguise
through Color and Pattern: Being a Summary of Abbott H. Thayer’s Discoveries. New York: Macmillan.
Waage, J. (1981). ‘How the Zebra Got its Stripes—Biting Flies as Selective Agents in the Evolution of Zebra
Coloration’. J. Ent. Soc. S. Afr. 44: 351–358.
Wente, W. H. and J. B. Phillips (2005). ‘Microhabitat Selection by the Pacific Treefrog, Hyla regilla’. Animal
Behaviour 70: 279–287.
Williams, D. (2001). Naval Camouflage 1914–1945. Barnsley: Pen and Sword Books.
862 Osorio and Cuthill
Zylinski, S., D. Osorio, and A. J. Shohet (2009a). ‘Edge Detection and Texture Classification by Cuttlefish’. J.
Vision 9: 1–10.
Zylinski, S., D. Osorio, and A. J. Shohet (2009b). ‘Perception of Edges and Visual Texture in the
Camouflage of the Common Cuttlefish, Sepia officinalis’. Phil. Trans. R. Soc. B 364: 439–448.
Zylinski, S., D. Osorio, and A. J. Shohet (2009c). ‘Cuttlefish Camouflage: Context-Dependent Body Pattern
Use during Motion’. Proc. R. Soc. B 276: 3963–3969.
Zylinski, S. and D. Osorio (2011). ‘What Can Camouflage Tell us about Non-Human Visual Perception?
A Case Study of Multiple Cue Use in the Cuttlefish’. In Animal Camouflage: Mechanisms and Function,
edited by M. Stevens and S. Merilaita, pp. 164–185. Cambridge: Cambridge University Press.
Zylinski, S. and A. S. Darmaillacq, and N. Shashar (2012). ‘Visual Interpolation for Contour Completion
by the European Cuttlefish (Sepia officinalis) and its Use in Dynamic Camouflage’. Proc. R. Soc. B
279: 2386–2390.
Chapter 42
A different visual aesthetic results when considerations about functional utility of the designed
item far outweigh those regarding the accommodation of a human user. Craftsmanship is the
art of combining qualitatively and aesthetically rich user interfaces with a high degree of func-
tional utility. Perception is not infallible: some designs are intentionally made with a high degree
of visual appeal, but handling of such an object should swiftly expose discrepancies between its
visual ‘promise’ of functionality and its actual frustrating performance. Design can even deliber-
ately counter the perceptual tendency to match form with function. Cartoons by Heath Robinson
(1872–1944) and Rube Goldberg (1883–1970) depict machines that accomplish simple tasks
through absurdly complex means, to the point of rendering them useless in practical terms.
Nature can be considered the evolutionary cradle for perception. While it is likely that all sen-
tient entities experience their own version of ‘reality’ (von Uexküll 1926), human-made designs
can alter, enhance, or antagonize mechanisms of perceptual organization that originally evolved
to deal with a natural environment unfettered by human hands. Of particular interest in this chap-
ter, therefore, are applied examples where human design aims to recreate some idealized aspect
of nature. The first section will be devoted to the intuitive insight captured by instances in which
classical Japanese designs emphasize the relation between human perception and natural form.
The same perceptual factors implicit in the centuries-old gardening manuals of Japan are partly
incorporated in ideas put forth by the Gestalt school of psychology, the Bauhaus and other move-
ments, nearly a millennium later, as will be discussed in the second section. We will also demon-
strate how Japanese design principles more directly influenced Bauhaus design.
In the third section, we discuss how naturalistic structure shares principles with the visual
patterns emphasized in Japanese design, Gestalt and Bauhaus approaches, thus serving as their
potential common denominator.
The appendix at the end of the chapter revisits a few recent general frameworks for thinking
about visual perception of designed structure.
or other qualities. Hongatte—the way in which the design layout guides the gaze (Kuitert
2002)—refers to visual balance, asymmetry, and incompleteness in the visible parts. Mitate—
literally ‘setting up the eye’—relates to techniques for bringing a new visual awareness to a
familiar object through the creation of visual allegories. For example, re-using the foundation
stone from a pillar of a temple as a stone washbasin not only introduces novel, interesting
stone shapes into a new context, but creates metaphorical narratives, for example by linking
the foundations of a place of spiritual practice with a fountain, a life-giving source of purifica-
tion. Shin-Gyō-Sō concerns the degree of formality in applying light, shadow, asymmetry, and
irregularity (Keane 1996, p. 77). In the design of a stone path, for example, at the most formal
level—Shin—stone shapes will be regular, angular, with little or no variation in colour, shape,
and size, arranged in a regular tessellation in a straight path with a straight border. Individual
stones and the path as a whole will tend to occupy fully rectangular frames. However, the
stones are not usually smoothly polished, as this is thought to rob them of their simple, natu-
ral materiality—a powerful Japanese design aspect referred to as Wabi-Sabi (Yanagi 1972). At
the most informal level—Sō—stones of varied shapes are spaced at more irregular intervals,
with small stones interspersed with large, light stones with dark, regular with irregular, rough
with smooth, the entire path winding within a loosely defined, jagged border, as if acciden-
tally stumbled upon in nature.
In actual designs, the combination of the three levels gives rise to very complex variations on
the theme. A formal path may intersect an informal one going in a different direction, creating
the impression that the paths overlap transparently. A path with a formal border may have a more
informal placement of stones within the border, and so forth. Such differences in levels of formal-
ity are found in many design cultures. Its formalized expression in Japanese garden design turned
it into a universally useful design aid, ubiquitous among all the Japanese arts.
There is no simple recipe for design. Concepts like Nōtan, Hongatte, Mitate, Shin-Gyō-Sō, and
Wabi-Sabi directly relate to the appearance of a design, intuitively conveying qualitative relations
between part and whole. Its greatest utility is as a mental tool for increasing one’s own awareness
of, and ability to respond more sensitively to, various perceived visual aspects of the design as it
is created.
Fig. 42.1 A glimpse of the exterior and interior of a small tea hut at Nobutsu-An in Kyoto, Japan.
Note the many contrasts between light and dark, small and large, and regular pattern set off against
irregular pattern. Intersecting lines are carefully avoided, while clearly demarcated T-junctions
enhance spaciousness.
Gilded panels reflect ambient light back onto surfaces, brightening the room, clearly delineating
shape silhouettes, and appearing as transparent layers beyond which space continues. Sometimes
applied in gilded parallelogram shapes (Naito and Nishikawa 1977, colour plate 91), it gives a
shimmering impression of spacious floorboards continuing around corners. Coloured panels are
traditionally painted with ink and a mixture of powdered seashell, ground semi-precious stone,
and nikawa—a gelatinous glue. This matte pigment results in nearly equiluminant coloured
regions, confounding the definiteness of distance, perceived size and the flatness, or shape, of the
surrounding walls (Akino 2012). Woven tatami mats reflect light back from the floor, whereas
strips of white washi, Japanese mulberry-bark paper, are pasted low along walls at locations where
various small tasks, such as mixing tea, require better visibility (Figure 42.1).
Straight lines are carefully placed so that repetitive sequences are contrasted with irregular pat-
terns. This is beautifully demonstrated in the irregular bundling together of built-in bamboo lat-
tices that act as window meshing (e.g. the far left of Figure 42.1). Appearing as a regular lattice
from a distance, its inherent irregularity dominates when viewed up close, creating the impression
of different meshes overlaid—another interesting depth and grouping effect.
Where two wooden frame lines intersect, the thinner line will deliberately be misaligned on
both sides of the thicker line to reduce the degree of smooth continuation. Discontinuity across
a visual junction is configured into two adjacent T-junctions, thus implying a greater number of
occluding elements than if there were merely a crossing of two straight lines. This enhances the
perception of spatiality, not necessarily veridical depth. In traditional construction, the layout of
sliding panels in their frames result in three nested T-junctions overlaid at each corner. In modern
design, where such simple details are often neglected, this kind of spatial articulation is easily lost.
Nōtan is thus expressed through light and dark paper, different hues of clay walls, with wooden
beams, gilded wall panels, and windows carefully arranged into an irregular, balanced pattern
with a subtle interplay of light and dark. Combining these devices culminates in an open-ended,
underspecified visual space of many scales and amassed layers of potential occlusion, from which
perception constructs an experience of a rich depth articulation and expansiveness in the sur-
rounding space. This perceptually inferred space is, at some level, physically implausible if the
physical visual clues were interpreted as literal ecological cues.
Occluding layers—the rich variety of sliding windows, in particular—hint at the spatial con-
tinuation of whatever is occluded. Traditional architects and gardeners are well aware that a small
section of a garden outside, viewed through layered frames, appears much enlarged, filled with
a greater number of components, and that shapes seen within the frame appear more beautiful
Design Insights 867
(a)
(b)
(c)
Fig. 42.2 (a) Exposed bedrock, eroded by wind, sun, ice and rain, remains as irregularly overlapping
heaps, facing upwards against gravity, with similar triangular shapes appearing at many spatial
scales. (b) The most visually dominant rock cluster in the garden at Ryoanji temple, Kyoto. Note the
many instances of triangularity in whole shapes and surface texture markings, with individual rocks
leaning towards each other. (c) The Ryoanji garden emulates a sparse naturalistic rock outcrop.
Japanese gardeners today still use the metaphor that a good design will show its ‘skin, flesh and
bones’ (Ogawa 2011) in one glance, meaning that the overall structural backbone, the shapes of
clusters, individual rocks, and their textures must all be visible. A rock should be placed in the
original orientation in which it was found in the wilderness so as not to ‘anger its inhabiting spirit’.
This taboo is a way of preserving the visual integrity between the shape of the rock as a whole,
and the directionality of its smaller facets and surface textures as chiselled out by erosion, so that
the impression of an entire rocky ridge can be conveyed with a single design component. Rocks
should be buried deeply enough that the visual junction with the ground plane lends the appear-
ance of continuing as solid bedrock underground (Slawson 1987), instead of betraying the pres-
ence of a small, unconnected design component (Figure 42.2B). A similar practice prevails among
Western masons, who match the orientation of stone in construction to its original alignment in
Design Insights 869
the quarry. Many cultures also pay heed to the orientation of timber: matching the dry and wet
(north and south) sides of the wood with architectural conditions on site, and using timber from
trees that endured windy conditions for building components that have to bear the greatest load
increase the durability of a wooden construction.
Using triadic rock groupings (Shingen 1466)—where each individual rock and rock cluster is
approached as a triangle—allows the design to be conceived of as a multiscale composition of
triangles knit into a whole (Figure 42.2B). Deliberately using a hierarchy of triangular templates
is thought to simplify the mental load for the designer (Arnheim 1966, 1969) when having simul-
taneously to deal with a lot of visual factors, such as asymmetry, proportion, and visual balance
(Slawson 1987).
Medieval Japanese design influenced Jugendstil, Art Noveau, the Vienna Secession, and
Bauhaus, nearly a millennium later, to adopt a renewed sensitivity to irregularity, asymmetry,
minimalism, and other factors that characterize perceptual organization.
Developed in 1957 by Max Miedinger & Eduard Hoffmann in Switzerland, Helvetica was intended as a
neutral font without intrinsic meaning in the shape of letters. We are not so sure about that, but it does
read smoothly.
Fig. 42.3 Examples of page layout and font design. Top left: Section from an anonymous medieval
vellum manuscript. Courtesy of the National Library of Medicine. Top right: A section from a
seventeenth-century letter between friends, courtesy of Nobutsu-An, Kyoto. Bottom: Example of a
modern font based on Bauhaus ideals.
grey monoglyph that resists fluent reading. Its East Asian counterpart may be found in the love
letters of court nobility in classical Japan, where excessively fluid script renders text virtually unin-
telligible to all but the most accomplished among the initiated.
Typography designers at the Bauhaus, among others, sought the opposite effect: page format
with clearly articulated flow of text lines and paragraphs, with text and figures interspersed in
a more irregular, asymmetrical composition in an effort to improve readability. Improved font
design was another objective. A good font balances the salience of individual letters with that of
whole words. Overt spacing is important, but the shape of extremities on individual letters influ-
ences the similarity, alignment, and spacing between parts with significant effect on perceptual
grouping of letters into words (Figure 42.3 bottom), which is incorporated in the technique of
‘kerning’, in which letters with salient primitives, such as closed bubbles (‘a’), gaps (‘c’), junctions
(‘k’, ‘x’) and bilateral symmetry (‘w’) resist blending into a uniform texture, promoting legibility.
The debate on legibility against readability of serif vs sans-serif fonts is still ongoing and delves
further into this issue (Poole 2008).
The mantras of good design relating to principles of composition developed at the Bauhaus and
other contemporaneous movements bear testament to the importance of the perceptual effects of
sparse, irregular, and asymmetrically balanced patterns. ‘Ornament and crime’ (Loos 1908), ‘form
follows function’ (attributed to Mies van der Rohe; see Schulze and Windhorst 2012), and ‘less is
more’ (Sullivan 1896) conceivably refer to aspects of perceptual organization and more generally
to the notion of “good Gestalt”.
Christian von Ehrenfels and later championed by Wertheimer (1938a), who was one of the
founders of the Berlin Gestalt movement. The central idea of gestalt perception was that the
perceptual whole transcends and modifies the properties of the parts. These ideas originate
in work by Brentano and his school, of which Ehrenfels, Wertheimer, and other figures in the
Gestalt movement were students (see Wagemans and Albertazzi for an overview of the origins
of Gestalt philosophy).
A significant contribution of the Gestalt movement was the derivation of a number of inter-
nal ‘laws’ that seemingly govern perceptual grouping. Every visual experience is perceptually
organized as a figure seen on a surrounding background, the visual qualities of the figure and
background (see Kogo and van Ee, this volume) unfolding even in the absence of clear visual
markings, such as when viewing a parabolic Ganzfeld screen (Metzger [1936] 2006). Here, the
perceptual figure appears to span the entire visual field, in the form of a thick bank of fog. In sim-
ple terms, perceptual organization is crystallized along structural constraints, such as smooth-
ness of alignment between parts, similarity or shared commonality in one or more visual aspect,
spatial proximity and density (on parts, see Singh, this volume), the degree of figural complete-
ness or closure in the arrangement of parts, and the degree of bilateral or higher-order symmetry
in the configuration of parts (Koffka 1935). Convex formations (see Bertamini and Casati, this
volume) appear more salient than concave configurations within the same set of parts (Rubin
1921), and the simplest potential configuration of parts arises as the dominant perceptual figure
(Wertheimer 1938b).
Arnheim (1966) presented a powerful vocabulary of higher-level qualities in perceptual organi-
zation, based on his interpretation of order and complexity. He defines order as ‘the degree and
kind of lawfulness governing the relations among the parts of an entity’, and complexity as ‘the
multiplicity of the relationships among the parts of an entity’. Order and complexity are antago-
nistic yet interdependent. Great design would display a high degree of both order and complexity.
Different kinds of structural order can be discerned. Homogeneity, at a minimum level of
complexity, is the application of a common quality to an entire pattern, whereas coordination,
of greater complexity, is the degree to which all parts constituting the whole have similar impor-
tance and carry similar weight. Parts constitute a hierarchy when distributed along a gradient of
importance with regards to the whole. Accident is highly defined, irrational, and not achieved by
an explicit principle.
Disorder could be thought of as the clash of uncoordinated orders among parts, and only
possible when within each part there is a discernible order. Structural definition is the extent
to which a given order is carried through. A relation between parts is rational when it is
being formed according to some simple principle such as straightness, exact repetition, or
symmetry.
Arnheim (1966, 1988) also discusses ‘directed tension’ between parts as a quality of gestalt.
A universal design strategy is to present a structural centre—analogous to the concept of percep-
tual figure—from which various tensions are directed to the other elements of a composition (see
also Alexander 2002). Depending on the perceived directionality of these tensions, different quali-
tative wholes are experienced. The tensions may be directed in obedience to some larger organiz-
ing principle, such as gravity. In triangular composition—a canon of many artistic traditions—the
triangle is a centre with a strong directed tension in itself. In a mandala, the overall tensions are
directed towards and away from a central middle point.
With this articulation of structural aspects discernible in design, Arnheim provided a vocab-
ulary that still inspires scientific experiments in the perception of design (e.g. Locher 2003;
McManus, Stoever, and Kim 2011).
872 van Tonder and Vishwanath
(b)
(c) (d)
Fig. 42.4 Natural and handmade patterns of growth and decay. (a) The undulating branch of a
clover azalea. Notice how the thickest branch or spine undulates to and fro. (b) Splashing white
foam in a flowing stream. Where the flow decelerates, the foam changes direction and sends
small eddies swirling outwards, creating a spine of to and fro lines. On a much larger scale, such
patterns appear as Kármán Vortex Streets in the atmosphere, where clouds swirl around an isolated
mountainous island in an open ocean. (c) Tracing detail of swirls on a first-century-BC bronze Celtic
mirror, excavated at Dordrecht. The undulating spine coils over four spatial scales, branching out
at sudden changes in direction. (d) A gilded wooden swirl from an eighteenth-century Austrian
baroque palace, showing one complete cycle of piling (bottom spiral), acceleration (smooth middle
section), deceleration (curl on upper end), and directional change (outwards swirls at the top).
Design Insights 875
Analytical effort aimed at automating the design process promises to free human designers
from overwhelming repetitive details (Jupp and Gero 2006). In fact, there is such an enormous
amount of bad design in the world, that one may wish for the coming of the great ‘design-bot’.
However, as authors with a passion for art and science, we would like to see greater scientific
understanding of design and its process, but not with the aim of removing the human designer
from the loop. It should be aimed at better equipping the coming generation of designers, rather
than planning their extinction.
Appendix
canonize the design shapes of different ancient civilizations, such as the instantly recognizable
proportions of an Egyptian sculpture or funerary mask. In the proportional systems used in
font design, or depicting the human body (Massironi 2002, pp. 35–43) by da Vinci, Dürer, Le
Corbusier, and many others, proportion refers to consistent spatial size relationships between
defined parts. In other stylistic effects, proportion can refer to the relative amount of colour to the
amount of luminance contrast, the salience of contours in relation to the salience of colours (think
of Monet’s impressionist painting style versus a cartoon by Hergé), or textures, or it can relate to
the degree to which contours are locally deformed, or even disconnected, while grouping globally
into a specified configuration. If applied to various objects in the same style, these objects appear
to belong together, a consequence of the shared fate of their underlying features.
Fig. 42.5 (a) Medial axis transformation of the empty space between two points: any point on the
medial axis is equidistant from the two points. (b) The set of centres of the largest included disks that
touch the boundary contours of this triangle trace out an inverted ‘Y’-shaped medial axis. (c) Medial
axis transformation of a human silhouette appears as a skeletal midline along the body and limbs.
Local maxima—or medial points—are emphasized in black.
878 van Tonder and Vishwanath
(a) (b)
Y
(c) (d)
Fig. 42.6 (a) Medial axes in the empty space at the Ryoanji dry rock garden form a four-level
dichotomous branching tree. Thin lines indicate the architectural layout of the temple before it was
destroyed in 1797. The intended viewing location is indicated by the letter ‘O’ inside the central hall.
(b) Note the relative size–distance relations between nearest rocks. Taller rocks are shaded darker.
Rocks in the leftmost cluster (c) and the whole set of clusters (d) do not line up, but are arranged
into irregular folding screen configurations facing the viewing location.
Going from the trunk to the tips of the tree, the lengths of limbs increases logarithmically. Adding
to that a branching pattern at counterbalanced angles, the empty space resembles the branch-
ing structures ubiquitous throughout nature (Prusinkiewicz and Lindenmayer 1990). A similar
branching structure converges outward from the most conspicuous rock cluster on the left (Figure
42.6A, C). Adding or removing any element in the composition significantly disrupts the ordered
structure of the empty space. Even if dissimilar from at a glance, baroque vista gardens can also
be represented as branching networks. This level of abstraction thus enables a more sophisticated
comparison of different landscaping traditions.
Medial axes designate information-rich loci where maximal amounts of shape boundary
surfaces can be encoded with minimal parameters (Leyton 1987). A practical consequence
in Ryoanji is that the surface facets from the entire set of rock clusters (approximating each
cluster with a convex hull envelope) are at their most surveyable at the most global medial
point Y. There are obvious evolutionary connotations with placing the viewer in a location
that affords high visual access to the surroundings. Strikingly, this point is near one of the
intended viewing points of the garden, the centre ‘O’ of the abbot’s hall in the original archi-
tectural layout. Classical illustrations depict the Ryoanji rock garden from this viewpoint
(Akisato 1799). Outlining the central loci of empty spaces, medial axes also map the paths of
least obstruction for spatial navigation.
The original intentions with the Ryoanji garden design are not exactly known, but the
probability of randomly stumbling upon this composition is sufficiently small (van Tonder
2006) to suggest that the perception of visual balance and other proportional relationships
may be particularly acute when a subject’s viewing location is physically aligned with the
medial loci of the viewed spatial layout, a perceptual consequence related to natural mapping
(see ‘Natural Mappings’).
Design Insights 879
(a) (b)
(c) (d)
Fig. 42.7 Isovists projected from the (a) corner and (b) side of a rectangular room, and their sight-line
graphs (c, d). Here, sight-lines are linearly scaled down away from the direction of gaze (centre red
bold line) to emphasize the influence of the viewing direction. The area under the isovist graph is
(c) larger for the corner projection than from the side (d), predicting that the room will look more
spacious from this viewpoint.
880 van Tonder and Vishwanath
Natural Mappings
Natural mapping (Norman 1988) emphasizes the importance of resonance between form and
function. Specifically, natural mapping refers to a design methodology where the layout of
Design Insights 881
controls is intentionally arranged to resemble the spatial layout of the designed object or environ-
ment. Consider, for example, a gas stove top with four burners arranged into a square layout. By
aligning the control knobs for the burners in a straight line, it is not clear which knob maps to
which burner. Even after repeated use, users may still make mistakes, when all it takes to create a
flawless interface is to place the four knobs into a square pattern that visually matches the layout of
the burners. According to Norman, great designs require neither labels nor manuals, but are suf-
ficiently intuitive to be used on the fly. The alignment between the user, controls, and the design
itself is also important for fluent use. We know from experience how difficult it can be to navigate
from a map that is rotated relative to the actual surroundings, even if it is an accurate mapping
of the terrain. Through the use of an intentional viewing point, classic Japanese gardens place the
viewer within a natural mapping from which the visual balance and other features of the design
can be most acutely experienced—a form of natural mapping for aesthetic enhancement or map-
ping where the need for mental rotation is kept to a minimum.
On the scale of architecture, the new Seattle Central Library, by Koolhaas and Prince-Ramus
(Goldberger 2004) presents a natural mapping of the romanized alphabet. The entire floor space
in the building consists of one long alphabetically indexed walkway, coiled into a huge helical
spiral. One can thus literally walk from book indices A to Z in one single stretch, a very efficient
design for both staff and users, although in this case the mapping is not directly perceptual but
requires cognitive knowledge of the relation between letters and organization. This type of heli-
cal structure is already exemplified in designs such as the Guggenheim Museum in Manhattan,
by Frank Lloyd Wright, although in the Seattle Central Library the helix is intentionally mapped
to another structure, the alphabet, and thus presents a clearer example of intentional functional
mapping between two structures. The design was received with mixed emotions, for reasons other
than the impact of the helical design (Cheek 2007).
Natural mapping can be extended to the structural mapping of the human body. The chair is an
example of a hugely successful design because it naturally maps to the body. The seat, arm rests,
opening for the legs, and rest for the back and head nearly resemble the visual layout of the user’s
anatomy, resulting in an intuitively grasped design. Grasping a design this fluently reveals some
discrepancies in perception of the actual qualitative experience of physically interacting with that
design: some of the most beautifully designed chairs have delivered an extremely uncomfortable
sitting experience, to the surprise of both their makers and users.
Ba-ila villages and Tang dynasty cities represent large-scale examples of natural mappings with
bilateral symmetry along a central axis, and with a clearly directional head-and-tail assignment.
As with a chair, these design layouts are suggestive of the human body. In fact, in traditional
maps showing the layout of Zen temple complexes in Kyoto, the names of architectural gates,
paths, halls, and facilities within the temple complex are typically inscribed on a human silhouette
(Masuno 2008, p. 150), where the human silhouette is spread in the ‘Vitruvian man’ style, with
different parts mapped to specified body parts.
Self-similar urban layouts mapped to the body are doubly powerful. First, there is the mapping
with the familiar body. Second, grasping the mapping of urban organization at any spatial level
informs one’s knowledge of its organization at other scales.
Acknowledgements
The authors thank Johan Wagemans, Steve Palmer, and the anonymous reviewers for many help-
ful comments. Thanks also to Branka Spehar for re-discovering the 1940 essay on art and psychol-
ogy by Koffka.
882 van Tonder and Vishwanath
References
Akino, A. (2012). Unpublished interview with the artist. Ai Akino is a classically trained Nihonga painter
from Kyoto, Japan.
Akisato, R. (1799). Miyako Rinsen Meishō Zue (Illustrated Guide to Famous Places In and Around the
Capital). 6 vols. Kyoto.
Albertazzi, L. (2010). ‘The Roots of Metaphorical Information’. In Perception Beyond Inference. The
Information Content of Perceptual Processes, edited by L. Albertazzi, G. van Tonder, and D. Vishwanath,
pp. 345–390. Cambridge MA: MIT Press.
Alexander, C. (2002). The Order of Nature. New York: Routledge.
Arnheim, R. (1966). ‘Order and Complexity in Landscape Design’. In Toward a Psychology of Art, pp. 123–
135. Berkeley: University of California Press.
Arnheim, R. (1969). Visual Thinking. Berkeley: University of California Press.
Arnheim, R. (1988). The Power of the Centre: A Study of Composition in the Visual Arts. Berkeley: University
of California Press.
Behrens, R. (2002). ‘How Form Functions: On Esthetics and Gestalt Theory’. Gestalt Theory 24: 317–325.
Benedikt, M. (1979). ‘To Take Hold of Space: Isovists and Isovist Fields’. Environment and Planning B
6: 47–65. doi: 10.1068/b060047
Blum, H. (1973). ‘Biological Shape and Visual Science (Part I)’. Journal of Theoretical Biology
38: 205–287.
Boudewijnse, G. (2012). ‘Gestalt Theory and Bauhaus—A Correspondence’. Gestalt Theory 34(1): 81–98.
Bovill, C. (1996). Fractal Geometry in Architecture and Design. Boston: Birkhäuser.
Cheek, L. (2007; updated 2012). On Architecture: How the New Central Library Really Stacks Up. Online.
http://www.seattlepi.com/ae/article/On-Architecture-How-the-new-Central-Library-1232303.
php?source=mypi. Accessed 15 August 2012.
Collins English Dictionary 11th Edition (2011; updated 2012). Collins. Online http://www.collinsdictionary.
com/dictionary/english. Accessed 30 November 2012.
Dutton, D. (2009). The Art Instinct. New York: Bloomsbury Press.
Eglash, R. (1999). African Fractals: Modern Computing and Indigenous Design. New Brunswick: Rutgers
University Press.
Fairbanks, M. S. and R. P. Taylor (2011). ‘Measuring the Spatial Properties of Temporal and Spatial
Patterns: From the Human Eye to the Foraging Albatross’. In Non-linear Dynamical Analysis for the
Behavioral Sciences Using Real Data. Boca Raton, FL: CRC Press, Taylor and Francis Group.
Francis, J. E. (2001). ‘Style and Classification’. In Handbook of Rock Art Research, edited by D. S. Whitley,
pp. 221–244. New York: Altamira Press.
Gibson, J. J. (1979). The Ecological Approach to Visual Perception. Boston: Houghton Mifflin.
Goldberger, P. (2004; updated 2012). ‘High-Tech Bibliophilia’. New Yorker 17 May. Online. http://www.
newyorker.com/critics/skyline/?040524crsk_skyline. Accessed 17 November 2012.
Gombrich, E. H. (1979). The Sense of Order: A Study in the Psychology of Decorative Art. Ithaca, NY: Cornell
University Press.
Graham, D. J. and D. J. Field (2007). ‘Statistical Regularities of Art Images and Natural Scenes: Spectra,
Sparseness and Nonlinearities’. Spatial Vision 21: 149–164. doi: 10.1163/156856807782753877
Graham, D. J. and C. Redies (2010). ‘Statistical Regularities In Art: Relations with Visual Coding and
Perception’. Vision Research 50: 1503–1509. doi: 10.1016/j.visres.2010.05.002
Hansmeyer, M. (2012). Building Unimaginable Shapes. TEDGlobal 2012. [Online]. http://www.ted.com/
talks/michael_hansmeyer_building_unimaginable_shapes.html. Accessed 14 December 2012.
Heider, F. and M. Simmel (1944). ‘An Experimental Study of Apparent Behavior’. American Journal of
Psychology 57: 243–259.
Design Insights 883
Poole, A. (2008). Which Are More Legible: Serif or Sans Serif Typefaces? Online. (Updated March 2012).
http://alexpoole.info/blog/which-are-more-legible-serif-or-sans-serif-typefaces/. Accessed on 18 March
2012.
Preston, S. D. and F. B. M. de Waal (2002). ‘Empathy: Its Ultimate and Proximate Bases’. Behavioural Brain
Science 25: 1–72.
Prusinkiewicz, P. and A. Lindenmayer (1990). The Algorithmic Beauty of Plants. Berlin: Springer.
Psotka, J. (1978). ‘Perceptual Processes that May Create Stick Figures and Balance’. Journal of Experimental
Psychology Human Perception and Performance 4: 101–111.
Ralph, P. and Y. Wand (2009). ‘A Proposal for a Formal Definition of the Design Concept’. In Design
Requirements Workshop (LNBIP 14), edited by K. Lyytinen, P. Loucopoulos, J. Mylopoulos, and
W. Robinson, pp. 103–136. New York: Springer. doi: 10.1007/978-3-540-92966-6_6
Rizzolatti, G. and L. Craighero (2004). ‘The Mirror-Neuron System’. Annual Review of Neuroscience
27: 169–192.
Rubin, E. (1921). Visuell Wahrgenommene Figuren. Copenhagen: Gyldendals.
Schulze, F. and E. Windhorst (2012). Mies Van Der Rohe, a Critical Biography (New and Revised Edition).
Chicago: University of Chicago Press.
Shimoyama, S. (1976). Translation of Sakuteiki: The Book of the Garden. Tokyo: Town and City Planners.
Shingen (1466). Senzui Narabi ni Yagyou no Zu (Illustrations for Designing Mountain, Water and Hillside
Field Landscapes). Sonkeikaku Library, Sonkeikaku Sōkan Series. Tokyo: Ikutoku Zaidan.
Shin-tsu Tai, S., S. Campbell Kuo, R. L. Wilson, and T. S. Michie (1998). Carved Paper: The Art of the
Japanese Stencil. New York and Tokyo: Santa Barbara Museum of Arts and Weatherhill Inc.
Slawson, D. A. (1987). Secret Teachings in the Art of Japanese Gardens. Tokyo: Kodansha.
Smith, J. T. (1797). Remarks on Rural Scenery with Twenty Etchings of Cottages, from Nature: And Some
Observations and Precepts Relative to the Picturesque. London: Joseph Downes.
Spehar, B., C. Clifford, B. Newell, and R. P. Taylor (2003). ‘Universal Aesthetics of Fractals’. Computers and
Graphics 27: 813–820. doi: 10.1016/S0097-8493(03)00154-7
Sullivan, L. H. (1896). ‘The Tall Office Building Artistically Considered’. Originally published in Lippincott’s
Magazine 57: 403–409.
Suzuki, T. (1979). 茶室と露地 (Tea Rooms and Tea Gardens). Tokyo: Sekai Bunkasha.
Synek, E. (1998). ‘Evolutionäre Ästhetik: Vergleich von prä—und postpubertären Landschaftspräferenzen
durch Einsatz von computergenerierten Bildern’. (Evolutionary Aesthetic: Comparison of Visual
Preference for Computer Generated Landscapes before and after Adolescence). Doctoral thesis,
University of Vienna.
Tanizaki, J. ([1933] 1977). In’ei Raisan. (In Praise of Shadows). Translated by E. Seidensticker and T. Harper.
Sedgwick, ME: Leete’s Island Books.
Taylor, R. P., A. Micolich, and D. Jonas (1999). ‘Fractal Analysis of Pollock’s Drip Paintings’. Nature
399: 422. doi: 10.1038/20833
Thompson, D. W. (1917). On Growth and Form: The New Edition. Cambridge: Cambridge University Press.
Also see On Growth and Form: The Complete Revised Edition (1992). New York: Dover Publications.de
la Torre, I. (2011). ‘The Origins of Stone Tool Technology in Africa: A Historical Perspective’.
Philosophical Transactions of the Royal Society B 366(1567): 1028–1037.
von Uexküll, J. (1926). Theoretical Biology. New York: Harcourt, Brace & Co.
van Tonder, G. J., M. J. Lyons, and Y. Ejima (2002). ‘Visual Structure of a Japanese Zen Garden’. Nature
419: 359–360. doi: 10.1038/419359a
van Tonder, G. J. and M. J. Lyons (2005). ‘Visual Perception in Japanese Rock Garden Design’. Axiomathes
Special Issue on Cognition and Design 15(3): 353–371. doi: 10.1007/s10516-004-5448-8
van Tonder, G. J. (2006). ‘Order and Complexity in Naturalistic Landscapes’. In Visual Thought: The
Depictive Space of Perception, edited by L. Albertazzi, pp. 257–301. Amsterdam: Benjamin Press.
Design Insights 885
Introduction
Definition of ‘visual art’
‘Art’ is not necessarily defined by an aesthetic dimension. A sunset may evoke aesthetic experi-
ences, so may flowers, or butterflies, but natural phenomena are not art. One might suppose that
art is necessarily of human manufacture. But if someone points out a sunset to you, what is the
difference from pointing at a urinal, as Duchamp famously did1? The sunset was certainly not
manufactured, but merely pointed out. So was the urinal. If the urinal is appreciated as an objet
trouvé2 (admitted as an objet d’art), then why not the sunset, the flower, or the butterfly? The single
common factor appears to be that art is intentional3, it implies an ‘artist’, who may, but need not, be
a manufacturer. This is indeed a necessary requirement, but it is not sufficient. I will first introduce
a few important distinctions.
‘Visual art’ is art that is meant to be looked at, instead of being heard, felt, etc. However, a
copy of The Brothers Karamazov is meant to be looked at too (one is supposed to read it), but it
is generally not reckoned to be ‘visual art’. Yet Fyodor Dostoyevsky4 was certainly an artist, and
his novel is ART. Likewise the famous Fountain (actually a ‘found’ urinal) displayed by Marcel
Duchamp in 1917, is art, but not ‘visual art’. It appeals to cognition and reflective thought, rather
than immediate visual awareness. Today, conceptual art5 holds the floor—this is indeed the polit-
ically correct thing in a democracy, because most people ‘see with their ears’ as my artist friends
say. However, this chapter is focused singularly on visual art, ignoring conceptual art.
Duchamp’s Fountain is one of the landmark objects of twentieth-century art. Virtually any book on ‘modern
1
in the world)’. For instance a thought is necessarily about something, you cannot have a thought that is about
nothing, although you may have thoughts about NOTHING. The term is usually traced to the teachings of
Franz Brentano (see also Albertazzi, this volume). Notice that ‘intention’ has nothing to do with the intentions
of anybody. A starting point is http://en.wikipedia.org/wiki/Intentionality. On Franz Brentano see http://
en.wikipedia.org/wiki/Franz_Brentano.
Fyodor Mikhailovich Dostoyevsky (1821–1881) was a Russian writer of novels, short stories, and essays. See
4
Although one should not fail to distinguish sharply between ‘visual art’ and ‘conceptual art’, this
may not always be easy because many paintings from western art fit into both categories. Raphael’s
Sistine Madonna6 (Figure 43.1 La Madonna di San Sisto, 1513/1514) is meant to be looked at, and
manages to strike an immediate visual impression. Yet it was commissioned as an altarpiece, and
has obvious religious connotations. It is art, both visual and conceptual. To someone coming from
a non-western culture the conceptual part may be non-existent; to such an observer the painting
is pure visual art. The same applies to the western appreciation of African tribal art as visual art,
when it was originally intended as conceptual.
As everyone knows from the newspapers, art has an important economic dimension, and
indeed one pragmatic definition of art is that it has a value on the art market. When a tin of shit
(Piero Manzoni’s Merda d’Artista7 Figure 43.2, 1961) sold for £97 250 at Sotheby’s in October 2008
(tin number 83 of 90; the cans were originally to be valued according to their weight in gold, or
$37 each in 1961), this thus marked it as a piece of Art. The value on the art market is important
for both visual and conceptual art. It is often considered a metric on artistic value, comparable to
the citation count in the case of scientific contributions, and making similar sense. This definition
places works of art in a single category with rare coins and postage stamps, evidently unfortunate.
What is lacking here is an ‘observer’. The investor is not an observer; in fact an investor is likely to
store the artwork in a vault. Here we identify another necessary condition for designating some
objects ‘art’.
This is perhaps best explained with an example; I use the case of pictures. What exactly is a
‘picture’, a painting say? It was famously discovered by Maurice Denis8 that a painting is (among
other things) a physical object:
It is well to remember that a picture before being a battle horse, a nude woman, or some anecdote, is
essentially a flat surface covered with colours assembled in a certain order.
However, used as a tea tray, such an object is certainly not a picture. In order to be a picture, there
should exist a double-sided intentionality, namely
the picture was intended by an artist to be looked at as a picture;
the picture is looked at as a picture, by an ‘observer’.
Raphael is the short name of Raffaello Sanzio da Urbino (1483–1520). Raphael was one of the best known
6
Italian painters and architects of the High Renaissance. There are many books on the man and his work, a
convenient starting point is http://en.wikipedia.org/wiki/Raphael. Raphael’s Sistine Madonna (La Madonna
di San Sisto, 1513/4) is the last painting he personally finished. It was completed ca. 1513–1514, as a commis-
sioned altarpiece. See http://en.wikipedia.org/wiki/Sistine_Madonna.
7 I use Piero Manzoni’s Merda d’Artista to illustrate what I think of ‘conceptual art’. Maybe you (the reader) think
it is a work of genius. That is fine, as long as my point that conceptual art is not visual art comes across. (Who
cares for visual art anyway? It is the concept that counts!) My (mis-)use of Manzoni is perhaps unfair. Read up
on this at http://en.wikipedia.org/wiki/Artist%27s_shit and http://en.wikipedia.org/wiki/Piero_Manzoni
Maurice Denis (1870–1943) was a French painter, a member of the Symbolist and Les Nabis movements. He
8
was something of a theorist too, and did quite a bit of writing. On his life see http://en.wikipedia.org/wiki
Maurice_Denis. The quotation is from a Symbolist Manifesto of 1890: ‘Se rappeler qu’un tableau, avant
d’etre un cheval de bataille, une femme nue ou une quelconque anecdote, est essentiellement une surface
plane recouverte de couleurs en un certain ordre assemblées’ (Définition du néo-traditionalisme, Revue Art
et Critique, 30 August 1890).
Fig. 43.1 La Madonna di San Sisto, or the Sistine Madonna by Raphael (Raffaello Sanzio da Urbino
1483–1520). It was finished only a few years before his death, c. 1513–1514, as a commissioned
altarpiece. It was his last painting.
Raphael (1483–1520): The Sistine Madonna, 1512–1513. Dresden Gemaeldegalerie Alte Meister, Staatliche
Kunstsammlungen. Photo: Elke Estel Hans -Peter Klut. © 2015. Photo Scala, Florence/bpk, Bildagentur fuer Kunst,
Kultur und Geschichte, Berlin
Perceptual Organization in Visual Art 889
Fig. 43.2 Piero Manzoni (1933–1963), Merda d’artista, No. 4, 1961, Diameter 6.5 cm.
Manzoni, Piero (1933–1963): Merda d’artista (Artist’s Shit) No. 014. May, 1961. New York, Museum of Modern
Art (MoMA). Metal, paper, and ‘artist’s shit’, 1 7/8’ (4.8 cm) x 2 1/2’ (6.5 cm) indiameter. Gift of Jo Carole and
Ronald S. Lauder. Acc. n.: 4.1999. © 2015. The Museum of Modern Art, New York/Scala, Florence
‘Looked at as a picture’ implies looking ‘into’, and entering a ‘pictorial world’9. Consider these
examples:
An ancient stained wall is not a picture: even though it might beat a Jackson Pollock10 in attracting
visual interest it is not a picture, since the artist is lacking. No work of art comes into existence
as a cosmic accident. Designating the wall an objet trouvé2 might provide an artist’s intention3,
although this in no way changes the wall as a physical object. People have discovered striking
renderings of the face of Jesus in trees, old rags, cookies, and the wood grain of toilet
doors11 http://en.wikipedia.org/wiki/Holy_Face_of_Jesus or http://en.wikipedia.org/wiki/
Perceptions_of_religious_imagery_in_natural_phenomena. These are not to be counted as
works of art, since the artist’s intention is lacking.
9
See Koenderink, J., van Doorn, A. J., and Wagemans, J. (2011). Depth. i-Perception 2(6): 541–564.
Paul Jackson Pollock (1912–1956), known as Jackson Pollock, was an influential American painter and a
10
major figure in the abstract expressionist movement. He became extremely influential. Jackson Pollock
was best known for his unique drip painting, and was sometimes known as ‘Jack the Dripper’. See http://
en.wikipedia.org/wiki/Jackson_Pollock. (If you fail to ‘get’ the nickname see http://en.wikipedia.org/wiki/
Jack_the_Ripper.)
11 The Holy Face of Jesus is one of the acheiropoieta relating to Christ. These have been reported through-
out the centuries. Devotions to the face of Jesus have been practiced throughout the ages. Devotions
to the Holy Face were approved by Pope Leo XIII in 1895 and Pope Pius XII in 1958. The shroud of
Turin is the best known example. See http://en.wikipedia.org/wiki/Holy_Face_of_Jesus. On the face
in the toilet door see http://www.telegraph.co.uk/news/religion/6373674/Jesuss-facespotted-on-the-
toilet-door-in-Ikea-Glasgow.html. Another recent example is a face in a tree stump at Belfast cemetery
890 Koenderink
The observer’s intention is just as necessary. In a hilarious painting by Mark Tansey12, a cow is
forced by several earnest looking men to look at a painting by Paulus Potter (Figure 43.3).
The cow remains apparently unaware of the explicit erotic overtones of this work, thus one
concludes that in the bovine universe the painting is just another object, despite its lifelike
size and color. The observer is lacking, because the cow is looking ‘at’ instead of ‘into’ the
painting. In this setting Potter’s work is just an object.
Depending on the art-form, the physical object matters. Although no mere physical object is a
‘work of art’, it may provide ‘a link’ to it. Examples of this are Roman marble copies (mere
pieces of stone handiwork) of original Greek bronzes13. Without such a link, the work of art
(in the intention of the Greek authors) doesn’t exist anymore. Without the double intentional
significance14, the physical object is just junk.
The double-sided intentional nature thus explains the ontological status of ‘pictures’. The value
on the market is irrelevant. There is much that might well be considered ‘art’ that is either not mar-
ketable or would bring merely some value typical of used goods. Examples are tattoos, ornaments
on teacups or weapons, facial makeup, and so forth.
In this chapter I take a broad view and consider ‘art’ (used as short for visual art) to be any
object, change applied to an object, happening, or expression, when it has double-sided intention-
ality15. Art is designed to affect immediate visual awareness in some specific way.
A work of art presupposes a certain ‘visual literacy’ in order to be ‘read’. It is a hermeneutical
task15, in George Steiner’s16 terms ‘not a science, but an exact art’. Steiner’s ‘four movements’ indeed
(http://www.belfasttelegraph.co.uk/news/local-national/northern-ireland/face-of-jesus-christ-appears-
on-tree-stump-at-belfast-cemetery-16195735.html), which drew crowds of visitors.
Mark Tansey (born 1949) is an American painter born in San Jose, California. The Innocent Eye Test dates
12
from 1981. According to Tansey (quoted in Mark Tansey: Visions and Revisions, by Arthur C. Danto; and
see http://www.101bananas.com/art/innocent.html): ‘I think of the painted picture as an embodiment of the
very problem that we face with the notion “reality”. The problem or question is, which reality? In a painted
picture, is it the depicted reality, or the reality of the picture plane, or the multidimensional reality the artist and
viewer exist in? That all three are involved points to the fact that pictures are inherently problematic. This prob-
lem is not one that can or ought to be eradicated by reductionist or purist solutions. We know that to successfully
achieve the real is to destroy the medium; there is more to be achieved by using it than through its destruction.’
Roman marble copies of original Greek bronzes: a well known example is the famous Discobolus. See http://
13
en.wikipedia.org/wiki/Discobolus. The Greek original was completed towards the end of the Severe period,
c. 460–450 BC, but the original Greek bronze is lost. However, there exist numerous Roman copies, including
full-scale ones in marble. The first one found (in 1781) is the Palombara Discobolus. It was famously bought
by Adolf Hitler in 1937 (and put in the Munich Glyptothek), but was returned to Rome in 1948.
Edmund Husserl has a notion of ‘double-intentionality’ that is quite different from my meaning here.
14
In order to avoid problems I will speak of a ‘double-sided intentionality’ associated with works of art. In
Husserl’s view the Langsintentionalität runs along protention and retention in the living present, where-
as the Querintentionalität runs from the living present to the object of which consciousness is aware.
See http://www.iep.utm.edu/phe-time/#SH1e. On Husserl (Edmund Gustav Albrecht Husserl, 1859–1938)
see http://en.wikipedia.org/wiki/Edmund_Husserl.
15 Hermeneutics is (roughly speaking) the art and science of text interpretation. See http://en.wikipedia.org/
wiki/Hermeneutics.
Francis George Steiner (born 1929), is an influential European-born American literary critic, essayist,
16
(b)
Fig. 43.3 (a) Mark Tansey’s (born 1949) The Innocent Eye Test, 1981. The cow is looking at Paulus
Potter’s (1625–1654) The Young Bull, 1647 (b). The cow remains apparently unaware of the explicit
erotic overtones of this work. One concludes that in the bovine universe the painting is just another
irrelevant object, despite its life size and lifelike color. (Keep in mind that this figure reproduces a
painting, rather than a ‘documentary photograph’!)
(a) Tansey, Mark (b. 1949): The Innocent Eye Test, 1981. New York, Metropolitan Museum of Art. Oil on canvas.
78 x 120 in. (198.1 x 304.8 cm). Gift of Jan Cowles and Charles Cowles, in honor of William S. Lieberman, 1988.
© 2015. Image copyright The Metropolitan Museum (b) Potter, Paul (1625–1654): Le jeune taureau Un berger et
son betail, belier, agneau, vache et taureau. 1647. The Hague, Mauritshuis. © 2015. White Images/Scala, Florence
of Art/Art Resource/Scala, Florence
892 Koenderink
apply to art appreciation. First there is the blind trust to find something there, a step into the dark,
for better or for worse: to find nothing is experienced as a painful breach of trust. Then there is an
act of aggression, as the observer ‘conquers’ the work, followed by incorporation, as the observer
makes the work his or her own. Finally, there is retribution, wherein the observer (as indeed with
the initial trust) honors the artist’s intentions. The work is re-created in the observer, albeit in
novel form, for ‘to understand is to decipher; to see [orig. hear] significance is to translate’. Exact
re-creation is impossible, the artist’s meaning is always lost. Each observer sees only him- or herself.
My central interest will be modern western art (which involves the art of western Europe of
the late middle ages to the present, the art of the United States since the sixteenth century, etc.),
especially painting, sculpture, and architecture. I will also occasionally touch on non-western art
and other fields of endeavor such as photography, cinema, fashion, graphics design, and so forth.
Of course, the interest is merely visual organization, I ignore the conceptual, magical, religious,
and so forth, connotations, even though these are often the very reason for the existence of the art.
I focus on Gestalt properties, that is on the nature of the organization of the work, to the extent
that it may be considered ‘visual’17. Although there are certainly works of art whose organization
is almost completely visual, in many cases there exists organization on many simultaneous levels.
I start by making some (minimal) distinctions.
Classic authors on the topic are Rudolf Arnheim (1904–2007; see http://en.wikipedia.org/wiki/Rudolf_
17
entries/ingarden/.
Perceptual Organization in Visual Art 893
None of these strata is necessarily present in any given instance, although they may all be simul-
taneously relevant. The profile of weights that might be placed on the strata is a useful indicator
of style. It varies widely, as one notices in mutually comparing works by Mondrian19, Pollock11,
Malevich20, Rubens21, and Botticelli22, for instance.
One may associate different aesthetic values, either positive or negative, with the strata. But
what is more important is that the strata are never seen in isolation, except for special cases like
art restoration work—but then the work is not a ‘picture’ in the sense used by me here. Pictures
are organic wholes, implying that the strata are mutually interdependent23. There appears to be a
two-way causal flow24. A superstratum contributes context to objects or processes in a substra-
tum, whereas a substratum contributes substantial qualities to objects of the superstratum. In
this way, paintings may be comparable to polyphonic harmonies. Notice that there is room for
both harmony and disharmony, a crucial point in aesthetic appreciation. Of course, this may be
more easily noticeable in a Rubens painting than in a work by Malevich, simply because of their
very different structural complexities.
Pieter Cornelis ‘Piet’ Mondriaan, after 1906 Mondrian (1872–1944), was a Dutch painter. He was an impor-
19
tant contributor to the De Stijl art movement and group. See http://en.wikipedia.org/wiki/Piet_Mondrian.
Kazimir Severinovich Malevich (1879–1935) was a Russian painter and art theoretician. He was a pio-
20
neer of geometric abstract art and the originator of the avant-garde Suprematist movement. See http://
en.wikipedia.org/wiki/Kazimir_Malevich.
Sir Peter Paul Rubens (1577–1640), was a Flemish baroque painter, and a proponent of an extravagant baroque
21
painter of the early Renaissance. He belonged to the Florentine school under the patronage of Lorenzo de
Medici. See http://en.wikipedia.org/wiki/Sandro_Botticelli.
Riedl, R. (1978). Order in Living Organisms: a Systems Analysis of Evolution. New York: Wiley.
23
Riedl, R. (1984). Biology of Knowledge: the Evolutionary Basis of Reason. Chichester: John Wiley and Sons.
24
Gombrich, E.H. (1994). The Sense of Order: a Study in the Psychology of Decorative Art (The Wrightsman
25
occur in facial tattoos of the Maori30, African scarifications31 (Figure 43.4) and jewelry (earrings),
Navaho sand paintings32, Australian aboriginal art33, and Japanese family emblems34.
The spiral has a very simple organization, not much more complicated than a line.
However, it manages to cover an arbitrarily large area in a manner that is immediately visu-
ally evident. One might say spirals render an area visible. Other ways to render areas is by
(usually regular) stippling, or (usually regular) hatching—also common, and visually evi-
dent patterns.
The double and triple spirals are composite patterns, yet are immediately recognized as unified
designs. They cannot be arbitrarily extended, like the single spiral. Thus, they naturally fit within
a circular outline. Concentric circles, ornamental knots, mazes and labyrinths fit into the same
overall family of visual organization. They are found as ornamentation on bodies, weapons, pot-
tery, jewelry, floors, and walls. They serve as family emblems, powerful symbols (the swastika of
the Third Reich falls in this class), etc.
Another important class of ornamentation that often has strong perceptual organization is that
of band patterns. These occur in Europe from the stone age on35, and are found worldwide in vir-
tually all cultures. They naturally occur at the boundaries of disks and as ‘bracelets’ on rotationally
symmetric objects like weapons, pots, and sticks. In the simplest cases one finds parallel lines,
often zig-zag or wavy. In more complicated cases one finds repeated localized configurations. The
repetition is often ‘with variations’, usually regular ones. Most typical are simple alternations, like
in the ‘egg and dart’ pattern36 found at the Erechtheion (c. 421 BCE37).
Formally, the organization is defined by the ‘frieze groups’38, which are the classes of infinite
discrete symmetry groups for patterns on a strip. There are seven different frieze groups. The
groups are built on translations and glide reflections, one may find additional reflections along
the translation axis as well as half-turns. These basic organizations are found in ornamental bor-
ders of the most diverse origin (e.g., painted on or scratched in pottery, in basketry, in ‘barbed
wire’ tattoos, in tile borders), all over the world, in the most diverse cultures. Although the rep-
etition with variation is indeed visually salient there is little indication that the taxonomy of the
frieze groups plays an important role in visual organization39. It is apparently not part of a ‘visual
grammar’.
On indigenous Australian art (also known as Australian Aboriginal art) see http://en.wikipedia.org/wiki/
33
Indigenous_Australian_art.
On Japanese family emblems see http://en.wikipedia.org/wiki/Mon_%28emblem%29.
34
New York: John Wiley and Sons. See also Jablan, S. V. (1995) Theory of Symmetry and Ornament.
Mathematical Institute: Belgrade. (Electronic reprint available as: Symmetry and Ornament at http://www.
emis.de/monographs/jablan/index.html.)
39 On visual discrimination of the frieze (note 39) and wallpaper (note 41) groups see Landwehr, K. (2011).
Visual discrimination of the 17 plane symmetry groups. Symmetry 30(3): 207–219.
(a) (b)
(c)
The patterns that are being repeated are necessarily ‘local’. They are often abstract geometrical
forms, like circles or crosses, that may also be used for their own sake. Indeed, starburst patterns,
circles (concentric or intertwined pairs or triples), and especially crosses, are found in all cultures.
Crosses are especially common, even in non-Christian (due to distance in space or time) civiliza-
tions. These simple configurations have frequently been given meaningful interpretations (circles
and starbursts standing for the sun, crosses for human copulation, etc.), but it would seem that
the visual salience preceded such meanings (which indeed can vary). The basic forms are also
found in the colorations of animals and plants, think of the ‘eyes’ found on butterfly wings. The
‘releasers’ that evoke standard action patterns in birds and fishes are often based on similar pat-
terns. In more advanced cultures one often encounters stylized images of floral motifs, animals,
and humans. However, such stylizations are frequently based upon one of the basic forms, which
appears to give them their impact.
It would seem that these forms are indeed part of a ‘visual grammar’. Their common prop-
erty appears to be simplicity (minimal structural information content) combined with high
non-accidentalness (see also van der Helm, this volume, on simplicity).
In two dimensions one obtains the so called ‘wallpaper patterns’40. Again, their organization
can be fully formalized through the symmetry groups in the plane. There are 17 distinct groups,
as has been known since 189141. All were already used by the ancient Egyptians! Indeed, these
groups have been invented independently by many cultures worldwide. Fabulous examples
are found in the tilings of Islamic architecture. The Alhambra is the paradigmatic example
(Figure 43.5). I know of no comprehensive accounts on the visual perception of these patterns.
It seems unlikely that naive observers would spontaneously differentiate between the various
types. As with the frieze groups, there is little indication that the taxonomy of the wallpaper
groups plays an important role in visual organization. It is not a part of ‘visual grammar’.
A particularly simple manner to induce perceptually salient organization is by bilateral sym-
metry about a vertical axis42 (see also van der Helm, this volume, on symmetry). This works
with virtually any pattern—witness the Rorschach inkblot figures43 (Figure 43.6). Such patterns
are localized and are easily fitted into various bilaterally symmetrical regions (coins, round
emblems, square tiles, heraldic patterns, vases, etc.). Although heraldic symmetry is often very
strict, e.g., spread eagles with two heads, one looking left, one looking right, heraldic trees are
often not quite bilaterally symmetric. They don’t need to be, because they ‘simply look it’ anyway
(Figure 43.6). With some degree of scrutiny you can make out the difference, but this has no
relevance to the Gestalt. ‘Just looking’ reveals a ‘visual symmetry’, even if (strictly speaking) it
isn’t there.
Bilateral symmetry about a vertical axis again combines minimization of structural information
content (a mere ‘etcetera’ suffices) with remarkable non-accidentalness.
On the ‘wallpaper groups’: Pólya, G. (1924). Über die Analogie der Kristallsymmetrie in der Ebene. Z
40
Kristallogr 60: 278–282.
Fedorov, E. (1891). Simmetrija na ploskosti [Symmetry in the plane]. Zapiski Imperatorskogo
41
vertical axis of bilateral symmetry in perception, see Mach, E. (1886). Die Analyse der Empfindungen und
das Verhältnis des Physischen zum Psychischen. The text is available at http://www.uni-leipzig.de/~psycho/
wundt/opera/mach/empfndng/AlysEmIn.htm.
On the Rorschach test see http://en.wikipedia.org/wiki/Rorschach_test.
43
Perceptual Organization in Visual Art 897
Fig. 43.5 Example of a sophisticated tiling pattern from the Alhambra. The Alhambra is a treasure trove
of such tessellations of the plane. The reason is, no doubt, that Islam forbids the depiction of reality.
Thus artists either design all kinds of abstractions of Koranic writings or they move towards ornamental
patterns. Of course, mural tile work is perfectly suited for that.
© batarliah/istockphoto.com
Faces (as seen en face) are the most important instances of bilateral symmetry from a (human)
biological perspective. Given almost any bilaterally symmetric blob, human observers are likely
to ‘see’ a face in it44. This fact (though rarely acknowledged explicitly) is of the utmost impor-
tance to the visual arts. Women in particular specialize in optimizing the ideal ‘face’ configura-
tion (see Behrmann et al., this volume). Ideal faces are perfectly bilaterally symmetric of course,
whereas no actual face really is. Bilateral symmetry is a visual organization that readily arises in
vision, even when the actual patterns are far from ‘ideal’. Apparently it has a marked template
character (see also Koenderink, this volume, on Gestalts as ecological templates).
Humbert de Superville45, in his Essai sur les Signes Inconditionnels dans l’Art (Leiden, 1827) lists
the most important visual organizations of the generic face. This is perhaps one of the more inter-
esting treatises from the perspective of experimental phenomenology.
Fashion
Human figures are easily the most important objects for a human observer. Virtually all humans
are ‘artists’ in that they intentionally shape and decorate their bodies such as to evoke certain
See http://digi.ub.uni-heidelberg.de/diglit/superville1827/0006?sid=dd31a03a096431e9277bcc612775728c.
(a) (b)
(c)
(d)
Fig. 43.6 Card 2 of the Rorschach test. Some popular responses are ‘two humans’, ‘four-legged
animal’, (a) ‘animal: dog, elephant, bear’. The website adds: ‘The red details of card II are often seen as
blood, and are the most distinctive features. Responses to them can provide indications about how a
subject is likely to manage feelings of anger or physical harm. This card can induce a variety of sexual
responses’. (b), (c), and (d) Drawings by Alphonse Mucha (1860–1939). Notice the apparent symmetry.
This ‘symmetry’ does not survive scrutiny, or even a good look. Yet the symmetry is obvious at first
glance! Perhaps unfortunately, we don’t have much of a ‘psychophysics of the cursory glance’ today.
(a) © zmeel /istockphoto.com (b) Awakening of Morning’, 1899. Chicago (IL), The Curt Teich Postcard Archives.
© 2015. Photo Curt Teich Postcard Archives/Heritage Images/Scala, Florence (c) Mucha, Alphonse (1860-1939):
Irises, 1898. Moscow, Pushkin Museum. © 2015. Photo Fine Art Images/Heritage Images/Scala, Florence (d)
Dance (From the series The Arts), 1898. Artist: Mucha, Alfons Marie (1860-1939). © 2015. Photo Fine Art Images/
Heritage Images/Scala
Perceptual Organization in Visual Art 899
gut-level visual responses in others. Methods may aim at eternity (witness mummified Maori
heads), a lifetime (scarification, tattoo, skull deformation), a short period (seasonal fashion), a
mere occasion (make-up), or just a fleeting moment (intentional smile, slightly bending the arm
in order to de-emphasize the elbow joint by Victorian ladies, articulating the finger pattern). Most
of these methods immediately address the momentary visual awareness of others. Both faces and
bodies yield strong Gestalts. Paintings and sculptures can be seen as carrying on body display ‘by
other means’.
Most facial ‘make-up’ is aimed at evoking emotional responses, often of a sexual nature, in oth-
ers. This generally implies the accentuation of desirable ‘releaser’ patterns46 (Figure 43.7), that is to
say, accentuations of the natural countenance. Comparatively rare exceptions include the make-up
used by the military to visually merge in the environment (camouflage47) and tribal ‘war paints’
that are supposed to induce fear in opponents, or, perhaps, promote courage, or recklessness, in
the wearer. The camouflage techniques reverse the usual make-up techniques by de-emphasizing
the eyes and mouth, and even optically defragment the face. The dark eye-stripes48 encountered
with many prey animals similarly de-emphasize the eyes, which are otherwise salient indicators
of an animal’s presence. Apparently the laws of visual organization rule throughout the animal
kingdom (see also Cuthill & Osorio, this volume).
A steady component of female make-up is the accentuation of the eyes, usually by darken-
ing or coloring the eye sockets, evidently with the intention of drawing attention to them. It is
known from ancient Egyptian, Greek, and Roman remains. This sometimes includes taking a
drug (Atropa belladonna49) in order to dilate the pupils. Another steady component is overall face
color (white in the Japanese geisha, brownish in modern western women), hairline (shaving in
On releasers see: http://en.wikipedia.org/wiki/Ethology.
46
nl/2008/02/eye-stripe.html.
On Atropa belladonna see http://en.wikipedia.org/wiki/Atropa_belladonna.
49
900 Koenderink
the middle ages), hair silhouette (cutting, braiding, binding), and hair color (tinting). Usually the
mouth receives a strong accent (much like the eyes), involving lip color, shape, and size. These
components define the overall first impression. They cause the face to ‘read’ clearly, even at a cur-
sory glance. They also introduce a ‘style’ (e.g., compare the classical geisha, the ancient Egyptian
woman, the modern western young urban professional) thus they intentionally set out to trigger
specific visual organizations. More volatile fashions aim at the shape of the face (false shading to
accentuate bony structure, rouge to raise the cheeks, powder to kill a highlight on the nose, and so
forth). In some cases actual ornamentation may be added. All this is carefully orchestrated so as
to evoke a highly organized perception in immediate visual awareness.
That these facial Gestalts are to a large extent conventional becomes evident by widening the
scope beyond one’s daily social environment. Different cultures often use fully different methods,
even one’s own culture changes over time, both in the short and long terms. As one compares
painted portraits over the centuries one encounters remarkable uniformity over an era, but great
diversity over longer time spans. In more recent times we have photography and the cinema,
yielding detailed and veridical data. Of course, one has to ‘correct’ for various photographic tech-
niques here, the camera operators typically adding their own job of ‘make-up’ in a purely optical
way. With only moderate experience one is able to date a face accurately, hardly being off by a
decade and usually getting it right within a few years. The ‘decade look’50 can be picked up at a
glance, and is mostly a matter of visual organization.
Theatrical make-up uses the same techniques51, but in a highly condensed manner. The face
should ‘read’ in the intended manner even from a great distance, and in all lights. Despite their
differences, the methods of stage make-up and glamour make-up are only quantitatively different.
Both aim at creating a strong visual Gestalt of some desired kind, say of age, character, or profession.
What goes for the face ipso facto holds for the body52. A person may control the visual impres-
sion of the body by assuming certain (studied) poses, moving in particular etc. by accentuating
or hiding various features by way of appropriately chosen dress. If there is an ample layer of fat,
‘foundation’ (corsetry, bras, etc.) may work wonders ‘behind the scene’–optically, that is. These are
deployed so as to influence the immediate visual impression of others.
Again, going through western painting throughout the centuries (not to speak of non-western
cultures!) reveals an amazing variety over time, especially as concerns women. Men appear to
vary predominantly through different conventional clothing, whereas women actually appear to
vary in body shape, as is evident from the rendering of nudes. Yet this is evidently nonsense!
From a biological perspective, it is evident that women have (anatomically and physiologically)
not changed that much during historical time. Going through a selection of paintings forcefully
shows that the body image is a conventional Gestalt. It is of vital importance in society, and it also
pervades the visual arts, both in sculpture and in painting.
One might say (as is the case with the ornaments discussed above) that the body image is
a meme53. It is no different from (and closely related to) ‘fashion’ in clothes. Memes are com-
paratively stable ‘mental images’ (or schemes), that are somehow ‘contagious’. They apparently
On the female body in art throughout the ages see Hollander, A. (1980). Seeing Through Clothes.
52
New York: Avon Books.
53 On memes see Blackmore, S. J. (1999). The Meme Machine. Oxford: Oxford University Press.
Perceptual Organization in Visual Art 901
spread from person to person within a time-slice of culture, and soon become traditionalized.
One witnesses changes that seem comparatively fast compared with the lifetime of an estab-
lished meme. Almost by definition, all memes of interest to the present quest are especially
good Gestalts.
Here is a striking example of such a sudden ‘transition’. The female body image throughout
(visually) recorded history is roughly characterized as a vertical column with some conventional
modulation of the silhouette (accentuated belly and short legs in the western middle ages, flat belly,
narrow waist, and wide hips (‘36–24–36’) in modern times) with a structured upper part (breasts,
shoulders, and head). The columnar nature is emphasized in Egyptian, Greek (kore), and Roman
art, to be continued in the western middle ages all the way up to the twentieth century. The long
robe is the dress that highly accentuates this by hiding the legs, thus delineating the column rising
from the floor. Trousers came only recently.
In 1961 Marilyn Monroe54 wears jeans55 (and even a bikini—invented by Louis Réard in 194656)
in The Misfits57. Her penultimate act is an emotional solo performance. She intentionally keeps her
legs together, although she goes through emotional contortions, mainly bending at the hips and
knees. Michelangelo Antonioni’s Blow-Up58 dates from 1966, only 5 years later. One notices that
the photographer’s models are instructed to pose with legs widely apart, poses that are orthogonal
to the classical ideal. Jean Shrimpton59 (‘the shrimp’) and Lesley Lawson60 (‘Twiggy’) set the scene
in the fashion world of that period, and introduced a novel model of the modern female. The
poses became angular, emphasizing knee and elbow joints, which tended to be played down in
the past. Fashion accentuated the effect through strategically constructed sleeves, and stockings,
striving for an androgynous effect. Designers often forced the models to wear caps, causing them
to look like young boys at an awkward age. Remarkably, this changeover occurred in just a few
years. Pre-1960s and post-1960s photographs of women are impossible to confuse. The fashion
(graphic) artists immediately followed suit. Soon modern visual artists did the same.
The particular revolution described above gave rise to major changes in the composition of
fashion photographs. This can be nicely monitored from Antonioni’s Blow-up photo sessions
mentioned in the last paragraph59. Instead of the composition involving the single figure (essen-
tially a Greek sculpture), or a small group (say the three Graces61), the composition involves an
arbitrary number of models that repeat (or play upon each other’s) awkward poses. If a single
model is photographed in the angular pose the pose is usually related to the picture frame, or
suitably arranged props. In this way one obtains again a well-organized perceptual organization,
The Misfits (1961) is a film drama directed by John Huston, starring Clark Gable, Marilyn Monroe,
57
60 Lesley Lawson (born Hornby, 1949), widely known by the nickname Twiggy, is an English model, actress,
and singer.
The three Graces (Charites) became a popular theme in western art. See http://en.wikipedia.org/wiki/
61
Charites.
902 Koenderink
albeit of a completely different kind from the generic perceptual organizations from before the
transition. This illustrates that strong compositions are possible in any ‘style’. No photographer
could avoid the change, as a study of the work of the well-known fashion photographers reveals
(study Richard Avedon62 as an example).
Sculpture
Sculpture is the art of composition in three dimensions. Here we mainly focus on the clas-
sical bronze, stone, and wood sculptures, although the realm of ‘sculpture’ has been greatly
expanded in recent times. Moreover we concentrate on simple works (busts, figures, putti,
single animals, etc.), and ignore most groups (like Rodin’s Burghers of Calais63), or extended
scenes (like Bernini’s St Theresa64). Some dyadic and even triadic topics are readily regarded as
‘simple’ though—think of ‘the three Graces’62, ‘mother and child’ (e.g., Isis with Horus, Mary
with the Infant Jesus), or ‘woman with male corpse’ (e.g., the Pietà), in one of the conventional
poses.
Sculpture is all about perceptual organization. Although one may display the plaster cast of an
object as a ‘sculpture’ (not uncommon in our era), this is evidently conceptual art, not different
from displaying a urinal. Sculpture proper is ‘architectonic’, it is about the composition of vol-
umes and surfaces. In 1893 the German sculptor Adolf von Hildebrand65 published a theory that
was ridiculed (but acclaimed by others) at the time. He was only interested in ‘naturalistic’ work.
He distinguished sharply between the Daseinsform and the Wirkungsform of volumetric objects.
The Daseinsform is what might be called the physical presence of an object. It enters awareness
through movements of the vantage point (binocular vision, moving around the object, or looking
at the manipulated object). Thus, it is not a thing of immediate visual awareness, but a cognitive
construction on the basis of many successive awarenesses. The Wirkungsform is an artistic con-
struction that works from a single viewpoint, immediately. This involves architectonic thinking
on the part of the artist. The artist has to understand microgenesis. The observer should appreci-
ate the view as ‘natural’, and be able to capture it in immediate visual awareness. As Hildebrand
observes, children’s drawings work immediately. He concludes that the Wirkungsform should
include what makes children’s drawings work. Thus, sculpting is not about copying nature. It is
about affecting human visual awareness. He mentions the ‘Grecian nose’66 as an example (‘ . . . it is
not as if the Greeks had noses like that. . . . ’).
Most western sculpture made before World War I is ‘volumetric’, and can be largely understood
in terms of an overall composition based on a small number of simple (ovoid, cubical, or cylin-
drical) major forms, smoothed together and elaborated by way of surface relief. Here ‘surface’
Richard Avedon (1923–2004), born Richard Avonda was an American fashion and portrait photographer.
62
See http://en.wikipedia.org/wiki/Richard_Avedon.
The Burghers of Calais is one of Rodin’s major works. See http://en.wikipedia.org/wiki/The_Burghers_of_
63
Calais.
64 Saint Teresa in Ecstasy is a sculptural group in the Cornaro Chapel, Santa Maria della Vittoria, Rome. It was
designed by Gian Lorenzo Bernini. It is a major work of the high Roman baroque.
Adolf von Hildebrand was the author of an important book Das Problem der Form (1893). One can find a
65
cosmetics (Harriet Hubbard Ayer’s Book of Health and Beauty) of 1902 the author describes the Greek nose
as ‘perfect’. This seems to have been the general opinion throughout the nineteenth century.
Perceptual Organization in Visual Art 903
should be understood in a very broad sense. Thus—for visual purposes—a cube can be under-
stood as essentially a sphere (a compact volumetric object with aspect ratios of roughly 1:1:1),
with a superficial ‘dressing’ of corners and edges. The overall composition is due to the mutual
relation of the major forms, and is retained when the sculpture suffers through weathering, and
so forth, as is often seen in old unrestored works. Even the overall configuration usually yields a
strong cylindrical, ovoid, or block-like impression67 (Figure 43.8). Exceptions (e.g., horse rider,
boy with dolphin, etc.) are usually seen as ‘groups’ of pieces that might exist as individuals. The
relations between group members are of a higher order than the relations between the subvolumes
of a single member.
An interesting instance of variations on a single basic shape are the ‘character heads’ made by
the Austrian sculptor Franz Xavier Messerschmidt68 (Figure 43.9). By all counts Messerschmidt
might be denoted as mentally ill when he produced 64 studies of his own head assuming the
most incredible grimaces. There is no doubt a system in this madness, although we remain in
the dark as to Messerschmidt’s formal design. What is of interest here is that the basic form,
Messerschmidt’s skull, remains constant over the series, whereas the muscular/fatty/skinny clad-
ding varies widely. The set is well documented, and makes a fascinating body of work for the study
of (sculptural) form.
Later developments in mainstream western sculpture involve extreme non-convexities. These
may take the form of holes (see also Bertamini & Casati, this volume) or are due to the bending of
elongated volumes. Such work still retains the overall volumetric character though. Constructivism
changed that by introducing non-volumetric elements like wires, rods, and plates. Such work may
lead to completely different perceptual organizations, in which the overall, mostly empty space,
dominates over volumetric, filled space. If the classical organization is like a rock, the new one is
like a leafless tree in the winter. The introduction of non-rigidly connected parts in arbitrary move-
ments destroyed even this static spatial organization. The perceptual organization may be similar
to that of a flock of birds. The visual organization changes when you walk around a work, very
differently for open and closed sculpture, the reason being that you look through open structures
(Figure 43.10). The Constructivists introduced transparent material for much the same reasons.
Painting
By ‘painting’ I refer to any type of essentially ‘planar’ art, be it drawing, embroidery, map making,
intarsia, sand painting, you name it. I mainly limit the discussion to works of human or slightly
smaller size and mainly confined to some visually obvious ‘frame’. The frame may be implicitly
defined by the size of the paper or explicitly as with an actual frame around a canvas, etc. In most
cases the frame, in whatever form, is an important part of the composition. Paintings as physical
objects are arrangements of colors on a planar surface of limited extent. Paintings as artworks may
or may not succeed in evoking varieties of visual awareness in observers that suit the intention of
the artist. Success or failure depends upon the distribution of colors, at least if the group of observ-
ers are in the artist’s intended target group. Thus ‘composition’ is everything69.
Of course, the range of possible visual awarenesses that the artist might want to evoke is virtu-
ally unlimited. To complicate matters, artists often had, and have, secret agendas. Apart from the
(c)
Fig. 43.8 The Egyptian piece in (a) is almost a cubical chunk of stone (man called Ay Second
Prophet of Amun and High Priest of the goddess Mut at Thebes, Limestone, XVIII Dynasty, 1336–
1327 BCE, Brooklyn Museum New York). (b) Peplos Kore from Paros (c. 530 BCE, Acropolis Museum,
Athens). (c) The Venus de Milo, Greek Hellenistic, c. 100 BCE, Louvre, Paris. Notice that so-called
‘abstraction’ comes first and so-called ‘naturalism’ only in later stages. This is entirely typical. Art
does not arise from a need for mimesis, it derived from an urge to create something that should
hold itself against nature. Naturalism only becomes possible when the artist has ‘conquered
nature’.
(a) Block Statue of Ay, ca. 1336–1327 B.C.E. Limestone, 18 9/16 x 10 x 12 1/4in. (47.1 x 25.4 x 31.1cm). Brooklyn
Museum, Charles Edwin Wilbour Fund, 66.174.1. Creative Commons-BY Accession Number: 66.174.1 (b) Peplos
Kore, c. 530 b.C., from Athens. Athens, Acropolis Museum. Marble. h 4 ft. (m 1.21).- © 2015. Marie Mauzy/Scala,
Florence (c) Greek civilization, 2nd century b.C. Statue of Aphrodite known as Venus of Milos, circa 100 b.C. From
the Island of Milos, Cyclades, Greece. Paris, Louvre. Marble, height 202 cm.© 2015. DeAgostini Picture Library/Scala,
Florence
(a) (b)
(c)
Fig. 43.9 Three ‘character heads’ by Franz Xaver Messerschmidt (1736–1787). At one point in his
career Messerschmidt became mentally ill, and started on a project of 64 representations of his own
head in various states of grimace. The set (most have been kept) is worth close study because these
(mutually very different) shapes are all based on a single template, namely the sculptor’s own skull.
(a) Messerschmidt, Franz Xaver (1736–1783): The Yawner, after 1770. Budapest, Museum of Fine Arts Budapest
(Szepmueveszeti Muzeum). Photo: Jozsa Denes © 2015. The Museum of Fine Arts Budapest/Scala, Florence.
(b) Messerschmidt, Franz Xaver (1736–1783): A Hypocrite and Slanderer, Bust, Austrian, Made in: Austria, ca.
1770–1783. New York, Metropolitan Museum of Art. © 2015. Image copyright The Metropolitan Museum of
Art/Art Resource/Scala, Florence. (c) Messerschmidt, Franz Xaver (1736-1783): A Hypocrite and Slanderer, Bust,
Austrian, Made in: Austria, ca. 1770–1783. New York, Metropolitan Museum of Art. © 2015. Image copyright
The Metropolitan Museum of Art/Art Resource/Scala, Florence
906 Koenderink
Fig. 43.10 Naum Gabo (1890–1977) Constructed Head No. 2 (1916, original lost). The Gabo is
constructed from planar sheets. Compare the Egyptian piece in fi
gure 43.8 (a), which is compact,
like a pebble.
Artist: Gabo, Naum Caption: Head No. 2 ,1916, enlarged version 1964 Classification: sculpture Medium: Steel
Dimensions: object: 1753 x 1340 x 1226 mm © Tate, London 2015. The Work of Naum Gabo © Nina & Graham Williams
urge to evoke visual awareness in their intended audience, they often have pedagogic or idealistic
objectives (this includes propaganda and advertisement). Here we only consider visual aware-
ness proper. The best illustrators and propagandists are invariably good artists. They have to be,
otherwise their ‘messages’ would not be driven home. For all we care, ‘pure art’ is a nonentity.
I simply concentrate on the perceptual organization, and ignore the ‘message’. This may be hard if
the cognitive message is very loud. A thoroughly detached attitude is of the foremost importance.
Experimental phenomenology should proceed in the same way as a physician performing an
autopsy. In studying visual awareness one should be ‘all eye’.
The first impact upon the eye is the composition. The composition is often not noticed by the
observer in a conscious fashion but it is always an important part of the artist’s trade. The compo-
sition is why certain images are remembered forever and others are forgotten after so much as a
glance.
An example of a memorable image is the photograph taken by Joe Rosenthal on 23 February 1945
on Iwo Jima, generally known as Raising the Flag on Iwo Jima70 (Figure 43.11). It depicts five marines
(b) (c)
Fig. 43.11 (a) Original photograph of the raising of the flag at Iwo Jima. (b) the first stamp.
(c) a recent parody.
(a) © MPVHistory / Alamy. (b) © Zoonar GmbH / Alamy.
908 Koenderink
and a US Navy corpsman raising the US flag atop Mount Suribachi. Three of the five did not survive
the battle. The photograph won a Pulitzer Prize in the same year, and in 1954 it was used as the
theme of the Marine Corps War Memorial (by Felix de Weldon) at Arlington National Cemetery. By
public demand it was printed on a postage stamp 5 months after the event, selling over 137 million
(the biggest selling stamp issued by the US Post Office). The photograph has been re-enacted, pub-
lished, painted, sculpted, cartooned, tattooed, etc., countless times. It is a true public image.
Another example is the painting American Gothic71 (Figure 43.12) by Grant Wood (1930). Whereas
initially the painting raised huge controversy, it soon became a public image. There exist numerous
copies (including sculptures), and countless parodies. A postage stamp was issued in 1998.
Why do these images command such public interest, even among people with scant interest in the
arts, and even many years after their first publication? It is not just their conceptual meaning, although
that evidently plays a role too. It is their immediate visual impact, as the many parodies, many of which
are just visual puns only roughly reflecting the gist of the image, show. Apparently these images ‘have
something’ that other pictures lack. The ‘something’ evidently has to do with the perceptual organization
evoked by them. The images have a Gestalt quality that easily survives reduction to postage stamp size.
The first visual impression is largely based upon the overall ‘gist’72. This gist is retained even
in a thumbnail reduction to a dozen by a dozen pixels. Art directors73 who have to select pic-
tures for magazines often look at reduced images (by printing proof sheets, using a reducing
glass, and so on). It is generally agreed that if an image doesn’t survive such minified viewing
it will certainly fail to have ‘impact’, even when printed large at high resolution in some glossy
magazine. Of course, in cases of iconic images, images for use in signs, etc., the gist may be all
there is (Figure 43.13).
Artists use various kinds of preliminary depictions74. The croquis is a gestural drawing of the live
model. It is done fast, and captures the essentials. The croquis (usually a number of croquis) are
used by the artist to design the final composition. The croquis is sought by the connoisseur because
of its sprezzatura75. The esquisse75 is a first sketch. The esquisse is intended to be used by the artist,
and is sought by the connoisseur because it allows a rare insight in the artist’s mind set. The esquisse
is often a stronger statement than the finished work. Several (or many) may be made, in order to
explore the range of possibilities of a project. The croquis and esquisse are usually small in size. The
ébauche75 is the underpainting for a painting, it is not intended to be seen, or used as such, since
its fate is to be overpainted. It is the size of the final painting. Since it is painted in a much broader
style, the ébauche may well be more indicative of the artist’s intentions than the final work though.
Famously, the Impressionists were accused of passing off their ébauches as final paintings.
Thus, the exploration of the gist is usually an important part of the evolution of a work. All these
exploratory or summary statements are of considerable interest to the study of visual organiza-
tion as it applies to the visual arts. In many cases they may be of more immediate interest than
tain nonchalance, so as to conceal all art and make whatever one does or says appear to be without effort
and almost without any thought about it . . . ’. The book is available at http://archive.org/details/bookof-
courtier00castuoft.
(a)
(b) (c)
Fig. 43.12 (a) Grant Wood’s (1891–1942) American Gothic (1930, Art Institute of Chicago). (b) a
Department of Agriculture Food Bank Debit Card. (c) one of the many parodies [the Web message
said: ‘Paris Hilton, left, and Nicole Richie pose with Tinkerbelle in this undated publicity photo. The
friends star in Fox’s new reality series “The Simple Life”, in which Hilton and Richie try to survive on
a camp. Notice how such parodies can (pictorially) be far off (e.g., the left figure is higher than the
right one, both figures are female, much younger, the clothes are very different, also in color, the
background is fully different, and so forth), yet are immediately recognized for what they are. There
seems to be no explicit ‘reasoning’ involved. Apparently the ‘gist’ is very generic in such cases.
(a) Wood, Grant (1892–1942): American Gothic (American Gothic), 1930. Chicago (IL), Art Institute of Chicago.
oil on panel, 78 x 65 cm © 2015. DeAgostini Picture Library/Scala, Florence (b) © GarRobMil (c) © REX/Snap Stills
910 Koenderink
Fig. 43.13 Isotypes (International System of Typographic Picture Education) were promoted by Otto
Neurath (1882–1945), an Austrian philosopher and member of the Wiener Kreis, in about 1935.
They were designed by an artist, Gerd Arntz (German-Dutch, 1900–1988; see http://www.gerdarntz.
org/). Such pictograms are still widely used all over the world. Most can be ‘read’ at a glance,
without any prior instruction.
© DACS 2015.
the study of completed works. It is hard to say to what extent the artistic development of a work
parallels microgenesis of visual perception76—cases where it apparently does and cases where it
clearly does not are not hard to find.
The impact of an image starts with the gist, but most images, except perhaps gestural sketches,
esquisses made in preparation for final works, and so forth, have relevant structures at other scales
that will be revealed under continued observation. Even comparatively simple paintings usually
require a ‘good glance’ involving a dozen fixations in order to obtain a preliminary impression.
This is not yet full scrutiny, but it certainly moves part of the way to visual cognition. Many of the
parts will still be in mere visual awareness though. Their impact on the whole is pre-cognitive and
depends upon Gestalt factors rather than cognitive factors. Most images one sees have many lay-
ers of scale, and even after scrutiny there is usually quite a bit of ‘mystery’ left; there are structural
On microgenesis see Brown, J.W. (1999). Microgenesis and Buddhism: the Concept of Momentariness.
76
elements that remain on the pre-cognitive level although one is well aware of them. An under-
standing of this spectrum that ranges from pure awareness, over cognitive stages to pure reflective
thought, is largely lacking.
A fact that is often forgotten, or certainly highly underestimated, is that virtually all images are
instances from an extremely huge number of possibilities. Consider a low-quality image from the
internet: it is likely to have a file size of 4 kb, implying that it is one of a set of 84000, a huge number.
The image is a member of a set of more than 2 × 103612 possible images. No one has a feel for num-
bers like that. You have at most 105 hairs on your head. The number of particles in the universe is
estimated at 1080, again, much smaller. Remember that is for just a low-quality image! Thus, the
number of possible images is for all practical purposes infinite. Of course, most of these images
‘look like nothing’, that is to say they look like ‘noise patterns, which all look the same. The ones
that ‘look like something’ are only a tiny fraction, though still an essentially infinite set. There is
no way one could ever see them all.
The ‘space of images’ as explored here is merely the space of physical images, or as Maurice
Denis put it ‘essentially a flat surface covered with colors assembled in a certain order’. What is
really of interest in the present investigation is, of course, the space of visual presentations of a
human observer. This is much more difficult to describe, it is a virtual space. This is the space of
real interest. The discussion that follows focuses on this visual space, although I will use the space
of physical images to indicate rough ballpark estimates.
One can identify the style of a painting at a glance and immediately identify an artist from a
work one has never seen before; a ‘fake van Gogh’ can be spotted at first sight, and so forth. It is a
priori likely that the set of images that are striking at first sight is also huge, but no doubt one will
not have encountered more than a vanishingly small fraction yet, no matter what one’s age. There
is still ample room for further development in the arts, so to speak. Perhaps the amazing thing
is that ‘visual organization’ works as well as it apparently does. However, it seems quite possible,
perhaps even likely, that the ability of human observers to deal with images enables them to deal
with only a small, singular subset.
From the perspective of experimental phenomenology, it is evidently of interest to attempt
to attain an overview of the boundaries of human visual microgenesis. This is far more diffi-
cult a problem than might be expected. Throughout the history of western art there have been
‘paradigm shifts’, not only of a mild character (a style change) but also of a cataclysmic nature.
Although hardly imaginable now, the paintings of the early Impressionists were considered dan-
gerous enough that pregnant women were kept away from the salon des refusés for fear of miscar-
riages77. The Cubist movement, and the work of ‘Jack the Dripper’78, perhaps fall into a similar
category. Such occasions can be seen as the conquest of a novel area, previously terra incognita,
of the space of images. In the case of the globe one at least had a notion that there was a ‘white
area’ somewhere, it could be marked hic sunt dracones79. This is not really possible with the space
of images. The new area discovered by Jackson Pollock must have felt more like the fear of early
sailors that they would fall off the edge of the (thought to be flat) earth.
Many of these cataclysmic changes had to do with attacks on our trust in the structure of the
generic terrestrial environment. This involves the ground plane, the existence of mutually disjunct
78 Jack the Dripper (Paul Jackson Pollock 1912–1956, known as Jackson Pollock) was an influential American
painter and a major figure in the abstract expressionist movement.
On hic sunt dracones (‘here be dragons’) see http://en.wikipedia.org/wiki/Here_be_dragons.
79
912 Koenderink
(a) (b)
Fig. 43.14 (a) Ingres (1780–1867) La Source (begun 1820, completed 1856, Musée d’Orsay, Paris).
(b) Pollock (1912–1956) Echo No. 25 (1951, Pollock-Krasner Foundation/Artists Rights Society (ARS),
New York). Compare the spatial structure. The figure in the Ingres is a solid form that stands in front
of a background, there is space behind the body. In the Pollock there is only a faint, fleeting, and
changing impression of objects and environment. The pictorial surface dominates over any classical
‘pictorial space’.
(a) Ingres, Jean Auguste Dominique (1780-1867): La source. Paris, Musee d’Orsay. peinture. © 2015. White
Images/Scala, Florence (b) Pollock, Jackson (1912-1956): Echo (Number 25, 1951). New York, 10 x 12 Museum
of Modern Art (MoMA). Enamel on unprimed canvas, 7’ 7 7/8’ x 7’ 2’ (233.4 x 218.4 cm). Acquired through the
Lillie P. Bliss Bequest and the Mr. and Mrs. David Rockefeller Fund. 241.1969 © 2015. The Museum of Modern
Art, New York/Scala, Florence
solid bodies, optical properties like the opaqueness and diffuse scattering of material surfaces, and
so forth. Impressionism80 destroyed part of that by dissolving the picture of the environment into
a chromatic, misty space. Cubism81 merged solid bodies with the background, and began their
fragmentation. Pollock completely sacrificed solid bodies (Figure 43.14). The observer finally lost
the ground under his or her feet. Meanwhile, movements like Surrealism and Dadaism attacked
from the other side, so to speak, and destroyed the relationships an observer silently expects to
find in the generic terrestrial scene82.
An analysis in terms of experimental phenomenology suggests a first rough inventory of the
part of the space of images that might be open to the human visual observer. One criterion is
On generic terrestrial scenes see Clarke, K. (1949) Landscape into Art (available for download at http://
82
archive.org/details/landscapeintoart000630mbp).
Perceptual Organization in Visual Art 913
whether microgenesis arrives at some fixed point after prolonged looking. Such fixed points
appear to occur in one of the following three cases:
a more or less uniform image;
a highly structured image, that is statistically uniform even in its small parts;
a ‘classical’ scene.
In the first case one sees nothing remarkable, whereas it is evident that this will never
change, for want of structure. The blue sky is an instance, so are many modern minimalist
paintings83. In the second case microgenesis ‘gives up’ in face of complexity. The image is
summarized as ‘texture’. The film grain in the sky of a 1950s monochrome photograph is an
example84. One doesn’t even try to ‘see anything’ in such a sky, although the texture is noted.
The third case is that of the nineteenth-century still life, landscape, or genre painting. One
simply sees what is there, and that is it. The proviso here is that images are rarely exhausted
at one ontic level. The genre scene may well offer interesting ‘mystery’ in the background, in
the rendering of structure and so forth. After all, no painter is going to paint all the individual
leaves of grass, yet the image of a meadow can hardly be painted a uniform (dead) green.
These three categories serve for a first parceling of the space of images, a bit like the distinc-
tion between the oceans and continents of the globe. Of course, the boundaries cannot be sharp.
Given any image, it is always possible to construct a huge number of images that are essentially
look-alikes. Thus, an image is not like a point, but like an open environment85 in image space. Such
open environments will be different for a glance, a good look, or under scrutiny. Under a glance
the environment of look-alikes may well have a complicated structure, since the observer is likely
to ‘miss’ parts that would be easily ‘got’ at another glance.
Perhaps more interesting are the images for which microgenesis fails to immediately arrive at a
(single) fixed point. One may distinguish (at least)
spontaneous jumps from one fixed point to another;
spontaneous fluctuations between a limited number of fixed points;
endless, chaotic fluctuations of visual presentation.
In the first case the observer notices that visual awareness suddenly changes, whereas it is hard
to regain the previous presentation. An example is the well known ‘Dalmatian dog’ picture86. At
first blush it looks like a pattern of blotches. Once you’ve seen the dog, it will stubbornly stay. In
the second case the presentations jump back and forth between a number of fairly obvious pres-
entations. A well-known case is Jastrow’s duck-rabbit:87 you never see anything like a ‘duck-rabbit’,
but either a duck or a rabbit. Moreover, these presentations spontaneously flip. The third case
is perhaps the most interesting, both from an artistic and a scientific perspective. It is the case
famously described by Leonardo da Vinci, in which the observer never stops to ‘hallucinate’ in the
Material/www.illusionworks.com/html/camouflage.html.
Jastrow’s duck-rabbit can be seen at http://en.wikipedia.org/wiki/File:Duck-Rabbit_illusion.jpg.
87
(a)
(b)
Fig. 43.15 (a) Rapid East by Suzanne Unrein, Courtesy of the artist (b) Robert Pepperell, Succulus
(2005) Oil on panel, 123 x 123 cm. Notice how Unrein paints in a ‘post-neo-baroque’-style. She
writes: ‘I started with Rubens, Correggio and Raphael, then branched out to less likely combinations
of Poussin and Bougereau. Now it’s the animaliers of the 17th & 18th centuries, the boar hunts
and dogfights. By combining the hounds from these genres with the figures from more epic scenes
the dogs become a dysfunctional Greek chorus further confusing the summarizing of a scene.
I am less interested in the narrative than the elements and forms that inspire the abstraction, and
movement, with a larger range of color combinations. By combining figures from a variety of artists
in a range of eras, I want to transport them from their original meaning into the contemporary
Perceptual Organization in Visual Art 915
Of course, the same thing happens when you look at (or into) a painting. John Ruskin is special
because he saw that one doesn’t need any ancient stained wall. Every vision suffices if you only
tune into the presence of ‘mystery’ in everything. Nothing is absolutely clear. You cannot count
the grains of sand beneath your feet, nor the leaves on the tree before you. What the painter paints
is not the leaves, but a leafy, ‘mysterious’ texture90. Therein lies the art.
There is a huge realm of the visual arts that exploits the pleasure experienced by observers due
to Ruskin’s mystery. It has merely come bluntly to the surface in modern times. Like all pictorial
structure, mystery occurs at all ontic levels. Much of surrealism occurred at the level of the repre-
sented entities. This is the level where René Magritte91 worked. In a sense, it is the least ‘visual’ of
these manifestations. The level of the ‘leafy texture’ is the level of the smallest relevant constitu-
ents. It is purely visual, and interesting, although only mildly so. It is to be expected in virtually
any serious painting (Magritte intentionally tried to avoid it). The most interesting levels from a
conceptual point of view are the levels of the simple meaningful units and the salient Gestalts. Some
of the more interesting work of Salvador Dali92 plays on the latter level, but the former is perhaps
the more interesting from the viewpoint of experimental phenomenology. Artists who address
88 Leonardo’s observations on what one might see in an old wall can be found at http://www.mirabilissimein-
venzioni.com/ing_treatiseonpainting_ing.html.
John Ruskin’s mystery is discussed in his Elements of Drawing, which can be downloaded from http://www.
89
gutenberg.org/files/30325/30325-h/30325-h.html.
On background texture (leafiness) see http://www.artsconnected.org/toolkit/encyc_texturetypes.html.
90
Good descriptions can be found in John Ruskin’s Modern Painters, an electronic version of which is avail-
able at http://www.lancs.ac.uk/fass/ruskin/empi/index.htm.
René François Ghislain Magritte (1898–1967) was a Belgian surrealist artist. See http://en.wikipedia.org/
91
wiki/René_Magritte.
Salvador Domingo Felipe Jacinto Dalí i Domènech, 1st Marqués de Dalí de Pubol (1904–1989), known as
92
domain and the challenge of newer interpretations’. Pepperell’s painting is ambiguous on purpose,
he writes ‘ . . . paintings and drawings are the result of intensive experimentation in materials
and methods designed to evoke a very specific, though elusive, state of mind. The works induce
a disrupted perceptual condition in which what we see cannot be matched with what we know.
Instead of a recognizable depiction the viewer is presented with—what the art historian Dario
Gamboni has called—a ‘potential image’, that is, a complex multiplicity of possible images, none of
which ever finally resolves’.
916 Koenderink
this level (for instance, Robert Pepperell93, or Suzanna Unrein94) play on the sentiments described
by Leonardo (Figure 43.15).
Conclusion
The topic is virtually boundless. I have only touched on a few conceptually interesting issues
here, fully ignoring extensive fields of endeavor like architecture, photography, cinema, or mime.
Moreover, I did not touch on the tangencies with music, poetry, and so forth. Each subtopic could
easily be extended into a book, or a lifetime of research.
My main objective in this chapter has been to offer some general background for thought, and
to indicate potentially profitable openings for future research in the experimental phenomenol-
ogy of the visual arts.
Robert Pepperell (born 1963) is an artist and professor of fine art at the Cardiff School of Art and Design.
93
Theoretical approaches
Chapter 44
Hierarchical organization by
and-or tree
Jungseock Joo, Shuo Wang, and Song-Chun Zhu
Introduction
A natural scene is composed of many components. See the example of the scene of beach in
Figure 44.2. When we look at this image, our visual systems process a series of tasks in order to
understand the whole scene. These tasks include to decompose the whole scene into parts, group
them to form larger and larger parts, and organize discovered parts in a certain way. It has been a
fundamental problem in computer vision to mimic these procedures by machine vision systems.
However, this is a very challenging task due to the huge complexity arisen from an enormous
number of distinct scene configurations, which are composed of a variety of objects and regions
of varying shapes in different layouts.
In this chapter we will introduce a general model for scene or object categories that can repre-
sent varying configurations effectively. The desired properties of such models can be summarized
as follows:
1 It should incorporate generic grouping rules among image primitives at low-middle level
interpretation (i.e. Gestalt Laws) as well as category-specific production rules of parts at high
level (i.e. image grammar).
2 Compositionality is required as it ensures that the model can be expressive enough to deal with
hugely varying configurations of many components by a relatively small dictionary.
3 The structural representation should be flexible so that it can adaptively capture unique
configuration of each instance at multi-scales, as opposed to fixed representations.
4 Finally, the learned models should be unambiguous and allow only one interpretation to each
instance of a given scene or an object.
In order to fulfil such requirements, the proposed model will be a hierarchical compositional
model based on the tiling method. The tiling, as shown in Figure 44.1, can be seen as a process
of composing complex shapes by assembling smaller and simpler parts. Figure 44.1(a) shows a
tiling puzzle, an ancient invention in China, called ‘Tangram’. While it is composed of a small set
of very simple pieces, one can composite an enormous number of a variety of complex shapes by
assembling them. The same intuition can be also found in real-world examples such as tessellated
street pavement and ceramic tile flooring. In such cases, one can observe complex high-order
patterns emerging from one or few types of tiles according to specific configurations, namely,
organizations of tiles.
Inspired by these examples, each individual component of scene or object will be treated as a tile
in the proposed model whose visual dictionary will be a collection of all observable tiles. Each tile
is treated as a template that explains a specific part of the image. Then, the task of understanding
920 Joo, Wang, and Zhu
Fig. 44.2 A natural scene (top) as well as an object (bottom) contain a number of components and
their subcomponents. We can completely understand the image by decomposing the whole into the
parts and organizing them.
the whole scene will simply become tiling, which is identifying proper tiles and assembling them.
As the nature of tiling, we consider the assembly of tiles in 2D space in this chapter, in contrast to
another class of models to cope with 3D arrangement of parts or primitives.
Our framework which utilizes image parts (tiles) and their relations is closely related
to a series of theories in part-based object recognition of human vision, for example
‘Recognition-by-Components’ by Biederman (1987). According to these models, humans per-
ceive given scenes as their ‘structural descriptions’ with a limited set of known components in
memory while a huge flexibility is achieved through combinations of the components. On the
Hierarchical Organization by And-Or Tree 921
other hand, another class of theories, ‘image-based’ models (Edelman and Bülthoff 1992; Tarr
and Bülthoff 1998), suggest that our brains store many viewpoint-specific images of the same
object. By analogy, our model also incorporates multiple templates, each of which explains an
aspect specific to viewpoint or appearance type. Such treatment allows us to deal with complex
and non-rigid parts of real-world objects such as humans. In contrast to image-based models, we
define the set of templates at the part level (rather than at the entire image level) and parse the
image into the parts with selected templates where the relations among the parts are also captured
by the model structure. Therefore, our proposed model can be seen as a combined approach that
can benefit from both classes of models.
Background Review
In this section, a group of related researches on perceptual organization will be briefly reviewed.
In particular, we will consider two different dimensions: (1) whether their grouping rules and
parts are generic or category-specific (see ‘Grouping rules: generic vs category-specific’), and
(2) whether their representations are built on a flat layer or in hierarchy (see ‘Organization: flat
vs. hierarchical’).
scene. The corresponding configurations can also capture unique structure or relations of parts.
For example, a human and a dog have different sets of parts and different configurations, and none
of them can be identified by generic rules without domain knowledge.
Organization: Flat vs Hierarchical
The generic grouping rules, such as Gestalt laws, have been often posed as relational constraints
on the parts, which are modelled in a flat layer. For example, Zhu (1999) proposed a mathematical
framework based on Markov Random Field (MRF) whose neighbourhood structures captured
relationship between line segments. Through the structures, Gestalt laws were explicitly modelled
as pairwise features so that they could act as constraints posed on shape elements. Porway, Wang,
and Zhu (2010) also employed MRFs for aerial image parsing where the common elements of
aerial images such as parking lots, roads, etc. were defined on the graph. Subsequently, the statisti-
cal constraints such as relative position were added between objects.
However, certain relations or groupings can be better organized and expressed in the hierarchy of
different levels of abstract. A fractal pattern is a good example in which one can observe the law of sym-
metry recursively at infinite scales. Let’s also recall the beach example in Figure 44.2 which contains
many components and their subcomponents. One can easily imagine the huge complexity that would
be generated by modelling all components and their relations together on the flat representation.
The use of hierarchical representation for image modelling dates back to 1970s in Fu’s early
works (Fu 1974): syntactic approaches in which pattern structures and sub-pattern relations
were modelled as symbolic tokens and production rules by analogy to natural languages.
Dickinson, Pentland, and Rosenfeld (1992) adopted a hierarchical Bayesian network for 3D
object recognition, where layers of short boundaries, object faces, and aspects were linked hier-
archically. Sarkar and Boyer (1994) also used the Bayesian network for grouping primitives into
hierarchical structures in aerial images. In both models, groupings were governed by condi-
tional probabilities defined over layers in the hierarchy. More recently, Geman and collaborators
(Bienenstock, German, and Potter 1997) presented grammatical and compositional frameworks
with applications such as vehicle licence plate recognition (Jin and Geman 2006). Zhu and
Mumford (2006) also proposed a general framework for image grammar named the And-Or
Graph, which we adopt in our model and will discuss in details in ‘Hierarchical Organization
by AOT’.
The key advantage of these approaches is that they can represent an enormous number of dis-
tinct configurations by composing a relatively smaller number of elements, instead of enumerat-
ing all possible configurations. In addition, hierarchical structures further allow us to limit local
complexity at each scale. As discussed at the beginning of this chapter, these are critical aspects in
modelling highly complex and versatile scene or object classes.
Again, the remaining question is how to learn image parts and their relations. In the rest of this
chapter, we will introduce a hierarchical compositional model based on ‘Hierarchical Tiling’. In
this model, the grouping rules will be defined by region-based recursive decomposition and each
subregion will correspond to an atomic element in the dictionary (see ‘Hierarchical Organization
by AOT’). Then the learning problem can be posed as a node pruning and parameter estimation
problem (see ‘Structure Learning by Parameter Estimation in AOT’).
has been used for modelling objects and scenes in the literature of computer vision (Zhu,
Chen, and Yuille 2009). An AOT, as the stochastic image grammar, represents the hierar-
chical decompositions of elements and produces a number of varying configurations by
alternating sub-components subject to probabilistic distributions defined over nodes and
edges.
Each node in the AOT plays a distinct role according to its node type. As Figure 44.3 illustrates,
an AOT has three types of node: AND nodes, OR nodes, and Terminal nodes. Note that all nodes
are associated with specific subregions and the root node corresponds to the whole region of
image. Each type can be characterized as follows.
1 An AND node represents the composition of two subregions. For instance, ‘upper-body’ = ‘head’
∪ ‘torso’. By the definition of hierarchical tiling, AND nodes always have two child nodes.
2 An OR node contains several alternating ways to decompose the current region. This is a switch
indicating how and where to partition the current region.
3 A Terminal node corresponds to the most elementary region that is not decomposed further.
Note that an AOT is a ‘whole’ representation of the entire scene class, in the sense that all possible
decompositions of all subregions are integrated in this AOT. In order to represent a particular
image, one needs to make choices at OR nodes to select specific decompositions. We call this
process parsing, which yields corresponding representations as follows:
1 A Parse Tree is an image-specific instance drawn from AOT. This is a set of selected nodes
including terminal and non-terminal nodes.
2 A Configuration means a spatial layout of elementary regions in a parse tree. In other words, it
is a set of terminal nodes in a parse tree which does not reflect hierarchical relationship.
3 A whole AOT can be seen as an entire collection of all possible parse trees and
configurations.
One important benefit of this representation is the flexibility that is required to account for
varying scene components and configurations. When built on an 8 x 8 grid, the AOT can
generate more than 4 × 1031 different parse trees. This flexibility comes from only 1296 rectan-
gular building blocks that are reconfigurable. The efficiency of this model partly relies on the
Or-nodes
And-nodes … …
Or-nodes
And-nodes … … Learning … …
…
Configurations …
Fig. 44.3 During the learning process, a number of invalid configurations are eliminated from the
initial model. This results in a huge drop in the complexity of the model and the final model only
contains a compact set of meaningful configurations which can be frequently observed in the
training images.
924 Joo, Wang, and Zhu
fact that smaller subregions—nodes in lower layers in AOT—can be shared by multiple par-
ent regions of a higher order. However, such huge flexibility also introduces a counter-effect
on increased complexity and ambiguity. We will discuss this issue in detail in the following
section.
Mathematical Formalism
In this subsection, we define notations and introduce mathematical formalisms. Given a set of N
training images {Ii}, our objective is to learn an AOT with visual dictionary and associated param-
eters. Let us define the AOT as follows.
where S is a start symbol at root, i.e. the whole region, V is a set of nodes in AOT. A node, vi
∈ v, has one of types: {AND, OR, Terminal} as section 3. Θ is the set of model parameters which
control the frequencies of decomposition rules being activated at OR nodes. The tiling dictionary
of the scenes is denoted by Δ, which is also a set of terminal nodes in V.
From the AOT, the learning problem can be formulated as maximum likelihood estimation
(MLE).
N
(∆, Θ)* = arg max ∆ ,Θ ∑ log p(I i ; ∆, Θ). (2)
i =1
In the AOT model, each image I is generated by a hidden parse tree, pt. Then the data likelihood
in Eq. (2) can be marginalized over parse trees and further factorized as follows.
For a certain parse tree, pt, the first factor of the product in Eq. (4) represents the likelihood
of an image given the parse tree. In order words, it measures how well the parse tree and corre-
sponding configuration are suited to or explains the given image. And the second part, p( pt ; Θ),
is a prior probability of the parse tree and this measures how commonly this parse tree would be
used. This part is not affected by the choice of image.
Therefore, our learning procedure follows the exact same strategy as humans do. The algorithm
takes as input a set of training images and infers the most probable interpretations of them, i.e.
parse trees and configurations. Next, it can evaluate what kinds of configuration are the most
common and how frequent each one is. Such information is stored as parameters of the learned
model, and eventually, can be used for analysing a new image.
On the other hand, the main difficulty in many structure-learning algorithms comes from the
fact that there are too many different ways to decompose the scene into parts, i.e. ambiguity.
This difficulty can be alleviated here by constraining the feasible set of structures by definition
of the hierarchical tiling described in the previous section. The hierarchical tiling AOT contains
a number of rectangles on the grid as basic building blocks as well as rules of decomposition.
In this representation, the original continuous geometric space is quantized at the resolution of
the grid, and moreover, factorized into the local forms of three regions: one parent region at an
AND node and two subsequent subregions. Therefore, the complexity is locally limited, and this
makes the model manageable in learning. Note that, despite this constraint, it can still represent
a combinatorial number of parse trees, which provide enough flexibility to modelling a variety
of configurations.
Figure 44.3 illustrates the key idea of the learning procedure which can be seen as a shrink-
ing process. It first establishes a very ‘fat’ and highly over-complete initial model. This model
can generate an exponential number of different configurations. Some of these configurations
are useful (they correspond to real examples of natural scenes); however, most of the other
configurations do not make any sense and are unable to capture the meaningful structure of
any natural scenes. These meaningless configurations will be gradually eliminated from the
initial model during the learning procedure. Eventually, the learned model can generate a
much more compact set of configurations and parse trees, which one can commonly observe
in real images.
Iterative Learning
In our formulation, a parse tree is a latent variable that is not observable. One common algorithm
used for maximum likelihood estimates with latent variables is the expectation-maximization
(EM) algorithm (Dempster, Laird, and Rubin 1977). This is an iterative algorithm and alternates
between evaluating the posterior distribution of latent variable and updating model parameters,
based on the current estimates at each iteration. Our learning algorithm follows a similar iterative
strategy which alternates between inference of the optimal parse trees and updating parameters.
The details of each step can be summarized as follows.
1 Inference. Inference is the task of evaluating the most probable parse tree which can be
considered as the best interpretation of a given image under the current parameters of
AOT. We obtain the optimal parse tree for each image by dynamic programming (DP) in a
bottom-up process. For a given image Ii, the optimal parse tree, pt i*, maximizes the following
probability:
p( pt ; Θt ) = ∏ Θt(v ,v ) ,
ch
(6)
v ∈V OR ⊂ pt
where Θt(v ,v )
is the branching frequency from an OR node v to its child node, vch.
ch
926 Joo, Wang, and Zhu
2 Activation frequency update. After obtaining the optimal parse trees, now the parameters of
model are updated. These parameters include the activation frequency, Θ, which indicates
frequencies of decomposition rules.
Θt(v+,1v =
∑ 1[v, v ∈ pt ] .
i ch
*
i
ch
)
∑ 1[v ∈ pt ] (7)
i
*
i
3 Node pruning. According to the updated frequency, the dictionary is compressed by pruning
nodes which have never or rarely been activated.
∆ t +1 = ∆ t \ {v ; f (v ) < ε, v ∈ ∆ t },(8)
1
f (v ) =
M i
∑1[v ∈ pti* ]. (9)
These steps are repeated until the model converges. At the beginning, an initial AOT contains a huge
number of decomposition rules and a large dictionary, and there is a very high ambiguity on parsing
images. As iterations proceed, the model parameters keep being refined; in addition, the size of the
dictionary becomes smaller. A series of relevant experimental results is presented in the following
sections, with applications to the scene and the human body.
Qualitative Results
Table 44.1 shows the statistics on the complexity of AOT. The size of parsing space that an initial
AOT defines is combinatorial. It contains a huge number of region decomposition rules and this
can generate an enormous number of distinct parse trees. This also implies a high ambiguity.
Through the iterative learning procedure, the admissible parsing space quickly shrinks by pruning
Hierarchical Organization by And-Or Tree 927
Table 44.1 The shrinkage of AOT for a ‘street’ scene at each iteration round
AND OR
Round |V | |V | |VT | | Ωpt |
Fig. 44.4 Parse an image into scene configuration. (a) Input image. (b) Segmentations in different
layers. (c) The optimal parse tree of given image. (d) Scene configuration. (e) Scene configuration
with localized parts.
many infrequent parsing rules and nodes. After convergence, the learned AOT only contains a
compact set of common parsing paths and nodes.
ground-truth segmentations (label map) of training images are provided in this dataset and we
used them for compatible comparisons with the other method.
We compare the performance of our model with prior works including: (1) a holistic ‘gist’
feature-based method (Oliva and Torralba 2001); (2) a BoW based method (Li and Perona 2005);
(3) the spatial pyramid matching (SPM) method (Lazebnik et al. 2006); (4) the locality-constrained
linear coding (LLC) (Wang et al. 2010), and (5) the tangram model (Tgm) (Zhu et al. 2012).
Figure 44.5 shows the average precision (AP) of different methods, where our method outper-
forms the others.
This is strong evidence supporting the needs of flexible and hierarchical models in understand-
ing the scene. Without such a hierarchy, one can still identify some common visual words (BoW),
but one loses the spatial information and the relationship between parts, and fails to capture the
context on the entire scene. Although some uniformly predefined configurations have been used
in SPM, it still results in poor performance. One possible explanation is that their configura-
tions, regular grids at multi-resolution, are not coherent with real images of scenes. Therefore, by
0.025 Coast
0.02
0.015
Posterior
0.005
0
0 10 20 30 40 50 (b) ‘Coast scene’
Configuration
Forest
(c) ‘Open country scene’ (e) ‘Forest scene’ (i) ‘Inside city scene’
Methods Gist[15] BoW [12] SPM [11] LLC [22] Tgm [25] Ours[23]
Fig. 44.5 Scene classification based on the categorical typical configurations. (a) The learned
configuration distributions where the horizontal axis is the index of configuration and the vertical
axis is the posterior probability. (b)–(i) The categorical typical configurations for each scene category.
The performance of scene classification is shown in the bottom table.
Hierarchical Organization by And-Or Tree 929
pursuing meaningful spatial layout from training data, the hierarchical tiling model can improve
classification performance.
E F
Image D
A
C
B
E C F
D A B
Conclusion
In this chapter, a hierarchical representation of images and its learning algorithm were discussed.
The And-Or Tree (AOT) was adopted as the main framework modelling the hierarchy of image
structure. An algorithm to learn the parameters and dictionary of the AOT was suggested with
Hierarchical Organization by And-Or Tree 931
Fig. 44.8 (Left) The optimal configurations pooled from the AOT at each iteration. At the beginning,
the ambiguity is very high. As the learning proceeds, the optimal configuration becomes more
meaningful, and finally captures the correct parts of human bodies. (Right) Some popular elements
in the dictionary of the AOT after learning.
mathematical formalisms. Finally, to demonstrate the introduced model and learning method,
two concrete cases, for natural scenes and human bodies, were presented, with various experi-
mental results.
Acknowledgements
This work was supported by NSF CNS 1028381, DARPA MSEE grant FA 8650-11-1-7149 and
MURI grant from ONR N00014-10-1-0933. We would like to thank Johan Wagemans and two
anonymous reviewers for their valuable comments.
References
Biederman, I. (1987). ‘Recognition-by-Components: A Theory of Human Image Understanding’.
Psychological Review 94: 115–147.
Bienenstock, E., S. Geman, and D. Potter (1997). ‘Compositionality, MDL Priors, and Object Recognition’.
In Advances in Neural Information Processing Systems, edited by M. C. Mozer, M. I. Jordan, and
T. Petsche, pp. 838–844. Cambridge, MA: MIT Press.
Dempster, A. P., N. M. Laird, and D. B. Rubin (1977). ‘Maximum Likelihood from Incomplete Data via the
EM Algorithm’. Journal of the Royal Statistical Society, Series B 39: 1–38.
Dickinson, S. J., A. P. Pentland, and A. Rosenfeld. (1992). ‘From Volumes to Views: An Approach to
3-D Object Recognition’. Computer Vision, Graphics, and Image Processing: Image Understanding
55(2): 130–154.
Edelman, S. and H. H. Bülthoff (1992). ‘Orientation Dependence in the Recognition of Familiar and Novel
Views of Three-Dimensional Objects’. Vision Research 32: 2385–2400.
Felzenszwalb, P. F. and D. P. Huttenlocher (2004). ‘Efficient Graph-Based Image Segmentation’.
International Journal of Computer Vision 59(2): 167–181.
Felzenszwalb, P. F., R. B. Girshick, D. A. McAllester, and D. Ramanan (2010). ‘Object Detection with
Discriminatively Trained Part-based Models’. IEEE Transactions on Pattern Analysis and Machine
Intelligence 32(9): 1627–1645.
932 Joo, Wang, and Zhu
Probabilistic models of
perceptual features
Jacob Feldman
Features
A ubiquitous element in perceptual theory is that of a feature, meaning a measurable attribute
of an object, such as its color, form, orientation, or motion. Features are a routine part of the
description of experimental stimuli, and an essential component of verbal descriptions of every-
day visual experience (the black pen is on the square table). Features play a wide variety of roles
in perceptual theory. Features such as convexity and symmetry are thought to influence figure/
ground interpretation (Kanizsa and Gerbino, 1976), helping to form initial representations of
objects (see Peterson, this volume). Later on each object’s features are bound together to form
complex object representations (Treisman and Gelade 1980; Ashby et al. 1996). Still later each
objects’ features are used to classify them into larger categories (Feldman 2000; Lee and Navarro
2002; Ullman et al. 2002).
But behind the simple idea of a ‘feature’ lurks some deep theoretical issues and controversies,
involving how features are defined and what motivates the choice of a particular feature vocab-
ulary (Jepson and Richards 1992; Koenderink 1993). This brief chapter centers on the ongo-
ing evolution of the feature concept from a ‘classical’ view, in which features are deterministic
attributes of objects, to a more probabilistic view, in which features are probabilistic estimates
of attributes inferred from image data. The newer view has grown in prominence in conjunc-
tion with a broader probabilistic conception of perceptual inference more generally (Knill and
Richards 1996).
It is useful first to distinguish certain commonly used terms and among different notions of ‘fea-
ture’ that are occasionally conflated. The terms feature, property, dimension, and attribute are all
commonly used to refer to an image characteristic that varies among visual objects. Each of these
terms is sometimes used to indicate the characteristic that can vary (e.g. size), or to a particular
value that it can take (e.g., large). Thus some authors refer to color as a feature, while others use
the term to refer to specific values such as red, white, or blue; and so forth. Some authors reserve
one term (e.g. feature) for the variable, another for the value (e.g. property), but such usage does
not seem to be consistent across the literature. The term feature is sometimes reserved for discrete
qualities, meaning those that can take one of a finite number of distinct values, including discre-
tizations of what are normally continuous-valued properties: examples include red vs. green (two
discrete cases of the continuous parameter color) or vertical vs. horizontal (discrete cases of the
continuous parameter orientation), and so forth (Aitkin 2009). Features with exactly two values,
often referred to as binary or Boolean, can be understood to involve the presence or absence of
some attribute (e.g. red vs. non-red).
934 Feldman
A more subtle distinction particular to the term feature is that it is sometimes used to refer
to localizable elements within an image, such as the facial ‘features’—eyes, nose, and mouth—
located at various positions on a face. Researchers in stereopsis, for example, refer to correspond-
ence between features in the left and right visual images, meaning local elements of the image with
well-defined locations (Poggio and Poggio, 1984). Any visual function that involves searching for,
counting, or measuring distances among features presumes this sense of the word. In contrast
many ‘features’, such as shape or color, are not localizable, but are characteristic of whole objects.
The distinction between these two senses of feature breaks down a bit when spatially localizable
elements are described in terms of their characteristics. For example, a T-junction is a spatially
localizable element of a line drawing, but is also a characteristic that some line junctions have and
others do not. In this review I will focus on the first sense of feature, as a characteristic that varies
among objects, although the issue of localizability becomes central later when we consider local
vs. global support for features.
m0
(a)
m
¬f f
(b)
Probability
p(m|¬ f ) p(m| f )
m
Fig. 45.1 Schematic illustrating the difference between (a) classical features, which divide the
measurement space (m) into clean-cut classes; and (b) probabilistic features, which are based on
potentially overlapping probability distributions.
Probabilistic models of perceptual features 935
This issue parallels a famous debate in the literature on cognitive categories, which following the
seminal papers of Posner and Keele (1968) and Rosch (1973) evolved from a ‘classical’ conception
based on necessary and sufficient features (see Smith and Medin 1981) to a graded and ‘fuzzy’
view based on prototypes (Posner and Keele 1968; Reed 1972), exemplars (Medin and Schaffer
1978; Nosofsky 1986), or both (Nosofsky et al. 1994; Anderson and Betz 2001), in order to account
for the observation that some category instances seem to be better examples of the category than
others. In recent years the modern view has been expressed via probabilistic models in which
conceptual representations are probabilistic estimates of underlying generating classes (Anderson
1991; Ashby and Alfonso-Reese 1995; Goodman et al. 2008; Briscoe and Feldman 2011). In a few
famous cases, perceptual processes do seem to impose relatively hard boundaries at thresholds
along continuous parameters, a phenomenon known as categorical perception (see Harnad 1987).
However such cases are exceptional, and in any case even they seem to involve gradations in the
vicinity of the threshold.
2 Arbitrariness. In the classical view a feature like between 600 and 601 meters in height is
perfectly well-defined, even though it captures no natural kind, and may not distinguish in any
useful way between objects that satisfy it and those that do not. Such features are arbitrary in
that they fail to relate to real classes actually extant in the environment. A desirable property of a
feature vocabulary is that it be well-tuned to the classes it use used to describe, a desideratum the
classical model in no way guarantees.
3 Insensitivity to context. A feature like has a 6-cylinder engine is perfectly well-defined for
cars, but makes no sense when applied to trees, and vice versa for evergreen. Such features
make meaningful distinctions only within a single narrow context. Indeed human subjects are
known to employ different features depending on context (Blair and Homa 2005; Schyns et al.
1998; Goldstone and Steyvers 2001) and can learn new features in new contexts (De Baene
et al. 2008; Stilp et al. 2010). But a classical feature vocabulary does not in any way constrain
the context in which features are applied, since their definitions make reference only to image
conditions satisfied or not. As with arbitrariness, the problem is that classical features allow no
connection between their definitions and the properties of the environment.
The sections that follow outline a modern probabilistic conception of features that avoids each
of the above defects. Probabilistic conceptions of features are certainly not new, but have grown
over several decades (from roots in signal detection theory; see Green and Swets 1966). The recent
explosion in probabilistic conceptions of perception (see Kersten 2004 or Feldman, this volume)
has introduced a natural mathematical language for expressing many probabilistic ideas, includ-
ing that of a perceptual feature. In what follows I attempt to lay out the basic modern idea of
probabilistic features in a simple and general way.
936 Feldman
Probabilistic features
From a probabilistic viewpoint, features are attributes of objects that are estimated from image
measurements, rather than measurements of image properties per se. The assumption is that
measurable image properties derive from both fixed distal properties of objects as well as random
noise, and that useful features attempt to extract the signal from the noise (see Figure 45.1). To
formalize this, we assume that an object feature f involves a likelihood distribution over image
measurements m,
p (m | f ) ∼ µ + e (1)
where µ is some mean value of m conditioned on the presence of the feature, and e is an error
drawn from a noise distribution with mean 0, such as a Gaussian
(
e ~ N 0, σ 2 ) (2)
The probabilistic assignment of feature values to image structures then proceeds by Bayes’ rule: an
object with measurement m is assigned feature f in proportion to the posterior
p ( f | m) ∝ p (m | f ) p ( f ) (3)
p ( f | m) ∝ p (m | f ) p ( f ). where p(f) is a prior distribution over feature values. The prior may
(though need not) be uniform (e.g. p ( f ) = p(¬f ) in the case of a Boolean feature) in which case
the posterior is proportional to the likelihood. The likelihood model p(m | f ) is sometimes called
a generative model because in effect it is a model of how the image was generated, describing how
observables (m) are generated stochastically from the distal reality (f). The Gaussian model given
above is only an example; other functional forms may be assumed, so long as they define a distri-
bution p(m | f ).
For example, the feature large might classically have been defined via a range of permissible
object sizes. But probabilistically it would be defined via a mean size µ, say 3 cm, plus some error
distribution, say normal with standard deviation 1 cm. (The mean µ itself might be conditioned
on other aspects of context, allowing large to mean different things in different contexts; see
below.) In contrast with the classical view, this means that largeness is a graded quality, with some
objects more likely to be regarded as large (namely, those closer to 3 cm) and others less likely.
This also means that the category of large objects can actually overlap with that of small objects
(see Figure 45.1). That is, a given object can be described by two contradictory features, although
generally with different probabilities.
Non-accidental features
Non-accidental features are an important class of perceptual feature that have received somewhat
more careful mathematical attention. As originally defined by Binford (1981) and Lowe (1987)
non-accidental features are properties of 2D configurations (e.g., cotermination of line segments
in the image) that reliably occur in the presence of associated 3D configurations (cotermination of
3D line segments in the world) but are very unlikely otherwise; that is, they are unlikely to occur
‘by accident’. Other examples include collinearity, parallelism, and skew symmetry (Wagemans
1993). More generally, a non-accidental feature is one that has high probability if certain distal
Probabilistic models of perceptual features 937
conditions are satisfied, but low probability otherwise. There is substantial, though not unalloyed,
empirical evidence that the visual system is particularly sensitive to non-accidental features1
(Wagemans 1992; Vogels et al. 2001; Feldman 2007; Amir et al. 2012), and they play an impor-
tant role in Biederman’s influential (1987) Recognition by Components (RBC) account of object
recognition.
Formally, a discrete image feature M (corresponding, say, to a fixed range of some measure-
ment m) is non-accidental with respect to a distal feature f if M has high probability in the
presence of f, i.e. p( M | f ) ≈ 1, but low probability otherwise, p( M | f ) ≈ 0. Jepson and Richards
(1992) showed that another condition is required in order for M to reliably indicate the presence
of f, namely that the prior on f be elevated relative to alternatives. That is, f must be a condition
that occurs with elevated probability in the world; it must be a recurring regularity (see also
Feldman 2009).2 As in the ubiquitous illustration of Bayesian inference in a medical context—in
which reliable inference of a disease based on a positive test requires not only an accurate (sen-
sitive and specific) test but also a high prior (e.g. see Gigerenzer and Hoffrage 1995)—it is not
sufficient that a measurement class M be likely conditioned on a world state f; the world state
f must itself have a high prior.
An example of a non-accidental feature is collinearity, extensively studied in the literature on
contour integration and completion (Hess et al., this volume; Field et al. 1993; Uttal et al. 1970;
Elder and Goldberg 2002; Geisler et al. 2001). In classical definitions, collinearity is defined via a
criterion on the orientation difference between successive edges in a chain. In probabilistic for-
mulations (Feldman 1995, 1997; Feldman and Singh 2005), collinearity is defined by a probabil-
ity distribution over turning angles (usually a normal or von Mises distribution) centered on
0◦ (straight continuation). This distribution gives a formal definition of the graded quality the
Gestaltists called ‘good continuation’, with perfectly straight being the ‘best’ and deviations from
straight constituting progressively ‘worse’ instances. In the probabilistic conception there is no
such thing as a turning angle that is definitely collinear or definitely not; any turning angle might
be an instance of the class (i.e., have been generated from a smooth contour process), though
straighter ones are more likely to be. Moreover, collinearity understood this way satisfies the
1 More precisely, there is very strong evidence that qualitative features such as non-accidental ones have spe-
cial salience relative to ‘metric’ or quantitative features (see references in text). But it is not completely clear
whether non-accidentalness is the correct mathematical characterization of ‘qualitative’ features.
2 To see why, assume that we express the condition p( M | f ) ≈ 1 as p( M | f ) = 1 − ε (with ε some low nonze-
ro probability), and similarly p ( M | ¬f ) = ε . Similarly assume f has low prior compared to alternatives,
e.g. p ( f ) = ε and p ( ¬f ) = 1 − ε (meaning that f occurring a priori is just as unlikely an accident as M occurring
without f). With these assumptions the posterior on f when M holds will be
p ( f | M ) = p ( M | f ) p( f )
p ( M | f ) p ( f ) + p ( M | ¬f ) p ( ¬f )
(1 − ε)(ε)
= (1 − ε)(ε) + (ε)(1 − ε)
= 1/ 2
That is, the probability of f in the presence of M(1 / 2) is no greater than the probability of ¬f (also 1 / 2). That
is, if f has low prior, then even though M is non-accidental in the standard sense, observing M does not actu-
ally indicate that f is particularly likely. As Jepson & Richards showed, a small “accident probability” of ε (i.e.,
non-accidentalness) only leads to reliable inference if the feature prior p( f ) is substantially greater than ε.
938 Feldman
requirement of elevated prior needed to guarantee statistical reliability. Collinear turning angles,
generated approximately from the von Mises distribution, occur along smooth contours, but rela-
tively rarely otherwise (only ‘by accident’). Smooth contours themselves are ubiquitous in the
world because they occur along the boundaries of many objects (Ren et al. 2008). Because of this
elevated probability, image conditions suggestive of collinearity generally do reliably signal col-
linearity in the world. Like a positive test for a disease that does have a high prior, observed col-
linearity reliably signals common physical origins.
But the distinction between multipart and single-part shapes is notoriously difficult to model
because the decomposition of shapes into component parts does not rely on any simple attribute
but instead involves a large set of non-local shape cues (Singh and Hoffman, 2001; de Winter and
Wagemans 2006). Classically, one would need to find some parameter reflecting two-partedness,
and set a threshold above which a shape is considered to have two parts rather than one. But such
a parameter is difficult to identify, and any threshold along it would be arbitrary. One can define
a spectrum of shapes (see abscissa of Figure 45.2) that vary smoothly from shapes clearly having
one part (left of figure) to those clearly having two (right of figure). Exactly where along this spec-
trum the boundary between one and two lies is unclear.
Alternatively, one can understand this shape feature probabilistically by defining distinct gen-
erative models for one- and two-part shapes. In the framework of Feldman and Singh (2006), a
one-part model would have a single axis (see Figure 45.2) from which the shape grows laterally;
this tends to yield simple elliptical shapes with random variations. Similarly, a two-part model
would have two axes, one branching off the other (see Figure 45.2), thus tending to generate shapes
with two distinct parts. (The recursively branching aspect of this generative model makes it hier-
archical; see Goldstone et al. 1991; Sanocki 1999; Geisler and Super 2000 for diverse discussions
of hierarchy in perceptual representations.) Each model can generate shapes anywhere along the
spectrum, but with different probabilities; the distributions overlap. Figure 45.2 illustrates how
the relative probability of the two models (more specifically, their posterior ratio) varies from one
end of the shape space to the other, with clearly one-part shapes (left) having higher probability
under the one-axis model, and clearly two-part shapes (right) having higher probability under
“One-part” “Two-part”
model A model B
Probability
p(A|SHAPE)/p(B|SHAPE)
Fig. 45.2 The shape feature two-parts vs. one-part, viewed probabilistically. The figure shows a
spectrum of shapes ranging from a single part (left) to two parts (right). Towards the left shapes are
well fit by a two-part model and poorly fit by a one-part model; at the right, vice versa. (Models are
shown with ribs; likelihood is diminished by variance in the lengths and directions of the ribs along
with several other factors.) The figure illustrates how the relative probability (posterior ratio) of the two
models shifts from favoring the one-part model on the left to favoring the two-part model on the right.
940 Feldman
the two-axis model, and intermediate shapes lying in between. (In the Feldman and Singh (2006)
model, variance in the lengths and angles of the ‘ribs’ [correspondences between axis points and
shape points, shown in the figure] entail poor fit between the model and shape and thus dimin-
ish likelihood. One can see by looking at the ribs in the figure how, for example, variance among
the rib lengths increases as the fit between the shape and the model degrades.) Briscoe (2008; see
Feldman et al. 2013) found empirical evidence for an exaggerated perceptual division between
one-part and two-part shapes at about the point where the posterior ratio shifts from favoring one
model to favoring the other.
Figures 45.3 and 45.4 illustrate two other shape features, respectively straight vs. bent
(Figure 45.3) and circular vs. elliptical (Figure 45.4). Again, both these shape spaces involve
smoothly varying aspects of shape that, in a classical view, would require an arbitrary division
between shape categories, but which are more elegantly described as varying probabilistically.
Incidentally, both of these shape features (in their classical guises) are invoked in distinctions
between geons in RBC (Biederman 1987).
“Straight” “Curved”
model A model B
Probability
p(A|SHAPE)/p(B|SHAPE)
Fig. 45.3 The shape feature bent vs. straight viewed probabilistically. Straighter shapes (left) are well
fit by a straight-axis model and poorly fit by a bent-axis model, while more bent shapes (right) are
better fit by the bent-axis model. (Models are shown with ribs; likelihood is diminished by variance in
the lengths and directions of the ribs along with several other factors.) The figure illustrates how the
relative probability (posterior ratio) of the two models shifts from favoring the straight-axis model on
the left to favoring the bent-axis model on the right.
Probabilistic models of perceptual features 941
on image parameters. While classical features may lump together highly dissimilar objects, or
exaggerate small differences among highly similar objects, probabilistic features make categorical
distinctions only in accord with the statistical evidence.
2 Non-arbitrarinesss. Moreover, more subtly, probabilistic features also solve the problem of
arbitrariness and context insensitivity. One of the main benefits for the probabilistic approach is
that it allows us to understand and formalize the connection between the feature lexicon—the set
of features used by the observer—and the statistical structure of the world (Barlow 1961; Shepard
1994). The world has predictable probabilistic structure: forms, scenes, and spatial relations tend
to occur in systematic, reliably recurring ways. A useful feature vocabulary is one that effectively
describes the probabilistic terrain.
One way to characterize the probabilistic structure in the world is by describing its ‘modes’,
meaning statistical peaks in the probability distribution that describes the world. A simple
example is the mean-plus-error definition of feature f = µ + e given above, which defines a mode
p(m|f) in the measurement space m. A simple assumption is that image structure contains a
set of such modes, each corresponding to a distinct naturally occurring class; in this case the
underlying distribution is the union of such modes, called a mixture distribution (see McLachlan
and Basford 1988). An effective feature, then, would be one that distinguishes ‘natural modes’
“Circular” “Elliptical”
model A model B
Probability
p(A|SHAPE)/p(B|SHAPE)
Fig. 45.4 The shape feature circular vs. elliptical viewed probabilistically. More circular shapes (left)
are well fit by a point-axis model and poorly fit by a straight-axis model, while more bent shapes
(right) are better fit by the straight-axis model. (Models are shown with ribs; likelihood is diminished
by variance in the lengths and directions of the ribs along with several other factors.) The figure
illustrates how the relative probability (posterior ratio) of the two models shifts from favoring the
point-axis model on the left to favoring the straight-axis model on the right.
942 Feldman
(Richards and Bobick 1988; Feldman 2012). Just as a single probabilistic feature separates one
modal distribution from another (see again Figure 45.1), a set of features is useful when it dis-
tinguishes the variety of modes extant in the world from each other. That is, a feature set is
meaningful when it ‘carves nature at its joints’—and the probabilistic formulation allows us to
specify where the joints are. Probabilistic features viewed this way are both non-arbitrary and
context-dependent.
Probabilistic features are non-arbitrary because their utility depends on the statistical structure
of the world they are used to describe, and a model of this statistical structure is part of the theory
supporting them. Classical features, by contrast, are defined ex nihilo; their definitions need not
in any way relate to the world. A classical definition of large/small might adopt an arbitrary size
cutoff; a probabilistic definition hinges on modal size categories in the world, and thus would be
m
2 m1
Joint distribution
p(m1,m2)
f2
B C
m2
p(m2)
m1
Marginal
distributions p(m1)
f1
Fig. 45.5 Context-sensitivity in probabilistic features. Because of the shape of the joint distribution
p(m1, m2) (shown in inset and as contour plot in main figure), feature f2 is well-defined for one value
of f1 (where it distinguishes mode A from mode B) but not for the other value of f1, which has only
one mode (C).
Probabilistic models of perceptual features 943
different for spoons (one mode about 10 cm, the other about 12 cm, say) vs. cars (one mode about
4 m, the other about 5 m).
Gestalt perceptual features, like proximity, good continuation, and closure, are infamous for
their vague definitions. The probabilistic formulation suggests that these features are difficult to
define because they mean different things in different contexts; a rich probabilistic description of
the world is required to specify exactly what they mean in the diverse situations in which they are
used. Creating such generative models is, of course, a substantial scientific challenge that has not
yet been met in many cases.
Conclusion
Perceptual features are involved in virtually all aspects of vision science, but are still treated in a
variety of divergent ways. Behavioral experiments still often use features defined by intuitively
simple criteria. At the same time, an enormous neuroscientific literature has established sophis-
ticated feature concepts based on the response properties of cells in visual cortical areas. Early
in the processing stream, these include such well-established properties as orientation, motion,
and stereoscopic disparity. Later in the stream, these include increasingly non-local properties
such as contour curvature (Pasupathy and Connor 2002), medial axis structure (Hung et al. 2012;
Lescroart and Biederman 2012), aspects of 3D shape (Yamane et al. 2008), and other less eas-
ily verbalized aspects of global shape (Op de Beeck et al. 2001; David et al. 2006; Cadieu et al.
2007). An important common theme to many modern proposals is that the visual system’s choice
of features is in some way optimized to the statistical structure of the visual world (Field 1987;
Olshausen 2003; Geisler et al. 2009). Indeed, there is a growing consensus that the underlying
neural code is inherently probabilistic (Rieke et al. 1996; Yang and Shadlen 2007). However, a fully
developed probabilistic model of visual features, in particular one that extends beyond early rep-
resentations to incorporate non-local features such as form, shape, and spatial relations, does not
yet exist. Such a model must be considered one of the main goals of the next decade of research
in the visual sciences.
Acknowledgment
I am grateful to Irv Biederman, Manish Singh, Wolf Vanpaemel, Johan Wagemans, and an anony-
mous reviewer for helpful comments. Preparation of this article was supported by NIH EY021494.
Please direct correspondence to the author at jacob@ruccs.rutgers.edu.
944 Feldman
References
Aitkin, C. (2009). Discretization of continuous features by human learners. Unpublished doctoral
dissertation, Rutgers University.
Amir, O., Biederman, I., and Hayworth, K. J. (2012). ‘Sensitivity to nonaccidental properties across various
shape dimensions’. Vision Research 62: 35–43.
Anderson, J. R. (1991). ‘The adaptive nature of human categorization’. Psychological Review 98(3): 409–29.
Anderson, J. R., and Betz, J. (2001). ‘A hybrid model of categorization’. Psychonomic Bulletin and Review
8(4): 629–47.
Angelucci, A., and Bullier, J. (2003). ‘Reaching beyond the classical receptive field of V1
neurons: horizontal or feedback axons?’ Journal of Physiology Paris 97(2–3): 141–54.
Ashby, F. G., and Alfonso-Reese, L. A. (1995). ‘Categorization as probability density estimation’. Journal of
Mathematical Psychology 39: 216–33.
Ashby, F. G., Prinzmetal, W., Ivry, R., and Maddox, W. T. (1996). ‘A formal theory of feature binding in
object perception’. Psychological Review 103: 165–92.
Barlow, H. B. (1961). ‘Possible principles underlying the transformation of sensory messages’. In Sensory
communication, edited by W. A. Rosenblith, pp. 217–234 (Cambridge: M.I.T. Press).
Biederman, I. (1987). ‘Recognition by components: a theory of human image understanding’. Psychological
Review 94: 115–47.
Biederman, I., and Shiffrar, M. (1987). ‘Sexing day-old chicks’. Journal of Experimental
Psychology: Learning, Memory, and Cognition 13: 640–5.
Binford, T. (1981). ‘Inferring surfaces from images’. Artificial Intelligence 17: 205–44.
Blair, M., and Homa, D. L. (2005). ‘Integrating novel dimensions to eliminate category exceptions: when
more is less’. Journal of Experimental Psychology: Learning, Memory and Cognition 31(2): 258–71.
Briscoe, E. (2008). Shape skeletons and shape similarity. Unpublished doctoral dissertation, Rutgers
University.
Briscoe, E., and Feldman, J. (2011). ‘Conceptual complexity and the bias/variance tradeoff ’. Cognition
118: 2–16.
Cadieu, C., Kouh, M., Pasupathy, A., Connor, C. E., Riesenhuber, M., and Poggio, T. (2007). ‘A model of
V4 shape selectivity and invariance’. Journal of Neurophysiology 98: 1733–50.
Craft, E., Schutze, H., Niebur, E., and von der Heydt, R. (2007). ‘A neural model of figure-ground
organization’. Journal of Neurophysiology 97(6): 4310–26.
David, S. V., Hayden, B. Y., and Gallant, J. L. (2006). ‘Spectral receptive field properties explain shape
selectivity in area V4’. Journal of Neurophysiology 96: 3492–505.
De Baene, W., Ons, B., Wagemans, J., and Vogels, R. (2008). ‘Effects of category learning on the stimulus
selectivity of macaque inferior temporal neurons’. Learning and Memory 15: 717–27.
De Winter, J., and Wagemans, J. (2006). ‘Segmentation of object outlines into parts: A large-scale
integrative study’. Cognition 99(3): 275–325.
Elder, J. H., and Goldberg, R. M. (2002). ‘Ecological statistics of Gestalt laws for the perceptual
organization of contours’. Journal of Vision 2(4): 324–53.
Feldman, J. (1995). ‘Perceptual models of small dot clusters’. In Partitioning data sets, edited by I. J. Cox,
P. Hansen, and B. Julesz, pp. 331–357 DIMACS Series in Discrete Mathematics and Theoretical
Computer Science, vol. 19.
Feldman, J. (1997). ‘Curvilinearity, covariance, and regularity in perceptual groups’. Vision Research
37(20): 2835–48.
Feldman, J. (2000). ‘Minimization of Boolean complexity in human concept learning’. Nature 407: 630–3.
Feldman, J. (2007). ‘Formation of visual ‘objects’ in the early computation of spatial relations’. Perception
and Psychophysics 69(5): 816–27.
Probabilistic models of perceptual features 945
Feldman, J. (2009). ‘Bayes and the simplicity principle in perception’. Psychological Review 116(4): 875–87.
Feldman, J. (2012). ‘Symbolic representation of probabilistic worlds’. Cognition 123: 61–83.
Feldman, J. (this volume). ‘Bayesian models of perceptual organization’. In Oxford Handbook of Perceptual
Organization, edited by J. Wagemans. (Oxford: Oxford University Press).
Feldman, J., and Singh, M. (2005). ‘Information along contours and object boundaries’. Psychological
Review 112(1): 243–52.
Feldman, J., and Singh, M. (2006). ‘Bayesian estimation of the shape skeleton’. Proceedings of the National
Academy of Science 103(47): 18014–19.
Feldman, J., Singh, M., Briscoe, E., Froyen, V., Kim, S., and Wilder, J. D. (2013). ‘An integrated Bayesian
approach to shape representation and perceptual organization’. In Shape perception in human and
computer vision: an interdisciplinary perspective, edited by S. Dickinson and Z. Pizlo, pp 55–70.
(New York: Springer).
Field, D. J. (1987). ‘Relations between the statistics of natural images and the response properties of cortical
cells’. Journal of the Optical Society of America A 4(12): 2379–94.
Field, D. J., Hayes, A., and Hess, R. F. (1993). ‘Contour integration by the human visual system: Evidence
for a local “association field”’. Vision Research 33(2): 173–93.
Fitzpatrick, D. (2000). ‘Seeing beyond the receptive field in primary visual cortex’. Current Opinion in
Neurobiology 10: 438–43.
Geisler, W. S., and Super, B. J. (2000). ‘Perceptual organization of two-dimensional patterns’. Psychological
Review 107(4): 677–708.
Geisler, W. S., Perry, J. S., Super, B. J., and Gallogly, D. P. (2001). ‘Edge co-occurrence in natural images
predicts contour grouping performance’. Vision Research 41: 711–24.
Geisler, W. S., Najemnik, J., and Ing, A. D. (2009). ‘Optimal stimulus encoders for natural tasks’. Journal of
Vision 9(13): 1–16.
Gigerenzer, G., and Hoffrage, U. (1995). ‘How to improve Bayesian reasoning without
instruction: Frequency formats’. Psychological Review 102(4): 684–704.
Gilchrist, A. L. (1977). ‘Perceived lightness depends on perceived spatial arrangement’. Science 195: 185–87.
Goldstone, R. L., and Steyvers, M. (2001). ‘The sensitization and differentiation of dimensions during
category learning’. Journal of Experimental Psychology 130(1): 116–39.
Goldstone, R. L., Medin, D. L., and Gentner, D. (1991). ‘Relational similarity and the nonindependance of
features in similarity judgments’. Cognitive Psychology 23: 222–62.
Goodman, N. D., Tenenbaum, J. B., Feldman, J., and Griffiths, T. L. (2008). ‘A rational analysis of
rule-based concept learning’. Cognitive Science 32(1): 108–54.
Green, D. M., and Swets, J. A. (1966). Signal detection theory and psychophysics. (New York: Wiley).
Harnad, S. (1987). Categorical perception: the groundwork of cognition. (Cambridge: Cambridge
University Press).
Harrison, S., and Feldman, J. (2009). ‘Influence of shape and medial axis structure on texture perception’.
Journal of Vision 9(6): 1–21.
Hess, R. F., May, K. A., and Dumoulin, S. O. (this volume). ‘Contour integration: Psychophysical,
neurophysiological and computational perspectives’. In Oxford Handbook of Perceptual Organization,
edited by J. Wagemans. (Oxford: Oxford University Press).
Hung, C. C., Carlson, E. T., and Connor, C. E. (2012). ‘Medial axis shape coding in macaque infer-
otemporal cortex’. Neuron 74(6): 1099–113.
Jepson, A., and Richards, W. A. (1992). ‘What makes a good feature?’ In Spatial vision in humans and
robots, edited by L. Harris and M. Jenkin, pp. 89–125 (Cambridge: Cambridge University Press).
Kanizsa, G., and Gerbino, W. (1976). ‘Convexity and symmetry in figure-ground organization’. In Vision
and artifact, edited by M. Henle, pp. 25–32. (New York: Springer).
946 Feldman
Kellman, P. J., and Shipley, T. F. (1991). ‘A theory of visual interpolation in object perception’. Cognitive
Psychology 23: 141–221.
Kersten, D., Mamassian, P., and Yuille, A. (2004). ‘Object perception as Bayesian inference’. Annual Review
of Psychology 55: 271–304.
Kim, S.-H., and Feldman, J. (2009). ‘Globally inconsistent figure/ground relations induced by a negative
part’. Journal of Vision 9(10): 1–13.
Knill, D. C., and Richards, W. (Eds.). (1996). Perception as Bayesian inference. (Cambridge: Cambridge
University Press).
Koenderink, J. J. (1993). ‘What is a “feature”?’ Journal of Intelligent Systems 3(1): 49–82.
Kogo, N., and van Ee, R. (this volume). ‘Neural mechanisms of figure-ground organization: Border-
ownership, competition and perceptual switching’. In Oxford Handbook of Perceptual Organization,
edited by J. Wagemans. (Oxford: Oxford University Press).
Lee, M. D., and Navarro, D. J. (2002). ‘Extending the ALCOVE model of category learning to featural
stimulus domains’. Psychonomic Bullein and Review 9(1): 43–58.
Lescroart, M. D., and Biederman, I. (2013). ‘Cortical representation of medial axis structure’. Cerebral
Cortex, 23, 629–637. doi: 10.1093/cercor/bhs046
Lowe, D. G. (1987). ‘Three-dimensional object recognition from single two-dimensional images’. Artificial
Intelligence 31: 355–95.
Lowe, D. G. (2004). ‘Distinctive image features from scale-invariant keypoints’. International Journal of
Computer Vision 60(2): 91–110.
McLachlan, G. J., and Basford, K. E. (1988). Mixture models: inference and applications to clustering.
(New York: Marcel Dekker).
Medin, D. L., and Schaffer, M. M. (1978). ‘Context model of classification learning’. Psychological Review
85: 207–38.
Nosofsky, R. M. (1986). ‘Attention, similarity, and the identification-categorization relationship’. Journal of
Experimental Psychology: General 115(1): 39–61.
Nosofsky, R. M., Palmeri, T. J., and McKinley, S. C. (1994). ‘Rule-plus-exception model of classification
learning’. Psychological Review 101(1): 53–79.
Olshausen, B. (2003). ‘Principles of image representation in visual cortex’. In The Visual Neurosciences,
edited by L. M. Chalupa and J. S. Werner, pp. 1603–15 (Cambridge: M.I.T. Press).
Op de Beeck, H., Wagemans, J., and Vogels, R. (2001). ‘Inferotemporal neurons represent low- dimensional
configurations of parameterized shapes’. Nature Neuroscience 4(12): 1244–52.
Pasupathy, A., and Connor, C. E. (2002). ‘Population coding of shape in area V4’. Nature Neuroscience
(12): 1332–8.
Peterson, M. (this volume). ‘Low-level and high-level contributions to figure-ground organization’. In
Oxford Handbook of Perceptual Organization, edited by J. Wagemans. (Oxford: Oxford University
Press).
Pizlo, Z., Salach-Golyska, M., and Rosenfeld, A. (1997). ‘Curve detection in a noisy image’. Vision Research
37(9): 1217–41.
Poggio, G. F., and Poggio, T. (1984). ‘The analysis of stereopsis’. Annual reviews of neuroscience 7: 379–412.
Pomerantz, J. R., and Pristach, E. A. (1989). ‘Emergent features, attention, and perceptual glue in visual
form perception’. Journal of Experimental Psychology: Human Perception and Performance 15(4): 635–49.
Posner, M. I., and Keele, S. W. (1968). ‘On the genesis of abstract ideas’. Journal of Experimental Psychology
77(3): 353–63.
Reed, S. K. (1972). ‘Pattern recognition and categorization’. Cognitive Psychology 3: 382–407.
Ren, X., Fowlkes, C. C., and Malik, J. (2008). ‘Learning probabilistic models for contour completion in
natural images’. International Journal of Computer Vision 77: 47–63.
Probabilistic models of perceptual features 947
Richards, W. A., and Bobick, A. (1988). ‘Playing twenty questions with nature’. In Computational processes
in human vision: An interdisciplinary perspective, edited by Z. Pylyshyn, pp. 3–26 (Norwood, NJ: Ablex
Publishing Corporation).
Rieke, F., Warland, D., de Ruyter van Steveninck, R., and Bialek, W. (1996). Spikes: exploring the neural
code. (Cambridge: M.I.T. Press).
Riesenhuber, M., and Poggio, T. (1999). ‘Hierarchical models of object recognition in cortex’. Nature
Neuroscience 2: 1019–25.
Rosch, E. H. (1973). ‘Natural categories’. Cognitive Psychology 4: 328–50.
Rosenholtz, R. (this volume). ‘Texture perception’. In Oxford Handbook of Perceptual Organization, edited
by J. Wagemans. (Oxford: Oxford University Press).
Sanocki, T. (1999). ‘Constructing structural descriptions’. Visual Cognition 6(3/4): 299–318.
Schyns, P. G., Goldstone, R. L., and Thibaut, J.-P. (1998). ‘The development of features in object concepts’.
Behavioral and brain Sciences 21: 1–54.
Shepard, R. N. (1994). ‘Perceptual-cognitive universals as reflections of the world’. Psychonomic Bulletin and
Review 1(1): 2–28.
Singh, M., and Hoffman, D. D. (2001). ‘Part-based representations of visual shape and implications
for visual cognition’. In From fragments to objects: segmentation and grouping in vision, advances in
psychology, edited by T. Shipley and P. Kellman, vol. 130, pp. 401–59. (New York: Elsevier).
Smith, E., and Medin, D. (1981). Categories and concepts. (Cambridge, MA: Harvard University Press).
Stilp, C. E., Rogers, T. T., and Kluender, K. R. (2010). ‘Rapid efficient coding of correlated complex acoustic
properties’. Proceedings of the National Academy of Sciences 107(50): 21914–19.
Treisman, A., and Gelade, G. (1980). ‘A feature-integration theory of attention’. Cognitive Psychology
12: 97–136.
Treisman, A., and Paterson, R. (1984). ‘Emergent features, attention, and object perception’. Journal of
Experimental Psychology: Human Perception and Performance 10(1): 12–31.
Ullman, S. (1979). The Interpretation of Visual Motion. (Cambridge, MA: M.I.T. Press).
Ullman, S., Vidal-Naquet, M., and Sali, E. (2002). ‘Visual features of intermediate complexity and their use
in classification’. Nature neuroscience 5(7): 682–7.
Uttal, W. R., Bunnell, L. M., and Corwin, S. (1970). ‘On the detectability of straight lines in visual noise: an
extension of French’s paradigm into the millisecond domain’. Perception and Psychophysics 8(6) 385–8.
Vogels, R., Biederman, I., Bar, M., and Lorincz, A. (2001). ‘Inferior temporal neurons show greater
sensitivity to nonaccidental than to metric shape differences’. Journal of Cognitive Neuroscience
13(4): 444–53.
Wagemans, J. (1992). ‘Perceptual use of non-accidental properties’. Canadian Journal of Psychology
46(2): 236–79.
Wagemans, J. (1993). ‘Skewed symmetry: a nonaccidental property used to perceive visual forms’. Journal of
Experimental Psychology: Human Perception and Performance 19(2): 364–80.
Wagemans, J., van Gool, L., Swinnen, V., and van Horebeek, J. (1993). ‘Higher-order structure in regularity
detection’. Vision Research 33(8): 1067–88.
Wilder, J., Feldman, J., and Singh, M. (2011). ‘Superordinate shape classification using natural shape
statistics’. Cognition 119: 325–40.
Yamane, Y., Carlson, E. T., Bowman, K. C., Wang, Z., and Connor, C. E. (2008). ‘A neural code
for three-dimensional object shape in macaque inferotemporal cortex’. Nature Neuroscience
11(11): 1352–60.
Yang, T., and Shadlen, M. N. (2007). ‘Probabilistic reasoning by neurons’. Nature 447: 1075–82.
Zhang, N., and von der Heydt, R. (2010). ‘Analysis of the context integration mechanisms underlying
figure-ground organization in the visual cortex’. Journal of Neuroscience 30(19): 6482–96.
Chapter 46
Introduction
A major historical event transpired in 2012, marking the centennial anniversary of the year in
which Wertheimer published his famous monograph, ‘Experimental Studies of the Perception
of Movement’. Many published reviews of progress, experimental and theoretical studies, and
stock-taking essays marked this signal year. Over the intervening century there has been
inspiring growth in the corpus of data related to Gestalt phenomena and in suggestions as to
operational definitions of holism.
The very existence of the present volume on perceptual organization is a testament to the
importance and new vitality of many interlocked themes within this fold. Especially recom-
mended for readers of this chapter are collateral chapters by Bertamini and Casati, Feldman,
Kimchi, Pomerantz and Cragin, Behrmann, and van Leeuwen.
With certain exceptions, it seems fair to make the following observations about this body of
work: first, there is a noticeable absence of a generally accepted, unified theory of Gestalt phe-
nomena. Second, aside from a few quite specific models of performance in some particular
sphere, rigorous definitions and quantitative models are scarce. Third, in the realm of quantitative
dynamic information-processing characteristics, definitions, proposed explanations, and deriva-
tions regarding concepts of holistic vs non-holistic objects are rare if extant at all. Our focus is on
the third of these.
Our primary goal is the establishment of a mathematical language within which the prop-
erties of strategic concepts that describe and purport to distinguish configural as opposed to
non-configural perception can be elucidated. A secondary goal is to propose what seem to
be reasonable specifications, within this language, of configural vs non-configural percep-
tion. The first goal is theoretically noncommittal, and should be relatively uncontroversial. The
second amounts to stating hypotheses (we call them ‘working axioms’) about how configural
vs non-configural processing may take place. However, it is important to point out that this
approach in no way pretends to be a computational model of configural perception. Rather,
it should be viewed as a meta-theoretical set of methodologies that are capable of assessing
a number of critical mechanisms associated with configural vs non-configural perception,
and hypotheses about them. As such, their application should aid in guiding the construc-
tion of principled, parameterized, computational models of configural and non-configural
perception.
On the Dynamic Perceptual Characteristics of Gestalten 949
With a nod to linguistic refinement, we will follow German usage in using leading capitals when Gestalt
1
appears in noun form but lower case when employed adjectively. Also, Gestalten with the added en will follow
standard German to indicate the plural.
950 Townsend and Wenger
Stopping rule
Self-terminating/ Exhaustive/
Architecture Minimum time Maximum time
g
g e sin e
sin
ces ops Tim ces ops Tim
Pro Pro st
st
g
sin
ces arts g
Pro st sin
ces arts
Pro st
Serial B
B
A
A
e g
Tim sin e
sin
g ces ops Tim
ces s Pro st
Pro stop
g
sin
ces arts sin
g
Pro st ces arts
Pro st
Parallel A A
B B
Fig. 46.1 Schematic representation of the critical distinctions with respect to processing architecture
and stopping-rule. In these examples, two processes (A and B) execute either serially (sequentially)
or in parallel. Once begun, processing continues until either the first (or fastest) or last (or slowest)
process completes.
processing is defined by a set of discrete items or subsystems (e.g. channels) being worked on
simultaneously. In Figure 46.1, this can be understood in terms of the temporal arrangement of
the two processes (A and B). Formally, this distinction is captured in the form of the probability
distribution for overall finishing time (the externally observable reaction time, in the terms of an
experiment), which is composed from the probability distributions on the (usually unobservable)
finishing times of the two internal processes. General forms for the four possibilities considered in
Figure 46.1 can be found in Appendix A of Townsend and Nozawa (1995, pp. 351–354).
Of course, there are many kinds of architectures other than serial and parallel, although these
have received the bulk of the attention of the cognitive community. For instance, hybrid models
could be a mixture of serial and parallel models, or more complex network models of which parallel
and serial networks comprise a special case (Schweickert 1978; Schweickert and Townsend 1989).
Another important type of system is constituted by a sequence of processes but with overlap of the
processing times, unlike true serial processing (e.g. Taylor 1976). When the next stage can start at the
same time as the previous one, we have the concept of continuous flow (e.g. Ashby 1982; McClelland
1979; Schweickert and Mounts 1998). These models are of great value, but they currently lie outside
the scope of methodologies that can test them against ordinary parallel or serial systems.
Not quite so paramount is the notion of the decisional stopping rule, or ‘stopping rule’ for
short. Suppose, as in many experiments and real-life situations, that a subset of features is suf-
ficient to make a correct response. In that case, a reasonable question is whether all the features
On the Dynamic Perceptual Characteristics of Gestalten 951
are processed even if they need not be. In the psychological literature, there are three cases of
interest:
1 Exhaustive or maximum-time processing. All aspects (e.g. features) are processed. In the case of
two elements, this can be represented by the Boolean AND operator.
2 Race or minimum-time processing. Processing ceases as soon as a single aspect is processed. In
the case of two elements, this can be represented by the Boolean OR operator.
3 Single-target self-termination. There is only one aspect in an object that is capable of
determining the correct response, and the system stops when and only when that aspect is
completed.
Since we typically think of a Gestalt as being a total unity, one axiom or part of a definition of
Gestalt processing might be that a Gestalt is perceived as a unity, which would imply exhaustive
processing of all features, even though a correct decision could be made on the basis of only a
subset of the features.
Finally, the concept of workload capacity turns out to be pivotal in our working definition of
Gestalt processing. This issue concerns how increasing workload—for instance, objects or faces
which are made up of fewer or a larger number of aspects—affects processing efficiency. A tra-
ditional approach might be to use mean Reaction Time (RT). However, we have developed an
instrument which takes into account the entire distribution of RTs in greater vs lesser workload
conditions (Townsend and Nozawa 1995; Townsend and Wenger 2004b; Wenger et al. 2010). As
will become increasingly apparent throughout this chapter, capacity will serve as a prime gauge of
configural superiority, introduced above as a potential marker of Gestalt perception.
The workload capacity yardstick consists of the predictions of a standard parallel model. This
model assumes parallel processing, stochastic independence, and unlimited capacity. It will prove
highly useful in our assembly of a yardstick for measuring capacity in arbitrary systems. The
capacity statistic C(t) measures the speed of channels acting together in comparison with the
speed predicted by the standard parallel model.
Stochastic independence implies that each channel’s processing time is independent of all oth-
ers. Unlimited capacity stipulates that the marginal processing time distribution of each channel is
invariant across any changes in workload. Informally, unlimited capacity implies that the average
processing time of any channel is unaffected by the overall workload on the system. It is critical
to observe that in processing a finite number of items, the decisional stopping rule will affect the
overall decision time. For instance, minimum-time (OR) processing requires that only a single
item be finished. On the other hand, maximum-time (AND) processing demands that all items
be completed. Therefore, we must derive capacity measures that take the appropriate stopping
rule into account. This can be accomplished for any logical stopping rule (e.g. find the one target
among five distractors), but we will focus on the most commonly studied in the literature so far
and these are the OR and AND decision rules.
If at any point in time, C(t) = 1, the system is said to be of unlimited capacity at that time
point. Overall, the system is acting just as efficiently as the standard parallel model but not more
efficiently. If at time t, C(t) > 1, we call the system super-capacity. In super-capacity systems, the
individual channels are running faster than when they were working alone. Finally, if C(t) < 1 at
time t, the system is said to be of limited capacity at that time point.
Thus, the bound separating super-capacity or limited capacity is simply C(t) = 1. In addition,
it can be seen that C(t) permits observation and predictions of workload capacity over an entire
range of time. For instance, we have observed that in some tasks, people can be super-capacity
early on, but reveal limited capacity later in time. In contrast, most modern conceptions take
capacity as a non-dynamic, single number.
952 Townsend and Wenger
circumstances where Gestalt organization can proceed in a more or less sequential manner; see,
e.g. Roelfsema and Houtkamp 2011).
Although parallel processing is an obvious choice with regard to the architecture associated
with configural processing, a question immediately arises as to the stochastic independence of the
parallel channels. For instance, the classic parallel race model assumes stochastically independ-
ent parallel channels (e.g. Egeth 1966; Smith 2000; Townsend 1974). Furthermore, the channels
could actually prove to be negatively (i.e. mutually inhibitory) interactive, which seems far from
the sense of configurality. Hence, we posit that in many tasks, a positive interaction will lead to
workload capacity results that are super-capacity. Parallel models possessing facilitatory chan-
nels readily produce super-capacity while mutually inhibitory channels evoke limited capacity
(e.g. Egeth 1966; Smith 2000; Townsend 1974).
Super-capacity processing obviously exceeds standard parallel processing in efficiency and is a
palpable example of configural superiority, as intimated in Working Axiom 1.3. The triad of par-
allelism, positive interactions, and super-capacity seems to be compatible with certain stochastic
versions of Hebbian learning. Thus, a stochastic Hebbian model advanced in a dissertation by
Blaha (2010) captures many aspects of a dramatic improvement of performance by observers in a
configural learning experiment.
The intent of Working Axiom 1.4 is to capture the oft-heard claim that ‘holistic face percep-
tion is obligatory’, and presumably this admonition might also refer to any Gestalt (although
see, for example, Plomp and van Leeuwen 2006; Stins and van Leeuwen 1993; van Leeuwen and
Lachmann 2004, also see Behrmann, Richler, Avidan, & Kimchi, this volume; Koenderink on
Gestalt templates, this volume). Although there may be more than one meaning to this statement,
at least one appears to be that if one part of a face is gazed at, all parts are perceived. Moreover,
it is also motivated by the notion of a Gestalt existing as a unity. If a unity is processed, no part
should be omitted.
Working Axiom 2 supplements the original list of Wenger and Townsend (2001; see also
O’Toole et al., 2001), since the latter focused on configurality superiority. To encompass phenom-
ena associated with configural inferiority, more facets are needed.
risk of oversimplification, Garner’s major operational specifications can be divided into two
major types:
1 Integrality of dimensions can hurt performance when the task involves attention to one
dimension and other dimensions, with which the attended one is integral, are present and
varied (usually more or less randomly) in the trial-to-trial presentations. This operationalism
eventuates in Garner filtering tasks and, if inferior performance does erupt, the phenomenon
of Garner interference.
2 Integrality of dimensions can help performance if perception of any of two or more dimensions
is redundant with regard to specifying the correct response.
In carrying out either 1 or 2, it can make a difference as to whether, say, the studied dimension or
item is, in a control condition, accompanied by nothing else (e.g. a blank), or whether a neutral
distractor is used. In any case, it is clear that in 1 having the full Gestalt present, when the observer
is supposed to focus on only one of the dimensions, may be deleterious.
This phenomenon is clearly a type of configural inferiority. The assistance provided by the pres-
ence of the Gestalt-interactive pair (or more) of redundant dimensions (as opposed to their sim-
ple additivity) is a kind of configural superiority. However, the latter term constitutes a very broad
spectrum of potential mechanisms and empirical consequences as opposed to the narrower focus
of a redundant targets effect.
Yet we do need to observe that while both of these Garnerian concepts intuitively capture
themes of Gestalten, they are by no means logically related to one another. Experiments could
logically find any combination of outcomes regarding them. Likewise, qualitative and quantita-
tive models of perception could well predict any particular combination of them. Of course, they
could be linked in any particular system.
From this standpoint, we learn that point 2---the redundancy facilitation effect---must be
mildly modified to be theoretically sound. Thus, in the case of accuracy, even when the dimen-
sions are stochastically independent, their redundancy leads to performance superior to a single
dimension by itself (a prediction known as probability summation). A completely analogous pre-
diction appears with RTs in the sense that independent redundant dimensions predict superiority
(i.e. faster RTs or improved accuracy) over single dimensions at least in the presence of parallel
processing. Thus, redundant superiority per se need not be associated with integrality or any par-
ticular form of Gestalt behaviour. We can view this state of affairs through our workload capacity
statistic. As a prime example using RTs within a redundant target design, assume for simplicity
that both of the two channels operate equally quickly. Then, if C(t) > 1/2, a redundancy gain will
occur (performance will exceed that of either of the channels stimulated alone). When C(t) = 1,
the standard parallel model prediction, the benefits of redundancy are reasonably dramatic.
Accordingly, a straightforward tactic to strengthen the Garner redundancy concept to rule out
the increase in speed due to redundancy alone in non-configural systems is to inspect data to see
if performance exceeds that expected from such systems. This concept, of performance contrasted
with what can be predicted from ordinary parallel processing has some history (e.g. Raab 1962;
Townsend and Nozawa 1995). However, historically, it took some time for notions such as co-
activation (e.g. Colonius and Townsend 1997; Miller 1982) and super-capacity (e.g. Townsend
and Nozawa 1995; Townsend and Wenger 2004b) to develop.
Now recall that the typical Garner filtering or interference experiment assays performance on
a single target, within both the control, fixed distractor dimension as well as the experimental
varying-value distractor dimension. There is no way to use any kind of redundancy to improve
performance. However, just as in the case of superiority, the causal mechanism for interference
On the Dynamic Perceptual Characteristics of Gestalten 955
could exist at one or more of several levels, from a relatively low perceptual echelon to a higher
order attentional level.
σ2A ρσ
i AσB
Σi =
i AσB
ρσ σ B
2
To make the theoretical characterization complete, we need only add decision bounds, to ‘carve
up’ the representational space into response regions. For simplicity only, we will assume that
these decision bounds are continuous and linear, though more complex types can be easily
accommodated (e.g. Maddox and Bogdanov 2000; Maddox 2001; Maddox and Bohil 2003).
With these as the elements of our theoretical language, we can develop theory-based characteri-
zations of any given hypothesized Gestalten that allow for immediate predictions for observable
behaviour.
We begin with the natural ‘null hypothesis’ for a percept that is neither a Gestalt, in which parts
interact positively, nor one whose parts interact negatively: complete independence and separabil-
ity everywhere. We now define the pertinent concepts more formally.
A first possibility for a Gestalt is one in which the integrality exists in the manner in which a
response decision is made. This type of Gestalten can be represented by allowing the decision
bounds to vary in their location across the levels of one or both of the dimensions, and is referred
to as a violation of decisional separability (DS). A second possibility is one in which the perceptual
distributions change, in their location, variability, or both, as a function of the level of each of the
two dimensions. This is referred to as a violation of perceptual separability (PS). Each of these two
possibilities is a type of Gestalten that is defined across stimuli.
The third possibility is one that is defined within stimuli and is thus closest to the vernacular con-
ception of a Gestalt (see O’Toole et al. 2001). In this possibility, the ‘amount’ of perceptual evidence
for one of the dimensions reliably co-varies in some way with the ‘amount’ of perceptual evidence
On the Dynamic Perceptual Characteristics of Gestalten 957
for the other dimension. One way to represent this in our simple Gaussian example is to allow any
or all of the ρi to be non-zero. This is referred to as a violation of perceptual independence (PI).
The experimental methodology that follows from the theoretical requirements of GRT is
known as the complete identification paradigm, and the experimental design implemented in this
paradigm is the feature-complete factorial design. In this design, each level of each dimension is
presented with equal frequency, and the observer is required to give a response (or sequence of
responses) that provides explicit evidence of the observer’s perceptual and decisional state with
respect to each dimension on each trial. The paradigm and design are flexible enough to address
for configural superiority and configural inferiority effects.
Within the assumptions of this paradigm and design, we can add information from GRT to our
working axioms:
Upr
Upr
Inv
Inv
Internal feature Facial surround
(b) (c)
Internal feature
Internal feature
Upr Upr
Inv Inv
(d) (e)
Internal feature
Internal feature
Upr Upr
Inv Inv
Fig. 46.2 Example GRT representations of the hypothetical sources of configurality in the Thatcher
illusion: (a) Bivariate distributions of perceptual evidence given stimuli in which the facial surround
and the internal features could are presented either upright (Upr) or inverted (Inv). The vertical planes
(outlined in red) represent the decision bounds which divide the representational space into four
response regions. (b) Contours of equal likelihood given preservation of PI, PS, and DS. (c) Contours of
equal likelihood for the situation in which PI is violated in upright but not inverted stimuli. (d) Contours
of equal likelihood for the situation in which PS is violated for the upright but not inverted stimuli.
(e) Contours of equal likelihood for the situation in which PI and PS are preserved and DS is violated.
On the Dynamic Perceptual Characteristics of Gestalten 959
In each of these three examples, the variations in the stimulus change the pattern of behav-
ioural responses that are predicted. In each case, there is the potential for the Thatcher
manipulation---inversion of the internal features relative to the facial surround---to be best
detected when the facial surround is upright rather than inverted. This would be the behav-
ioural ‘signature’ of the Thatcher illusion as a Gestalt effect. However, a critical point to note
here is that only one of the hypotheses just considered applies to the perception of an indi-
vidual stimulus on a within-trial basis, and that is the violation of PI. Violations of either PS or
DS pertain to the perception of sets of stimuli. This raises an interesting ‘disconnect’ between
the general state of theorizing (or, more accurately operationalizing) about Gestalten and the
experimental methods that are typically used to assess the presence or absence of Gestalt states.
In general, the vernacular conception of Gestalten within the scientific community is most
consistent with a violation of PI. That is, the Gestalt state is assumed to exist for the observer
within the perception of an individual stimulus (see Cornes et al. 2011 for a discussion specific
to the Thatcher illusion). Unfortunately, the overwhelming majority of experimental studies that
have probed Gestalt perception have used tasks (including tasks used in the Garnerian approach)
in which it is possible to glean information about the observer’s state with respect to only one of
the stimulus dimensions on each trial. Thus, these tasks cannot provide the data needed to assess
potential violations of PI, meaning that it becomes difficult if not impossible to connect the exper-
imental evidence with the theoretical construct at the level at which investigators are postulating
the Gestalt state. The exception are studies that implement the feature-complete factorial design
and use a complete identification response task. We will have more to say about data from these
designs in the final section of this chapter
to identify architecture, stopping rules and, less directly, channel interactions across the studied
configural features. Finally, we also used not only new facial contexts but also feature-alone, with-
out any facial context at all.
First, in both the OR as well as the AND conditions, observers were faster in the familiar face
stimuli than with the new face or features alone situation. Next, in the OR conditions, all observ-
ers indicated strong parallel processing along with a ‘stop as soon as the first target feature is
completed’ (i.e. minimum time) stopping rule, both in the familiar face context as well as the new
face context. However, some observers proved to be serial, minimum time, although only in the
features-alone conditions. The combination of ordinary parallel or serial processing, for instance,
not co-active or parallel interactive, provides strong support for analytic processing even though
the learned contexts aided efficiency.
In contrast, within the AND experiment and when presented with familiar faces, observers
appeared to mix an ordinary exhaustive (note: more holistic!) parallel processing strategy with a
decided tendency towards facilitatory interactive channels (see, e.g., Eidels et al. 2011). There was
also some interaction present in the new face and features-alone conditions though not much.
Analysis of the learning phases of the experiment also support this account.
Overall, these results point to a graded notion of Gestalt perception, namely that significant
parallel interactions can appear under certain circumstances, such as when exhaustive processing
of facial features is obligatory. However, when experimental conditions afford the opportunity
to be analytic and stop as soon as sufficient information is accrued to make a correct response,
observers will do so. Even when interactive parallel processing is found, the parts do not reveal a
perfect correlation (i.e. starting and finishing at the same moments, indicating the whole is pro-
cessed as a complete unit). Supplementing the above précis with other studies in the literature we
summarize the provisional findings through SFT. Theoretical and empirical results accrued over
the past fifteen years or so have thoroughly verified the parallel nature of within-object feature and
dimensional perception. In a number of experiments with well-organized figures like faces, a type
of parallel processing called co-activation has been discovered. Co-activation entails summation
across channels or possibly positive channel interaction.
Interestingly, even objects such as realistic faces, which are prime candidates for Gestalten, do
not inevitably evoke super-capacity perception. Sometimes even moderately limited capacity is
found in such circumstance, especially if early termination (i.e. non-exhaustive processing) of
features is allowed. On the other hand, when a task calls for processing of all the featural informa-
tion contained in Gestalt items (exhaustive processing), the investigator tends to witness higher
degrees of super-capacity. Moreover, when people learn to glue together meaningless features into
patterns, again within tasks which demand exhaustive featural processing of the target category,
rather extraordinary magnitudes of super-capacity are witnessed, implying efficiency far exceed-
ing ordinary parallel processing (as per Blaha and Townsend 2004).
The most recent applications of GRT to questions of configurality have come in the context of
studies of the perception of and memory for faces, although we should also note that we have done
the same with respect to perceptual organization of hierarchical forms (Copeland and Wenger
2006). Specifically, we have applied the constructs and methods of GRT to the holistic encoding
hypothesis (Wenger and Ingvalson 2002, 2003), the composite face effect (Richler et al. 2008), the
Thatcher illusion (Cornes et al. 2011), and face inversion (Mestry et al. 2012). An intriguing regu-
larity from these studies is the consistent lack of evidence (or at best weak evidence) for violations
of PI. Instead, these studies have revealed that the empirical regularities that are commonly taken
as the ‘signatures’ of Gestalten do not produce compelling evidence for the state—a violation of
PI—that is most consistent with the vernacular conception of Gestalten.
One intriguing possibility here is that the non-parametric quantitative methods that have to
date been the most widely used methods for supporting inferences regarding PI, PS, and DS may
actually be overly conservative with respect to detecting violations of PI. This observation has
come from ongoing work by Menneer and colleagues (e.g. Menneer et al. 2009; Menneer, Blaha,
and Wenger 2012) examining alternative statistical methods for supporting inferences regarding
PI, PS, and PS. One particular aspect of this work is the evaluation of probit regression models to
GRT data, as first suggested by DeCarlo (e.g. 2003). Preliminary results suggest that probit models
are capable of detecting true violations of PI that can be missed by other methods. The following
paragraphs attempt to encapsulate the recent contributions arrived at through GRT.
Perceptual independence
Recall that perceptual independence (PI) is defined as the stochastic independence occurring on
a within-trial basis among features or dimensions. We have previously suggested that, in a sense,
violations of PI could be considered the strongest type of non-independence possibly indicative
of Gestalt perception. It has not often been detected in our data, even for respectable Gestalten. It
is not clear why this is the case, as featural inter-channel dependencies, for example in a Hebbian
sense, stand as one of the most natural ways to bring about configural superiority. In addition,
cross-channel interactions provide the best explanation in a number of response-time experi-
ments where configural superiority effects are found (a few of which are Eidels, Townsend, and
Pomerantz 2008; Fific and Townsend 2010; Townsend and Houpt 2012; Eidels et al. 2008).
Perceptual separability
Violations of perceptual separability (PS) occur when a change on one feature, across trials,
for example, causes perceptual effects on a distinct feature. Although violations of PS could be
brought about through a failure of perceptual independence, dynamic systems have been devel-
oped which evince non-separability even though perceptual independence is intact. Perceptual
non-separability in the form of what Garner called integrality has been found with Gestalten more
frequently than positive perceptual dependencies, but less often than decisional non-separability
of a type that would be associated with Gestalt-like decision making.
Decisional separability
Intriguingly, when viewing Gestalten such as realistic faces, a failure of decisional separability
(DS) has been experimentally diagnosed more frequently than either of the other two types of
‘independence’. Investigators working in the area of visual object perception have sometimes
recoiled from these findings apparently because it is felt that a decisional influence is not suf-
ficiently perceptual. Our view is that such influences are also perceptual. For instance, when, as
we have sometimes discovered with Gestalten (e.g. faces), decisional criteria apparently tend to be
962 Townsend and Wenger
lowered or raised on the constituent features together, is this not a perceptual effect? For example,
in a recent GRT study of facial race-feature perception and adaptation, it was discovered that
adaptation to racial physiognomy or skin tine led to dramatic alterations in both perceptual sepa-
rability as well as decisional criteria (Blaha, Silbert, and Townsend 2011).
Wenger data. Further research on this issue is called for. There were other less critical findings
that have to be neglected here.
In contrast, Amishav and Kimchi (2010) used the Garner interference (therefore, configural
inferiority) design to investigate this issue. In contrast to the Ingvalson and Wenger (2005) find-
ings, they determined processing to be highly integral (i.e. non-independent and non-separable),
possibly indicating strong cross-talk across the two types of informational channels. It is logically,
mathematically, and scientifically possible that in attentional sharing (or divided attention) exper-
iments, relative independence or even positive facilitation might be found, but that in a configural
inferiority design, attention cannot be confined to a single source without a cost. Although this is
not the place for a detailed review of the literature, we suggest that any such literature evaluation
should first parse the studies into the types of methodology used. If there is sufficient regularity
after that, perhaps general inference drawing can advance.
Our approach is, like that of Garner and colleagues (see Pomerantz and Cragin, this volume)
oriented toward an information processing perspective. However, it seems clear that ultimately,
topology and geometry must be brought into the picture (see Bertamini and Casati, this volume
for a related discussion). A very brief overview of these topics is now in order. First we need
quickly to note that topology is the branch of mathematics where qualitative, but not quantitative,
relationships among points matter. In fact, any deformation of an object which does not tear it is
a perfectly good topological transformation. The legendary statement that ‘topologists are defined
by the characteristic that they can’t tell the difference between a tea cup and a doughnut’ is due to
this aspect of topology. Geometry, on the other hand, is devoted to the study of shape, size, rela-
tive position of figures, and certain quantifiable properties of space. In general, geometries assume
that a distance between points and things like angles exist—properties that are meaningless in
topology. Euclidean geometry can be characterized in a number of ways, but the presence of the
famous Euclidean metric in which the distance between points A and B in an n-dimensional
space is
∑ (B - A )
2
i i
i =1
is the best-known property. It took centuries for mathematicians to discover the existence of
non-Euclidean geometries. Considerable effort has been devoted by psychometricians and math-
ematical psychologist to investigate at lease some non-Euclidean geometries in the context of
human perception (e.g. Shepard 1964).
Chen (2005) has discussed the relationship of certain topological notions, such as the presence
of holes, to Gestalt perception. Eidels and colleagues (2008) showed how similarity concepts asso-
ciated with Chen’s efforts could be merged with systems factorial technology in studying Gestalt
processing.
Though quantitatively rigorous, our approach is at a substantially more macroscopic level than
those which attempt to capture neuro-anatomical structure and process. One apposite example is
the feed-forward model provided by Poggio and colleagues (e.g. Riesenhuber and Poggio 1999;
Serre, Oliva, and Poggio 2007). This model rests on a hierarchical ascending network of computa-
tions based on summation and max-rule decisions, which capture some of the elemental increas-
ing invariance of feature processing in the afferent, ventral pathways. It is unknown whether such
models could be extended to make predictions corresponding to the relatively larger-scale aspects
treated here but it would seem valuable to do so.
964 Townsend and Wenger
Another contender with regard to object vs face perception is defined by Biederman and his
colleagues. For instance, Biederman and Kalocsai (1997) introduced a theory based on earlier
ideas stemming from von der Marlsburg’s laboratory. The key elements envision an early layer of
hypercolumn pattern of representation for objects as well as faces. Subsequently, several types of
relational variables are instituted among the parts (typically Biederman’s geons; see, e.g., Hummel
and Biederman 1992) that permit discrimination and generalization among objects. However, the
system associated with face perception is strikingly different and contains two sub-tracks. One
of these tracks preserves spatial relationships and stores the information in hypercolumn—like
lattices which can later be matched against probe stimuli. These lattices are permitted to undergo
a certain degree of distortion to maximize closeness of match. In addition, a second track centres
each column of filters on a particular facial feature. The latter apparently allows selectivity of the
input into a holistic representation, thus avoiding such artefacts as unrelated object occlusion.
This bipartite structure is able to encompass a number of phenomena associated with face and
(and vs) object perception, including certain configural properties in face processing. Although
inspired by visual neurophysiology, much of the data guiding this as well as the Poggio and com-
pany model are behavioural in nature. Thus, it does not seem too outlandish to suggest that exten-
sions or special analyses might engender predictions concerning the architecture (presumably
heavily parallel, though with sequential hierarchies), workload capacity, stopping rule, and inde-
pendence, for example, of various types of parts (e.g. geons).
One of the most prominent and exciting developments, with respect to the focus of this chap-
ter, must be the theoretical unification of SFT and GRT. This effort has begun on several fronts.
For instance, we have recently formulated a new mathematical workload capacity function
which bonds information based on RTs (part of the SFT toolbox) with that assessing accuracy
(Townsend and Altieri 2012). However, this new statistic has not yet been employed in the study
of Gestalt perception. Similarly, Townsend, Houpt, and Silbert (2012) offer an extended GRT
which includes parallel architectures and permits a strengthened methodology based both on
RT as well as accuracy. Nonetheless, the RT-based methodologies which afford identification of
architecture (e.g. serial vs parallel processing; Townsend and Wenger 2004a) have not yet been
unified with GRT and accuracy in general.
Finally, theoretical work on the applied mathematics associated with model analysis and prob-
ing of failures of the different types of dependence is proceeding at a lively pace, both on GRT as
well as SFT. It could turn out that, say, perceptual independence may be more subject to Type II
errors than the other two types of independence. Only further theoretical and experimental prob-
ing will tell the tale. We think the next decade or so should see a growing comprehension of the
underpinning process machinery which handles Gestalt perception.
References
Amishav, R. and R. Kimchi (2010). ‘Perceptual Integrality of Componential and Configural Information in
Faces’. Psychonomic Bulletin & Review 17(5): 743–748.
Ashby, F. G. (1982). ‘Deriving Exact Predictions from the Cascade Model’. Psychological Review
89: 599–607.
Ashby, F. G. and Townsend, J. T. (1986). ‘Varieties of Perceptual Independence’. Psychological Review
93: 154–179.
Ashby, F. G. and W. W. Lee (1991). ‘Predicting Similarity and Categorization from Identification’. Journal of
Experimental Psychology: General 120: 150–172.
Ashby, F. G. and W. W. Lee (1993). ‘Perceptual Variability as a Fundamental Axiom of Perceptual Science’.
In Foundations of Perceptual Theory, edited by S. C. Masin, pp. 369–399. Amsterdam: Elsevier.
On the Dynamic Perceptual Characteristics of Gestalten 965
Ashby, F. G. and W. T. Maddox (1993). ‘Relations between Prototype, Exemplar, and Decision Bound
Models of Categorization’. Journal of Mathematical Psychology 37: 372–400.
Ashby, F. G., G. Boynton, and W. W. Lee (1994). ‘Categorization Response Time with Multidimensional
Stimuli’. Perception & Psychophysics 55: 11–27.
Ashby, F. G. and W. T. Maddox (1994). ‘A Response Time Theory of Separability and Integrality in Speeded
Classification’. Journal of Mathematical Psychology 38: 423–466.
Ashby, F. G., E. M. Waldron, W. W. Lee, and A. Berkman (2001). ‘Suboptimality in Human Categorization
and Identification’. Journal of Experimental Psychology: General 130: 77–96.
Biederman, I. and P. Kalocsai (1997). ‘Neurocomputational Bases of Object and Face Recognition’.
Philosophical Transactions of the Royal Society London: Biological Sciences 352: 1203–1219.
Blaha, L. M. and J. T. Townsend (2004). ‘From Nonsense to Gestalt: The Influence of Configural Learning
on Processing Capacity’. Paper presented at the 2004 Meeting of the Society for Mathematical
Psychology, July, Ann Arbor, MI.
Blaha, L. M. (2010). ‘A Dynamic Hebbian-style Model of Configural Learning’. Dissertation submitted
in partial fulfilment of the requirements for the degree doctor of philosophy, Indiana University,
Bloomington.
Blaha, L. M., N. Silbert, and J. T. Townsend (2011). ‘A General Recognition Theory of Race Gestalten 28
Adaptation’. Paper presented at the annual meeting of the Vision Sciences Society, May, Naples, FL.
Chen, L. (2005). ‘The Topological Approach to Perceptual Organization’. Visual Cognition 12: 553–637.
Colonius, H. and J. T. Townsend (1997). ‘Activation-state Representation of Models for the
Redundant-signals-effect’. In Choice, Decision, and Measurement: Essays in Honor of R. Duncan Luce,
edited by A. A. J. Marley, pp. 245–254. Hillsdale, NJ: Erlbaum.
Copeland, A. M. and M. J. Wenger (2006). ‘An Investigation of Perceptual and Decisional Influences on the
Perception of Hierarchical Forms’. Perception 35: 511–529.
Cornes, K., N. Donnelly, H. Godwin, and M. J. Wenger (2011). ‘Perceptual and Decisional Factors
Affecting the Detection of the Thatcher Illusion’. Journal of Experimental Psychology: Human Perception
and Performance 37: 645–668.
DeCarlo, L. T. (2003). ‘Using the Plum Procedure of SPSS to Test Unequal Variance and Generalized Signal
Detection Models’. Behavior Research Methods, Instruments, and Computers 35: 49–56.
Egeth, H. (1966). ‘Parallel versus Serial Processes in Multidimensional Stimulus Discrimination’. Perception
and Psychophysics 1: 245–252.
Eidels, A., J. T. Townsend, and J. R. Pomerantz (2008). ‘Where Similarity Beats Redundancy: The
Importance of Context, Higher Order Similarity, and Response Assignment’. Journal of Experimental
Psychology: Human Perception and Performance 34(6): 1441–1463.
Eidels, A., J. W. Houpt, N. Altieri, L. Pei, and J. T. Townsend (2011). ‘Nice Guys Nish Fast And Bad Guys
Nish Last: Facilitatory vs Inhibitory Interaction in Parallel Systems’. Journal of Mathematical Psychology
55: 176–190.
Fific, M., R. M. Nosofsky, and J. T. Townsend (2008). ‘Information-processing Architectures in
Multidimensional Classification: A Validation Test of the Systems Factorial Technology’. Journal of
Experimental Psychology: Human Perception and Performance 34(2): 356–375.
Fific, M. and J. T. Townsend (2010). ‘Information-processing Alternatives to Holistic
Perception: Identifying the Mechanisms of Secondary-level Holism within a Categorization Paradigm’.
Journal of Experimental Psychology: Learning, Memory, and Cognition 36(5): 1290–1313.
Garner, W. R. (1974). The Processing of Information and Structure. New York: Wiley.
Green, D. M. and J. A. Swets (1966). Signal Detection Theory and Psychophysics. New York: Wiley.
Hummel, J. E. and I. Biederman (1992). ‘Dynamic Binding in a Neural Network for Shape Recognition’.
Psychological Review 99: 480–517.
Ingvalson, E. M. and M. J. Wenger (2005). ‘A Strong Test of the Dual Mode Hypothesis’. Perception and
Psychophysics 67: 14–35.
966 Townsend and Wenger
The visual hierarchy achieves integral representation through convergence. Whereas LGN neu-
rons are not selective for orientation, to obtain this feature in V1 requires the output of several
LGN neurons to converge on V1 simple cells. Besides simple cells, complex cells were distinguished
of which the receptive field is larger and more distinctive; Hubel and Wiesel proposed them to
be the result of output from several simple cells converging on a complex cell. Convergence is
understood to continue along the ventral stream (Kastner et al. 2001), leading to receptive field
properties not available at lower level (Hubel and Wiesel 1998): e.g. a representation in V4 is based
on convex and concave curvature (Carlson et al. 2011). Correspondingly, these representations
are becoming increasingly abstract; e.g. curvature representations in V4 in Macaque are invariant
against color changes (Bushnell and Pasupathy 2011). Also, the populations of neurons that carry
the representations become increasingly sparse (Carlson et al. 2011).
The higher up, the more the representations become integral and abstract, i.e. invariant under
perturbations such as location or viewpoint changes (Nakatani et al. 2002) or occlusion (e.g.
Plomp et al. 2006). In individual neurons of macaque inferotemporal cortex (Tanaka et al. 1991),
although some of these cells respond specifically to whole, structured objects such as faces or
hands, most of them are more responsive to simplified objects. These cells provide higher-order
features with more or less position and orientation-invariant representation. The ‘more or less’
is added because the classes of stimuli these neurons respond to vary widely; some are orienta-
tion invariant, some are not; some are invariant with respect to contrast polarity, some are not.
Collectively, neurons in temporal areas represent objects by using a variety of combinations of
active and inactive columns for individual features (Tsunoda et al. 2001). They are organized in
spots, also known as columns, that are activated by the same stimulus. Some researchers proposed
that these columns constitute a map, the dimensions of which representing some abstract param-
eters of object space (Op de Beeck et al. 2001). Whether or not this proposal holds, it remains true
that realistic objects at this level are coded in a sparse and distributed population (Quiroga et al.
2008; Young and Yamane 1992).
In the psychological literature, the hierarchical approach to the visual system has found a func-
tional expression early on in the influential work of Neisser (1967), who identified the hierarchical
levels with stages of processing. Although Neisser recalled much from these views in subsequent
work (Neisser 1976), these early ideas have remained remarkably persistent amongst psychologists.
Most today acknowledge hierarchical stages in perception albeit ones that are ordered as cascades
rather than strictly sequentially. Neisser (1967) regards the early stages of perception as automatic
and the later ones as attentional. This notion has been elaborated by Anne Treisman, mostly in
visual search experiments. Treisman and Gelade (1980) showed that visual detection of target ele-
ments in a field of distracters is easy when the target is distinguished by a single basic feature.
When, however, a conjunction of features is needed to identify a target, search is slow and difficult.
Presumably, this is because attention is deployed by visiting the spatial location of each item-by-
item. Treisman concluded that spatially selective attention is needed for feature integration.
However, regardless of whether a basic feature identifies the target, the ease of finding it
amongst non-targets depends on their homogeneity (Duncan and Humphreys 1989); search for
conjunctions of basic features need not involve spatial selection, as long as these conjunctions
result in the emergence of a higher-order, integral feature that is salient enough (Nakayama and
Silverman 1986; Treisman and Sato 1990; Wolfe et al. 1988). We will come back to this notion
shortly. For now we may consider salience as the product of competition amongst target and dis-
tracter features, positively biased for relevant target features (Desimone and Duncan 1995) and/
or negatively biased for nontarget features, including the target’s own components (Rensink and
Enns 1995).
Hierarchical Stages or Emergence in Perceptual Integration? 971
Fig. 47.1 Popping out or popping in? Seeing is not always believing. <http://illutionista.blogspot.
be/2011/07/eating-hand-illusion-punching-face.html.>
972 van Leeuwen
to say that the event is salient, because it is unlikely. Recall, however, that we are then drawing on
precisely the kind of knowledge and inferences that would prevent us from seeing what we are
actually seeing here. We might say the event is salient, because mid-level vision is producing an
unusual output. This requires conscious awareness to have access to the mid-level representations,
in which, according to Wolfe and Cave (1999) targets and non-targets consist of loosely bundled
collections of features. But as far as this level is concerned, there is nothing unusual about the
scene; it is just a few bundles of surfaces, some of which are partially occluded. The event is salient
because it seems a fist is being swallowed. This illusion, therefore, is taking the notion of popping
out to the extreme: what is supposedly popping out is actually popping in.
All things considered, perhaps perception scientists have focused too exclusively on the hier-
archical approach. In fact, from a neuroscience point of view the hierarchical picture is not that
clear-cut. On the one hand, hierarchy seems not always necessary: single cells in V1 have been
found that code for grouping and e.g. are sensitive to occlusion information (Sugita 1999). On
the other hand, neurons selective for specific grouping configurations, irrespective of the sensory
characteristics of their components, occur outside of the ventral stream hierarchy, in macaque
lateral intraparietal sulcus (LIP) (Yokoi and Komatsu 2009). The LIP belongs to the dorsal stream
or ‘where’ system, for processing location and/or action-relevant information (Ungerleider and
Mishkin 1982; Goodale and Milner 1992), and is associated with attention and saccade targeting.
Using fMRI, areas of both the ventral and dorsal stream showed object-selectively; in intermediate
processing areas these representations were viewpoint and size specific, whereas in higher areas
they were viewpoint-independent (Konen and Kastner 2008). Generally speaking, it is not surpris-
ing that the ‘where’ system is involved in perceptual grouping. Consider, for instance, grouping by
proximity, which is primarily an issue of ‘where’ the components are localized in space (Gepshtein
and Kubovy 2007; Kubovy et al. 1998; Nikolaev et al. 2008). These observations might suggest that
hierarchy does not adequately characterize the distribution of labor in visual processing areas.
Predictive Coding
According to Murray (2008), we must take care to distinguish effects of attention that are
pattern-specific from non-specific shifts in the baseline firing rates of neurons. Baseline shifts can
strengthen or weaken a given lower-level signal and can selectively affect a certain brain region,
independently of what is represented there; the firing rates of neurons, even when no stimulus is
present in the receptive field (Luck et al. 1997).
Moreover, also reduction in activity has been reported as a result of attention allocation
(Corthout and Supèr 2004). Possibly, this top-down effect could be understood as predictive cod-
ing: this notion proposes that inferences of high-level areas are compared with incoming sensory
information in lower areas through cortical feedback and the error between them is minimized by
modifying the neural activities (Rao and Ballard 1999). Using fMRI, Murray et al. 2002 found that
whereas activity increases in the higher areas, in particular the lateral occipital complex; when
elements grouped into objects as opposed to randomly arranged, reduction of activity occurs in
primary visual cortex. This observation suggests that activity in early visual areas may be reduced
as a result of grouping processes in higher areas. Reduced activity in early visual areas, as meas-
ured by fMRI was shown to indicate reduction of visual sensitivity (Hesselmann et al. 2010),
presumably due to these processes.
Reduction of activity has also been claimed to have the opposite effect: Kok et al. (2012) found
that the reduction corresponded to a sharpening of sensory representations. Sharpening is under-
stood as top-down suppression of neural responses that are inconsistent with the current expec-
tations. These results suggest an active pruning of neural representations, in other words, active
expectation making representations increasingly sparse. Then again, multi-unit recording stud-
ies in ferrets and rats have provided evidence against such active sparsification in visual cortex
(Berkes et al. 2009).
Overall, we may conclude that top-down effects on early visual perception are both ubiq-
uitous and varied, sufficiently to accommodate contradicting theories; top-down effects may
selectively or aselectively increase or decrease firing rates, change the tuning properties of neu-
rons, including receptive field locations and sizes. Some of these effects may be predictive; per-
ception does not begin when the light hits the retina. None of these mechanisms, however, are
fast enough to enable the rapid detection of complex object properties that configural superior-
ity requires.
Hierarchical Stages or Emergence in Perceptual Integration? 975
Contextual modulation
Neurons in primary visual cortex (V1) respond differently to a simple visual element when it is
presented in isolation from when it is embedded within a complex image (Das and Gilbert 1995).
Beyond their classical receptive field, there is a surround region; its diameter is estimated to be
at least 2–5 times larger than the classical receptive field (Fitzpatrick 2000). Stimulation of this
region can cause both inhibition and facilitation of a cell’s responses, and modification of its RF
(Blakemore and Tobin 1972), spatial summation of low-contrast stimuli (Kapadia et al. 1995),
and cross-orientation modulation (Das and Gilbert 1999; Khoe et al. 2004). Khoe et al. (2004)
studied detection thresholds for low-contrast Gabor patches, in combination with event-related
976 van Leeuwen
potentials (ERP) analyses of brain activity. Detection sensitivity increases for such stimuli when
flanked by other patches in collinear orientation, as compared to ones in the orthogonal orienta-
tion. Collinear stimuli gave rise an increased ERP response between 80 to 140 ms from stimulus
onset, centered on the midline occipital scalp, which could be traced to primary visual cortex.
Such interactions are thought to depend on local excitatory connections between cells in V1
(Kapadia et al. 1995; Polat et al. 1998).
Das and Gilbert (1999) showed that the strength of these connections declines gradually with
cortical distance in a manner that is largely radially symmetrical and relatively independent of
orientation preferences. Contextual influence of flanking visual stimuli varies systematically with
a neuron’s position within the cortical orientation map. The spread of connections could provide
neurons with a graded specialization for processing angular visual features such as corners and
T junctions. This means that already at the level of V1, complex features can be detected. In par-
ticular, T-junctions are an important clue that an object is partially hidden behind an occlude,
in accordance with the observation that occlusion is detected early in perception (see Kogo and
van Ee, this volume). According to Das and Gilbert (1999), these features could have their own
higher-order maps in V1, linked with the orientation map. In other words, higher-order maps
thought to belong to mid-level may be found already in early visual areas.
In V1, and more predominantly in the adjacent area V2, Zhou et al. (2000) and Qiu and von der
Heydt (2005) observed in macaque, neurons sensitive to boundary assignment. One neuron will
fire if the figure is on one side of an edge, but will remain silent and another will fire instead if the
figure is on the other side of the edge. These distinctions are made as early as 30 ms after stimulus
onset. Thus, even receptive fields in early areas such as V1 are sensitive to context almost instan-
taneously after a stimulus onset.
In the input layers (4C) of V1 neurons reach a peak in orientation selectively with a latency of
30-45 milliseconds, persisting for 40-85 ms (macaque). The output layers (layers 2, 3, 4B, 5 or 6),
however, show a development in selectivity, in which often neurons shows several different peaks.
This could be understood in terms of wide-range lateral inhibition needed for high-level of orienta-
tion selectivity in V1 (Ringach et al. 1997) but also, I should add, as a result of modulation from
long-range connections within V1. Along with the architecture of neural connectivity, the dynamics
provides the machinery for early holism, through spreading of activity within the early visual areas.
Due to activation spreading, the time course of activity in cells, regions and systems shows an
increased context-dependency in early visual areas with time. Around 60 ms from stimulus onset
the activity of neurons in V1 becomes dependent on that of their neighbors through horizontal
connections (in the same neuronal layer), for instance the interactions of oriented contour seg-
ments through local association fields (Kapadia et al. 1995; Polat et al. 1998; Bauer and Heinze
2002). These effects can be observed in human scalp EEG: the earliest ERP component C1—which
peaks at 60–90 ms after stimulus onset—is not affected by attention (Clark et al. 2004; Martinez
et al. 1999; Di Russo et al. 2003), although the later portion of this component may reflect contribu-
tions from visual areas other than V1 (Foxe and Simpson 2002). The earliest attentional processes
in EEG reflect spatial attention. ERP studies (reviewed by Hillyard et al. 1998) showed that spatial
attention affects ERP components not earlier than about 90 ms after stimulus onset. The 80–100
ms latency is generally understood to be the earliest moment where attentional feedback kicks in.
that the former can be done on the basis of low-spatial resolution information, whereas the latter
required a combination of low and high spatial resolution aspects of the stimuli. Eliminating low
spatial frequency information from the stimuli, left hemisphere activity became dominant.
Even though for proximity, the locus of these effects seems early, the time course of perceptual
grouping might seem to confirm that it is attentionally driven. By varying the task, requiring spa-
tial attention to be narrowly or widely focused, it is possible to observe differences in perceptual
integration (Stins and van Leeuwen 1993). Han et al. (2005) varied the target of the task by setting
the task either to detect a target color in the center or more distributed across the stimulus. They
measured the effects of this manipulation on evoked potentials. Han et al. (2005) found that all the
grouping-related evoked activity not only started later than 100 ms, but also depended on the task.
There are, however, earlier correlates of grouping in neural activity than the ones observed by
Han et al (2001, 2005). In the dot-lattice display of Figure 47.2, Nikolaev et al. (2008) studied
(a)
c
b Aspect ratio AR = |b| / |a|
a d
(b)
AR=1.0 AR=1.1
AR=1.2 AR=1.3
Fig. 47.2 Dot lattices. The dots appear to group into strips. (a) The four most likely groupings are
labeled a, b, c, and d, with the inter-dot distance increasing from a to d. Perception of lattices
depends on their aspect ratio (AR), which is the ratio of two shortest inter-dot distances: along a
(the shortest) and b. When AR = 1.0, the organizations parallel to a and b are equally likely. When
AR > 1.0, the organization parallel to a is more likely than the organization parallel to b. These
phenomena are manifestations of grouping by proximity. (b) Dot lattices of four aspect ratios.
Reproduced from Experimental Brain Research, 186(1), pp. 107–122, Dissociation of early evoked cortical activity in
perceptual grouping, Andrey R. Nikolaev, Sergei Gepshtein, Michael Kubovy, and Cees van Leeuwen, DOI: 10.1007/
s00221-007-1214-7 Copyright (c) 2008, Springer-Verlag. With kind permission from Springer Science and Business Media.
Hierarchical Stages or Emergence in Perceptual Integration? 979
grouping by proximity using a design based on a parametrized grouping strength. They found
an effect of proximity, more precisely of aspect ratio (AR, see Figure 47.2) on C1 in the medial
occipital region starting from 55 ms after onset of the stimulus. As mentioned, C1 is considered
the earliest evoked response of the primary visual cortex; it is usually registered in the central
occipital area 45–100 ms after stimulus presentation. This result suggests that C1 activity reflects
early spatial grouping. The early activity was higher in the right than left hemisphere, consistently
with Han et al.’s (2001) observation that low spatial frequencies are processed more in the right
than left hemisphere. Therefore, proximity grouping at this stage depends more on low than high
spatial frequency content of visual stimuli.
One of the reasons this result was not observed in Han et al. (2001) may have been that their
task never involved reporting grouping. In this respect it is interesting that in Nikolaev et al.
(2008) the amplitude of C1 depended on individual sensitivity to subtle differences in AR. The
more sensitive an observer, the better AR predicted the amplitude of C1. The absence of an effect
of AR on C1 in low grouping sensitivity observers was compensated by an effect on the next
peak. This is the P1 in posterior lateral occipital areas (without a clear asymmetry), having its
earliest effect of proximity (AR) at 108 ms from stimulus onset, i.e. right at the onset of atten-
tional feedback activity. The effect is present in all observers, but the trend is opposite to that of
C1, in that the lower the proximity sensitivity, the larger its effect on P1 amplitude. Thus, the two
events represent different aspects of perceptual grouping, with the transition between the two
taking place on the interval from 55 to 108 ms after stimulus onset. Perceptual grouping, there-
fore, may be regarded as a multistage process, which consists of early attention-independent
processes and later processes that depend on attention, where the latter may compensate the
former if needed.
1991; van Leeuwen and Bakker 1995; Patching and Quinlan 2002) and Garner effects (Garner
1974, 1976, 1988), have had a crucial role for detecting feature integration in behavioral studies.
Incongruence effects involve the deterioration of a response to a target feature resulting from
one or more incongruent but irrelevant other features presented at the same trial, as compared to
a congruent feature. They belong to the family that also includes the classical Stroop task (Stroop
1935) in which naming the ink color of a color-word is delayed if the color-word is different
(incongruent) from the color of the ink which has to be named (e.g. the word red printed in
green ink), as well as auditory versions (Hamers and Lambert 1972), the Eriksen flanker paradigm
(Eriksen and Eriksen 1974), tasks using individual faces and names (Egner and Hirsch 2005),
numerical values and physical sizes (Algom et al. 1996), names of countries and their capitals
(Dishon-Berkovits and Algom 2000), and versions employing object—or shape-based stimuli
(Pomerantz et al. 1989; for a review: Marks 2004). These effects, therefore, are generic to different
levels of processing. Different Stroop-like tasks will involve a mixture of partially overlapping, and
partially distinct brain mechanisms (see, for instance, a recent meta-analysis in Nee et al. 2007).
Consider the stimuli in Figure 47.3. According to their contours the stimuli on one diagonal
are congruent and the ones on the other incongruent. Participants responding to whether the
concave contour has a rectangular or triangular shape, show an effect of congruency of the outer
contour on response latencies and EEG. These effects imply that concave and surrounding con-
tour shapes have somehow become related in the representation of the figure.
Garner interference was named by Pomerantz (1983) after the work of Garner (1974, 1976, and
1988). Stimulus dimensions, such as brightness or saturation, are assumed to describe a stimulus
in a ‘feature space’ (Garner 1976). Dimensions are called separable if variation along the irrelevant
dimension results in same performance as without variation. An example of separable dimensions
are circle size and radius inclination (Garner and Felfoldy 1970). When variation of the stimuli along
an irrelevant dimension of this space slow the response to the target compared to when the irrelevant
dimension is held constant, Garner called such dimensions integral, which means that they have been
integrated perceptually. Brightness and saturation are typically integral dimensions (Garner 1976).
G3L3 G3L4
G4L3 G4L4
Fig. 47.3 Stimuli composed of a larger outer contour (global feature G) and a smaller inner contour
(local feature L) which were either a triangular or rectangular in shape, yielding the congruent stimuli
G3L3, G4L4 and the incongruent ones: G3L4, G4L3.
Participants classified the figures as triangular or rectangular according to the shape of the inner contour. Reprinted
from NeuroImage, 45(4), Lars T. Boenke, Frank W. Ohl, Andrey R. Nikolaev, Thomas Lachmann, and Cees van
Leeuwen, Different time courses of Stroop and Garner effects in perception — An Event-Related Potentials Study, pp.
1272–1288, doi: 10.1016/j.neuroimage.2009.01.019 Copyright (c) 2009, with permission from Elsevier.
Hierarchical Stages or Emergence in Perceptual Integration? 981
In one of his studies, for instance, Garner (1988) used the dimensions ‘letters’ and ‘color’. Letters
C and O were presented in green or red ink color. The task was to name the ink color, which varied
randomly in both letter conditions. Here, the irrelevant feature was associated with the ‘letters’
dimension. In the baseline condition, the letters ‘O’ or ‘C’ would occur in separate blocks; in the
filtering conditions they would be randomly intermixed. Irrelevant variation of the letters had
impact on the response to the color dimension, which implies that letter identity and color are
integral dimensions.
As independent factors in one single experiment, incongruence and Garner effects occurred
either jointly (Pomerantz 1983; Pomerantz et al. 1989; Marks 2004) or mutually exclusively
(Melara and Mounts 1993; Patching and Quinlan 2002, van Leeuwen and Bakker 1995). These
effects might thus be considered as belonging to different mechanisms. But perhaps better, they
could be regarded as the same mechanism operating on two different time scales. In both cases,
the principle is that attentional selection failed, based on the previous inclusion with the target
information of task-irrelevant information. Their difference may then be considered in terms of
the time it takes this irrelevant information to become connected with the target. Incongruence
effects occur when conflicting information is presented within a narrow time window (Flowers
1990). Thus, memory involvement is minimal. The Garner effect, on the other hand, is a conflict
operating between presentations, and thus involves episodic memory. Incongruence and Garner
effects, therefore, differ considerably in the width of their scope and that of their feedback cycle,
the drawing upon a much wider feedback cycle than the former.
As a result, their time course will differ. Boenke et al. (2009) used ERP analyses to observe
the time course of incongruence and Garner effects. In accordance with Kasai’s (2010) effects
of spreading of attention, they found incongruence effects on N1 and N2. The first interval was
observed on N1, between 172–216 ms after stimulus onset and had a maximum at 200 ms, located
in the parieto-occipital areas, more predominantly on the right. The amplitude was larger in
incongruent than congruent condition. The second interval occurred between 268–360 ms after
stimulus onset and included the negative component N2 and the rising part of the P3 component,
predominantly in the fronto-central region of the scalp.
Garner effects in Boenke et al. (2009) started off later. The earliest one between 328–400 ms
after stimulus onset. This interval corresponded to the rising part of the positive component
P3 and was observed predominantly above the fronto-central areas). The first maximum in
the Garner effect almost coincided with the second maximum in the incongruence effect. This
moment (336 ms) was also the maximum of interaction with the Garner effect observed over left
frontal, central, temporal, and parietal areas. This result implies that Stroop and Garner effects
occur in cascaded stages, resolving the longstanding question about their interdependence. We
may conclude that the time course of Garner effects follows the principle of spreading attention;
with Garner effects depending on information from the preceding episode, they depend on a
wider feedback cycle than incongruence effects, and thus the rise time of the former is longer, and
their latency larger, than that of the latter.
activation cycles operating at multiple scales. These cycles work in parallel (e.g. between ventral and
dorsal stream), but where the onset of their evoked activity differs, they operate as cascaded stages.
According to a principle I have been peddling since the late eighties (e.g. van Leeuwen et al.
1997), early holism is realized through diffusive coupling through lateral and large-scale intrinsic
connections, prior to the deployment of attentional feedback. The coupling results in spreading
activity on, respectively, circuit-scale (Gong and van Leeuwen, 2009), area-scale (Alexander et al.
2011), and whole head-scale traveling wave activity (Alexander et al. 2013).
Starting from approximately 100 ms after onset of a stimulus, attentional feedback also begins
to spread, but cannot separate what earlier processes have already joined together. Early-onset
attentional feedback processes have been shown to extend to congruency of proximal information
in the visual display; later ones to extend to information in episodic memory (Boenke et al. 2009).
This is because the onset latency of the effect is determined by the width of the feedback cycle,
which determines the time it takes for the contextual modulation to arrive: short for features close
by within the pattern or long for episodic memory.
Open issues
In this chapter, I drew a perspective of visual processing based on intrinsic holism, as established
through the dynamic spreading of signals via short and long range lateral, as well as top-down feed-
back connections. Since the mechanism is essentially indifferent with respect to pre-attentional
and attentional processes in perception, we might consider a unified theoretical framework, in
which processes are distinguished, based on the scale of which these interactions are taking place.
The exact layout of the theory will depend on a precise, empirical study of the way spreading activ-
ity can achieve coherence in the brain. The next chapter will provide some of the results that could
offer the groundwork for such a theory.
Acknowledgments
The author is supported by an Odysseus research grant from the Flemish Organization for
Science (FWO) and wishes to thank Lee de-Wit, Pieter Roelfsema, and Andrey Nikolaev for use-
ful comments.
Hierarchical Stages or Emergence in Perceptual Integration? 983
References
Alexanders, D.M. and van Leeuwen, C. (2010). Mapping of contextual modulation in the population
response of primary visual cortex. Cognitive Neurodynamics 4: 1–24.
Alexander, D.M., Trengove, C., Sheridan, P., and van Leeuwen, C. (2011). Generalization of learning by
synchronous waves: from perceptual organization to invariant organization. Cognitive Neurodynamics
5: 113–32.
Alexander, D.A., Jurica, P., Trengove, C., Nikolalev, A.R., Gepshtein, S., Zviagyntsev, M., Mathiak,
K., Schulze-Bonhage, A., Rüscher, J., Ball, T., and van Leeuwen, C. (2013). Traveling waves and
trial averaging; the nature of single-trial and averaged brain responses in large-scale cortical signals.
NeuroImage doi: 10.1016/j.neuroimage.2013.01.016.
Algom, D., Dekel, A., and Pansky, A. (1996). The perception of number from the separability of the
stimulus: the Stroop effect revisited. Memory & Cognition 24: 557–72.
Amedi, A., Jacobson, G., Hendler, T., Malach, R., and Zohary, E. (2002). Convergence of visual and tactile
shape processing in the human lateral occipital complex. Cerebral Cortex 12: 1202–12.
Bauer, R. and Heinze, S. (2002). Contour integration in striate cortex. Experimental Brain Research
147: 145–52.
Baylis, G.C. and Driver, J. (1992). Visual parsing and response competition: The effect of grouping factors.
Perception & Psychophysics 51: 145–62.
Ben-Shahar, O. and Zucker, S.W. (2004). Sensitivity to curvatures in orientation-based texture
segmentation. Vision research 44: 257–77.
Ben-Shahar, O., Huggins, P.S., Izo, T., and Zucker, S.W. (2003). Cortical connections and early visual
function: intra-and inter-columnar processing. Journal of Physiology Paris 97: 191–208.
Berkes, P. White, B.L., and Fiser, J. (2009). No evidence for active sparsification in the visual cortex. Paper
presented at NIPS 22. <http://books.nips.cc/papers/files/nips22/NIPS2009_0145.pdf>
Blakemore, C. and Tobin, E.A. (1972). Lateral inhibition between orientation detectors in the cat’s visual
cortex. Experimental Brain Research 15: 439–40.
Boenke, L.T., Ohl, F., Nikolaev, A.R., Lachmann, T., and van Leeuwen, C. (2009). Stroop and Garner
interference dissociated in the time course of perception, an event-related potentials study. NeuroImage
45: 1272–88.
Bushnell, B.N. and Pasupathy, A. (2011). Shape encoding consistency across colors in primate V4. Journal
of Neurophysiology 108: 1299–308.
Carlson, E.T., Rasquinha, R.J., Zhang, K., and Connor, C.E. (2011). A sparse object coding scheme in area
V4. Current Biology 21: 288–93.
Clark, V.P., Fan, S., and Hillyard, S.A. (2004). Identification of early visual evoked potential generators by
retinotopic and topographic analyses. Human Brain Mapping 2(3): 170–87.
Corthout, E. and Supèr, H. (2004). Contextual modulation in V1: the Rossi-Zipser controversy.
Experimental Brain Research 156: 118–23.
Das, A. and Gilbert, C.D. (1995). Long-range horizontal connections and their role in cortical
reorganization revealed by optical recording of cat primary visual cortex. Nature 375: 780–4.
Das, A. and Gilbert, C.D. (1999). Topography of contextual modulations mediated by short-range
interactions in primary visual cortex. Nature 399: 655–61.
Davis, G. and Driver, J. (1997). Spreading of visual attention to modally versus amodally completed
regions. Psychological Science 8(4): 275–81.
Dehaene, S., Changeux, J.P., Naccache, L., Sackur, J., and Sergent, C. (2006). Conscious, preconscious, and
subliminal processing: a testable taxonomy. Trends in Cognitive Sciences 10: 204–11.
Di Russo, F., Martínez, A., and Hillyard, S.A. (2003). Source analysis of event-related cortical activity
during visuo-spatial attention. Cerebral Cortex 13(5): 486–99.
984 van Leeuwen
Dishon-Berkovits, M., Algom, D. (2000). The Stroop effect: it is not the robust phenomenon that you have
thought it to be. Memory & Cognition 28: 1437–49.
Desimone, R. and Duncan, J. (1995). Neural mechanisms of selective visual attention. Annual review of
neuroscience 18(1): 193–222.
Duncan, J. and Humphreys, G.W. (1989). Visual search and stimulus similarity. Psychological Review
96: 433–58.
Eimer, M. (1996). The N2pc component as an indicator of attention selectivity. Electroencephalography and
Clinical Neurophysiology 99: 225–34.
Egeth, H.E. and Yantis, S. (1997). Visual attention: Control, representation, and time course. Annual review
of psychology 48(1): 269–97.
Egner, T. and Hirsch, J. (2005). Cognitive control mechanisms resolve conflict through cortical
amplification of task-relevant information. Nature Neuroscience 8: 1784–90.
Enns, J.T. and Rensink, R.A. (1990). Sensitivity to three-dimensional orientation in visual search.
Psychological Science 1(5): 323–6.
Eriksen, B.A. and Eriksen, C.W. (1974). Effects of noise letters upon the identification of a target letter in a
nonsearch task. Perception & Psychophysics 16: 143–9.
Feldman, J. and Singh, M. (2005). Information along contours and object boundaries. Psychological Review
112: 243–52.
Felleman, D.J. and Van Essen, D.C. (1991). Distributed hierarchical processing in the primate cerebral
cortex. Cerebral Cortex 1: 1–47.
Fiorani, M., Rosa, M.G., Gattass, R., and Rocha-Miranda, C.E. (1992). Dynamic surrounds of receptive
fields in primate striate cortex: a physiological basis for perceptual completion? Proceedings of the
National Academy of Sciences USA 89: 8547–51.
Fitzpatrick, D. (2000) Seeing beyond the receptive field in primary visual cortex. Current Opinions in
Neurobiology 10: 438–43.
Flowers, J.H. (1990). Priming effects in perceptual classification. Perception & Psychophysics 47:
135–48.
Foxe, J.J. and Simpson, G.V. (2002). Flow of activation from V1 to frontal cortex in humans. Experimental
Brain Research 142(1): 139–50.
Freeman, W.J. (1991). Insights into processes of visual perception from studies in the olfactory system.
In: L. Squire, N.M. Weinberger, G. Lynch, and J.L. McGaugh (eds.), Memory: Organization and Locus of
Change, pp. 35–48. New York: Oxford University Press.
Freeman, W.J. and van Dijk, B.W. (1987). Spatial patterns of visual cortical fast EEG during conditioned
reflex in a rhesus monkey. Brain Research 422(2): 267–76.
Garner, W.R. (1974). The Processing of Information and Structure. Potomac: Erlbaum Publishers.
Garner, W.R. (1976). Interaction of stimulus dimensions in concept and choice processes. Cognitive
Psychology 8: 98–123.
Garner, W.R. (1988). Facilitation and interference with a separable redundant dimension in stimulus
comparison. Perception & Psychophysics: 44: 321–30.
Garner, W.R. and Felfoldy, G.L. (1970). Integrality of stimulus dimensions in various types of information
processing. Cognitive Psychology 1: 225–41.
Gepshtein, S. and Kubovy, M. (2007). The lawful perception of apparent motion. Journal of Vision
7(8):9: 1–15.
Gilaie-Dotan, S, Perry, A., Bonneh, Y., Malach, R., and Bentin, S. (2009). Seeing with profoundly
deactivated mid-level visual areas: nonhierarchical functioning in the human visual cortex. Cerebral
Cortex 19: 1687–703.
Gong, P. and van Leeuwen, C. (2009). Distributed dynamical computation in neural circuits with
propagating coherent activity patterns. PloS Computational Biology 5(12): e1000611.
Hierarchical Stages or Emergence in Perceptual Integration? 985
Goodale, M.A., and Milner, A.D. (1992). Separate visual pathways for perception and action. Trends in
Neuroscience 15: 20–5.
Grosof, D.H., Shapley, R.M., and Hawken, M.J. (1993). Macaque V1 neurons can signal ‘illusory contours’.
Nature 365: 550–2.
Hamers, J.F. and Lambert, W.E. (1972). Bilingual interdependencies in auditory perception. Journal of
Verbal Learning and Verbal Behaviour 11: 303–10.
Han, S., Song, Y., Ding, Y., Yund, E.W., and Woods, D.L. (2001). Neural substrates for visual perceptual
grouping in humans. Psychophysiology 38: 926–35.
Han, S., Jiang, Y., Mao, L., Humphreys, G.W., and Qin, J. (2005). Attentional modulation of perceptual
grouping in human visual cortex: ERP studies. Human Brain Mapping 26: 199–209.
Hesselmann, G., Sadaghiani, S., Friston, K.J., and Kleinschmidt, A. (2010) Predictive coding or evidence
accumulation? False inference and neuronal fluctuations. PloS ONE 5(3), e9926: doi:10.1371/journal.
pone.0009926.
Hillyard, S.A., Vogel, E.K., and Luck, S.J. (1998). Sensory gain control (amplification) as a mechanism of
selective attention: electrophysiological and neuroimaging evidence. Philosophical Transactions of the
Royal Society of London. Series B: Biological Sciences 353: 1257–70.
Hochstein, S. and Ahissar, M. (2002). View from the top-hierarchies and reverse hierarchies in the visual
system. Neuron 36(5): 791–804.
Hubel, D.H. and Wiesel, T.N. (1959). Receptive fields of single neurones in the cat’s striate cortex. Journal of
Physiology 148: 574–91.
Hubel, D.H. and Wiesel, T.N. (1974). Sequence regularity and geometry of orientation columns in the
monkey striate cortex. Journal of Comparative Neurology 158: 267–94.
Hubel, D.H. and Wiesel T.N. (1998). Early exploration of the visual cortex. Neuron: 20 401–12.
Kahneman, D. and Henik, A. (1981). Perceptual organization and attention. In: M. Kubovy and
J.R. Pomerantz (eds), Perceptual Organization, pp. 181–211. Hillsdale: Erlbaum.
Kanizsa, G. (1994). Gestalt theory has been misinterpreted, but has also had some real conceptual
difficulties. Philosophical Psychology 7: 149–62.
Kapadia, M.K., Ito, M., Gilbert, C.D., and Westheimer, G. (1995). Improvement in visual sensitivity by
changes in local context: parallel studies in human observers and in V1 of alert monkeys. Neuron
15: 843–56.
Kasai, T. (2010). Attention-spreading based on hierarchical spatial representations for connected objects.
Journal of Cognitive Neuroscience 22: 12–22.
Kasai, T. and Kondo, M. (2007). Electrophysiological correlates of attention-spreading in visual grouping.
Neuroreport 18: 93–8.
Kastner, S., De Weerd, P., Pinsk, M.A., Elizondo, M.I., Desimone, R., and Ungerleider, L.G. (2001).
Modulation of sensory suppression: implications for receptive field sizes in the human visual cortex.
Journal of Neurophysiology 86: 1398–411.
Kenemans, J.L., Baas, J.M., Mangun, G.R., Lijffijt, M., and Verbaten, M.N. (2000). On the processing of
spatial frequencies as revealed by evoked-potential source modeling. Clinical neurophysiology: official
journal of the International Federation of Clinical Neurophysiology 111: 1113–23.
Khoe, W., Freeman, E., Woldorff, M.G., and Mangun. G.R. (2004). Electrophysiological correlates of
lateral interactions in human visual cortex. Vision Research 44: 1659–73.
Kimchi, R. and Bloch, B. (1998). Dominance of configural properties in visual form perception.
Psychonomic Bulletin & Review 5: 135–9.
Kitterle, F.L., Hellige, J.B. and Christman, S. (1992). Visual hemispheric asymmetries depend on which
spatial frequencies are task relevant. Brain and Cognition 20: 308–14.
Kok, P., Jehee, J.F.M., and de Lange, F.P. (2012). Less is more: expectation sharpens representations in the
primary cortex. Neuron 75: 265–70.
986 van Leeuwen
Konen, Ch. and Kastner, S. (2008). Tho hierarchically organized neural systems for object information in
human visual cortex. Nature Neuroscience 11: 224–31.
Kubovy, M., Holcombe, A.O., and Wagemans, J. (1998). On the lawfulness of grouping by proximity.
Cognitive Psychology 35: 71–98.
Lamme, V.A., Supèr, H., and Spekreijse, H. (1998). Feedforward, horizontal, and feedback processing
cortex. Current Opinion in Neurobiology 8: 529–35.
Lee, T.S. and Mumford, D. (2003). Hierarchical Bayesian inference in the visual cortex. JOSA A
20(7): 1434–48.
Livingstone, M. and Hubel, D. (1988). Segregation of form, color, movement, and depth: anatomy,
physiology, and perception. Science 240: 740–9.
Lörincz, A., Szirtes, G., Takács, B., Biederman, I., and Vogels, R. (2002). Relating priming and repetition
suppression. International Journal of Neural Systems 12: 187–201.
Luck, S.J., Heinze, H.J., Mangun, G.R., and Hillyard, S.A. (1990). Visual event-related potentials
index focused attention within bilateral stimulus arrays: II. Functional dissociations of P1 and N1
components. Electroencephalography and Clinical Neurophysiology 75: 528–42.
Luck, S.J. and Hillyard, S.A. (1994). Spatial filtering during visual search: evidence from human
electrophysiology. Journal of Experimental Psychology: Human Perception and Performance 20: 1000–14.
Luck, S.J., Chelazzi, L., Hillyard, S.A., and Desimone, R. (1997). Neural mechanisms of spatial selective
attention in areas V1, V2, and V4 of macaque visual cortex. Journal of Neurophysiology 77: 24–42.
Lund, J.S., Yoshioka, T., and Levitt, J.B. (1993). Comparison of intrinsic connectivity in different areas of
macaque monkey cerebral cortex. Cerebral Cortex 3: 148–62.
MacLeod, C.M. (1991). Half a century of research on the Stroop effect: an integrative review. Psychological
Bulletin 109: 163–203.
Malach, R., Amir, Y., Harel, M., and Grinvald, A. (1993). Relationship between intrinsic connections and
functional architecture revealed by optical imaging and in vivo targeted biocytin injections in primate
striate cortex. Proceedings of the National Academy of Sciences USA 90: 10469–73.
Mangun, G.R., Hillyard, S.A., and Luck, S.J. (1993) Electrocortical substrates of visual selective attention.
In: Meyer, D. and Kornblum, S. (eds.), Attention and Performance XIV, pp. 219–43. Cambridge,
MA: MIT Press.
Marks, L.E. (2004). Cross-modal interactions in speeded classification. In: G. Calvert, C. Spence and B.E.
Stein (eds.), The Handbook of Multisensory Processes, pp. 85–106. Cambridge, MA: MIT Press.
Martinez, A., Anllo-Vento, L., Sereno, M.I., Frank, L.R., Buxton, R.B., Dubowitz, D.J. et al. (1999).
Involvement of striate and extrastriate visual cortical areas in spatial attention. Nature Neuroscience
2: 364–9.
Melara, R.D. and Mounts, J.R. (1993). Selective attention to Stroop dimensions: effects of baseline
discriminability, response mode, and practice. Memory & Cognition 21: 627–45.
Melcher, D. and Colby, C.L. (2008). Trans-saccadic perception. Trends in Cognitive Science 12: 466–73.
Mounts, J.R. and Tomaselli, R.G. (2005). Competition for representation is mediated by relative attentional
salience. Acta psychologica 118: 261–75.
Murray, S.O. (2008). The effects of spatial attention in early human visual cortex are stimulus independent.
Journal of Vision: 8(10).
Murray, S.O., Kersten, D., Olshausen, B.A., Schrater, P., and Woods, D.L. (2002). Shape perception
reduces activity in human primary visual cortex. Proceedings of the National Academy of Sciences USA
99: 15164–9.
Nakatani, C., Pollatsek, A., and Johnson, S.H. (2002). Viewpoint-dependent recognition of scenes. The
Quarterly Journal of Experimental Psychology: Section A 55(1): 115–39.
Nakayama, K. and Silverman, G.H. (1986). Serial and parallel processing of visual feature conjunctions.
Nature 320: 264–5.
Hierarchical Stages or Emergence in Perceptual Integration? 987
Nauhaus, I., Nielsen, K.J., Disney, A.A., and Callaway, E.M. (2012). Orthogonal micro-organization of
orientation and spatial frequency in primate primary visual cortex. Nature Neuroscience 15: doi:10.1038/
nn.3255.
Nee, D.E., Wager, T.D., and Jonides, J. (2007). Interference resolution: insights from a meta-analysis of
neuroimaging tasks. Cognitive, Affective, & Behavioral Neuroscience 7: 1–17.
Neisser, U. (1967). Cognitive Psychology. East Norwalk: Appleton-Century-Crofts.
Neisser, U. (1976). Cognition and reality: Principles and Implications of Cognitive Psychology. New York,
NY: WH Freeman, Holt and Co.
Nikolaev, A.R. and van Leeuwen, C. (2004). Flexibility in spatial and non-spatial feature grouping: an
Event-Related Potentials study. Cognitive Brain Research 22: 13–25.
Nikolaev, A.R., Gepshtein, S., Kubovy, M., and van Leeuwen, C. (2008). Dissociation of early evoked
cortical activity in perceptual grouping. Experimental Brain Research 186: 107–22.
Op de Beeck, H., Wagemans, J., and Vogels, R. (2001). Inferotemporal neurons represent low-dimensional
configurations of parameterized shapes. Nature Neuroscience 4: 1244–52.
Patching, G.R. and Quinlan, P.T. (2002). Garner and congruence effects in the speeded classification of
bimodal signals. Journal of Experimental Psychology: Human Perception and Performance 28: 755–75.
Plomp, G., Liu, L., van Leeuwen, C., and Ioannides, A.A. (2006). The mosaic stage in amodal completion
as characterized by magnetoencephalography responses. Journal of Cognitive Neuroscience 18: 1394–405.
Polat, U., Mizobe, K., Pettet, M.W., Kasamatsu, T., and Norcia, A.M. (1998).Collinear stimuli regulate
visual responses depending on cell’s contrast threshold. Nature 391: 580–4.
Pomerantz, J.R. (1983). Global and local precedence: selective attention in form and motion perception.
Journal of Experimental Psychology, General 112: 516–40.
Pomerantz, J.R. and Lockhead, G.R.(1991). Perception of structure: an overview. In: G.R. Lockhead and
J.R. Pomerantz (eds.), The Perception of Structure, pp. 1–20. Washington, DC: American Psychological
Association.
Pomerantz, J.R., Sager, L.C., and Stoever, R.J. (1977). Perception of wholes and of their component
parts: some configural superiority effects. Journal of Experimental Psychology: Human Perception &
Performance 3(3): 422.
Pomerantz, J.R., Pristach, E.A., and Carson, C.E. (1989). Attention and object perception. In: B. Shepp,
and S. Ballesteros (eds.), Object Perception: Structure and Process, pp. 53–89. Hillsdale: Erlbaum.
Qiu, F.T. and Von Der Heydt, R. (2005). Figure and ground in the visual cortex: V2 combines stereoscopic
cues with Gestalt rules. Neuron 47(1): 155.
Quiroga, R.Q., Kreiman, G., Koch, C., and Fried, I. (2008). Sparse but not ‘grandmother-cell’ coding in the
medial temporal lobe. Trends in Cognitive Science 12: 87–91.
Rao, R.P. and Ballard, D.H. (1999). Predictive coding in the visual cortex: a functional interpretation of
some extra-classical receptive-field effects. Nature Neuroscience 2: 79–87.
Rensink, R.A. and Enns, J.T. (1995). Preemption effects in visual search: Evidence for low-level grouping.
Psychological Review 102: 101–30.
Ringach, D., Hawken, M., and Shapley, R. (1997). The dynamics of orientation tuning in the macaque
monkey striate cortex. Nature 387: 281–4.
Roelfsema, P.R. (2006). Cortical algorithms for perceptual grouping. Annual Review of Neuroscience
29: 203–27.
Roelfsema, P.R., Lamme, V.A., and Spekreijse, H. (1998). Object-based attention in the primary visual
cortex of the macaque monkey. Nature 395(6700): 376–81.
Sergent, J. (1982). The cerebral balance of power: Confrontation or cooperation? Journal of Experimental
Psychology: Human Perception & Performance 8: 253–72.
Skarda, C.A. and Freeman, W.J. (1987). How brains make chaos in order to make sense of the world.
Behavioral and Brain Sciences 10: 161–95.
988 van Leeuwen
Stins, J. and van Leeuwen, C. (1993). Context influence on the perception of figures as conditional upon
perceptual organization strategies. Perception & Psychophysics 53: 34–42.
Stroop, J.R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology 18
643–62.
Sugita, Y. (1999). Grouping of image fragments in primary visual cortex. Nature 401: 269–72.
Tanaka, K., Saito, H., Fukada, Y., and Moriya, M. (1991) Coding visual images of objects in the
inferotemporal cortex of the macaque monkey. Journal of Neurophysiology 66: 170–89.
Treisman, A. and Gelade, G. (1980). A feature integration theory of attention. Cognitive Psychology
12: 97–136.
Treisman, A. and Sato, S. (1990). Conjunction search revisited. Journal of Experimental Psychology: Human
Perception & Performance 16: 459–78.
Tsunoda, K., Yamane, Y., Nishizaki, M., and Tanifuji, M. (2001). Complex objects are represented
I macaque inferotemporal cortex by the combination of feature columns. Nature Neuroscience 4
(8): 832–8.
Ungerleider, L.G. and Mishkin, M. (1982). Two cortical visual systems. In: D.J. Ingle, M.A. Goodale, and
R.J.W. Mansfield (eds.), Analysis of Visual Behavior, pp. 549–80. Cambridge: MIT Press.
von der Heydt, R. and Peterhans, E. (1989). Mechanisms of contour perception in monkey visual cortex.
I. Lines of pattern discontinuity. Journal of Neuroscience 9: 1731–48.
van Leeuwen, C. and Bakker, L. (1995). Stroop can occur without Garner interference: Strategic and
mandatory influences in multidimensional stimuli. Perception & Psychophysics 57: 379–92.
van Leeuwen, C., Steyvers, M., and Nooter, M. (1997). Stability and intermittency in large-scale coupled
oscillator models for perceptual segmentation. Journal of Mathematical Psychology 41: 319–44.
Wanning, A., Stanisor, L., and Roelfsema, P.R. (2011). Automatic spread of attentional response
modulation along Gestalt criteria in primary visual cortex. Nature Neuroscience 14: 1243–4.
Wolfe, J.M. and Cave, K.R. (1999) Psychophysical evidence for a binding problem in human vision. Neuron
24: 11–17.
Wolfe, J.M., Cave, K.R., and Franzel, S.L. (1988). Guided search: An alternative to the feature integration
model for visual search. Journal of Experimental Psychology: Human Perception & Performance
15: 419–33.
Yokoi, I. and Komatsu, H. (2009). Relationship between neural responses and visual grouping in the
monkey parietal cortex. Journal of Neuroscience 29: 13210–21.
Young, M.P. and Yamane, S. (1992). Sparse population coding of faces in the inferotemporal cortex. Science
256: 1327–31.
Zipser, K., Lamme, V.A., and Schiller, P.H. (1996). Contextual modulation in primary visual cortex. The
Journal of Neuroscience 16: 7376–89.
Zhou, H., Friedman, H.S., and Von Der Heydt, R. (2000). Coding of border ownership in monkey visual
cortex. The Journal of Neuroscience 20(17): 6594–611.
Chapter 48
Fig. 48.1 Four occluded figures (right side of each panel) and their possible local, global, and mosaic
interpretations.
Part (a): Adapted from R.J. van Lier, P.A. van der Helm, and E.L.J. Leeuwenberg, Competing global and local
completions in visual occlusion, Journal of Experimental Psychology: Human Perception and Performance, 21(3),
pp. 571–583. http://dx.doi.org/10.1037/0096-1523.21.3.571 (c) 1995, American Psychological Association.
Parts (b)–(d): Reproduced from G. Plomp, C. Nakatani, V. Bonnardel, and C. van Leeuwen, Amodal completion as
reflected by gaze durations, Perception, 33(10), pp. 1185–2000, doi: 10.1068/p5342x Copyright © 2004, Pion.
With kind permission from Pion Ltd, London www.pion.co.uk and www.envplan.com.
In line with the hierarchical account of perception described in the previous chapter, Sekuler and
Palmer (1992) proposed that the mosaic interpretation is actually computed first. In behavioral stud-
ies, priming with short stimulus onset asynchrony (SOAs; the latency between the onset of the prime
and the target stimulus) facilitated the mosaic figure, whereas long SOAs facilitate the occlusion
interpretation. More recent studies of facilitation by the prime using MEG measurement showed no
such processing order. Indeed, in the period of 50–300 ms after stimulus onset, priming facilitated
both mosaic and different occluded interpretations. This effect was found in occipitotemporal areas,
in particular in the right fusiform cortex, which therefore acts as a hub for different occluded figure
interpretations in this stage of perception (Liu et al. 2006). Thus, for at least this time period, this
part of the visual system keeps active multiple alternative representations of a pattern, including the
Cortical Dynamics and Oscillations 991
mosaic, and thus leaves the choice between several alternative options open. Surrounding (Bruno
et al. 1997; Dinnerstein and Wertheimer 1957; Rauschenberger et al. 2004) or preceding context,
including primes (Plomp et al. 2006; Plomp and van Leeuwen 2006) can bias the choice between
these interpretations during this interval. Occlusion, therefore, provides a key example of the visual
system keeping multiple representations of the same object active at the same time.
Since the visual system compiles and maintains different representations in parallel, even of the
same pattern, neural networks, which allow one pattern to be processed at a time, will not do. Since
each of these representations is determined, to various extents, by shared information from ‘what’
and ‘where’ visual functions, as well as by episodic and semantic memory, a study of isolated areas,
regions or activity sources alone, will not do. We need to consider the coexistence of these repre-
sentations, their interaction, and the mechanisms with which these interactions are effectuated.
Fig. 48.2 Adaptive rewiring leads from an initial random network (left), to modular small-world
structure (right) in small iterative steps. Coupled chaotic oscillators at the nodes synchronize and
desynchronize their activity spontaneously. Over time, pairs of synchronized units that are not connected
receive a connection, and where connected units are not synchronized, connections are removed.
During this process, a modular, small-world structure emerges from an initially random configuration.
Reproduced from Daan van den Berg, Pulin Gong, Michael Breakspear, and Cees van Leeuwen, Fragmentation:
loss of global coherence or breakdown of modularity in functional brain architecture?, Frontiers in Systems
Neuroscience, 6, p. 20, doi: 10.3389/fnsys.2012.00020 (c) 2012, Frontiers Media S.A. This work is licensed under
a Creative Commons Attribution 3.0 License.
structure with increasing sparseness means that the network tends, to some degree, to resemble
a random structure (van den Berg et al. 2012). In the real brain, this may have dramatic conse-
quences. Because of the randomness, the system will have difficulties tracing the origin of signals
in the brain, which means that the observer cannot distinguish perception from hallucination. In
random networks, global connections are relatively predominant (Rubinov et al. 2009b; van den
Berg et al. 2012). The consequence is that patients who suffer connectivity loss, e.g. beginning
schizophrenics, will have difficulty in directing their attention towards local structures (Bellgrove
et al. 2003; Coleman et al. 2009).
Sleep deprivation is another way in which excess randomness is introduced to the network. Our
wakeful experiences continually modify brain connectivity, in a manner that can be considered
random as far as large-scale structure is concerned. One of the functions of sleep, therefore, is to
restore the small-world network structure (Koenis et al. 2011). Indeed, whereas in (REM) sleep
deprivation selectively only affects basic visual discrimination tasks (Karni et al. 1994), general
sleep deprivation (but not, for instance, physical exercise) leads to weakened perceptual organiza-
tion performance on the hidden figures task (Lybrand et al. 1954). In non-REM sleep, we observe
wave-like activity similar to the immature brain, and we may speculate on its role in restoring the
network connectivity structure.
I mentioned the importance of brain connectivity and its pathologies. But the structural connec-
tivity is only relevant, insofar it leads to co-activation of brain circuits and regions. Studies using
fMRI have shown large-scale, distributed patterns of spontaneous activity in the brain (Cordes
et al. 2000; Lowe et al. 1998). These patterns reflect brain connectivity structure (Achard et al.
2006; Bassett and Bullmore 2006; Stam 2004). Correlated patterns in spontaneous fMRI activity
predict which brain regions are likely to respond together during a task (Fox and Raichle 2007).
Pre-stimulus activity could therefore be a way of anticipating the incoming sensory information
by dynamically established coordination of active circuits (Hesselmann et al. 2008). These authors
briefly presented Rubin’s ambiguous face/vase stimuli and observed that when pre-stimulus
Cortical Dynamics and Oscillations 993
activity in the fusiform area, a cortical region preferentially responding to faces, was high, observ-
ers were likely to subsequently perceive the stimulus as a face instead of a vase. Correlated activity
in brain circuits and regions should enable transient coalitions of distributed brain regions, which
jointly represent the information available to the system.
It is possible, therefore, to extract a ‘functional network’ from the activity patterns (for reviews,
see Basset and Bullmore 2006; Bullmore and Sporns 2009). In addition to small worlds, functional
networks extracted from fMRI (Eguiluz et al. 2005) and EEG (Linkenkaer-Hansen et al. 2001 for
amplitude; Gong et al. 2003 for coherence interval durations) have the property of scale invari-
ance. This means that their characteristics are preserved if the measurement scale is increased
or decreased. Scale invariance is a necessary condition for criticality, and hence for dynamically
assembled complexity and long-term memory in brain activity (Linkenkaer-Hansen et al. 2001).
Networks that have both scale-invariance and modular small world properties can arise as a prod-
uct of network rewiring to spontaneous activity, if we assume that new units are recruited at random
into the network (Gong and van Leeuwen 2003). Thus, the properties of functional connectivity
networks may be the product of adaptation of the system to its own spontaneous activity patterns.
Oscillatory Activity
Coordination of brain regions across a range of scales should be flexible, in a manner that hard-
wired connectivity alone could not provide. One way in which this could be achieved is through
control of excitability. Simultaneous activity between neurons, or regions, is an effective means of
enhancing signal effectivity (Fries 2005).
Let us therefore consider which properties of brain activity are useful in this respect. Activity that
is bounded and cyclical is called oscillatory or (in the continuous case) as wave activity. Periodic
and a-periodic oscillators have a natural tendency to synchronize, either complete (Yamada and
Fujisaka 1983; Pecora and Caroll 1990) or phase only (Rosenblum et al. 1996).
In 1929, Hans Berger first observed the oscillatory properties of the EEG. Tallon-Baudry and
Bertrand (1999) argued that synchrony is always the result of a mixture of internal states and
external events. The effects of spontaneous activity on perception can be explained by the fact that
it continues during task performance: evoked activity shows a similar neuroanatomical distribu-
tion to that observed at rest (Arieli et al. 1996). This property of brain activity may have become
recruited for coordinating activity, and for enabling multiple patterns of activity simultaneously
(evidence reviewed in Thut et al. 2012). According to an influential point of view, synchronization
of oscillatory activity binds together distributed representations (Milner 1974; von der Malsburg
1985). Unlike in classical neural networks, synchronous oscillations allow multiple distributed
patterns to be processed in parallel, as they can be separated in phase.
Episodes of oscillatory brain activity are typically decomposed into an array of band-passed signals.
We distinguish delta, theta, alpha, beta and gamma frequency bands. Distinct cognitive and perceptual
functions have traditionally been associated with each of these bands. EEG and MEG signals provide
us with a picture of how phase and amplitude evolve in time over within bands at different locations
of the scalp. We can study couplings between amplitudes and/or phases at different locations within
frequency bands or between phases and amplitudes of different frequency bands. This includes, for
instance, the coupling of phase (phase synchrony) at two different locations at the scalp or the coupling
between theta phase and gamma amplitude at a certain location (phase-amplitude coupling).
Alpha Activity
Generally, large-scale wave patterns in activity, below eight Hz, are uncommon in healthy adults
when awake. Without stimulation and when the observer is relaxed spontaneous activity is
994 van Leeuwen
dominated by eight to twelve Hz, i.e. alpha activity. Alpha activity is a ‘far from unitary phenome-
non’ (Foxe and Snyder 2011, p. 10). It arises from cortico-thalamic or cortico-cortical loops. Alpha
frequency increases during execution of difficult tasks compared with more simple ones (complex
addition and mental rotation vs. simple addition and visual imagery). The increase is largest in
the hemisphere that is dominant for the task, i.e. arithmetical tasks for the left, and visuo-spatial
tasks for the right hemisphere (Osaka 1984). Peak alpha frequency correlates positively with spe-
cific verbal and non-verbal abilities (Anokhin and Vogel 1996; Jausovec and Jausovec 2000; Shaw
2004) and memory performance (Klimesch et al. 1990) and are a reliable individual characteristic.
In perceptual organization, the peak alpha frequency has implications for whether a perceiver is
likely to integrate the surrounding context (i.e. field dependence) or as isolated from its surround-
ing context (field independence)—see van Leeuwen and Smit (2012). This individual difference
has consequences for whether a pattern is perceived as a consistent whole, or as a loose collection
of object features. According to some authors (Peterson and Hochberg 1983; Peterson and Gibson
1991), objects are predominantly perceived in a ‘piecemeal fashion’. That is, they are seen as a loose
collection of features. This, however, may be a consequence of presenting objects in isolation.
When object are seen in a surrounding context, the objects themselves tend overall to be seen as
integral wholes. However, this happens to different degrees, depending on perceiver’s peak alpha.
Alpha activity, thus, is an important modulator of whether perception is predominantly local or
global.
This observation is in accordance with the understanding that alpha activity is involved in sup-
pressing neurons responsible for processing stimuli outside of the focus of attention (Lopes da
Silva 1991). Alpha oscillations, represent a certain rhythm ‘pulsed inhibition’ (Mathewson et al.
2011) on attentional processes. In the previous chapter, we have seen that attention spreads over
time (e.g. Roelfsema 2006). When the spreading is periodically inhibited, then if this happens
relatively fast, perceptual integration will remain within a restricted region.
Presentation of a stimulus affects the ongoing alpha EEG/MEG. This effect takes the form of an
event-related amplitude decrease (called event-related desynchronization or ERD, based on the
assumption that amplitude is the result of large numbers of neurons firing in unison) and subse-
quent re synchronization (ERS). A visual input results in the desynchronization of occipital alpha
rhythms (Pfurtscheller and Lopes da Silva 1999). The alpha ERD can be understood as a sign that
the area is engaged in processing.
periods where wave activity dominates the brain, there are episodes where the activity appears
more disorganized.
The alternation of irregular and regular episodes is a fundamental property of brain activity
(Gong et al. 2007; Kitzbichler et al. 2009). These episodes emerge, hold, and dissipate across a
range of temporal scales (Freeman and Baird 1987; Friston 2000; Gong et al. 2003; Leopold and
Logothetis 2003). Ito et al. (2007) characterized the short- and long-term behavior of these pat-
terns. To some patterns visited earlier, the system had a tendency to dwell in, or return within
hundreds of milliseconds; on a time scale of several to ten seconds. The transitions were irregular
in the short-term but showed systematic preferences in the long-term dynamics. This kind of
wandering behavior is called chaotic itinerancy (Kaneko and Tsuda 2001). Chaotic itinerancy is a
mechanism that enables a system to visit a broad variety of synchronized states, and to dwell near
them without becoming trapped in any of them. Chaotic itinerancy offers a theoretical basis for
the transient character of brain dynamics and suggests flexibility which is essential for effective
brain functioning. Thus, the dynamical properties of spontaneous activity provide the brain with
flexibility: an openness to respond to a great variety of stimuli.
This kind of dynamics may play a role in perceptual organization. First: consider perceptual
organization to be a process that needs to be achieved rapidly. Too much stability of any preced-
ing state will hamper that. Second: dynamic flexibility is needed, in order not to settle on a given
interpretation. We can observe spontaneous changes of interpretation in ambiguous figures, such
as the Necker cube. The same mechanism may be at work, when it comes to detecting a hidden
perceptual structure. This will never work if the system settles on a given interpretation of an
object and stays there, until perturbed by new incoming stimulation. Some spontaneous wander-
ing should characterize perceptual organization.
on distal dendrites of pyramidal neurons (Markram et al. 2004). Beta oscillations may therefore
facilitate information transfer between areas (Livanov 1977). Wrobel (2000) showed in cat that
during attentive visual behavior, 300 ms to one second long bursts of beta frequency activity oper-
ate within the cortico-geniculate feedback cycle to enhance visual information transmission from
the LGN. Beta bursting spread to other visual centers, including the lateral posterior and pulvinar
complex and higher cortical areas. These bursts coincide in time with gamma oscillations.
Accordingly, Vierling-Claasen’s et al.’s (2010) model produced a lot of gamma along with the
beta activity. Across various cognitive tasks, beta and gamma power show similar scalp distribu-
tions (Fitzgibbon et al. 2004). According to Siegel et al. (2012), whereas gamma activity reflects
the emergence of a percept, it is likely that beta oscillations reflect maintenance of perceptual
information. Combined with the previous observations about the role of beta in transmission of
information, this implies that maintenance of visual stimuli occurs through interactions between
areas (Simione et al. 2012).
Gross et al. (2004), using MEG, demonstrated a role for beta oscillations in maintenance of
information attentional blink conditions. The attentional blink involves the presentation of sev-
eral visual stimuli in rapid succession (at a rate of approx 100 ms); two targets are embedded in
the presentation sequence. Whereas the first one is usually detected easily, the second one is often
missed, in particular if the temporal separation (lag) equals 300 ms. Gross et al. (2004) showed
that detection in these conditions was accompanied by enhanced beta coherence between sources
in temporal cortex DLPFC and PPC. In the same task, Nakatani et al. (2005) demonstrated the
role of gamma synchrony prior to the onset of the target, which was increased when the target
was successfully detected, as compared to when the target was missed. Taken together the results
of Gross and Nakatani support Siegel et al. (2012) about the complementary roles of beta and
gamma frequencies.
Synchrony in the gamma band, therefore, may be related to the emergence of the percept rather
than to its maintenance. Nakatani and van Leeuwen. (2006) studied the relationship between
long-distance transient phase synchronization in EEG and perceptual switching in the Necker
cube. Transient periods of response related synchrony between parietal and frontal areas were
observed. They start 800–600 ms prior to the switch response and are sometimes accompanied by
transient alpha band activity in the occipital area. The results indicate that perceptual switching
processes involve parietal and frontal areas; these are the ones that are normally associated with
visual attention and decision-making.
information is propagated to other brain areas. The differences in time it takes for such informa-
tion to reach their multiple destinations is accommodated by keeping the window open for a
while, e.g. up to 200 ms (van Wassenhove et al. 2007).
The regular episodes thus provide a mechanism for global broadcasting of results in informa-
tion processing that are needed for conscious access to visual information (Baars 1988, 2002). In
the previous chapter, we have seen how traditionally, conscious access is centered upon conver-
gence zones; areas where the information from many regions comes together. Rather than conver-
gence, we see these areas as hubs, or relay stations, in the communication between brain regions,
based on principles of synchrony. As a result, conscious access functions belong to organized
brain activity, rather than specific local regions. The activity is not tied to any region in particular,
as it travels along the cortex; it may, however, visit the hubs regions more consistently then others
(see Alexander et al. 2013).
During these intervals, the informational content remains unchanged. As a result, the con-
tent of perceptual experience is fixed in an extended psychological present (cf. Stroud 1955).
The duration of coherence intervals was estimated at 50–300 ms (Bressler et al. 1993; Dennett
and Kinsbourne 1991; Varela 1995). In the rest condition the durations of the patterns have a
power-law distribution (Gong et al. 2003; Kitzbichler et al. 2009) which indicates that the system
is in a state of dynamical criticality (Kitzbichler et al. 2009). When the system is perturbed by
a stimulus, the scale-free distribution is suppressed and changes into a characteristic distribu-
tion (Nikolaev et al. 2010; Nikolaev et al. 2005). The new distribution often turns out to be an
extreme-value distribution (Nikolaev et al. 2010). Indeed, as the interval reflects the propagation
of information, this will take place in parallel across multiple channels. The extreme-value distri-
bution of these intervals then means that the length of the interval is determined by the slowest
channel (cf. Pöppel 1970). Since the slowest channel determines the durations of episodes of syn-
chronous activity, their averages may reflect information-processing demands of the task at hand.
We tested this prediction by studying the patterns of quasi-stable synchrony over small regions
on the human scalp with an electrode spacing of two cm (Nikolaev et al. 2005). We selected
electrode chains over the scalp region with maximal ERP activity following presentation of the
stimuli. To obtain the intervals of quasi-stable synchrony we measured the variability of phase
synchronization indices within electrode chains. Then the duration of the intervals in which the
variability fell below the threshold was computed. The comparison of durations showed that in the
beta EEG frequency range the intervals were longer when observers were engaged in a perceptual
task than when they were stimulated without task. This result was interpreted as evidence that
more information was transferred across brain areas in ‘task’ than ‘no-task’ conditions.
durations of synchronized intervals in relation to the aspect ratio of the dot lattice. We found a
simple, linear relation of aspect ratio with coherence interval duration. This means that the more
information contained in the stimulus, the longer the coherence intervals in the evoked activity.
In individuals, the duration of the coherence intervals was found to be strongly correlated to
grouping sensitivity. Thus, coherence intervals directly reflect the amount of stimulus information
processed rather than available in the physical stimulus.
We concluded that the intervals of synchronized activity may reflect the time needed for prom-
ulgation of the stimulus information from the visual system to the rest of the brain. The coherence
intervals, thus, represent global broadcasting of visual information. Global broadcasting has been
associated with visual conscious awareness and the emergence of visual experience (Dehaene
et al. 2006).
Global broadcasting takes central stage in global workspace theories and models of visual
information processing. These models are increasingly successful in dealing with a wide range of
phenomena in visual experience, such as the limited capacity of visual working memory, visual
persistence, and the attentional blink (e.g. Simione et al. 2012). Large-scale dynamics provides a
mechanism for coordinating the information processing which endows these models with greater
neural plausibility.
Delta band activity is the frequency of the P3 ERP component, which has been taken to signal the
emergence of global workspace activity (Sergent et al. 2005). Delta activity is observed a solitary,
high amplitude brain wave with an oscillation period between zero and four hertz. Delta phase
has been related to top-down modulation of sensory signal strength (Lakatos et al. 2005, 2009).
the phase of access control-related slow oscillatory activity and the amplitude of fast oscillations
encoding perceptual contents for conscious access in a cognitive task. This coupling increased in
strength during practice of the task, corresponding with increase of correct target recognition
under AB conditions.
Acknowledgments
The author is supported by an Odysseus research grant from the Flemish Organization for Science
(FWO) and wishes to thank Lee de-Wit, Michael Herzog, and Naoki Kogo for useful comments.
References
Achard, S., Salvador, R., Whitcher, B., Suckling, J., and Bullmore, E. (2006). A resilient, low-frequency,
small-world human brain functional network with highly connected association cortical hubs. Journal
of Neuroscience 26: 63–72.
Alexander, D.A., Jurica, P., Trengove, C., Nikolalev, A.R., Gepshtein, S., Zviagyntsev, M., Mathiak, K.,
Schulze-Bonhage, A., Rüscher, J., Ball, T., and van Leeuwen, C. (2013). Traveling waves and trial
averaging; the nature of single-trial and averaged brain responses in large-scale cortical signals. NeuroImage
73: 95–112.
Anokhin, A.P. and Vogel, F. (1996). EEG alpha rhythm frequency and intelligence in normal adults.
Intelligence 23: 1–14.
Arieli, A., Sterkin, A., Grinvald, A., and Aertsen, A. (1996). Dynamics of ongoing activity: explanation of
the large variability in evoked cortical responses. Science 273(5283): 1868–71.
Baars, B. J. (1988). A Cognitive Theory of Consciousness. Cambridge: Cambridge University Press.
Cortical Dynamics and Oscillations 1001
Baars, B. J. (2002). The conscious access hypothesis: origins and recent evidence. Trends in Cognitive
Sciences 6: 47–52.
Baars, B. J. and Franklin, S. (2003). How conscious experience and working memory interact. Trends in
Cognitive Sciences 7: 166–72.
Bassett, D. and Bullmore, E. (2006). Small-world brain networks. Neuroscientist 12: 512–23.
Bellgrove, M.A., Vance, A., and Bradshaw, J.L. (2003). Local-global processing in early-onset
schizophrenia: evidence for an impairment in shifting the spatial scale of attention. Brain and Cognition
51: 48–65.
Block, N. (2001). Paradox and cross purposes in recent work on consciousness. Cognition 79(1–2): 197–219.
Bressler, S. L. and Menon, V. (2010). Large-scale brain networks in cognition: emerging methods and
principles. Trends in Cognitive Sciences 14: 277–90.
Bressler, S. L., Coppola, R., and Nakamura, R. (1993). Episodic multiregional cortical coherence at
multiple frequencies during visual task performance. Nature 366(6451): 153–6.
Bruno, N., Bertamini, M., and Domini, F. (1997). Amodal completion of partly occluded surfaces: is there
a mosaic stage? Journal of Experimental Psychology: Human Perception & Performance 23: 1412–26.
Buffart, H., Leeuwenberg, E., and Restle, F. (1983). Analysis of ambiguity in visual pattern completion.
Journal of Experimental Psychology: Human Perception & Performance 9: 980–1000.
Bullmore, E. and Sporns, O. (2009). Complex brain networks: graph theoretical analysis of structural and
functional systems. Nature Reviews Neuroscience 10: 186–98.
Buzsaki, G. and Draguhn, A. (2004). Neuronal oscillations in cortical networks. Science 304 (5679):
1926–9.
Canolty, R. T., Edwards, E., Dalal, S. S., Soltani, M., Nagarajan, S. S., Kirsch, H. E., Berger, M. S., Barbaro,
N. M., and Knight, R. T. (2006) High gamma power is phase-locked to theta oscillations in human
neocortex. Science 313: 1626–8.
Coleman, M.J., Cestnick, L., Krastoshevsky, O., Krause, V., Huang, Z., Mendell, N.R., and Deborah
L., and Levy, D.L. (2009). Schizophrenia Patients Show Deficits in Shifts of Attention to Different
Levels of Global-Local Stimuli: Evidence for Magnocellular Dysfunction. Schizophrenia Bulletin
35: 1108–116.
Cordes, D., Haughton, V. M., Arfanakis, K., Wendt, G. J., Turski, P. A., Moritz, C. H., . . . Meyerand, M. E.
(2000). Mapping functionally related regions of brain with functional connectivity MR imaging. American
Journal of Neuroradiology 21(9): 1636–44.
Dehaene, S., Kerszberg, M., and Changeux, J. P. (1998). A neuronal model of a global workspace in
effortful cognitive tasks. Proceedings of the National Academy of Science, USA 95(24): 14529–34.
Dehaene, S., Changeux, J. P., Naccache, L., Sackur, J., and Sergent, C. (2006). Conscious, preconscious, and
subliminal processing: a testable taxonomy. Trends in Cognitive Sciences 10(5): 204–11.
Dennett, D. and Kinsbourne, M. (1991). Time and the observer: the where and when of time in the brain.
Behavioral and Brain Sciences 15: 183–247.
Dinnerstein, D. and Wertheimer, M. (1957). Some determinants of phenomenal overlapping. American
Journal of Psychology 70: 21–37.
Eckhorn, R., Bauer, R., Jordan, W., Brosch, M., Kruse, W., Munk, M., and Reitboeck, H. J. (1988).
Coherent oscillations: A mechanism of feature linking in the visual cortex. Biological Cybernetics
60: 121–30.
Eguiluz, V. M., Chialvo, D. R., Cecchi, G. A., Baliki, M., and Apkarian, A. V. (2005). Scale-free brain
functional networks. Physical Review Letters 94: 18102.
Engel, A.K. and Singer, W. (2001). Temporal binding and the neural correlates of sensory awareness. Trends
in Cognitive Sciences 5: 16–25.
Fell, J., Fernandez, G., Klaver, P., Elger, C. E., and Fries, P. (2003). Is synchronized neuronal gamma
activity relevant for selective attention? Brain Research: Brain Research Review 42: 265–72.
1002 van Leeuwen
Fitzgibbon, S. P., Pope, K. J., Mackenzie, L., Clark, C. R., and Willoughby, J. O. (2004). Cognitive tasks
augment gamma EEG power. Clinical Neurophysiology 115: 1802–9.
Fox, M. D. and Raichle, M. E. (2007). Spontaneous fluctuations in brain activity observed with functional
magnetic resonance imaging. Nature reviews Neuroscience 8(9): 700–11.
Foxe, J.J. and Snyder, A.C. (2011). The role of alpha-band brain oscillations as a sensory suppression
mechanism during selective attention. Frontiers in Psychology art. 154.
Freeman, W. J. (2005). Origin, structure, and role of background EEG activity. Part 3. Neural frame
classification. Clinical Neurophysiology 116(5): 1118–29.
Freeman, W. J. and Baird, B. (1987). Relation of olfactory EEG to behavior: spatial analysis. Behavioral
Neuroscience 101(3): 393–408.
Freeman, W. J., Burke, B. C., and Holmes, M. D. (2003). Aperiodic phase re-setting in scalp EEG
of beta-gamma oscillations by state transitions at alpha-theta rates. Human Brain Mapping
19(4): 248–72.
Fries, P. (2005). A mechanism for cognitive dynamics: neuronal communication through neuronal
coherence. Trends in Cognitive Sciences 9: 474–80.
Friston, K. J. (2000). The labile brain. I. Neuronal transients and nonlinear coupling. Philosophical
Transactions of the Royal Society of London. Series B: Biological Sciences 355(1394): 215–36.
Gaillard, R., Dehaene, S., Adam, C., Clemenceau, S., Hasboun, D., Baulac, M., et al. (2009). Converging
intracranial markers of conscious access. PLoS Biology 7: e61.
Gong, P. and van Leeuwen, C. (2003). Emergence of scale-free network with chaotic units. Physica A,
Statistical mechanics and its applications 321: 679–88.
Gong, P. and van Leeuwen, C. (2004). Evolution to a small-world network with chaotic units. Europhysics
Letters 67: 328–33.
Gong, P., Nikolaev, A. R., and van Leeuwen, C. (2003). Scale-invariant fluctuations of the dynamical
synchronization in human brain electrical activity. Neuroscience Letters 336: 33–6.
Gong, P., Nikolaev, A.R., and van Leeuwen, C. (2007). Dynamics of collective phase synchronization in
human electrocortical activity. Physical Review E, 76, art. 011904.
Gray, C. M., König, P., Engel, A. K., and Singer, W. (1989). Oscillatory responses in cat visual cortex exhibit
intercolumnar synchronization which reflects global stimulus properties. Nature 338: 334–7.
Gross, J., Schmitz, F., Schnitzler, I., Kessler, K., Shapiro, K., Hommel, B., and Schnitzler, A. (2004).
Modulation of long-range neuronal synchrony reflects temporal limitations of visual attention in
humans. Proccedings of the National Academy of Science, USA 101: 13050–5.
He Y., Chen Z.J., and Evans A.C. (2007): Small-world anatomical networks in the human brain revealed by
cortical thickness from MRI. Cerebral Cortex 17: 2407–19.
Hernández, A., Nácher V, Luna, R., Zainos, A., Lemus, L. et al. (2010). Decoding a perceptual decision
process across cortex. Neuron 66: 300–14.
Hesselmann G., Kell C.A., Eger E., and Kleinschmidt A. (2008) Spontaneous local variations in ongoing
neural activity bias perceptual decisions. Proceedings of the National Academy of Sciences, USA
105: 10984–9.
Hulleman, J. and Humphreys, G.W. (2004). A new cue to figure-ground coding: top-bottom polarity.
Vision Research 44: 2779–91.
Ito, J., Nikolaev, A. R., and Leeuwen, C. (2005). Spatial and temporal structure of phase synchronization of
spontaneous alpha EEG activity. Biological Cybernetics 92(1): 54–60.
Ito, J., Nikolaev, A. R., and van Leeuwen, C. (2007). Dynamics of spontaneous transitions between global
brain states. Human Brain Mapping 28(9): 904–13.
Iturria-Medina Y, Canales-Rodríguez EJ, Melie-García L, Valdés-Hernández PA, Martínez-Montes E,
Alemán-Gómez Y, Sánchez-Bornot JM (2007): Characterizing brain anatomical connections using
diffusion weighted MRI and graph theory. NeuroImage 36: 645–60.
Cortical Dynamics and Oscillations 1003
Jaušovec, N. and Jaušovec, K. (2000). Correlations between ERP parameters and intelligence: A reconsideration.
Biological Psychology 55: 137–54.
Kaneko, K. and Tsuda, I. (2001). Complex systems: chaos and beyond—A constructive approach with
applications in life sciences. Berlin: Springer Verlag.
Karni, A., Tanne, D., Rubenstein, B.S., Askenasy, J.J.M., and Sagi, D. (1994). Dependence on REM sleep of
overnight improvement of a perceptual skill. Science 265: 679–82.
Kitzbichler, M. G., Smith, M. L., Christensen, S. R., and Bullmore, E. (2009). Broadband criticality of
human brain network synchronization. PLoS computational biology 5(3): e1000314.
Klimesch, W., Schimke, H., Ladurner, G., and Pfurtscheller, G. (1990). Alpha frequency and memory
performance. Journal of Psychophysiology 4: 381–90.
Koenis, M. M.G., Romeijn, N., Piantoni, G., Verweij, I., Van der Werf, Y. D., Van Someren, E. J.W., and
Stam, C. J. (2011). Does sleep restore the topology of functional brain networks? Human Brain Mapping
doi: 10.1002/hbm.21455.
Konen, Ch. and Kastner, S. (2008). Tho hierarchically organized neural systems for object information in
human visual cortex. Nature Neuroscience 11: 224–31.
Kranczioch, C., Debener, S., Maye, A., and Engel, A. K. (2007). Temporal dynamics of access to
consciousness in the attentional blink. Neuroimage 37: 947–55.
Kubovy, M., Holcombe, A. O., and Wagemans, J. (1998). On the lawfulness of grouping by proximity.
Cognitive Psychology 35(1): 71–98.
Kwok, H.F., Jurica, P. Raffone, A., and van Leeuwen, C. (2007). Robust emergence of small-world structure
in networks of spiking neurons. Cognitive Neurodynamics 1: 39–51.
Lakatos, P., Shah, A. S., Knuth, K. H., Ulbert, I., Karmos, G., and Schroeder, C. E. (2005). An oscillatory
hierarchy controlling neuronal excitability and stimulus processing in the auditory cortex. Journal of
Neurophysiology 94: 1904–11.
Lakatos, P., O’Connell, M. N., Barczak, A., Mills, A., Javitt, D. C., and Schroeder, C. E. (2009). The leading
sense: supramodal control of neurophysiological context by attention. Neuron 64: 419–30.
Latora, V. and Marchiori, M. (2001). Efficient behavior of small-world networks. Physical Review Letters
87: 198701.
Lee, K. H., Williams, L. M., Breakspear, M., and Gordon, E. (2003). Synchronous gamma activity:
A review and contribution to an integrative neuroscience model of schizophrenia. Brain Research: Brain
Research Review 41: 57–78.
Lehmann, D., Ozaki, H., and Pal, I. (1987). EEG alpha map series: brain micro-states by space-oriented
adaptive segmentation. Electroencephalography and Clinical Neurophysiology 67(3): 271–88.
Leopold, D. A. and Logothetis, N. K. (2003). Spatial patterns of spontaneous local field activity in the
monkey visual cortex. Reviews in the Neurosciences 14(1–2): 195–205.
Linkenkaer-Hansen, K., Nikouline, V.V., Palva, J.M. and Ilmoniemi, R.J. (2001). Long-range temporal
correlations and scaling behavior in human brain oscillations. Journal of Neuroscience 21: 1370–7.
Lisman, J. E. and Idiart, M. A. (1995). Storage of 7 +/- 2 short-term memories in oscillatory subcycles.
Science 267: 1512–15.
Liu, L. and Ioannides, A.A.(1996). A correlation study of averaged and single trial MEG signals: the average
describes multiple histories each in a different set of single trials. Brain Topography 8: 385–96.
Liu, L., Plomp, G., van Leeuwen, C., and Ioannides, A.A. (2006). Neural correlates of priming on occluded
figure interpretation in human fusiform cortex. Neuroscience 141: 1585–97.
Livanov, M. N. (1977). Spatial Organization of Cerebral Processes. New York: John Wiley and Sons.
Lowe, M. J., Mock, B. J., and Sorenson, J. A. (1998). Functional connectivity in single and multislice
echoplanar imaging using resting-state fluctuations. Neuroimage 7(2): 119–32.
Lopes da Silva, F.H. (1991). Neural mechanisms underlying brain waves: from neural membranes to
networks. Electroencephalography and Clinical Neurophysiology 79: 81–93.
1004 van Leeuwen
Lopes da Silva, F.H., van Rotterdam, A., Storm van Leeuwen, W., and Tielen, A.M. (1970). Dynamic
characteristics of visual evoked potentials in the dog. II. Beta frequency selectivity in evoked potentials
and background activity. Electroencephalography and Clinical Neurophysiology 29: 260–8.
Luck, S. J. and Vogel, E. K. (1997). The capacity of visual working memory for features and conjunctions.
Nature 390: 279–81.
Lutz, A., Lachaux, J. P., Martinerie, J., and Varela, F. J. (2002). Guiding the study of brain dynamics by
using first-person data: Synchrony patterns correlate with ongoing conscious states during a simple
visual task. Proceedings of the National Academy of Sciences, USA 99: 1586–91.
Lybrand, W.A., Andrews, T. G., and Ross, S. (1954). Systemic Fatigue and Perceptual Organization The
American Journal of Psychology 67: 704–7.
Maia, T.V. and Cleeremans, A. (2005). Consciousness: Converging insights from connectionist modeling
and neuroscience. Trends in Cognitive Sciences 9: 397–404.
Makeig, S., Westerfield, M., Jung, T. P., Enghoff, S., Townsend, J., Courchesne, E., and Sejnowski, T. J.
(2002). Dynamic brain sources of visual evoked responses. Science 295(5555): 690–4.
Markram, H., Toledo-Rodriguez, M., Wang, Y., Gupta, A., Silberberg, G., and Wu, C. (2004).
Interneurons of the neocortical inhibitory system. Nature Reviews Neuroscience 5: 793–807.
Mathewson, K.E., Lleras, A., Beck, D.M., Fabiani, M., Ro, T., and Gratton, G. (2011). Pulsed out of
awareness: EEG alpha oscillations represent a pulsed-inhibition of ongoing cortical processing. Frontiers
in Psychology 2, 99: doi: 10.3389/fpsyg.2011.00099.
Milner, P. M. (1974). A model for visual shape recognition. Psychological Review 81: 521–35.
Murthy, V.N. and Fetz, E.E. (1992). Coherent 25–35 Hz oscillations in the sensorimotor cortex of awake
behaving monkeys. Proceedings of the National Academy of Science, USA, 89: 5670–4.
Nakatani, C., Ito, J., Nikolaev, A. R., Gong, P., and van Leeuwen, C. (2005). Phase synchronization analysis
of EEG during attentional blink. Journal of Cognitive Neuroscience 12: 343–54.
Nakatani, C., Raffone, A., and van Leeuwen, C. (in press). Increased efficiency of conscious access with
enhanced coupling of slow and fast neural oscillations. Journal of Cognitive Neuroscience.
Nakatani, H., Khalilov, I., Gong, P., and van Leeuwen, C. (2003). Nonlinearity in giant depolarizing
potentials. Physics Letters A, 319: 167–72.
Nakatani, H. and van Leeuwen, C. (2006). Transient synchrony of distant brain areas and perceptual
switching. Biological Cybernetics 94: 445–57.
Nikolaev, A. R., Gepshtein, S., Kubovy, M., and van Leeuwen, C. (2008). Dissociation of early evoked
cortical activity in perceptual grouping. Experimental Brain Research 186(1): 107–22.
Nikolaev, A. R., Gepshtein, S., Gong, P., and van Leeuwen, C. (2010). Duration of coherence intervals in
electrical brain activity in perceptual organization. Cerebral Cortex 20(2): 365–82.
Nikolaev, A. R., Gong, P., and van Leeuwen, C. (2005). Evoked phase synchronization between adjacent
high-density electrodes in human scalp EEG: duration and time course related to behavior. Clinical
Neurophysiology 116(10): 2403–19.
Osaka, M. (1984). Peak alpha frequency of EEG during a mental task: task difficulty and hemispheric
differences. Psychophysiology 21(1): 101–5.
Pecora, L.M. and Carroll, T. L. (1990). Synchronization in chaotic systems. Physical Review Letters 64: 821–4.
Peterson, M.A. and Gibson, B. S. (1991). Directing spatial attention within an object: Altering the
functional equivalence of shape description. Journal of Experimental Psychology: Human Perception and
Performance, 17: 170–82.
Peterson, M. A. and Hochberg, J. (1983), Opposed-Set Measurement Procedure: A Quantitative Analysis
of the Role of Local Cues and Intention in Form Perception. Journal of Experimental Psychology: Human
Perception and Performance 9: 183–93.
Peterson, M. A. and Skow-Grant, E. (2003). Memory and learning in figure-ground perception. In: B. Ross
and D. Irwin (eds.), Cognitive Vision: Psychology of Learning and Motivation, Vol. 42, pp. 1–34. San
Diego: Academic Press.
Cortical Dynamics and Oscillations 1005
Steriade, M. (2001). Impact of network activities on neuronal properties in corticothalamic systems. Journal
of Neurophysiology 86(1): 1–39.
Stroud, J. M. (1955). The fine structure of psychological time. In: H. Quasten (ed.), Information Theory in
Psychology, pp. 174–207. Glencoe, Illinois: Free Press.
Supp, G. G., Schlogl, A., Fiebach, C. J., Gunter, T. C., Vigliocco, G., Pfurtscheller, G., and Petsche, H.
(2005). Semantic memory retrieval: cortical couplings in object recognition in the N400 window.
European Journal of Neuroscience 21: 1139–43.
Tallon-Baudry, C. and Bertrand, O. (1999). Oscillatory gamma activity in humans and its role in object
representation. Trends in Cognitive Sciences 3: 151–62.
Tallon-Baudry, C., Bertrand, O., Delpuech, C., and Permier, J. (1997). Oscillatory g-band (30–70 Hz)
activity induced by a visual search task in humans. Journal of Neuroscience 17: 722–34.
Tallon-Baudry, C., Bertrand, O., Peronnet, F., and Pernier, J. (1998). Induced gamma-band activity during
the delay of a visual short-term memory task in humans. Journal of Neuroscience 18: 4244–54.
Tallon-Baudry, C., Bertrand, O., and Fischer, C. (2001). Oscillatory synchrony between human extrastriate
areas during visual short-term memory maintenance. Journal of Neuroscience 21: RC177.
Tononi, G. and Edelman, G. M. (1998). Consciousness and complexity. Science 282: 1846–51.
Thut, G., Miniussi, C., and Gross, J. (2012). The functional importance of rhytmic activity in the brain.
Current Biology 22: R658–R663.
van den Berg, D. and van Leeuwen, C. (2004). Adaptive rewiring in chaotic networks renders small-world
connectivity with consistent clusters. Europhysics Letters 65: 459–64.
van den Berg, D., Gong, P., Breakspear, M., and van Leeuwen, C. (2012). Fragmentation: Loss of
global coherence or breakdown of modularity in functional brain architecture? Frontiers in Systems
Neuroscience 6: 20.
van Leeuwen, C. (2007). What needs to emerge to make you conscious? Journal of Consciousness Studies
14(1–2): 115–36.
van Leeuwen, C. and Bakker, L. (1995). Stroop can occur without Garner interference: strategic and
mandatory influences in multidimensional stimuli. Perception and Psychophysics 57(3): 379–92.
van Leeuwen, C. and Smit, D.J.A. (2012). Restless brains, wandering minds. In: S. Edelman, T. Fekete, and
N. Zach (eds.), Being in Time: Dynamical Models of Phenomenal Awareness. Advances in Consciousness
Research, pp. 121–47. Amsterdam: John Benjamins PC.
van Leeuwen, C. and van den Hof, M. (1991). What has happened to Prägnanz? Coding, stability, or
resonance. Perception & Psychophysics 50(5): 435–48.
van Lier, R. J., van der Helm, P. A., and Leeuwenberg, E. L. J. (1995). Competing global and local
completions in visual occlusion. Journal of Experimental Psychology: Human Perception and Performance
21: 571–83.
van Wassenhove, V., Grant, K. W., and Poeppel, D. (2007). Temporal window of integration in
auditory-visual speech perception. Neuropsychologia 45: 598–607. doi: S0028-3932(06)00011-X [pii]
10.1016/j.neuropsychologia.2006.01.001.
Varela, F. J. (1995). Resonant cell assemblies: a new approach to cognitive functions and neuronal
synchrony. Biological Research 28(1): 81–95.
Varela, F., Lachaux, J.P., Rodriguez, E., and Martinerie, J. (2001). The brainweb: phase synchronization and
large-scale integration. Nature Reviews Neuroscience 2: 229–39.
Vecera, S. P., Vogel, E. K., and Woodman, G. F. (2002). Lower region: A new cue for figure-ground
assignment. Journal of Experimental Psychology: General 131: 194–205.
Vierling-Claassen, D., Cardin, J.A., Moore, C.I., and Jones S. R. (2010). Computational modeling of
distinct neocortical oscillations driven by cell-type selective optogenetic drive: separable resonant
circuits controlled by low-threshold spiking and fast-spiking interneurons. Frontiers in Human
Neuroscience 4:198. doi: 10.3389/fnhum.2010.00198.
Cortical Dynamics and Oscillations 1007
von der Malsburg, C. (1985). Nervous structures with dynamical links. Berichte der Bunsengesellschaft für
physikalische Chemie 89: 703–10.
von Stein, A., Rappelsberger, P., Sarnthein, J., and Petsche, H. (1999). Synchronization between temporal
and parietal cortex during multimodal object processing in man. Cerebral Cortex 9: 137–50.
Watts, D. and Strogatz, S. (1998). Collective dynamics of ‘small-world’ networks. Nature 393: 440–2.
Wrobel, A. (2000). Beta activity: A carrier for visual attention. Acta Neurobiologica Experimentalis
60: 247–60.
Yamada, T. and Fujisaka, H. (1983). Stability theory of synchronized motion in coupled oscillator systems.
Progress in Theoretical Physics 70: 1240–8.
Zylberberg, A., Dehaene, S., Roelfsema, P. R., and Sigman, M. (2011). The human Turing machine: a
neural framework for mental programs. Trends in Cognitive Sciences 15: 293–300.
Chapter 49
Inference in Perception
One of the central ideas in the study of perception is that the proximal stimulus—the pattern of
energy that impinges on sensory receptors, such as the visual image—is not sufficient to specify the
actual state of the world outside (the distal stimulus). That is, while the image of your grandmother
on your retina might look like your grandmother, it also looks like an infinity of other arrange-
ments of matter, each having a different combination of three-dimensional structures, surface
properties, colour properties, etc., so that they happen to look just like your grandmother from a
particular viewpoint. Naturally, the brain generally does not perceive these far-fetched alternatives,
but rapidly converges on a single solution, which is what we consciously perceive. A shape on the
retina might be a large object that is far away, or a smaller one that is closer, or anything in between.
A mid-grey region on the retina might be a bright white object in dim light, a dark object in bright
light, or anything in between. An elliptical shape on the retina might be an elliptical object face-on,
a circular object slanted back in depth, or anything in between. Every proximal stimulus is consist-
ent with an infinite family of possible scenes, only one of which is perceived.
The central problem for the perceptual system is to quickly and reliably decide among all these
alternatives, and the central problem for visual science is to figure out what rules, principles, or
mechanisms the brain uses to do so. This process was called unconscious inference by Helmholtz,
perhaps the first scientist to appreciate the problem, and is sometimes called inverse optics to
convey the idea that the brain must in a sense invert the process of optical projection—to take the
image and recover the world that gave rise to it.
The modern history of visual science contains a wealth of proposals for how exactly this process
works, far too numerous to review here. Some are very broad, like the Gestalt idea of Prägnanz
(infer the simplest or most reasonable scene consistent with the image). Many others are nar-
rowly addressed to specific aspects of the problem, like the inference of shape or surface colour.
But historically, the vast majority of these proposals suffer from one or both of the following two
problems. First, many (like Prägnanz and many other older suggestions) are too vague to be real-
ized as computational mechanisms. They rest on central ideas, like the Gestalt term ‘goodness of
form’, that are subjectively defined and cannot be implemented algorithmically without a host
of additional assumptions. Second, many proposed rules are arbitrary or unmotivated, meaning
that is unclear exactly why the brain would choose them rather than an infinity of other equally
effective ones. Of course, it cannot be taken for granted that mental processes are principled in
this sense, and some have argued for a view of the brain as a ‘bag of tricks’ (Ramachandran 1985).
Nevertheless, to many theorists, a mental function as central and evolutionarily ancient as percep-
tual inference seems to demand a more coherent and principled explanation.
Bayesian Models of Perceptual Organization 1009
p( A and B) (1)
p( A | B) =
p(B)
Similarly, the probability of B given A is the ratio of the probability that B and A are both true
divided by the probability that A is true, hence
It was the Reverend Thomas Bayes (1763) who first noticed that these mathematically simple
observations can be combined to yield a formula1 for the conditional probability p(A|B) (A given
B) in terms of the inverse conditional probability p(B|A) (B given A)
p(B | A) p( A) (3)
p( A | B) =
p(B)
This formula is now called Bayes’ theorem or Bayes’ rule.2 Before Bayes, the mathematics of prob-
ability had been used exclusively to calculate the chances of a particular random outcome of a
stochastic process, like the chance of getting ten consecutive heads in ten flips of a fair coin [p(ten
heads|fair coin)]. Bayes realized that his rule allowed us to invert this inference and calculate the
probability of the conditions that gave rise to the observed outcome—here, the probability, having
observed ten consecutive heads, that the coin was fair in the first place [p(fair coin|10 heads)].
Of course, to determine this, you need to assume that there is some other hypothesis we might
entertain about the state of the coin, such as that it is biased towards heads. Bayes’ logic, often
called inverse probability, allows us to evaluate the plausibility of various hypotheses about the
state of the world (the nature of the coin) on the basis of what we have observed (the sequence of
flips). For example, it allows us to quantify the degree to which observing ten heads in a row might
persuade us that the coin is biased towards heads.
More specifically, note that p(B and A) = p(A and B) (conjunction is commutative). Substitute the latter for
1
the former in Eq. (1) to see that p(A|B)p(B), and likewise p(B)p(A|B), are both equal to p(A and B) and thus to
each other. Divide both sides of p(A|B)p(B) = p(B|A)p(A) by p(B) to yield Bayes’ rule.
The rule does not actually appear in this form in Bayes’ essay. But Bayes’ focus was indeed on the underlying
2
problem of inverse inference and deserves credit for the main insight (see Stigler 1983).
1010 Feldman
Bayes and his followers, especially the visionary French mathematician Laplace, saw how
inverse probability could form the basis of a full-fledged theory of inductive inference (see Stigler
1986). As David Hume had pointed out only a few decades previously, much of what we believe in
real life—including all generalizations from experience—cannot be proved with logical certainty,
but instead merely seems intuitively plausible on the basis of our knowledge and observations. To
philosophers seeking a deductive basis for our beliefs, this argument was devastating. But Laplace
realized that Bayes’ rule allowed us to quantify belief—to precisely gauge the plausibility of induc-
tive hypotheses.
By Bayes’ rule, given any data D which has a variety of possible hypothetical causes H1, H2, etc.,
each cause Hi is plausible in proportion to the product of two numbers: the probability of the data
if the hypothesis is true p(D|Hi), called the likelihood, and the prior probability of the hypothesis,
p(Hi), that is, how probable the hypothesis was in the first place. If the various hypotheses are all
mutually exclusive, then the probability of the data D is the sum of its probability under all the
various hypotheses:
Plugging this into Bayes’ rule (with Hi playing the role of A, and D playing the role of B), this means
that the probability of hypothesis Hi given data D, called the posterior probability, P(Hi|D), is
or in words
The posterior probability p(Hi|D) quantifies how much we should believe Hi after considering the
data. It is simply the ratio of the probability of the evidence under Hi (the product of its prior and
likelihood) relative to the total probability of the evidence arising under all hypotheses (the sum
of the prior–likelihood products for all the hypotheses). This ratio measures how plausible Hi is
relative to all the other hypotheses under consideration.
But Laplace’s ambitious account was followed by a century of intense controversy about the use
of inverse probability (see Howie 2004). In modern retellings, the critics’ objections to Bayesian
inference are often reduced to the idea that to use Bayes’ rule we need to know the prior probabil-
ity of each of the hypotheses (for example, the probability that the coin was fair in the first place),
and that we often don’t have this information. But their criticism was far more fundamental, and
relates to the meaning of probability itself. They argued that many propositions—those whose
truth value is fixed but unknown—can’t be assigned probabilities at all, in which case the use of
inverse probability to assign them probabilities would be nonsensical. This criticism reflects a
conception of probability, often called frequentism, in which probability refers exclusively to rela-
tive frequency in a repeatable chance situation. Thus, in their view, you can calculate the probabil-
ity of a string of heads for a fair coin because this is a random event that occurs on some fraction
Bayesian Models of Perceptual Organization 1011
of trials; but you can’t calculate a probability of a non-repeatable state of nature, like this coin is
fair, or the Higgs boson exists because such hypotheses are either definitely true or definitely false,
and are not ‘random’. The frequentist objection was not just that we don’t know the prior for many
hypotheses, but that most hypotheses don’t have priors—or posteriors, or any probabilities at all.
But, in contrast, Bayesians generally thought of probability as quantifying the degree of belief,
and were perfectly content to apply it to any proposition at all, including non-repeatable ones. To
Bayesians, the probability of any proposition is simply a characterization of our state of knowl-
edge about it, and can freely be applied to any proposition as a way of quantifying how strongly
we believe it. This conception of probability, sometimes called subjectivist (or epistemic or some-
times just Bayesian), is thus essential to the Bayesian programme. Without it, one cannot calculate
the posterior probability of a non-repeatable proposition because such propositions simply don’t
have probabilities—and this would rule out most uses of Bayes’ rule to perform induction. But to
subjectivists, Bayesian inverse probability can be used to determine the posterior probability, and
thus the strength of belief, for any hypothesis at all.3
This philosophical disagreement underlies the recent debate between traditional statistics centred on null
3
hypothesis significance testing (NHST) and Bayesian inference (see Lee and Wagenmakers 2005). NHST was
invented by fervent frequentists (Fisher, Neyman, and Pearson) who insisted that scientific hypotheses, being
non-repeatable, cannot have probabilities. This position rules out the application of Bayes’ rule to estimate the
posterior probability of a hypothesis, leading them to propose alternative ways of evaluating hypotheses such
as ‘rejecting the null’.
1012 Feldman
we would like to infer) plus some uncertainty introduced in the process of image formation (which
we would like to disregard). Bayesian inference allows us estimate the stable properties of the
world conditioned on the image data. The aptness of Bayesian inference as a model of perceptual
inference was first noticed in the 1980s by a number of authors, and brought to wider attention by
the collection of papers in Knill and Richards (1996). Since then the applications of Bayesian infer-
ence to perception have multiplied and evolved, while always retaining the core idea of associating
perceptual belief with the posterior probability as given by Bayes’ rule. Several excellent introduc-
tions are already available (e.g. Bülthoff and Yuille 1991; Kersten et al. 2004) each with a slightly
different emphasis or slant. The current chapter is intended as an introduction to the main ideas
of Bayesian inference as applied to human perception and perceptual organization. The emphasis
will be on central principles rather than on mathematical details or recent technical advances.
Students are often warned that the likelihood function is not a probability distribution, a remark that in my
4
experience tends to cause confusion. In traditional terminology, likelihood is a property of the model or
hypothesis, not the data, and one refers, for example, to the likelihood of H (and not the likelihood of the
data under H). This is because the term ‘likelihood’ was introduced by frequentists (specifically Fisher 1925),
who insisted that hypotheses did not have probabilities, and sought a word other than ‘probability’ to express
the degree of support given by the data to the hypothesis in question. To Bayesians, however, the distinction
is unimportant, since both data and hypotheses can have probabilities. So Bayesians have tended (especially
recently) to refer to the likelihood of the data under the hypothesis, or the likelihood of the hypothesis, in both
cases meaning the probability p(D|H). In this sense, likelihoods are indeed probabilities. However, note that
the likelihoods of the various hypotheses do not have to sum to one; for example, it is perfectly possible for
many hypotheses to have likelihood near one given a dataset that they all fit well. In this sense, the distribution
of likelihood over hypotheses (models) is certainly not a probability distribution. But the distribution of likeli-
hood over the data for a single fixed model is, in fact, a probability distribution and sums to one.
Bayesian Models of Perceptual Organization 1013
Reducing the posterior distribution to a single ‘winner’ discards useful information, and it should
be kept in mind that only the full posterior distribution expresses the totality of our posterior beliefs.
Likelihood functions
Hypothesis A:
One contour
α
-180° 0 180°
?
Collinear most likely...
All directions
? equally likely...
Hypothesis B:
Two contours
-180° 0 180°
Fig. 49.1 Two edges can be interpreted as part of the same smooth contour (hypothesis A, top) or
as two distinct contours (hypothesis B, bottom). Each hypothesis has a likelihood (right) that is a
function of the turning angle α; with p(α|A) sharply peaked at 0º but p(α|B) flat.
grouping. Applying this simple formulation more broadly to all the image edge pairs allows the
image to be divided up into a discrete collection of ‘smooth’ contours—that is, contours made up
of elements which Bayes’ rule says all belong to the same contour. The resulting parse of the image
into contours agrees closely with human judgments (Feldman 2001). Related models have been
applied to contour completion and extrapolation (Singh and Fulvio 2005).
(a) Prior
p (SHAPE SKEL)
Maximizes
posterior
p (SKEL|SHAPE)
(d) Examples
Fig. 49.2 Generative model for shape from Feldman and Singh (2006), giving: (a) prior over skeletons,
(b) likelihood function, (c) MAP skeleton, the maximum posterior skeleton for the given shape, and (d)
examples of the MAP skeleton.
Adapted from Jacob Feldman and Manish Singh, Bayesian estimation of the shape skeleton, Proceedings of
the National Academy of Sciences, USA, 103(47), pp. 18014–18019, Figures 1, 2a, and 5e, doi: 10.1073/
pnas.0608811103 Copyright (2006) National Academy of Sciences, U.S.A.
Each of these problems requires its own unique approach, but broadly speaking a Bayesian
framework for any problem in perceptual organization flows from a generative model for image
configurations (Feldman et al. 2013). Perceptual organization is based on the idea that the visual
image is generated by regular processes that tend to create visual structures with varying prob-
ability, which can be used to define likelihood functions. The challenge of Bayesian perceptual
grouping is to discover psychologically reasonable generative models of visual structure.
For example, Feldman and Singh (2006) proposed a Bayesian approach to shape representation
based on the idea that shapes are generated from axial structures (skeletons) from which the shape
contour is understood to have ‘grown’ laterally. Each skeleton consists of a hierarchically organ-
ized collection of axes, and generates a shape via a probabilistic process that defines a probability
distribution over shapes (Fig. 49.2). This allows a prior over skeletons to be defined, along with
a likelihood function that determines the probability of any given contour shape conditioned on
1016 Feldman
the skeleton. This in turn allows the visual system to determine the MAP skeleton (the skeleton
most likely to have generated the observed shape) or, more broadly, a posterior distribution over
skeletons. The estimated skeleton in turn determines the perceived decomposition into parts,
with each section of the contour identified with a distinct generating axis perceived as a distinct
‘part’. This shape model is certainly oversimplified relative to the myriad factors that influence real
shapes, but the basic framework can be augmented with a more elaborate generative model, and
tuned to the properties of natural shapes (Wilder et al. 2011). Because the framework is Bayesian,
the resulting representation of shape is, in the sense discussed above, optimal given the assump-
tions specified in the generative model.
Discussion
This section raises issues that often arise when Bayesian models of cognitive processes are
considered.
Bayesian updating
Bayesian inference is sometimes referred to as Bayesian updating because of the inherently pro-
gressive way that the arrival of new data leads the observer’s belief to evolve from the prior towards
the ultimate posterior. The initial prior represents the observer’s beliefs before any data have been
encountered. When data arrive, belief in all hypotheses is modified to reflect them: the likelihood
of each hypothesis is multiplied by its prior (Bayes’ rule) to yield a new, updated posterior belief
distribution. From there on, the state of belief continues to evolve as new data are acquired, with
the posterior at each step becoming the prior for the next step. In this way, belief is gradually
pushed by the data away from the initial prior and towards beliefs that better reflect the data.
More specifically, because of the way the mathematics works, the posterior distribution tends to
get narrower and narrower (more and more sharply peaked) as more and more data come in. That
is, belief typically evolves from a broad prior distribution (representing uncertainty about the state of
the world) towards a progressively narrower posterior distribution (representing increasingly well-
informed belief). In this sense, the influence of the prior gradually diminishes over the course of
inference—in a Bayesian cliché, the ‘likelihood swamps the prior’. Partly for this reason, though the
source of the prior can be controversial (see Where do the priors come from?), in many situations
(though not all) its exact form is not too important, because the likelihood eventually dominates it.
unique aspects of the particular image). However, as already discussed, Bayesian inference is not
really limited to such situations if (as is traditional for Bayesians) probabilities are treated simply
as quantifications of belief. In this view, priors do not represent the relative frequency with which
conditions in the world obtain, but rather the observer’s uncertainty (prior to receiving the data in
question) about the hypotheses under consideration.
There are many ways of boiling this uncertainty down to a specific prior. Many descend from
the Laplace’s principle of insufficient reason (sometimes called the principle of indifference), which
holds that a set of hypotheses, none of which one has any reason to favour, should be assigned equal
priors. The simplest example of this is the assignment of uniform priors over symmetric options,
such as the two sides of a coin or the six sides of a die. More elaborate mathematical arguments
can be used to derive specific priors from more generalized symmetry arguments. One is Jeffreys’
prior, which allows more generalized equivalences between interchangeable hypotheses (Jeffreys
1939/1961). Another is the maximum entropy prior (Jaynes 1982), which prescribes the prior that
introduces the least information (in the technical sense of Shannon) beyond what is known.
Bayesians often favour so-called uninformative priors, meaning priors that are as ‘neutral’ as
possible; this allows the data (via the likelihood) to be the primary influence on posterior belief.
Exactly how to choose an uninformative prior can, however, be problematic. For example, to
estimate the probability of success of a binomial process, like the probability of heads in a coin
toss, it is tempting to adopt a uniform prior over success probability (i.e. equal over the range 0
to 100 per cent).5 But mathematical arguments suggest that a truly uninformative prior should be
relatively peaked at 0 and 100 per cent (the beta(0,0) distribution, sometimes called the Haldane
prior; see Lee 2004). But recall that as data accumulate, the likelihood tends to swamp the prior,
and the influence of the prior progressively diminishes. Hence while the choice of prior may be
philosophically controversial, in some real situations the actual choice is moot.
More specifically, certain types of simple priors occur over and over again in Bayesian accounts.
When a particular parameter x is believed to fall around some value µ, but with some uncertainty
that is approximately symmetric about µ, Bayesians routinely assume a Gaussian (normal) prior
distribution for µ, i.e. p(x) ∝ N(µ, σ2). Again, this is simply a formal way of expressing what is
known about the value of x (that it falls somewhere near µ) in as neutral a manner as possible
(technically, this is the maximum entropy prior with mean µ and variance σ2). Gaussian error
is often a reasonable assumption because random variations from independent sources, when
summed, tend to yield a normal distribution (the central limit theorem).6 But it should be kept
in mind that an assumption of normal error along x does not entail an affirmative assertion that
repeated samples of x would be normally distributed—indeed in many situations (such as where
x is a fixed quantity of the world, like a physical constant) this interpretation does not even make
sense. Such simple assumptions work surprisingly well in practice and are often the basis for
robust inference. Another common assumption is that priors for different parameters that have
no obvious relationship are independent (that is, knowing the value of one conveys no informa-
tion about the value of the other). Bayesian models that assume independence among parameters
Bayes himself suggested this prior, now sometimes called Bayes’ postulate, but he was apparently uncertain of
5
its validity, which may have contributed to his reluctance to publish his essay (which was eventually published
posthumously; see Stigler 1983).
More technically, the central limit theorem says that the sum of random variables with finite variances tends
6
towards normality in the limit. In practice this means that if x is really the sum of a number of component vari-
ables, each of which is random though not necessarily normal itself, then x tends to be normally distributed.
1018 Feldman
whose relationship is unknown are sometimes called naïve Bayesian models. Again, an assump-
tion of independence does not reflect an affirmative empirical assertion about the real-world rela-
tionship between the parameters, but rather is an expression of ignorance about their relationship.
In the context of perception, there are several ways to think of the source of the prior. Of course,
perceptual data arrive in a continuous stream from the moment of birth (or before). So in one sense
the prior represents belief prior to experience—that is, the innate knowledge about the environ-
ment with which evolution has endowed our brains. But in another sense it simply represents belief
prior to a given perceptual act, in which case it must also reflect the updated beliefs stemming from
learning over the course of life. Of course, there is a long history of controversy about the magni-
tude and specificity of innate knowledge (Elman et al. 1996; Carruthers et al. 2005). Bayesian the-
ory does not intrinsically take a position on this issue, easily accommodating either very broad or
uninformative ‘blank slate’ priors, more narrowly tuned ‘nativist’ priors representing more specific
knowledge about the environment, or anything in between. In any case because adult perceivers
benefit from both innate knowledge and experience, priors estimated by experimental techniques
(e.g. Girshick et al. 2011) must be assumed to reflect both evolution and learning in combination.
7 This should not be confused with what statisticians call the likelihood principle, a completely different idea.
The statistical likelihood principle asserts that the data should influence our belief in a hypothesis only via
the probability of those data conditioned on the hypothesis (i.e. the likelihood). This principle is universally
accepted by Bayesians; indeed the likelihood is the only term in Bayes’ rule that involves the data. But it is
violated by classical statistics, where, for example, the significance of a finding depends in part on the prob-
ability of data that did not actually occur in the experiment. For example, when one integrates the tail of a
sampling distribution, one is adding up the probability of many events that did not actually occur.
Bayesian Models of Perceptual Organization 1019
More recently, Chater (1996) has argued that simplicity and likelihood are two sides of the same
coin, for several reasons that stem from Bayesian arguments. First, basic considerations from infor-
mation theory suggest that more likely propositions are automatically simpler in that they can be
expressed in more compact codes. Specifically, Shannon (1948) showed that an optimal code—
meaning one that has minimum expected code length—should express each proposition A in a
code of length proportional to the negative log probability of A, i.e. −log p(A). This quantity is some-
times referred to as the surprisal, because it quantifies how ‘surprising’ the message is (larger values
indicate less probable outcomes), or as the description length (DL), because it also quantifies how
many symbols it occupies in an optimal code (longer codes for more unusual messages). Just as in
Morse code (or for that matter approximately in English) more frequently used concepts should
be assigned shorter expressions, so that the total length of expressions is minimized on average.
Because the proposition with maximum posterior probability (the MAP) also has minimum nega-
tive log posterior probability, the MAP hypothesis is also the minimum DL (MDL) hypothesis. More
specifically, while in Bayesian inference the MAP hypothesis is the one that maximizes the product of
the prior and the likelihood p(H)p(D|H), in MDL the winning hypothesis is the one that minimizes
the sum of the DL of the model plus the DL of the data as encoded via the model [−log p(H) − log
p(D|H), a sum of logs having replaced a product]. In this sense the simplest interpretation is neces-
sarily also the most probable—though it must be kept in mind that this easy identification rests on
the perhaps tenuous assumption that the underlying coding language is optimal.
More broadly, Bayesian inference tends to favour simple hypotheses even without any assump-
tions about the optimality of the coding language.8 This tendency, sometimes called ‘Bayes Occam’
(after Occam’s razor, a traditional term for the preference for simplicity), reflects fundamental
considerations about the way prior probability is distributed over hypotheses (see MacKay 2003).
Assuming that the hypotheses Hi are mutually exclusive, then their total prior necessarily equals
one (∑i p(Hi) = 1), meaning simply that the observer believes that one of them must be correct.
This in turn means that models with more parameters must distribute the same total prior over
a larger set of specific models (combinations of parameter settings) inevitably requiring each
model (on average) to be assigned a smaller prior. That is, more highly parametrized models—
models that can express a wider variety of states of nature—necessarily assign lower priors to each
individual hypothesis. Hence in this sense Bayesian inference automatically assigns lower priors
to more complex models and higher priors to simple ones, thus enforcing a simplicity metric
without any mechanisms designed especially for the purpose. This is really an instance of the
ubiquitous bias–variance tradeoff, that is, the tradeoff between the fit to the data (which benefits
from more complex hypotheses) and generalization to future data (which is impaired by more
complex hypotheses; see Hastie et al. 2001). Bayesians argue that Bayes’ rule provides an ideal
solution to this dilemma because it determines the optimal combination of data fit (reflected in
the likelihood) and bias (reflected in the prior).
Indeed the link between probability and complexity is fundamental to information theory, and
also leads to an alternative ‘subjectivist’ method for constructing priors. Kolmogorov (1965) and
Chaitin (1966) introduced a universal measure of complexity (now usually called Kolmogorov
complexity) which in a technical sense is invariant to differences in the language used to express
messages (see Li and Vitányi 1997). This means that just as DL can be thought of as −log p(H),
p(H) can be defined as (proportional to) 2−K(H) where K(H) is the Kolmogorov complexity of the
hypothesis H (see Cover and Thomas 1991). Solomonoff (1964) first observed that this defines a
‘The simplest law is chosen because it is most likely to give correct predictions’ (Jeffreys 1939/1961, p. 4).
8
1020 Feldman
‘universal prior’, assigning high priors to simple hypotheses and low priors to complex ones in a
way that is internally consistent and invariant to coding language—another way in which simplic-
ity and Bayesian inference are intertwined (see Chater 1996).
Though the close relationship between simplicity and Bayesian inference is widely recognized,
the exact nature of the relationship is more controversial. Bayesians regard the calculation of the
Bayesian posterior as fundamental, and the simplicity principle as merely a heuristic whose value
derives from its correspondence to Bayes’ rule. The originators of MDL and information-theoretic
statistics (e.g. Akaike 1974; Rissanen 1978; Wallace 2004) take the opposite view, regarding the
minimization of complexity (DL or related measures) as the more fundamental principle and
dismissing as naïve some of the assumptions underlying Bayesian inference (see Burnham and
Anderson 2002; Grünwald 2005). This debate roughly parallels the controversy in the perception
literature over simplicity and likelihood (see Feldman 2009; van der Helm 2013).
probability, referred to by Bayesians as sampling from the posterior. Again, only zero–one loss
would require rational subjects to choose the MAP response on every trial, so probability match-
ing generally rules out zero–one loss (but obviously does not rule out Bayesian models more
generally). The choice of loss functions in real situations probably depend on details of the task,
and remains a subject of research.
Loss functions in naturalistic behavioural situations can be arbitrarily complex, and it is not
generally understood either how they are apprehended or how human decision making takes
them into account. Trommershauser et al. (2003) explored this problem by imposing a moderately
complex loss function on their subjects in a simple motor task; they asked their subjects to touch a
target on a screen that was surrounded by several different penalty zones structured so that misses
in one direction cost more than misses in the other direction. Their subjects were surprisingly
adept at modulating their taps so that expected loss (penalty) was minimized, implying a detailed
knowledge of the noise in their own arm motions and a quick apprehension of the geometry of
the imposed utility function (see also Trommershauser et al. 2008).
the system might arrive at those beliefs. In this sense, Bayesian inference is a competence theory
(Chomsky’s term) or a theory of the computation (Marr’s term), meaning it is an abstract specifica-
tion of the function to be computed rather than the means to compute it. Many theorists, concur-
ring with Marr and Chomsky, argue that competence theories play a necessary role in cognitive
theory, parallel to but distinct from that of process accounts. Competence theories by their nature
abstract away from details of implementation and help connect the computations that experi-
ments uncover with the underlying problem those computations help solve. Conversely, some
psychologists denigrate competence theories as abstractions that are irrelevant to real psychologi-
cal processes (Rumelhart et al. 1986), and indeed Bayesian models have been criticized on these
grounds (McClelland et al. 2010; Jones and Love 2011).
But to those sympathetic to competence accounts, rational models have an appealingly ‘explan-
atory’ quality precisely because of their optimality. Bayesian inference is, in a well-defined sense,
the best way to solve whatever decision problem the brain is faced with. Natural selection pushes
organisms to adopt the most effective solutions available, so evolution should tend to favour
Bayes-optimal solutions whenever possible (see Geisler and Diehl 2002). For this reason, any
phenomenon that can be understood as part of a Bayesian model automatically inherits an evo-
lutionary rationale.
Conclusions
In a sense, perception and Bayesian inference are perfectly matched. Perception is the process by
which the mind forms beliefs about the outside world on the basis of sense data combined with
prior knowledge. Bayesian inference is a system for determining what to believe on the basis of
data and prior knowledge. Moreover, the rationality of Bayesian inference means that perceptual
beliefs that follow the Bayesian posterior are, in a well-defined sense, optimal given the infor-
mation available. This optimality has been argued to provide a selective advantage in evolution
(Geisler and Diehl 2002), driving our ancestors towards Bayes-optimal percepts. Moreover opti-
mality helps explain why the perceptual system, notwithstanding its many apparent quirks and
special rules, works the way it does—because these rules approximate the Bayesian posterior.
Moreover, the comprehensive nature of the Bayesian framework allows it to be applied to any
problem that can be expressed probabilistically. All these advantages have led to a tremendous
increase in interest in Bayesian accounts of perception in the last decade.
Still, a number of reservations and difficulties must be noted. First, to some researchers a
commitment to a Bayesian framework seems to involve a dubious assumption that the brain is
rational. Many psychologists regard the perceptual system as a hodge-podge of hacks, dictated
by accidents of evolutionary history and constrained by the exigencies of neural hardware. While
to its advocates the rationality of Bayesian inference is one of its main attractions, to sceptics the
hypothesis of rationality inherent in the Bayesian framework seems at best empirically implausi-
ble and at worse naïve.
Second, more specifically, the essential role of the prior poses a puzzle in the context of percep-
tion, where the role of prior knowledge and expectations (traditionally called ‘top-down’ influ-
ences) has been debated for decades. Indeed there is a great deal of evidence (see Pylyshyn 1999)
that perception is singularly uninfluenced by certain kinds of knowledge, which at the very least
suggests that the Bayesian model must be limited in scope to an encapsulated perception module
walled off from information that an all-embracing Bayesian account would deem relevant.
Finally, many researchers wonder if the Bayesian framework is too flexible to be taken seriously,
potentially encompassing any conceivable empirical finding. However while Bayesian accounts
Bayesian Models of Perceptual Organization 1023
are indeed quite adaptable, any specific set of assumptions about priors, likelihoods, and loss
functions provides a wealth of extremely specific empirical predictions, which in many specific
perceptual domains have been validated experimentally.
Hence notwithstanding all of these concerns, to its proponents Bayesian inference provides
something that perceptual theory has never really had before: a ‘paradigm’ in the sense of Kuhn
(1962)—that is, an integrated, systematic, and mathematically coherent framework in which
to pose basic scientific questions and evaluate potential answers. Whether or not the Bayesian
approach turns out to be as comprehensive or empirically successful as its advocates hope, this
represents a huge step forward in the study of perception.
Acknowledgments
I am grateful to Lee de-Wit, Vicky Froyen, Manish Singh, Johan Wagemans, and an anonymous
reviewer for helpful comments. Presentation of this article was supported by NIH EY0211494.
Please correspind directly with the author at jacob@ruccs.rutgers.edu.
References
Akaike, H. (1974). ‘A new look at the statistical model identification’. IEEE Trans Automat Contr
19(6): 716–723.
Bayes, T. (1763). ‘An essay towards solving a problem in the doctrine of chances’. Phil Trans R Soc. Lond
53: 370–418.
Bülthoff, H. H. and A. L. Yuille (1991). ‘Bayesian models for seeing shapes and depth’. Comm Theor Biol
2(4): 283–314.
Burge, J., C. C. Fowlkes, and M. S. Banks (2010). ‘Natural-scene statistics predict how the figure- ground
cue of convexity affects human depth perception’. J Neurosci 30(21): 7269–7280.
Burnham, K. P. and D. R. Anderson (2002). Model Selection and Multi-model Inference: a Practical
Information-theoretic Approach (New York: Springer).
Carruthers, P., S. Laurence, and S. Stich (2005). The Innate Mind: Structure and Contents (Oxford: Oxford
University Press).
Chaitin, G. (1966). ‘On the length of programs for computing finite binary sequences’. J Assoc Comput
Machin 13(4): 547–569.
Chater, N. (1996). ‘Reconciling simplicity and likelihood principles in perceptual organization’. Psychol Rev
103(3): 566–581.
Claessens, P. M. E. and J. Wagemans (2008). ‘A Bayesian framework for cue integration in multistable
grouping: proximity, collinearity, and orientation priors in zigzag lattices’. J Vision 8(7): 1–23.
Cohen, E. H., M. Singh, and L. T. Maloney (2008). ‘Perceptual segmentation and the perceived orientation
of dot clusters: the role of robust statistics’. J Vision 8(7): 1–13.
Compton, B. J. and G. D. Logan (1993). ‘Evaluating a computational model of perceptual grouping by
proximity’. Percept Psychophys 53(4): 403–421.
Cover, T. M. and J. A. Thomas (1991). Elements of Information Theory (New York: John Wiley).
Cox, R. T. (1961). The Algebra of Probable Inference (Oxford: Oxford University Press).
Dakin, S. (2013). ‘Statistical regularities’. In Handbook of Perceptual Organization, edited by J. Wagemans.
(This volume, forthcoming.)
de Finetti, B. (1970/1974). Teoria delle Probabilita 1 (Turin: Giulio Einaudi). [Translated by A. Machi and
A. Smith, 1990 as Theory of Probability 1 (Chichester: John Wiley and Sons).]
De Winter, J. and J. Wagemans (2006). ‘Segmentation of object outlines into parts: a large-scale integrative
study’. Cognition, 99(3): 275–325.
1024 Feldman
Earman, J. (1992). Bayes or Bust?: a Critical Examination of Bayesian Confirmation Theory (Cambridge,
MA: MIT Press).
Elder, J. (2013). ‘Contour grouping’. In Handbook of Perceptual Organization, edited by J. Wagemans. (This
volume, forthcoming.)
Elman, J., A. Karmiloff-Smith, E. Bates, M. Johnson, D. Parisi, and K. Plunkett (1996). Rethinking
Innateness: a Connectionist Perspective on Development (Cambridge, MA: MIT Press).
Feldman, J. (1997). ‘Curvilinearity, covariance, and regularity in perceptual groups’. Vision Res
37(20): 2835–2848.
Feldman, J. (2001). ‘Bayesian contour integration’. Percept Psychophys 63(7): 1171–1182.
Feldman, J. (2009). ‘Bayes and the simplicity principle in perception’. Psychol Rev 116(4): 875–887.
Feldman, J. (2013). ‘Tuning your priors to the world’. Top Cogn Sci 5(1): 13–34.
Feldman, J. and M. Singh (2006). ‘Bayesian estimation of the shape skeleton’. Proc Natl Acad Sci USA
103(47): 18014–18019.
Feldman, J., Singh, M., and Froyen, V. (2013). ‘Perceptual grouping as Bayesian mixture estimation’. In
Oxford Handbook of Computational Perceptual Organization edited by Gepshtein, Maloney and Singh,
forthcoming.
Fisher, R. (1925). Statistical Methods for Research Workers (Edinburgh: Oliver and Boyd).
Geisler, W. S. and R. L. Diehl (2002). ‘Bayesian natural selection and the evolution of perceptual systems’.
Phil Trans R Soc Lond B 357: 419–448.
Geisler, W. S., J. S. Perry, B. J. Super, and D. P. Gallogly (2001). ‘Edge co-occurrence in natural images
predicts contour grouping performance’. Vision Res 41: 711–724.
Girshick, A. R., M. S. Landy, and E. P. Simoncelli (2011). ‘Cardinal rules: visual orientation perception
reflects knowledge of environmental statistics’. Nat Neurosci 14(7): 926–932.
Gregory, R. (2006). ‘Editorial essay’. Perception 35: 143–144.
Griffiths, T. L. and A. L. Yuille (2006). ‘A primer on probabilistic inference’. Trends Cogn Sci 10(7).
Supplement to special issue on Probabilistic Models of Cognition. Available at: <http://www.stat.ucla.
edu/~yuille/pubs/ucla/A204_tgriffiths_chater2007.pdf>)
Grünwald, P. D. (2005). ‘A tutorial introduction to the minimum description length principle’. In Advances
in Minimum Description Length: Theory and Applications, edited by P. D. Grünwald, I. J. Myung, and
M. Pitt.(Cambridge, MA: MIT Press).
Hastie, T., R. Tibshirani, and J. Friedman (2001). The Elements of Statistical Learning: Data Mining,
Inference, and Prediction (New York: Springer).
Hatfield, G. and W. Epstein (1985). ‘The status of the minimum principle in the theoretical analysis of
visual perception’. Psychol Bull 97(2): 155–186.
Hochberg, J. and E. McAlister (1953). ‘A quantitative approach to figural “goodness” ’. J Exp Psychol
46: 361–364.
Hoffman, D. D. (2009). ‘The user-interface theory of perception: natural selection drives true perception to
swift extinction’. In Object Categorization: Computer and Human Vision Perspectives, edited by
S. Dickinson, M. Tarr, A. Leonardis, and B. Schiele.(Cambridge: Cambridge University Press).
Hoffman, D. D. and M. Singh (2012). Computational evolutionary perception. Perception. 41: 1073–1091.
Howie, D. (2004). Interpreting Probability: Controversies and Developments in the Early Twentieth Century
(Cambridge: Cambridge University Press).
Jaynes, E. T. (1982). ‘On the rationale of maximum-entropy methods’. Proc IEEE 70(9): 939–952.
Jaynes, E. T. (2003). Probability Theory: the Logic of Science (Cambridge: Cambridge University Press).
Jeffreys, H. (1939/1961). Theory of Probability, 3rd edn (Oxford: Clarendon Press).
Jones, M. and B. C. Love (2011). ‘Bayesian fundamentalism or enlightenment? On the explanatory status
and theoretical contributions of Bayesian models of cognition’. Behav Brain Sci 34: 169–188.
Bayesian Models of Perceptual Organization 1025
Juni, M. Z., M. Singh, and L. T. Maloney (2010). ‘Robust visual estimation as source separation’. J Vision
10(14): 2; doi: 10.1167/10.14.2.
Kersten, D., P. Mamassian, and A. Yuille (2004). ‘Object perception as Bayesian inference’. Ann Rev Psychol
55: 271–304.
Knill, D. C. and W. Richards (eds) (1996). Perception as Bayesian Inference (Cambridge: Cambridge
University Press).
Kolmogorov, A. N. (1965). ‘Three approaches to the quantitative definition of information’. Prob Inform
Transmission 1(1): 1–7.
Kuhn, T. S. (1962). The Structure of Scientific Revolutions (Chicago: University of Chicago Press).
Lee, M. D. and E.-J. Wagenmakers (2005). ‘Bayesian statistical inference in psychology: comment on
Trafimow (2003)’. Psychol Rev 112(3): 662–668.
Lee, P. (2004). Bayesian Statistics: an Introduction, 3rd edn (Chichester: Wiley).
Leeuwenberg, E. L. J. and F. Boselie (1988). ‘Against the likelihood principle in visual form perception’.
Psychol Rev 95: 485–491.
Li, M. and P. Vitányi (1997). An Introduction to Kolmogorov Complexity and its Applications
(New York: Springer).
McClelland, J. L., M. M. Botvinick, D. C. Noelle, D. C. Plaut, T. T. Rogers, M. S. Seidenberg, et al. (2010).
‘Letting structure emerge: connectionist and dynamical systems approaches to understanding cognition’.
Trends Cogn Sci 14: 348–356.
MacKay, D. J. C. (2003). Information Theory, Inference, and Learning Algorithms (Cambridge: Cambridge
University Press).
Maloney, L. T. (2002). ‘Statistical decision theory and biological vision’. In Perception and the Physical
World: Psychological and Philosophical Issues in Perception, edited by D. Heyer and R. Mausfeld,
pp. 145–189 (New York: Wiley).
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (San Mateo,
CA: Morgan Kauffman).
Perkins, D. (1976). ‘How good a bet is good form?’ Perception 5: 393–406.
Pylyshyn, Z. (1999). ‘Is vision continuous with cognition? The case for cognitive impenetrability of visual
perception’. Behav Brain Sci 22(3): 341–365.
Ramachandran, V. S. (1985). ‘The neurobiology of perception’. Perception 14: 97–103.
Rissanen, J. (1978). ‘Modeling by shortest data description’. Automatica 14: 465–471.
Rumelhart, D. E., J. L. McClelland, and G. E. Hinton (1986). Parallel Distributed Processing: Explorations in
the Microstructure of Cognition (Cambridge, MA: MIT Press).
Shannon, C. (1948). ‘A mathematical theory of communication’. Bell Syst Tech J 27: 379–423.
Singh, M. (2013). ‘Visual representation of contour geometry’. In Handbook of Perceptual O rganization,
edited by J. Wagemans. (This volume, forthcoming.)
Singh, M. and J. M. Fulvio (2005). ‘Visual extrapolation of contour geometry’. Proc Natl Acad Sci USA
102(3): 939–944.
Singh, M. and D. D. Hoffman (2001). ‘Part-based representations of visual shape and implications for visual
cognition’. In From Fragments to Objects: Segmentation and Grouping in Vision, Advances in Psychology
Vol. 130, edited by T. Shipley and P. Kellman, pp. 401–459 (New York: Elsevier).
Solomonoff, R. (1964). ‘A formal theory of inductive inference: part II’. Inform Control 7: 224–254.
Stigler, S. M. (1983). ‘Who discovered Bayes’s theorem?’ Am Statistician 37(4): 290–296.
Stigler S. M. (1986). The History of Statistics: the Measurement of Uncertainty Before 1900 (Cambridge,
MA: Harvard University Press).
Trommershauser, J., L. T. Maloney, and M. S. Landy (2003). ‘Statistical decision theory and the selection of
rapid, goal-directed movements’. J Opt Soc Am A: Opt Image Sci Vis 20(7): 1419–1433.
1026 Feldman
Trommershauser, J., L. T. Maloney, and M. S. Landy (2008). ‘Decision making, movement planning and
statistical decision theory’. Trends Cogn Sci 12(8): 291–297.
van der Helm, P. (2013). ‘Simplicity in perceptual organization’. In Handbook of Perceptual Organization,
edited by J. Wagemans. (This volume, forthcoming.)
Wallace, C. S. (2004). Statistical and Inductive Inference by Minimum Message Length (New York: Springer).
Weiss, Y., E. P. Simoncelli, and E. H. Adelson (2002). ‘Motion illusions as optimal percepts’. Nat Neurosci
5(6): 598–604.
Wilder, J., J. Feldman, and M. Singh (2011). ‘Superordinate shape classification using natural shape
statistics’. Cognition 119: 325–340.
Zucker, S. W., K. A. Stevens, and P. Sander (1983). ‘The relation between proximity and brightness
similarity in dot patterns’. Percept Psychophys 34(6): 513–522.
Chapter 50
1 Introduction
Perceptual organization is the neuro-cognitive process that takes the light in our eyes as input
and that enables us to interpret scenes as structured wholes consisting of objects arranged in
space—wholes which, moreover, usually are sufficiently veridical to guide action. This auto-
matic process may seem to occur effortlessly, but by all accounts, it must be very complex and
yet very flexible. To organize meaningless patches of light into meaningfully structured wholes
within (literally) the blink of an eye, it must combine a high combinatorial capacity with a
high speed (notice that a recognition model that tests previously stored templates against the
visual input might avoid the combinatorics but would not achieve the required speed). To give
a gist (following Gray 1999, but many others have argued similarly), multiple sets of features
at multiple, sometimes overlapping, locations in a stimulus must be grouped simultaneously.
This implies that the process must cope with a large number of possible combinations in paral-
lel, which also suggests that these possible combinations are engaged in a stimulus-dependent
competition between grouping criteria. Hence, the combinatorial capacity of the perceptual
organization process must be very high. This, together with its high speed (it completes in
the range of 100–300 ms), reveals the truly impressive nature of the perceptual organization
process.
One of the great mysteries of perception is how the human visual system manages to do all this.
An intriguing idea in this context is that, from among all possible interpretations of a stimulus,
the visual system selects the one defined by a minimum number of parameters. This simplicity
principle has gained empirical support but is also controversial. Indeed, simplicity is obviously an
appealing property in many settings, but can it be the guiding principle of the intricate process
sketched above? To review this idea, this chapter focuses on underlying theoretical issues which
may be introduced by way of a brief history of this principle.
applies to model selection and, more general, to inductive inference (Solomonoff 1964a, 1964b).
It proposes a trade-off between the complexity of hypotheses as such and their explanatory
power, as follows:
The best hypothesis to explain given data is the one that minimizes the sum of
(a) the information needed to describe the hypothesis; and
(b) the information needed to describe the data with the help of the hypothesis.
For instance, in physics, Einstein’s theory as such is more complex than that of Newton, but
because it explains much more data, it is nevertheless considered to be better. Applied to percep-
tual organization, the two amounts of information above can be taken to refer to, respectively, the
view-independent complexity of hypothesized distal stimuli as such and their view-dependent
degree of consistency with the proximal stimulus at hand. The MDL principle then suggests that,
in the absence of further knowledge, the best interpretation of a stimulus is the one that mini-
mizes the sum of these two amounts of information.
Another predecessor of the simplicity principle is the law of Prägnanz. The early twentieth-century
Gestalt psychologists Wertheimer (1912, 1923), Köhler (1920), and Koffka (1935) proposed that this
law underlies perceptual groupings based on properties such as symmetry and similarity. It was
inspired by the minimum principle in physics, which holds that dynamic physical systems tend to
settle into relatively stable states defined by minimum energy loads. Applied to perceptual organiza-
tion, the law of Prägnanz suggests that, when faced with a stimulus, the human visual system tends
to settle into relatively stable neural states reflecting cognitive properties such as symmetry and sim-
plicity. This idea does not exclude the influence of knowledge represented at higher cognitive levels,
but it takes this influence to be subordinate to stimulus-driven mechanisms of a largely autonomous
visual system.
Nowadays, the neural side of the law of Prägnanz finds elaboration in connectionist and
dynamic-systems approaches to cognition. In the spirit of Marr’s (1982/2010) levels of descrip-
tion, these two kinds of approaches are complementary in that connectionism usually focuses on
the internal mechanisms of information processing systems, while dynamic systems theory (DST)
usually focuses on the physical development over time of whole systems. Also complementary,
but then usually focusing on the nature of outcomes of information processes, is representational
theory in which the cognitive side of the law of Prägnanz finds elaboration. This may be specified
as follows. For perceptual organization, Koffka formulated the law of Prägnanz as holding
‘of several geometrically possible organizations that one will actually occur which possesses the best, the
most stable shape’ (1935: 138),
and Hochberg and McAlister put this in information-theoretic terms by
‘the less the amount of information needed to define a given organization as compared to the other alterna-
tives, the more likely that the figure will be so perceived’ (1953: 361),
specifying descriptive information loads, or complexities, by
‘the number of different items we must be given, in order to specify or reproduce a given pattern (361).
Hochberg and McAlister coined this information-theoretic idea as the descriptive minimum principle, and
nowadays it is also known as the simplicity principle.
Hence, just as the MDL principle in AIT, the simplicity principle in perception promotes sim-
plest codes as specifying the outcomes of an inference process based on descriptive codes of
things. Such descriptive codes are much like computer codes, that is, representations that can
be seen as reproduction recipes for things and whose internal structures are therefore enforced
by the internal structures of those things. Both the MDL principle and the simplicity principle
Simplicity in Perceptual Organization 1029
reflect modern information-theoretic approaches which contrast with Shannon’s (1948) classical
selective-information approach in communication theory. Shannon’s approach promotes optimal
codes, that is, nominalistic label codes (as in the Morse code) that minimize the long-time aver-
age burden on communication channels—assuming the transmission probabilities of codes are
known. The simplicity principle further contrasts with von Helmholtz’s (1909/1962) likelihood
principle. The latter holds that the internal neuro-cognitive process of perceptual organization
is guided by veridicality and yields interpretations most likely to be true in the external world—
assuming such probabilities are known. Shannon’s and von Helmholtz’s approaches are appealing
but suffer from the problem that, in many situations, the required probabilities are unknown if not
unknowable. A main objective of modern descriptive-information theory is to circumvent this
problem, that is, to make inferences without having to know the real probabilities.
An initial problem for modern information theory was that complexities depend on the
chosen descriptive coding language. However, both theoretical findings in AIT (Chaitin 1969;
Kolmogorov 1965; Solomonoff 1964a, 1964b) and empirical findings in perception (Simon 1972)
provided evidence that, regarding complexity rankings, it does not matter much which descrip-
tive coding language is employed. This evidence is not solid proof, but does suggest that descrip-
tive simplicity is a fairly stable concept.
The simplicity principle in perception agrees with ideas by Attneave (1954 1982) and Garner
(1962, 1974), for instance, and it has been promoted most prominently in Leeuwenberg’s
(1968, 1969, 1971) structural information theory (SIT). SIT was developed independently of
AIT, but in hindsight, its current implementation of the simplicity principle can be seen as a
perception-tailored version of the MDL principle in AIT (van der Helm 2000). A notable dif-
ference, though, is that the MDL principle postulates that simplest interpretations are the best
ones (without qualifying what ‘best’ means), whereas the simplicity principle postulates that
they are the ones most likely to result from the internal neuro-cognitive process of perceptual
organization—which may not be interpretations most likely to be true in the external world.
This historical overview raises three questions which, below, are discussed in more detail.
The first question is whether the human visual system indeed organizes stimuli in the simplest
way; this is basically an empirical question, but because it has been plagued by unclarities, it is
addressed by looking at operationalizations of simplicity. The second question is whether simplest
stimulus organizations are sufficiently veridical; this is a theoretical question which is addressed
by using AIT findings in a comparison between the simplicity and likelihood principles. The third
question is whether the simplicity principle agrees with the putative high combinatorial capacity
and speed of perceptual organization; this is a tractability question which is addressed by relating
SIT to DST and connectionism to assess how the simplicity principle might be neurally realized.
3 Operationalizations of simplicity
Hochberg and McAlister (1953) introduced the simplicity principle in an article entitled A quan-
titative approach to figural ‘goodness’. Figural goodness is an intuitive Gestalt notion and the idea
behind the association between descriptive simplicity and goodness is that simplicity entails both
accuracy and parsimony. For instance, a square can be represented as if it were a rectangle, but
representing it as a square is both more accurate and more efficient in terms of memory resources
as it requires fewer descriptive parameters. Assuming that patterns are represented in the simplest
way, simpler patterns are thus expected to be better in the sense that they can be remembered or
reproduced more easily.
1030 van der Helm
Hence, the motto here is ‘what is simple, is easy to learn’. Notice that this is the inverse of the
motto ‘what has been learned, is simple’, which expresses that patterns that have been seen often
are familiar so that they are experienced as being simple. The latter motto agrees with the likeli-
hood principle rather than with the simplicity principle, but it shows that simplicity has different
connotations which may be relevant in different settings (see also Sober 2002). Therefore, this
section first addresses this issue.
(a) (b)
Fig. 50.1 Objects that are simple because they have a highly regular internal structure consisting
of a superstructure (visualized by thick dashes) that determines the positions of many identical
subordinate structures (visualized by thin dashes). The hierarchy in (a) is the inverse of that in (b),
and in both cases, the objects are presumably classified on the basis of primarily the perceptually
dominant superstructure.
Simplicity in Perceptual Organization 1031
that classifications of different stimuli may be assessed on the basis of these hierarchical code
structures (see Figure 50.1; for more examples, see Leeuwenberg and van der Helm 2014).
These different ideas about simplicity are also reflected in the following. In classical informa-
tion theory, the length of an optimal code for an individual pattern is determined by the size of
the set of all actually occurring identical patterns. In modern information theory, conversely, the
length of the simplest descriptive code for an individual pattern determines the size of the set
of all theoretically possible equally complex patterns (as in AIT, which focuses on the algorith-
mically relevant complexities of simplest descriptive codes) or the set of all theoretically pos-
sible equally structured patterns (as in SIT, which focuses on the perceptually relevant structural
classes implied by simplest descriptive codes). The fact that descriptively simpler patterns belong
to smaller structural classes (Collard and Buffart 1983) agrees with Garner’s (1962, 1970) idea
of inferred subsets and his motto of ‘good patterns have few alternatives’. For instance, the set of
all imaginable squares is smaller than the set of all imaginable rectangles. In fact, in perception,
the structural class to which a pattern belongs is considered to be more relevant than its precise
metrical details (MacKay 1950), so that one could say that this class constitutes the generic repre-
sentation of the pattern (e.g. the mental representation of a particular square primarily represents
‘a square’ and its precise size is secondary). This suggests that a pattern should not be treated in
isolation, but in reference to its structural class (Lachmann and van Leeuwen 2005a, 2005b).
Hence, all in all, it is true that Shannon’s optimal codes have a flavour of simplicity. They are
shorter for more frequently occurring things, and thereby, minimize the long-term average length
of nominalistic label codes over many identical and different things. However, it is crucial to
distinguish this from the simplicity principle which minimizes the length of descriptive codes
for individual things. Furthermore, notice that the foregoing deals with view-independent prop-
erties only. Indeed, initially, both the simplicity principle and likelihood principle focused on
view-independent properties of hypothesized distal objects to predict the most likely outcome
of the perceptual organization process—that is, ignoring how well hypotheses fit the proximal
data. The latter issue is about view-dependencies, and as discussed next, the inclusion of this issue
boosted research on perceptual organization.
3.2 View-dependencies
Because descriptive simplicity is a fairly stable concept (see above), the assessment of complexities
of hypothesized distal objects (i.e. objects as hypothesized in candidate interpretations) as such
is not a big problem for the simplicity principle. For the likelihood principle, however, the assess-
ment of their probabilities is a problem. It predicts that the most likely outcome of the perceptual
organization process is the one that is also objectively most likely to be true in the world. However,
despite suggestions (Brunswick 1956), such objective probabilities in the world are unknown, if
not unknowable. This does not exclude that perception is guided by the likelihood principle, but
it does mean that this may not be verifiable (Leeuwenberg and Boselie 1988).
Be that as it may, in the 1980s, proponents of the likelihood principle switched to view-depend-
ent properties, that is, to properties that determine the degree of consistency between a candi-
date interpretation and the proximal stimulus (see, e.g., Gregory 1980). For these properties, fair
approximations of their objective probabilities in the world can be assessed better. This led to a
debate in which advocates of one principle presented phenomena that were claimed to be explained
by this principle but not by the other principle—however, advocates of the other principle were
generally able to counter such arguments (see, e.g., Boselie and Leeuwenberg’s 1986, reaction to
Rock 1983, and to Pomerantz and Kubovy 1986; Sutherland’s 1988, reaction to Leeuwenberg and
Boselie 1988; Leeuwenberg, van der Helm, and van Lier’s 1994, reaction to Biederman 1987). The
1032 van der Helm
crux of this debate is illustrated by Figure 50.2, for which both principles—as formulated at the
time—would make the correct amodal-completion prediction. That is, the simplicity principle
could say that the preferred interpretation is the one in which, view-independently, the com-
pleted shape is the simplest one. The likelihood principle, conversely, could say that it is the one
without unlikely view-dependent coincidences of edges and junctions of the two shapes.
Both arguments seemed to be valid, and in both the simplicity paradigm and the likelihood
paradigm, the result of this debate was the insight that perceptual organization requires an inte-
grated account of both view-independent and view-dependent factors (see, e.g., Gigerenzer and
Murray 1987; Knill and Richards 1996; Tarr and Bülthoff 1998; van der Helm 2000; van Lier, van
der Helm, and Leeuwenberg 1994, 1995; van Lier 1999). For the simplicity principle, such an
integration implies compliance with the MDL principle in AIT (see above), and no matter which
underlying principle one adopts, it concurs with an integration of information from the ven-
tral and dorsal streams in the brain (Ungerleider and Mishkin 1982). These streams are believed
to be dedicated to object perception and spatial perception, respectively, and an integration of
view-independent and view-dependent factors can thus be said to reflect an interaction between
these streams, to go from percepts of objects as such to percepts of objects arranged in space.
Hence, the past few decades showed a convergence of ideas about the factors to be included in
perceptual organization. This convergence, however, does not mean that the two principles agree
on how these factors are to be quantified. As explicated next in Bayesian terms, the latter issue is
not just a matter of complexities vs probabilities.
Fig. 50.2 The pattern in (a) is readily interpreted as a parallelogram partly occluding the shape in
(b) rather than the shape in (c). In this case, this preference could be claimed to occur either because,
unlike the shape in (b), the shape in (c) would have to take a rather coincidental position to yield the
pattern in (a), or because the shape in (b) is simpler than the shape in (c). In general, however, both
factors seem to play a role.
Simplicity in Perceptual Organization 1033
conditional probabilities. Notice, however, that Bayes’ rule does not prescribe where the prior and
conditional probabilities come from (cf. Watanabe 1969). The failure to recognize this crucial
point has led to overly strong claims (see also Bowers and Davis 2012a, 2012b). For instance,
Chater (1996) claimed that the simplicity and likelihood principles in perception are equiva-
lent, but this claim assumed implicitly—and incorrectly—that any Bayesian model automatically
implies compliance with the Helmholtzian likelihood principle (van der Helm 2000, 2011a). This
may be clarified further as follows.
In Bayesian terms, the above-mentioned convergence of ideas about the factors to be included
in perceptual organization means that both the likelihood paradigm and the simplicity para-
digm nowadays promote an integration of priors and conditionals—where the priors refer to
view-independent factors of candidate interpretations as such, while the conditionals refer to their
view-dependent degree of consistency with proximal stimuli. Hence, Bayes’ rule can be employed
to predict the most likely outcome of the human perceptual organization process. However, for a
modeller, the key question then is: where do I get the priors and conditionals from? If one wants
to model perceptual organization rather than explaining it, one might subjectively choose certain
probabilities, whether or not backed up by compelling arguments (for fine examples, see Knill and
Richards 1996). This is customary in Bayesian approaches, but notice that compliance with either
one of the explanatory simplicity and likelihood principles requires more specific probabilities.
The natural way to model the likelihood principle, on the one hand, is to use Bayes’ rule. After
all, this principle assumes that objective probabilities in the world (pw) determine the outcome of
the perceptual organization process. That is, for proximal stimulus D, the likelihood principle can
be formalized in Bayesian terms by:
Select the hypothesis H that maximizes pw(H|D) = pw(H) * pw(D|H)
where pw(H) is the prior probability of hypothesis H, while pw(D|H) is the probability that the
proximal stimulus D arises if the real distal stimulus is as hypothesized in H.
The natural way to model the simplicity principle, on the other hand, is to minimize the sum of
prior and conditional complexities (just as specified for the MDL principle in AIT). However, one
may also convert descriptive complexities C into artificial probabilities pa = 2−C; these are called
algorithmic probabilities in AIT (Li and Vitányi 1997) and precisals in SIT (van der Helm 2000).
Under this conversion, minimizing the sum of prior and conditional complexities C is equivalent
to maximizing the product of prior and conditional probabilities pa. Normalization then is irrel-
evant, and these artificial probabilities thus imply that also the simplicity principle can be formal-
ized in Bayesian terms, namely, by:
Select the hypothesis H that maximizes pa(H|D) = pa(H)*pa(D|H)
Thus, both principles can be formalized in Bayesian terms to predict the most likely outcome of
the perceptual organization process. The crucial difference then still is, however, that the likeli-
hood principle employs probabilities pw based on the frequency of occurrence of things in the
world whereas the simplicity principle employs probabilities pa derived from the descriptive com-
plexity of individual things.
Hence, to determine if the Bayesian formulation of the simplicity principle complies with the
likelihood principle, one should assess how close the latter’s objective probabilities pw and the
former’s artifical probabilities pa might be (van der Helm 2000, 2011a). This is discussed further
in the next section, but notice that a proof of equivalence of the principles is out of the question,
simply because the pw are unknown. The next two examples may illustrate various things dis-
cussed so far.
1034 van der Helm
3.5 Example 2: T-junctions
Each of the four configurations in Figure 50.3 can, in principle, be interpreted as consisting of
one object or as consisting of two objects. Going from left to right, however, the two-objects
interpretation (definitely preferred in a) gradually looses strength in favour of the one-object
interpretation (definitely preferred in d). By way of a clever experiment involving twelve of such
configurations, Feldman (2007) provided strong evidence for this. For instance, he found that,
just as the configuration in a, the T-junction in b is perceived as two objects, and that, just as the
configuration in d, the hook in c is perceived as one object.
T-junctions are particularly interesting because, in many models of amodal completion, they
are considered to be cues for occlusion (e.g. Boselie 1994; see also van Lier and Gerbino, this vol-
ume). That is, if the proximal stimulus contains a T-junction, this is taken as a strong cue that the
distal scene comprises one surface partly occluded by another (see, e.g., Figure 50.2). However,
before the visual system can infer this occlusion, it first has to segment the proximal stimulus into
the visible parts of those two surfaces, and Feldman’s (2007) data in fact suggest that T-junctions
are cues for segmentation rather than for occlusion. That is, they trigger segmentation even when
occlusion is not at hand.
To explain this, one may invoke van Lier, van der Helm, and Leeuwenberg’s (1994) empirically
successful amodal-completion model. It quantifies prior complexities of interpretations using
SIT’s coding model, and it quantifies conditional complexities under the same motto, namely,
that complexity reflects the effort to construct things. Thus, for an interpretation, the prior
complexity reflects the effort to construct the hypothesized distal objects, and the conditional
complexity reflects the effort to bring these objects in the relative position given in the proximal
Simplicity in Perceptual Organization 1035
stimulus. Notice that these conditional complexities are quantitatively equal to what Feldman
(2007, 2009) called co-dimensions—with the difference that Feldman (who assumed uniform pri-
ors) took a high co-dimension to be an asset of an interpretation, whereas van Lier, van der Helm,
Leeuwenberg’s (who assumed non-uniform priors) took a high conditional complexity to be a
liability. The latter agrees with the simplicity principle, and implies the following for Figure 50.3.
Going from left to right, the one-object interpretation has prior complexities of 5, 4, 3, and 1
(reflecting the number of line segments and angles needed to describe each configuration as one
object) and a conditional complexity of 0 in each case (i.e. no degree of positional freedom to be
removed to arrive at the proximal configurations). Likewise, the two-objects interpretation has a
prior complexity of 2 in each case (i.e. just two separate line segments to be described) and con-
ditional complexities of 0, 1, 2, and 3 (reflecting the degrees of positional freedom to be removed
to arrive at the proximal configurations). Hence, the one-object interpretation has posterior com-
plexities of 5, 4, 3, and 1, respectively, and the two-objects interpretation has posterior complexi-
ties of 2, 3, 4, and 5, respectively. This explains Feldman’s (2007) data that the hook is preferably
interpreted as one object whereas the T-junction is preferably interpreted as two objects (see also
van der Helm 2011a).
Hence, both examples stress the relevance of an interplay between non-uniform priors and
non-uniform conditionals. Notice that this still stands apart from the difference between the sim-
plicity and likelihood principles. This difference returns in the next section.
cannot be answered, but the second question found an answer in AIT’s Fundamental Inequality
(Li and Vitányi 1997) which, in my words, holds:
For any enumerable probability distribution P over things x with Kolmogorov complexities K(x), the
difference between the real probabilities p(x) and the artificial probabilities 2−K(x) is maximally equal to
the complexity K(P) of the distribution P.
would count intuitively. Hence, taking high conditional complexities to be a liability (as the simplicity
principle does) agrees with Rock’s (1983) avoidance-of-coincidences principle which is in line with
the general viewpoint assumption as put forward in the likelihood paradigm (see previous section).
Thus, in sum, the simplicity principle’s priors are probably not veridical, but its conditionals
probably are. On the one hand, this suggests that attempts to assess if the human visual system
is guided by simplicity or by likelihood should focus on the priors, because the conditionals do
not seem to be decisive in this respect. On the other hand, the simplicity principle’s veridicality
difference between priors and conditionals might explain experiences that scenes look weird at
first glance, but less so at subsequent glances. That is, by way of co-evolution, seeing organisms
can usually move as well, and this allows them to get different views of a same scene to infer better
what the scene entails. This inference process can be modelled neatly by a recursive application of
Bayes’ rule, which means that posteriors obtained for one glance are taken as priors for the next
glance. This implies that the effect of the first priors fades away and that the conditionals become
the decisive entities. Hence, although the simplicity principle’s priors probably are not veridical,
the fact that its conditionals probably are veridical seems sufficient to reliably guide actions in
everyday situations. In other words, a visual system that aims at internal efficiency seems to yield,
as a side-effect, an evolutionary sufficient degree of veridicality in the external world.
Roelfsema 2000; Lamme, Supèr, and Spekreijse 1998; Peterson 1994), the latter subprocess is taken
to be a predominantly exogenous subprocess within the visual hierarchy (Gray 1999; Pylyshyn
1999). Currently more relevant, those feature constellations are thought to be the result of the
exogenous subprocess of horizontal binding of similar features coded within visual areas. This
subprocess seems to be mediated by transient neural assemblies which also have been implicated
in the phenomenon of neuronal synchronization (Gilbert 1992). This phenomenon is discussed
next in more detail.
Neuronal synchronization is the phenomenon that neurons, in transient assemblies, temporar-
ily synchronize their activity. Not to be confused with neuroplasticity which involves changes in
connectivity, such assemblies are thought to arise when neurons shift their allegiance to different
groups by altering connection strengths (Edelman 1987), which may also imply a shift in the spec-
ificity and function of neurons (Gilbert 1992). Both theoretically (Milner 1974; von der Malsburg
1981) and empirically (e.g. Eckhorn et al. 1988, 2001; Finkel, Yen, and Menschik 1998; Fries 2005;
Gray and Singer 1989; Salinas and Sejnowski 2001), neuronal synchronization has been associated
with cortical integration, and more general, with cognitive processing. Synchronization in the
gamma-band (30–70 Hz), in particular, has been associated with feature binding in perceptual
organization.
It is true that these associations are indicative of what neuronal synchronization is involved in,
but notice that they are not indicative of the nature of the underlying process. For instance, not
only inside but also outside connectionism, the neural network in the brain is taken to perform
parallel distributed processing (PDP). PDP, however, neither requires nor automatically implies
synchronization which, therefore, is likely to subserve a form of neuro-cognitive processing that
is more special than standard PDP. The question then is what this special form of processing
might be.
The neural side of this question has been investigated in DST. That is, by varying system param-
eters, DST has yielded valuable insights into the physical conditions under which networks may
exhibit synchronization (e.g. Buzsáki and Draguhn 2004; Campbell, Wang, and Jayaprakash 1999;
Hummel and Holyoak 2003, 2005; van Leeuwen, Steyvers, and Nooter 1997). The point now is
that SIT’s simplicity approach provides complementary insights, namely, into the cognitive side of
synchronization. To set the stage, the next subsection ventures briefly into the prospected applica-
tion of quantum physics in computing.
mind hypothesis does not seem tenable, because quantum-physical phenomena do not seem to
last long enough to be useful for neuro-cognitive processing (Chalmers 1995, 1997; Searle 1997;
Seife 2000; Stenger 1992; Tegmark 2000). However, a cognitive form of superposition still seems
needed to account for perceptual organization (see also Townsend, Wenger, and Khodadadi,
this volume and Townsend and Nozawa’s 1995, similar call for what they coined a coactive
architecture yielding supercapacity). As discussed next, SIT provides such a cognitive option; it
is perhaps somewhat speculative and technical, but it is also mathematically sound and neurally
plausible.
6 Conclusions
It remains to be seen if human perceptual organization is indeed guided by the Occamian simplic-
ity principle which aims at internal efficiency, but this chapter shows that this principle is a seri-
ous contender of the Helmholtzian likelihood principle which aims at external veridicality. The
controversy between these principles is plagued by unclarities, but as reviewed, these unclarities
can be resolved—enabling a clear view on their fundamental differences. One insight then is that
empirical attempts to distinguish between them should focus on view-independent aspects of can-
didate stimulus interpretations, because view-dependent aspects do not seem to be decisive in this
respect. Their functional equivalence regarding view-dependent aspects, in turn, suggests that the
simplicity principle also has evolutionary survival value in that it yields sufficient veridicality in
everyday situations. Furthermore, the simplicity principle’s stance—that internal neuro-cognitive
mechanisms tend to yield parsimoneous percepts—is not only in line with Gestalt psychology but
is also sustained by the computational explanation of neuronal synchronization as being a mani-
festation of transparallel feature processing. This explanation suggests that the simplicity principle
is neurally realized by way of flexible cognitive architecture implemented in the relatively rigid
neural architecture of the brain.
Acknowledgment
Preparation of this chapter was supported by Methusalem grant METH/08/02 awarded to Johan
Wagemans (www.gestaltrevision.be).
References
Allen, G. (1879). ‘The origin of the sense of symmetry’. Mind 4: 301–316.
Atmanspacher, H. (2011). ‘Quantum approaches to consciousness’. In The Stanford Encyclopedia of
Philosophy, edited by E. N. Zalta. Retrieved from http://plato.stanford.edu.
Attneave, F. (1954). ‘Some informational aspects of visual perception’. Psychological Review 61: 183–193.
1042 van der Helm
Attneave, F. (1982). ‘Prägnanz and soap-bubble systems: A Theoretical Exploration’. In Organization and
Representation in Perception, edited by J. Beck, pp. 11–29. Hillsdale, NJ: Erlbaum.
Bayes, T. (1958). ‘Studies in the history of probability and statistics: IX. Thomas Bayes’ (1763) Essay “Towards
Solving a Problem in the Doctrine of Chances” (in modernized notation)’. Biometrika 45: 296–315.
Biederman, I. (1987). ‘Recognition-by-components: A theory of human image understanding’. Psychological
Review 94: 115–147.
Binford, T. (1981). ‘Inferring surfaces from images’. Artificial Intelligence 17: 205–244.
Boselie, F. (1994). ‘Local and global factors in visual occlusion’. Perception 23: 517–528.
Boselie, F. and E. L. J. Leeuwenberg (1986). ‘A test of the minimum principle requires a perceptual coding
system’. Perception 15: 331–354.
Bowers, J. S. and C. J. Davis (2012a). ‘Bayesian just-so stories in psychology and neuroscience’. Psychological
Bulletin 3: 389–414.
Bowers, J. S. and C. J. Davis (2012b). ‘Is that what Bayesians believe? Reply to Griffiths, Chater, Norris, and
Pouget (2012)’. Psychological Bulletin 3: 423–426.
Brunswick, E. (1956). Perception and the Representative Design of Psychological Experiments. Berkeley,
CA: University of California Press.
Buzsáki, G. and A. Draguhn (2004). ‘Neuronal oscillations in cortical networks’. Science 304: 1926–1929.
Campbell, S. R., D. L. Wang, and C. Jayaprakash (1999). ‘Synchrony and desynchrony in integrate-and-fire
oscillators’. Neural Computation 11: 1595–1619.
Chaitin, G. J. (1969). ‘On the length of programs for computing finite binary sequences: Statistical
considerations’. Journal of the Association for Computing Machinery 16: 145–159.
Chalmers, D. J. (1995). ‘Facing up to the problem of consciousness’. Journal of Consciousness Studies 2:
200–219.
Chalmers, D. J. (1997). The Conscious Mind: In Search of a Fundamental Theory. Oxford: Oxford University
Press.
Chater, N. (1996). ‘Reconciling simplicity and likelihood principles in perceptual organization’.
Psychological Review 103: 566–581.
Collard, R. F. A. and H. F. J. M. Buffart (1983). ‘Minimization of structural information: A set-theoretical
approach’. Pattern Recognition 16: 231–242.
Dijkstra, E. W. (1959). ‘A note on two problems in connexion with graphs’. Numerische Mathematik 1:
269–271.
Eckhorn, R., R. Bauer, W. Jordan, M. Brosch, W. Kruse, M. Munk, and H. J. Reitboeck (1988). ‘Coherent
oscillations: A mechanism of feature linking in the visual cortex?’ Biological Cybernetics 60: 121–130.
Eckhorn, R., A. Bruns, M. Saam, A. Gail, A. Gabriel, and H. J. Brinksmeyer (2001). ‘Flexible cortical
gamma-band correlations suggest neural principles of visual processing’. Visual Cognition 8: 519–530.
Edelman, G. M. (1987). Neural Darwinism: The Theory of Neuronal Group Selection. New York: Basic Books.
Feldman, J. (2007). ‘Formation of visual “objects” in the early computation of spatial relations’. Perception
and Psychophysics 69: 816–827.
Feldman, J. (2009). ‘Bayes and the simplicity principle in perception’. Psychological Review 116: 875–887.
Feynman, R. (1982). ‘Simulating physics with computers’. International Journal of Theoretical Physics 21:
467–488.
Finkel, L. H., S.-C. Yen, and E. D. Menschik (1998). ‘Synchronization: The computational currency of
cognition’. In ICANN 98, Proceedings of the 8th International Conference on Artificial Neural Networks
(Skövde, Sweden: 2–4 September 1998), edited by L. Niklasson, M. Boden, and T. Ziemke. New York:
Springer-Verlag.
Fries, P. (2005). ‘A mechanism for cognitive dynamics: Neuronal communication through neuronal
coherence’. Trends in Cognitive Sciences 9: 474–480.
Garner, W. R. (1962). Uncertainty and Structure as Psychological Concepts. New York: Wiley.
Simplicity in Perceptual Organization 1043
Garner, W. R. (1970). ‘Good patterns have few alternatives’. American Scientist 58: 34–42.
Garner, W. R. (1974). The Processing of Information and Structure. Potomac, MD: Erlbaum.
Gigerenzer, G. and Murray, D. J. (1987). Cognition as Intuitive Statistics. Hillsdale, NJ: Erlbaum.
Gilbert, C. D. (1992). ‘Horizontal integration and cortical dynamics’. Neuron 9: 1–13.
Gray, C. M. (1999). ‘The temporal correlation hypothesis of visual feature integration: Still alive and well’.
Neuron 24: 31–47.
Gray, C. M. and W. Singer (1989). ‘Stimulus-specific neuronal oscillations in orientation columns of cat
visual cortex’. Proceedings of the National Academy of Sciences USA 86: 1698–1702.
Gregory, R. L. (1980). ‘Perceptions as hypotheses’. Philosophical Transactions of the Royal Society of London
B 290: 181–197.
Hagar, A. (2011). ‘Quantum computing’. In The Stanford Encyclopedia of Philosophy, edited by E. N. Zalta.
Retrieved from http://plato.stanford.edu.
Hatfield, G. C. and W. Epstein (1985). ‘The status of the minimum principle in the theoretical analysis of
visual perception’. Psychological Bulletin 97: 155–186.
Hochberg, J. E. and E. McAlister (1953). ‘A quantitative approach to figural “goodness” ’. Journal of
Experimental Psychology 46: 361–364.
Hoffman, D. D. (1998). Visual Intelligence. New York: Norton.
Howe, C. Q. and D. Purves (2004). ‘Size contrast and assimilation explained by the statistics of natural
scene geometry’. Journal of Cognitive Neuroscience 16: 90–102.
Howe, C. Q. and D. Purves (2005). ‘Natural-scene geometry predicts the perception of angles and line
orientation’. Proceedings of the National Academy of Sciences USA 102: 1228–1233.
Hummel, J. E. and K. J. Holyoak (2003). ‘A symbolic-connectionist theory of relational inference and
generalization’. Psychological Review 110: 220–264.
Hummel, J. E. and K. J. Holyoak (2005). ‘Relational reasoning in a neurally-plausible cognitive architecture:
An overview of the LISA project’. Current Directions in Cognitive Science 14: 153–157.
Jilk, D. J., C. Lebiere, C. O’Reilly, and J. R. Anderson (2008). ‘SAL: An explicitly pluralistic cognitive
architecture’. Journal of Experimental and Theoretical Artificial Intelligence 20: 197–218.
Knill, D. C. and W. Richards (eds) (1996). Perception as Bayesian Inference. Cambridge: Cambridge
University Press.
Koffka, K. (1935). Principles of Gestalt Psychology. London: Routledge and Kegan Paul.
Köhler, W. (1920). Die physischen Gestalten in Ruhe und im stationären Zustand [Static and stationary
physical shapes]. Braunschweig: Vieweg.
Kolmogorov, A. N. (1965). ‘Three approaches to the quantitative definition of information’. Problems in
Information Transmission 1: 1–7.
Lachmann, T. and C. van Leeuwen (2005a). ‘Individual pattern representations are context-independent,
but their collective representation is context-dependent’. Quarterly Journal of Experimental Psychology:
Human Experimental Psychology 58: 1265–1294.
Lachmann, T. and C. van Leeuwen (2005b). ‘Task-invariant aspects of goodness in perceptual
representation’. Quarterly Journal of Experimental Psychology: Human Experimental Psychology 58:
1295–1310.
Lamme, V. A. F. and P. R. Roelfsema (2000). ‘The distinct modes of vision offered by feedforward and
recurrent processing’. Trends in Neuroscience 23: 571–579.
Lamme, V. A. F., H. Supèr, and H. Spekreijse (1998). ‘Feedforward, horizontal, and feedback processing in
the visual cortex’. Current Opinion in Neurobiology 8: 529–535.
Leeuwenberg, E. L. J. (1968). Structural Information of Visual Patterns: An Efficient Coding System in
Perception. The Hague: Mouton and Co.
Leeuwenberg, E. L. J. (1969). ‘Quantitative specification of information in sequential patterns’. Psychological
Review 76: 216–220.
1044 van der Helm
Leeuwenberg, E. L. J. (1971). ‘A perceptual coding language for visual and auditory patterns’. American
Journal of Psychology 84: 307–349.
Leeuwenberg, E. L. J. and F. Boselie (1988). ‘Against the likelihood principle in visual form perception’.
Psychological Review 95: 485–491.
Leeuwenberg, E. L. J. and P. A. van der Helm (2013). Structural Information Theory: The Simplicity of Visual
Form. Cambridge: Cambridge University Press.
Leeuwenberg, E. L. J., P. A. van der Helm, and R. J. van Lier (1994). ‘From geons to structure: A note on
object classification’. Perception 23: 505–515.
Li, M. and P. Vitànyi (1997). An Introduction to Kolmogorov Complexity and its Applications (2nd edn).
New York: Springer-Verlag.
Mach, E. (1959). The Analysis of Sensations and the Relation of the Physical to the Psychical.
New York: Dover. (Originally published 1922.)
MacKay, D. (1950). Quantal aspects of scientific information. Philosophical Magazine 41: 289–301.
MacKay, D. J. C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge: Cambridge
University Press.
Marr, D. (2010). Vision. Cambridge, MA: MIT Press. (Originally published 1982 by Freeman.)
Milner, P. (1974). ‘A model for visual shape recognition’. Psychological Review 81: 521–535.
Penrose, R. (1989). The Emperor’s New Mind: Concerning Computers, Minds and the Laws of Physics.
Oxford: Oxford University Press.
Penrose, R. and S. Hameroff (2011). ‘Consciousness in the universe: Neuroscience, quantum space-
time geometry and orch OR theory’. Journal of Cosmology 14, http://journalofcosmology.com/
Consciousness160.html.
Perkins, D. (1976). ‘How good a bet is good form?’ Perception 5: 393–406.
Peterson, M. A. (1994). ‘Shape recognition can and does occur before figure-ground organization’. Current
Directions in Psychological Science 3: 105–111.
Pomerantz, J. and M. Kubovy (1986). ‘Theoretical approaches to perceptual organization: Simplicity and
likelihood principles’. In Handbook of Perception and Human Performance: Vol. 2. Cognitive Processes
and Performance, edited by K. R. Boff, L. Kaufman, and J. P. Thomas, pp. 36–46. New York: Wiley.
Pylyshyn, Z. W. (1999). ‘Is vision continuous with cognition? The case of impenetrability of visual
perception’. Behavioral and Brain Sciences 22: 341–423.
Rissanen, J. J. (1978). ‘Modelling by the shortest data description’. Automatica 14: 465–471.
Rock, I. (1983). The Logic of Perception. Cambridge, MA: MIT Press.
Salinas, E. and T. J. Sejnowski (2001). ‘Correlated neuronal activity and the flow of neural information’.
Nature Reviews Neuroscience 2: 539–550.
Searle, J. R. (1997). The Mystery of Consciousness. New York: The New York Review of Books.
Seife, C. (2000). ‘Cold numbers unmake the quantum mind’. Science 287: 791.
Shannon, C. E. (1948). ‘A mathematical theory of communication’. Bell System Technical Journal 27: 379–
423, 623–656.
Simon, H. A. (1972). ‘Complexity and the representation of patterned sequences of symbols’. Psychological
Review 79: 369–382.
Sober, E. (2002). ‘What is the problem of simplicity?’ In Simplicity, Inference, and Econometric Modelling,
edited by H. Keuzenkamp, M. McAleer, and A. Zellner, pp. 13–32. Cambridge: Cambridge University
Press.
Solomonoff, R. J. (1964a). ‘A formal theory of inductive inference, Part 1’. Information and Control 7: 1–22.
Solomonoff, R. J. (1964b). ‘A formal theory of inductive inference, Part 2’. Information and Control 7:
224–254.
Stenger, V. (1992). ‘The myth of quantum consciousness’. The Humanist 53: 13–15.
Simplicity in Perceptual Organization 1045
Sutherland, S. (1988). ‘Simplicity is not enough’. In Working Models of Human Perception, edited by
B. A. G. Elsendoorn and H. Bouma, pp. 381–390. London: Academic Press.
Tarr, M. J. and H. H. Bülthoff (1998). ‘Image-based object recognition in man, monkey and machine’.
Cognition 67: 1–20.
Tegmark, M. (2000). ‘Importance of quantum decoherence in brain processes’. Physical Review E61:
4194–4206.
Townsend, J. T. and G. Nozawa (1995). ‘Spatio-temporal properties of elementary perception: An
investigation of parallel, serial, and coactive theories’. Journal of Mathematical Psychology 39: 321–359.
Ungerleider, L. G. and M. Mishkin (1982). ‘Two cortical visual systems’. In Analysis of Visual Behavior,
edited by D. J. Ingle, M. A. Goodale, and R. J. W. Mansfield, pp. 549–586. Cambridge, MA: MIT Press.
van der Helm, P. A. (2000). ‘Simplicity versus likelihood in visual perception: From surprisals to precisals’.
Psychological Bulletin 126: 770–800.
van der Helm, P. A. (2004). ‘Transparallel processing by hyperstrings’. Proceedings of the National Academy
of Sciences USA 101(30): 10862–10867.
van der Helm, P. A. (2011a). ‘Bayesian confusions surrounding simplicity and likelihood in perceptual
organization’. Acta Psychologica 138: 337–346.
van der Helm, P. A. (2011b). ‘The influence of perception on the distribution of multiple symmetries in
nature and art’. Symmetry 3: 54–71.
van der Helm, P. A. (2012). ‘Cognitive architecture of perceptual organization: From neurons to gnosons’.
Cognitive Processing 13: 13–40.
van der Helm, P. A. (2014). Simplicity in Vision: A Multidisciplinary Account of Perceptual Organization.
Cambridge: Cambridge University Press.
von Helmholtz, H. L. F. (1962). Treatise on Physiological Optics, trans. by J. P. C. Southall. New York: Dover.
(Originally published 1909.)
van Leeuwen, C., M. Steyvers, and M. Nooter (1997). ‘Stability and intermittency in large-scale coupled
oscillator models for perceptual segmentation’. Journal of Mathematical Psychology 41: 319–344.
van Lier, R. (1999). ‘Investigating global effects in visual occlusion: From a partly occluded square to a tree-
trunk’s rear’. Acta Psychologica 102: 203–220.
van Lier, R. J., P. A. van der Helm, and E. L. J. Leeuwenberg (1994). ‘Integrating global and local aspects of
visual occlusion’. Perception 23: 883–903.
van Lier, R. J., P. A. van der Helm, and E. L. J. Leeuwenberg (1995). ‘Competing global and local
completions in visual occlusion’. Journal of Experimental Psychology: Human Perception and Performance
21: 571–583.
von der Malsburg, C. (1981). ‘The correlation theory of brain function’. Internal Report 81–2, Max-Planck-
Institute for Biophysical Chemistry, Göttingen, Germany.
van Rooij, I. (2008). ‘The tractable cognition thesis’. Cognitive Science 32: 939–984.
Watanabe, S. (1969). Knowing and Guessing. New York: Wiley.
Wertheimer, M. (1912). ‘Experimentelle Studien über das Sehen von Bewegung’. Zeitschrift für Psychologie
12: 161–265.
Wertheimer, M. (1923). ‘Untersuchungen zur Lehre von der Gestalt’ [On Gestalt theory]. Psychologische
Forschung 4: 301–350.
Witkin, A. P. and J. M. Tenenbaum (1983). ‘On the role of structure in vision’. In Human and Machine
Vision, edited by J. Beck, B. Hope, and A. Rosenfeld, pp. 481–543. New York: Academic Press.
Yang, Z. Y. and D. Purves (2003). ‘Image/source statistics of surfaces in natural scenes’. Network-
computation in Neural Systems 14: 371–390.
Yang, Z. Y. and D. Purves (2004). ‘The statistical structure of natural light patterns determines perceived
light intensity’. Proceedings of the National Academy of Sciences USA 101: 8745–8750.
Chapter 51
Visual Awareness
Open your eyes in bright daylight, what happens? Typically you will be immediately aware of
the scene in front of you. There is nothing you can do about it, the ‘presentations’ simply hap-
pen to you. The presentations follow each other at a rate of about a dozen a second (Brown 1996;
VanRullen and Koch 2003). Typically each one is similar to the immediately preceding one,
though occasionally sudden changes occur. Changes appear to be both of an endogenous and an
exogenous origin1.
You have no control over the presentations, except by way of voluntary eye movements, and
so forth. But many of the eye fixations are generated endogenously, rather than voluntarily. They
also ‘happen to you’, although you won’t notice. They are part of what I propose to call your
‘zombie nature’2. Apart from immediate awareness you have a stream of cognitions and reflec-
tive thoughts. The latter are your doing; you largely have control over your thoughts, although a
minority ‘simply occur’ to you. In cases where you know to be aware of an ‘illusion’, you usually
can’t ‘correct’ your awareness3.
Your awareness is your reality in the sense that it is simply given to you4. Introspectively, a ‘cor-
rected illusion’ in reflective thought is much ‘less real’ than the illusion in your immediate visual
awareness. Thoughts may be right or wrong (your rational mind knows that), but awareness is
beyond this or that, right or wrong (your gut feelings depend on that).
1 The generic example is the depth flips of a ‘Necker cube’ (Necker 1832).
The reference is to ‘philosophical zombies’. See the entry on philosophical zombies in the Stanford Encyclopedia
2
of Philosophy at <http://plato.stanford.edu/entries/zombies/>.
See <http://en.wikipedia.org/wiki/Optical_illusion> on ‘Optical Illusions’.
3
4 Notice that my use of ‘reality’ is phenomenological, and different from what is often called ‘physical reality’.
The German distinction between Realität and Wirklichkeit does not seem to have an equivalent in English.
Mistaking a rope for a snake refers, of course, to the generic example of illusion from the Vedanta. See <http://
5
en.wikipedia.org/wiki/Vedanta>.
Gestalts as Ecological Templates 1047
quality like ‘redness’6, except that it carries an emotional load that pure qualities lack. But it is only
a matter of degree; it is not that redness is devoid of emotional charge.
See <http://en.wikipedia.org/wiki/God%27s_eye_view>.
7
The ‘being’ of an object such as a ‘round square’ is famously discussed in Alexius Meinong’s theory of objects
8
(Meinong 1899).
On the Higgs boson, see <http://en.wikipedia.org/wiki/Higgs_boson>.
9
1048 Koenderink
• Limited anatomical or physiological resources are the causes of illusion and error (your cat or
dog has only a limited view of the world).
• Modern western man comes close(st) to seeing things as they really are (as widespread practices
like rain dancing, black magic, and so forth, suggest).
The core concept is objective reality. In western philosophy, Kant’s Copernican revolution10
replaced it with transcendental idealism. However, this never really influenced the attitude of
mainstream science in a serious way, nor that of generally accepted common sense. Holding such
convictions by default (perhaps unfortunately, few people question them or even think about
them) leads to numerous further misunderstandings. Fortunately, there are (and have always
been) thinkers who expressly reject the God’s Eye View (e.g.11). However, it is perhaps fair to say
that they represent a marginal stream of thought.
The causal theory of perception is an idea that purports to bridge the ontological chasm between
the two realms of the physical and the mental. I will call it a ‘bridging hypothesis’. This particular
bridging hypothesis is based on the God’s Eye View; I used it only to introduce the concept. But
the concept of ‘bridging hypothesis’ as such is something one cannot do without.
A number of distinct bridging hypotheses have been proposed. I mention only a few. A common
one is the causal theory of perception. Another is the notion of the Gestalt school that Gestalts
in visual awareness are isomorphic with certain brain activities (Kohler 1920). Eliminative mate-
rialists hold the notion that ‘pain’ is really nothing but the ‘firing of C-fibres’12. The notion that
consciousness is due to activity of a NCC (‘Neural Centre of Consciousness’) is hardly a bridg-
ing hypothesis, but more like a theory held by one of Molière’s physicians in the farce La mal-
ade imaginaire (e.g. opium induces sleep due to its virtus dormitiva). One of the few bridging
hypotheses that makes any sense to me is the one proposed by Erwin Schrödinger (of quantum
mechanical fame).
Schrödinger proposed that awareness is generated when the organism learns (Schrödinger
1958). All learning is necessarily by mistake, that is, through the falsification of expectations
through actual experience. This is an idea that finds wide acceptance in biology, psychology, and
philosophy. But Schrödinger gives it a novel twist: you ‘meet the world’ when your expectation is
suddenly exposed as wrong, thereby initiating a spark of enlightenment so to speak. Awareness
can be understood as a series of such micro-enlightenments. I find Schrödinger’s proposal attrac-
tive, although there is no way to prove it in the framework of the sciences because it is a pure
bridging hypothesis. However, it leads to interesting consequences, thus it has its value as a heuris-
tic device. Moreover, the alternatives (the NCC and so forth13) seem to me just silly in comparison.
I’ll refer to it as the Schrödinger principle.
10 See Kant’s Preface to the second edition of the Critique of Pure Reason (1787, a serious revision of the first
edition of 1781).
Alfred Korzybski, ‘Science & Sanity’, available at <http://esgs.free.fr/uk/art/sands.htm>.
11
‘Type physicalism’ proposes that mental event types (such as pain in an individual) are identical with spe-
12
cific event types in the brain. In this case the ‘C-fibre firings’ in the individual. Of course, this extends to
all sentient beings and all times.
E.g. Francis Crick: ‘You, your joys and your sorrows, your memories and your ambitions, your sense of
13
personal identity and free will, are in fact no more than the behaviour of a vast assembly of nerve cells and
their associated molecules’. See <http://www.consciousentities.com/crick.htm>.
Gestalts as Ecological Templates 1049
‘Twenty questions’ is a spoken parlor game. It originated in the US, because very popular in the late 1940s
15
(through a radio quiz program). The game spread through Europe and was popular till (at least) the 1990s.
An online version can be found at <http://20q.net/>.
1050 Koenderink
plot. In this case the plot is template-like, bit like von Uexküll’s16 ‘seek image’ (Suchbild) in animal
behaviour. Did you lose completely? No, you collected a long list of places where not to look, pos-
sibly a great time saver. More importantly, you detected the need for a paradigm shift.
The Sherlock model centres upon the framing of questions. Notice that the meaning of an answer
is defined by the question, for questions imply a set of acceptable answers. That is why a discarded
cigarette butt—otherwise an irrelevant object—may bring the butler to the gallows. Questions are
like computer formats that define whether a certain sequence of key presses will be interpreted
as a number, a password, a command, or what have you. The meaning is not in the sequence of
key presses, but in the currently active format. This is how awareness is generated (here I use
Schrödinger’s bridging hypothesis!), and how awareness becomes composed of meanings.
I will refer to this important principle as Sherlock’s principle: ‘The meaning of an answer is in
the question; questions derive from a plot.’
‘Meaning’ cannot be computed from mere structure, as the causal theory of perception implies.
Algorithms (of the ‘inverse optics’ kind in vision) merely transform meaningless structure into
equally meaningless structure. In most computer applications the meaning is provided by a user,
the computer simply computing a sequence of symbols or an array of pixels. In the case of sentient
beings, the meaning has to be intrinsic, that is to say, imposed by the agent’s intentionality. This does
not imply that the meaning is a mere arbitrary hallucination. It will be confronted with the structure
currently in the perceptual front-ends. Such ‘reality checks’ keep the system from free-wheeling.
‘Controlled hallucination’ is like ‘analysis by synthesis’, and very different from brute hallucination.
Although it is clear how meaning might be transferred so to speak, it remains unclear how the
agent might get at its plots. One a priori principle that appears rational is that any plot should ulti-
mately be due to repeated, uncontradicted experience. ‘Plots’ are similar to Searle’s (1983) ‘local
background’, Rumelhart’s (1980) ‘schemata’, or Minsky’s (1975) ‘frames’. The alternative would be
that plots might be present at birth, or might be revealed by some supernatural power. The lat-
ter possibility should be reserved to religion, as it certainly lies outside the sciences. The former
one is more interesting. It is certain that organisms are not born without structure, anatomically
and physiologically. No doubt certain abilities involving (even extensive) brain activity are pre-
sent prior to actual experiences. However, to hold that such actions would be accompanied by
immediate awareness would be to fall back on revelation. I will consider them part of the zombie
nature. Of course, one may (eventually) become aware of one’s actions, even automatic ones after
the fact. After all, the body and its movements are just another part of the physical world.
or medicine) in 197317. A most important immediate forerunner was Jakob von Uexküll16, whose
marks are abundantly present in conceptual biology, psychology, and philosophy.
Important instances of animal behaviour over a wide range of species are Fixed Action Patterns
(FAPs), and Releasers18. These might be said to make up most of ‘instinctive behaviour’. What is
striking about the FAPs is that they occur even when the circumstances are not appropriate.
For instance, birds have been spotted to feed fish19, apparently because they ‘mistake the open
mouths of the fish for the open beaks of their chicks’. However, such an interpretation is no doubt
too anthropomorphic. Geese that roll eggs to their nest appear to act rationally20. When they do
the same with a potato one may suspect low visual acuity, and perhaps defective spectral resolu-
tion. However, the geese keep ‘rolling’ even when you remove the egg. Apparently, they can’t help
‘rolling’ once locked into the action pattern. The action can also be triggered by a brick placed in
the vicinity of the nest. The attempts of the bird to ‘roll the egg (brick) to the nest’ appear comi-
cal to the human observer. In many cases the ‘releasers’ trigger behaviour that even threatens the
survival of the species. A spectacular example involves male Australian Jewel beetles, mating beer
bottles left about the roads to exhaustion21. This in spite of the fact that the optical system of the
beetle easily resolves the difference between a beer bottle and a female. Such ‘mistakes’ can actu-
ally be quite useful, for instance, certain dairying ants appear to milk aphids (plant lice), yet the
ants ‘really’ mistake the rear ends of the lice for the heads of their fellow ants22.
What’s in these animals’ minds? Do they have any? Or is ‘mind’ synonymous with ‘human mind’
or even ‘my mind’? A major reference suggests that humans are unique (Genesis 26:27: ‘And God
said, Let us make man in our image. So God created man in his own image, in the image of God
created he him . . . ’23). However, the generic knowledge of medical doctors and veterinarians is
pretty much identical.
Such animal examples remind one of the fact that reflective thought often ‘knows’ that certain
spectacular ‘visual illusions’ in awareness are indeed illusionary, whereas awareness cannot be
‘corrected’ at all. One says that ‘vision is cognitively impenetrable’. Thus the ‘fixed action patterns’
and ‘releasers’ of ethology have many features in common with the Gestalts in human vision in
that they appear to be prepackaged responses that cannot be circumvented by the animal. On
the level of immediate awareness humans are not that different from what ethology reveals in
animal behaviour. I give some striking examples of template-like phenomena in human percep-
tion below. Although the emphasis tends to be on the illusory character of such phenomena, the
positive side is their adaptive significance. All well-adapted user interfaces have to be illusory
(also below).
See <http://www.nobelprize.org/nobel_prizes/medicine/laureates/1973/>.
17
watch?v=vUNZv-ByPkU>.
See ‘University of Toronto Mississauga professor wins Ig Nobel Prize for beer, sex research’, at <http://www.
21
eurekalert.org/pub_releases/2011-09/uot-uot092911.php>.
22 Video clips at <http://www.youtube.com/watch?v=tE7UL2pAaL0>, <http://www.youtube.com/
watch?v=IcdAgvroj5w>, <http://www.youtube.com/watch?v=NybgIxjlAGQ>, and <http://www.youtube.
com/watch?v=43id_NRajDo>.
23 King James Bible (completed 1611). Available online at the Electronic Text Center of the University of
Virginia: <http://www.rasmusen.org/special/USEFUL/CHILDREN/Things-to-do-words/literature-kids/
Childrens.bible.book/bible_kjv/kjv/etext.lib.virginia.edu/kjv.browse.html>.
1052 Koenderink
Gibson also hardly recognizes the Leibnizian harmony, which always has been a major source
of wonder to many researchers of the animal world. This appears to reflect the well-known dif-
ference in perspective between the Anglo-Saxon and continental European traditions in general.
An account in contemporary terms might run as follows. The physical world (Welt) is per-
haps the least clearly defined entity. For our (biologically inspired) purposes we certainly
don’t need reference to quarks26, Dirac’s equation27, and so forth. The ‘physical world’ is the
26 On quarks <http://en.wikipedia.org/wiki/Quark>.
On Dirac’s equation <http://en.wikipedia.org/wiki/Dirac_equation>.
27
Gestalts as Ecological Templates 1053
everyday world as described by the applied sciences on scales relevant to humans. Although
very vague, one may simply consider a huge chunk with respect to a large variety of scales.
An overdose doesn’t hurt, because the physical world as such is irrelevant to the organism
(Turvey, Shaw, Reed, and Mace 1981). The Umwelt is a subset of the physical world that might
conceivably involve the organism, because it might be involved in actions to its body (in the
widest sense), or might be the target of its own actions. Thus the Umwelt is different from
a mere geographical niche (Umgebung), which is Gibson’s use. The body itself is part of the
Umwelt.
We need additional distinctions. The ‘sense world’ (Merkwelt) is a subset of the Umwelt that
might causally affect the organism’s sense organs. The ‘act world’ (Wirkwelt) is a subset of the
Umwelt that might be causally affected by the organism’s effectors. Sense world and act world
allow of dual descriptions. One is in terms of the causal nexus (mainly physics) of the Umwelt, the
other is in terms of neural activity in the body of the organism. In the latter case one thinks of the
act world as the ‘motor system’ (in the most general sense, including the glandular system, etc.),
and of the sense world as the ‘sensoria’ with their associated neural ‘front-ends’. All these above
distinctions in what is usually simply called ‘world’ are basic in discussing organisms, and are
commonly introduced in modern accounts (MacIver 2009).
In virtually all organisms one encounters closed loops of sensorimotor behaviour. An action in
the act world causes an action in the sense world, the chain being closed in the Umwelt. Activity in
the sense world causes actions in the act world, the chain being closed in the brain. Umwelt, sense
world, brain and act world are nodes in a single closed loop. The brain may complicate this loop in
numerous ways. For instance, an intended motor action yields an expectation of consequent sen-
sor activity, the so called reafference signal (von Uexküll’s ‘new loop’ (Uexküll 1926), now usually
associated with von Holst and Mittelstaedt 1950). The reafference is an expectation that may, or
may not, happen to successfully predict sensory effects. Mismatches are informative, because the
organism ‘meets the Umwelt’ in the mismatch, thus this may again lead to awareness according to
Schrödinger’s principle.
In these functional loops (Funktionskreisen) certain invariants eventually obtain a ‘functional
tone’, an envelope based on frequent uncontradicted experience. Since an invariant may occur
in many intertwined functional loops, such functional tones may acquire multiple degrees of
freedom. Eventually they become carriers of meaning. When traced to the Umwelt they are like
Gibson’s affordances, although that strips them from their roots in the functional loops, and
moves them from their proper ontological level.
One important point is that the functional tone derives from uncontradicted experience. I will
refer to this as von Uexküll’s principle: ‘The form of awareness reflects prior experience. There is
no awareness from “revelation” ’. Of course, this also involves Schrödinger’s bridging hypothesis
again. I return to this point later.
The ‘inner world’ (Innenwelt) of the organism can be thought of as a ‘projection’ of the func-
tional organization (as implemented by the whole body, including the brain) on the Umwelt. It
is the implementation of intentionality. Without the organism inner world and Umwelt disap-
pear, and one is left with the meaningless chaos that is the physical world. This is a revolutionary
notion with, for many, perhaps shocking consequences. It implies, for instance, that even space
and time—as you know them—are your constructions, not pre-existing entities that you happen
to find yourself immersed in. There are indeed many instances of animals that lack space and/or
time as judged from the structure of their sense and act worlds. Humans also appear to construct
their own space-times (Koenderink, Richards, and van Doorn 2012).
As von Uexküll remarks, the inner world of an organism must forever remain a closed book
to us. It can only be experienced from within, and cannot possibly be revealed by external
1054 Koenderink
observation. This recommended him to the behaviourist movement28 in the United States of the
early twentieth century. The inner world is mental. It is ‘what it is like’ to be a certain being. He
recognizes that we will never be able to enter the inner world of other beings. This is echoed by
Thomas Nagel in his famous paper ‘What is it like to be a bat?’ (Nagel 1974).
Notice that von Uexküll’s account (and the consequent account from ethology) suggests logical
and mutually complementary tasks for anatomy, physiology, brain science, ethology, behavior-
istic psychology, and cognitive science. It also treats phenomenological research as beyond the
realm of the sciences. Of course, this assumes that the various disciplines ‘play fair’, and stick to
their assigned areas of endeavor and discourse. Perhaps unfortunately, brain scientists engaging in
‘mind talk’, and psychologists engaging in ‘brain talk’, are commonly overstepping their bounda-
ries (Manzotti and Moderato 2010).
In my view phenomenological research is not altogether ruled out as a science, as von Uexküll
implies, because it applies singularly to homo sapiens, whereas he considers general, typically alien
phenomenology. In the human case a ‘shared subjectivity’ is possible due to the fact that individu-
als cannot be pried loose from their embedding in a social structure. This enables an empathic
or ‘silent’ understanding between individuals, a ‘pointing to the moon’29. Successful pointing, as
a silent communication device, implies emphatic understanding (Montag, Gallinat, and Heinz
2008; Stein 1917). When ‘pointing to the moon’, your dog will look at your finger, and so do young
children. However, dogs will never ‘get it’, whereas little children soon will.
An example would be a ‘visual proof ’, as frequently used by the Gestaltists. ‘Kanizsa’s triangle’
(Kanizsa 1955) tells us something, ‘we know not what’, but we all agree. Is it a scientific fact? That
is a matter of definition, but it is definitely a fact of experimental phenomenology, because the
triangle belongs to the ‘inner world’. When neuro-cognition purports to explore it, it oversteps its
boundaries. There is a place for experimental phenomenology because we are humans. Neither
behaviourism, nor cognitive science—by design—address the inner world.
the-finger-pointing-to-the-moon/>.
Elizabeth S. Spelke’s website at the Department of Psychology of Harvard University has a good list of
30
important publications: <http://www.wjh.harvard.edu/~lds/index.html?spelke.html>.
Giorgio Vallortigara’s website at the Center for Mind/Brain Sciences of the University of Trento has a useful
31
list of publications: <http://www.unitn.it/en/cimec/11761/giorgio-vallortigara>.
Gestalts as Ecological Templates 1055
32 See <http://en.wikipedia.org/wiki/Animal_cognition>.
See <http://en.wikipedia.org/wiki/Will_to_power>.
33
1056 Koenderink
for specific resistance, it is alike to questioning the world. Eventually this leads to ‘presentations’,
that is, awareness (Schopenhauer’s Die Welt als Wille (poking) und Vorstellung (presentation)
(Schopenhauer 1818/1819)). In humans this evolves into a nexus of qualities, meanings, and
emotions.
The process is systolic, the microgenesis of the next presentation going on, even as one experi-
ences the previous one. The timescale is largely limited by the fact that the perceptual front-end
buffers are continually being overwritten, thus there is only so much time for a reality check.
A natural termination is enforced when the volley of threads launched by the microgenesis has
been tried against front-end activity. Then a next systole is required in which some threads are
killed, others diversified (split into several independent threads). Thus the process is much like
any variety of genetic algorithm—for instance, ‘harmony search’ (Geem, Kim, and Loganathan
2001). One imagines the individual threads to be fairly simple, because any single presentation
cannot be very complicated. The general gist will be kept, and the focal structure is probably lim-
ited by the magical number seven plus or minus two.
The cognitive processes are distinct from this as they have their own agenda of plots. These
plots may be injected to bias microgenesis, the resulting awareness again making its way to the
input of cognitive processes. Apart from triggering plots, the cognitive processes generate con-
cepts that may enter reflective thought. In a way, the world on and in which cognition works is
awareness.
Reflective thought, finally, may be expected to launch novel cognitive processes. It works on the
conceptual level to confabulate ‘stories’ that account for the sequence of good looks. The world on
and in which reflective thought works is cognition.
Thus the various levels are intertwined in complicated ways. To try to understand vision in
simple terms as ‘bottom up this—top down that’ is far too simplistic to have much explanatory
power.
Perhaps the best-known example is the ‘desktop’ paradigm of laptop computers36. Consider
the process of deleting a text file. The text file ‘is’ an icon on the desktop. You use the mouse to
‘drag it’ to the ‘trash’, which is another icon on the desktop. As you place the text file on top of the
trash, it magically disappears. What really happened? That depends. To the interface program-
mer you moved the mouse, thus defining a sequence of screen locations. The program writes
the empty desktop over the text file icon, then writes the text file icon in its new location over
the desktop. This process is terminated once the mouse is over the trash. The text file icon is not
redrawn; instead a message is send to the file manager. The file manager is another program. It
manages nested lists of files. It deletes the text file from the list. This deletion generates a signal
to the ‘system’ (another program) that ‘frees’ the space on the disk (or somewhere else) where
the text file was stored. Nothing happened to the text file (a hacker may still ‘retrieve it’). Only
a reference was deleted and the desktop picture changed. The systems programmer has another
story. The electronics engineer another story still. The chips technician has yet another story, and
so forth. The user doesn’t have to know, nor does the user want to know. The fact that the text file
icon suddenly disappeared was encouraging (the ‘text file disappeared in the trash’). Are text files
like such icons? No way! The text file is different things to different people. Fortunately, the user
doesn’t need to know.
It is actually a good thing not to know what goes on in the physical environment you find your-
self in. You don’t want to be a systems programmer, an electronic technician, a chip specialist, a
solid-state physicist, a quantum mechanics expert . . . just to delete a text file! Moreover, you don’t
want to know what is inside the box you call ‘computer’ (vacuum tubes, transistors, silicon chips,
sawdust, empty beer cans, or what have you). Thus desktop interfaces are good. Everybody agrees on
that. The surprising thing is that people somehow hesitate when talking about perception and the
physical world. Most contemporary philosophers consider it problematic that we do not have the
kind of awareness that might be designated ‘veridical’. (Strange enough, it is usually silently under-
stood that we all know what might be meant by ‘veridical’. Does it include string theory37? This is the
God’s Eye View again.)
Biological evolution38 doesn’t care about all this. It simply optimizes biological fitness39. As a
consequence strange things may happen, as is amply recorded in ethological research. It is not
that humans are exempt of such strange behaviors either. After all, rain dancing40, black magic41,
and various religious beliefs42 are still widespread. Many of these cases are beneficial to the agents,
some not. In all cases the agents are ‘fooled’ by their user interfaces.
See <http://en.wikipedia.org/wiki/Desktop_metaphor>.
36
See <http://en.wikipedia.org/wiki/String_theory>.
37
See <http://en.wikipedia.org/wiki/Evolution>.
38
39 See <http://en.wikipedia.org/wiki/Fitness_%28biology%29>.
See <http://en.wikipedia.org/wiki/Rainmaking_%28ritual%29>.
40
41 See <http://en.wikipedia.org/wiki/Black_magic>.
See <http://en.wikipedia.org/wiki/Religion>.
42
1058 Koenderink
and so forth, possibly organized on a page in some pleasant pattern, unless you are using the
UNIX vi editor), it simply stands for it like a name. The icon has nothing else to do with what you
mean than a mere conventional association.
Most people are not aware of this, or prefer to forget it. When they have unfortunately deleted
their text file accidentally, they start searching for its icon(!). Yet only the icon is really gone,
whereas the text file (at least immediately after the act of deletion) can probably still be recovered,
thus is still ‘on’ your computer. The icon is like the Gestalt, quality, or meaning, in your visual
awareness. Although the elements of your immediate awareness are not physical objects, they are
indeed your reality. But they are your reality, and nothing beyond that. That does not mean they
have no useful existence. As you change your text file (you probably wrote it in the first place),
it will have different effects as you send it as a letter. Using the internet—at considerable remove
from ‘daily reality’—you can donate half of your income to a house for stray cats. This will have
real consequences to your life, for instance, it may prevent you from paying your rent, causing
you to have to sleep in the streets. Although sleeping in the streets is tough (‘real’ reality), it is still
experienced in terms of your user interface. Everything is.
In immediate visual awareness you encounter qualities and meanings, packaged as Gestalts.
These are, no doubt, elements of your optical user interface. They are template objects. Consider
a few common templates:
• figures and grounds43;
• volumetric objects;
• causal interactions (Michotte 1946);
and so forth.
What about them? The familiar phenomenon of ‘figure–ground reversal’ is sufficient evidence
for the volatile nature of this distinction. You know, no doubt, that you see only the frontal sur-
faces of ‘volumetric’ objects. The apple you see may actually turn out to be hollow on turning it
around. Causal interactions may be faked like in a magician’s show, or in the interaction of the
text file with the trash icon.
Here I will discuss a few fairly obvious and common reflections on the fact that human visual
awareness is a ‘user interface’. One spots this because the elements of the user interface tend to be
abiding templates, rather than ‘solutions of the inverse optics problem’. I simply give some obvious
examples. Many more can be found, one needs only look for them. What is perhaps surprising
is that mainstream vision research has failed to notice these facts, for one is not talking of minor
effects! The reason is, no doubt, that they were never looked for.
External local sign. ‘Local sign’ is a concept due to Lotze (1852). It is a place label on fibres
of the optic nerve, a solution to the problem of how the brain ‘knows where the fibres are from’.
Tarachopia (Hess 1982) appears to be an aphasia revealing a defective local sign. ‘External local
sign’ (Koenderink, van Doorn, and Todd 2009) assigns a ‘visual ray’ (Burton 1945), that is a direc-
tion in the world, in oculocentric coordinates, to fibres in the optic nerve. Early speculations
about the origin of external local sign are due to Berkeley (1709). Otherwise hardly any phenom-
enological research exists on the topic.
In a simple experiment we mapped external local sign throughout the field of view of a few
dozen observers. One simple overall measure of external local sign is the angular spread of the
visual rays over the full field of view. This is the diameter of the ‘visual field’, which is the subjec-
tive correlate of the field of view. Whereas the field of view of the human eye subtends about 180º,
43 See <http://en.wikipedia.org/wiki/Figure%E2%80%93ground_%28perception%29>.
Gestalts as Ecological Templates 1059
we find a wide spectrum of visual field diameters. The distribution appears to be bimodal, most
observers having a visual field of about 90º across. Thus most observers experience visual objects
as far more ‘in front of them’ than they are.
External local sign appears to be an important rigid ‘template’ that strongly influences human
awareness of visual space. We found that virtually all observers commit huge errors (exceeding
100º) when asked to rotate (under remote control) one of two congruent objects in the scene in
front of them so as to be geometrically parallel (Koenderink, van Doorn, de Ridder, and Oomes
2010). We also showed that visual observers make huge mistakes in judging whether a number of
people in front of them are arranged in strict military order. Such non-veridical observations are
due to the application of a rigid template that fails to implement the optical fact that visual direc-
tions fan out from the eye into a half-space.
Linear perspective of pictorial box spaces. Pictorial ‘box spaces’ are renderings of cubicles
(Panofski 1927). They were common in the woodcuts of the Middle Ages, but still in use today.
The early renderings are in a free style reminiscent of one-point perspective. Later one used true
one-point perspective, which is very simple in the case of cubes. The front and back faces of the
cube are rendered as squares, the image of the back face smaller than that of the front one. Then
corresponding vertices are joined (the ‘orthogonals’) so as to define the side faces. The front face
is left ‘transparent’, so the cubicle is open to the view. In a true linear perspective the orthogonals
would be concurrent lines. The construction is so simple that many draftsmen sketch it free hand.
The cubicle then acts as a ‘stage’ that the artist may fill with any content. The stage defines the
pictorial space; it acts as a scaffold or skeleton to the pictorial structure.
In linear perspective there is a well-defined viewpoint and thus angular size of the cube. Given
the viewpoint, the ratio of the sizes of front and back face is fixed. As you change this ratio, the
prediction is that the cubicle will either appear as a thin slab (ratio nearer to unity) or a deep
corridor (ratio larger than fiducial). In an experiment we asked observers to adjust the ratio such
that the awareness was of a true cubicle (Pont, Nefs, van Doorn, Wijntjes, te Pas, de Ridder, and
Koenderink 2012). We did this for a wide range of viewpoints, varying both distance and angular
size. The result was clear-cut in that the prediction was not borne out at all. What observers do is
set a fixed ratio. They impose a template, even when ‘not applicable’.
The result may account for the fact that observers judge wide-angle or telephotographs
as ‘distorted’ as compared with photographs taken with a ‘normal lens’ (field of view about
40–50º). They do this, even when the viewpoint is perspectively ‘correct’. Apparently they apply
templates for familiar things, and experience obvious deviations from the template as distor-
tions. That is no doubt why artists ‘correct for distortions’ when depicting wide-angle scenes
(Pirenne 1970).
Shape from shading. ‘Shading’ is an important shape cue for visual artists. It has been used from
the earliest time on. An interpretation in terms of optics starts in Renaissance art, and becomes
a proper (applied) science in the seventeenth and eighteenth centuries. Shading was taught as a
discipline in western academies of art till the early twentieth century (Baxandall 1995).
The perception of shading was initially studied with the simplest patterns, designed to isolate
the ‘shading cue’ in its simplest form. The canonical stimulus has been a circular disk filled
with a linear lightness gradient. From an optical analysis one finds that such a pattern can
be due to the illumination of a curved surface in infinitely many ways. Assuming a uniform,
unidirectional illumination, the possible surfaces would be quadrics: spherical, cylindrical,
saddle-shaped, and anything in between. From the phenomenology we know that observers
are only aware of spherical patches though. In order to become aware of a cylinder one needs
to change the shape of the patch from circular to square, whereas saddle shapes are never
reported. Perhaps surprisingly, an analysis reveals that the prior is biased towards saddle shapes
1060 Koenderink
(about 57 per cent of the area of a Gaussian random surface is saddle-shaped; Koenderink and
van Doorn 2003).
Apparently human visual observers apply templates that do not include a saddle shape. This
may be due to a general disregard of saddle shapes. For instance Alberti, writing in the fifteenth
century, proposes a ‘complete’ catalogue of shapes that lacks saddles (Alberti 1435). Apparently,
they never occurred to this highly educated intellectual. The correct taxonomy only came with
Gauss in the nineteenth century (Gauss 1828).
An interpretation might be that spheres and cylinders are ‘thing-like’ whereas saddle shapes
cannot be (you can’t have an object bounded by a saddle-like surface throughout). Thus the tem-
plate might be biased to ‘things’, that is to say, volumetric objects of manipulable size.
Conclusion
Human visual awareness is perhaps best characterized as an optical user interface. The elements of
the interface are template-like. They have qualities and meanings that derive from their functional
role in the interface. Thus, awareness is non-veridical by design. Evolution optimizes biological
fitness, rather than physical veridicality. In this, human visual awareness is not unlike the struc-
ture of animal vision as described by ethology.
Throughout the paper I have consistently used three principles that appear fundamental to the
understanding of visual awareness (the epithets are mine, and perhaps not entirely fair):
• Sherlock’s principle: The meaning of an answer is in the question, questions derive from a plot.
• Schrödinger’s principle: The occurrence of awareness corresponds to the falsification of an
expectation.
• Von Uexküll’s principle: The form of awareness reflects prior experience. There is no awareness
from ‘revelation’.
Many of the conceptual leads are due to von Uexküll, who has indeed left his marks on various
strands of modern biology, psychology, philosophy, semiotics, artificial intelligence and robotics,
and so forth.
Can the user interface be changed, or extended in the course of the life of an individual? The
quick answer appears to be ‘No!’, or at least ‘Hardly!’ Non-vertebrate animals appear to have
fixed interfaces, and the majority of vertebrates are not that far ahead. Even primates (including
humans) appear to have predominantly fixed interfaces, although these develop over a number
of years in the child. The human interface has many traits common to those of all vertebrates, is
still adapted to savannah hunter-gatherer life, and so forth. Yet it appears that the human interface
has at least some (very limited) flexibility. Most adaptations to the technological age are on the
level of reflective thought and novel sensorimotor and cognitive adaptations. They tend to be in
the margin of visual awareness per se, more like a layer of (painfully cognitive) ‘corrections’. Yet it
is obvious how novelty might arise. It has to be through the formation of novel functional loops,
slowly developing novel ‘functional tones’.
One might wonder why the ‘application of templates’ would lead to awareness at all. At first
blush it would seem to run counter to Schrödinger’s principle. But notice that the implementation
of the ‘application of a template’ would be the launching of a microgenetic thread that would still
have to pass a reality check. A standard template is likely to be violated in such checks, and to be
fine-tuned to fit (or be killed). Thus, the templates are more like plots, enabling the system to come
to terms with the optical structure impinging upon it. There is no reason to think they would not
lead to the falsification of expectations on various different levels.
Gestalts as Ecological Templates 1061
References
Alberti, L. B. (1435). De Pictura. (On Painting, trans. C. Grayson, ed. M. Kemp. Harmondsworth: Penguin,
1972.)
Baxandall, Michael (1995). Shadows and Enlightenment (London, New Haven: Yale University Press).
Berkeley, G. (1709). An Essay Towards a New Theory of Vision (Dublin: Pepyat).
Brown, J. W. (1972). Aphasia, Apraxia and Agnosia (Springfield: Charles C. Thomas).
Brown, J. W. (1977). Mind, Brain and Consciousness (New York: Academic Press).
Brown, J. W. (1996). Time, Will and Mental Process (New York: Plenum Press).
Burton, H. E. (1945). ‘The Optics of Euclid’. J Opt Soc Am 35: 357–372.
Cook, Francis H. (1977). Hua-Yen Buddhism: The Jewel Net of Indra (University Park and
London: Pennsylvania State University Press).
Gauss, Carl Friedrich (1828). Disquisitiones generales circa superficies curvas (Gottingae: Typis
Dieterichianis).
Geem, Z. W., J. H. Kim, and G. V. Loganathan (2001). ‘A New Heuristic Optimization
Algorithm: Harmony Search’. Simulation 76(2): 60–68.
Gibson, J. J. (1986). The Ecological Approach to Visual Perception, pp. 138–139 (London: Routledge).
Hess, R. (1982). ‘Developmental sensory impairment: Amblyopia or tarachopia’. Human Neurobiology
1: 1–29.
Hoffman, D. (2008). ‘Sensory Experiences as Cryptic Symbols of a Multi-modal User Interface’
[Computer, Felsen, Gehirne und Sterne: Raetselhafte Zeichen einer multimodalen
Benutzerschnittstelle]. In Kunst und Kognition, ed. M. Bauer, F. Liptay, and S. Marschall, 261–279
(Munich: Wilhelm Fink).
Hoffman, D. (2009). ‘The Interface Theory of Perception: Natural Selection Drives True Perception to Swift
Extinction’. In Object Categorization: Computer and Human Vision Perspectives, ed. S. Dickinson,
M. Tarr, A. Leonardis, and B. Schiele, pp. 148–165 (Cambridge: Cambridge University Press).
Kanizsa, G. (1955). ‘Margini quasi-percettivi in campi con stimolazione omogenea’. Rivista di Psicologia
49(1): 7–30.
Koenderink, J. J. and A. J. van Doorn (2003). ‘Shape and shading’. In The Visual Neurosciences, ed.
L. M. Chalupa and J. S. Werner, pp. 1090–1105 (Cambridge, MA: MIT Press).
Koenderink, J. J., A. J. van Doorn, and J. T. Todd (2009). ‘Wide Distribution of External Local Sign in the
Normal Population.’ Psychological Research 73: 14–22.
Koenderink, J. J., A. J. van Doorn, H. de Ridder, and S. Oomes. (2010). ‘Visual rays are parallel.’ Perception
39(9): 1163–1171.
Koenderink, J. J., W. A. Richards, and A. J. van Doorn (2012). ‘Space-time disarray and visual awareness.’
i-Perception 3: 159–165.
Kohler, W. (1920/1955). Die physischen Gestalten in Ruhe und im stationaren Zustand. Abridged trans.
in A Source Book of Gestalt Psychology, ed. W. D. Ellis, pp. 71–88 (New York: The Humanities Press).
(Original work published in 1920.)
Leibniz, G. W. (1991). La Monadologie, ed. E. Boutroux (Paris: LGF).
Lotze, R. H. (1852). Medicinische Psychologie oder Physiologie der Seele (Leipzig: Weidmann’sche
Buchhandlung).
MacIver, M. A. (2009). ‘Neuroethology: From Morphological Computation to Planning’. In The Cambridge
Handbook of Situated Cognition, ed. P. Robbins and M. Aydede, pp. 480–504 (New York: Cambridge
University Press).
Manzotti, R. and P. Moderato (2010). ‘Is neuroscience the forthcoming “mind science”?’ Behaviour and
Philosophy 38(1): 1–28.
1062 Koenderink
Meinong, A. (1899). ‘Über Gegenstände höherer Ordnung und deren Verhältniss zur inneren
Wahrnehmung.’ Zeitschrift für Psychologie und Physiologie der Sinnesorgane 21: 187–272.
Michotte, Albert (1946). La perception de la causalité (Louvain: Institut Supérieur de Philosophie)
Minsky, Marvin (1975). ‘A Framework for Representing Knowledge.’ In The Psychology of Computer Vision,
ed. Patrick Henry Winston (New York: McGraw-Hill).
Montag, Christiane, Jürgen Gallinat, and Andreas Heinz (2008). ‘Theodor Lipps and the Concept of
Empathy: 1851–1914.’ Am J Psychiatry 165: 1261.
Nagel, T. (1974). ‘What is it Like to be a Bat?’ The Philosophical Review 83(4) (October): 435–450.
Necker, L. A. (1832). ‘Observations on some Remarkable Optical Phaenomena seen in Switzerland; and on
an Optical Phaenomenon which Occurs on Viewing a Figure of a Crystal or Geometrical Solid.’ London
and Edinburgh Philosophical Magazine and Journal of Science 1(5): 329–337.
Panofski, E. (1927). Die Perspektive als ‘symbolische Form’. Vorträge in der Bibliothek Warburg 1924/1925
(Leipzig: Teubner).
Pirenne, M. H. (1970). Optics, Painting, and Photography (Cambridge: Cambridge University Press).
Poggio, T. (1985). ‘Early Vision: From Computational Structure to Algorithms and Parallel Hardware.’
Computer Vision, Graphics, and Image Processing 31: 139–155.
Pont, S. C., H. T. Nefs, A. J. van Doorn, M. W. A. Wijntjes, S. F. te Pas, H. de Ridder, and J. J. Koenderink
(2012). ‘Depth in Box Spaces.’ Seeing and Perceiving 25(3–4): 339–349.
Richards, W. A. (1982). ‘How to Play 20 Questions with Nature and Win.’ MIT A.I. Memo No. 660
(December).
Rumelhart, D. E. (1980). ‘Schemata: The Building Blocks of Cognition’. In Theoretical Issues in Reading
Comprehension, ed. R. J. Spiro et al., pp. 33–58 (Hillsdale, NJ: Lawrence Erlbaum).
Schopenhauer, A. (1818–1819/1966). The World as Will and Representation [Die Welt als Wille und
Vorstellung], vol. 1; vol. 2 (1844/1966) (New York: Dover Publications).
Schrödinger, E. (1958). Mind and Matter: The Tarner Lectures (Cambridge: Cambridge University Press).
Searle, J. (1983). Intentionality: An Essay in the Philosophy of Mind, vol. 9 (Cambridge: Cambridge
University Press).
Stein, Saint Edith (1917). Zum Problem der Einfühlung (Halle an der Saale). Reprinted in Herder
Edith-Stein-Gesamtausgabe, vol. 5, ed. Andreas Uwe Müller (Freiburg: Herder, 2008).
Turvey, M. T., R. E. Shaw, E. S. Reed, and W. M. Mace (1981). ‘Ecological Laws of Perceiving and Acting: In
Reply to Fodor and Pylyshyn.’ Cognition 9: 237–304.
Twain, Mark (1903/1997). ‘Was the World Made for Man?’ Reprinted in John Carey, Eyewitness to Science,
p. 250 (Boston: Harvard University Press).
VanRullen, R. and C. Koch (2003). ‘Is Perception Discrete or Continuous?’ Trends in Cognitive Science
7(5): 207–213.
von Holst, E. and H. Mittelstaedt (1950). ‘The Reafference Principle: Interaction between the Central
Nervous System and the Periphery’. In Selected Papers of Erich von Holst: vol. 1: The Behavioural
Physiology of Animals and Man, trans. R. Martin, pp. 139–73 (London: Methuen). (From German.)
von Uexküll, J. J. (1926). Theoretical Biology (London: Kegan Paul, Trubner).
von Uexküll, J. J. (2011). A Foray into the Worlds of Animals and Humans, with A Theory of Meaning, trans.
Joseph D. O’Neil, introduction by Dorion Sagan (Minneapolis: University of Minnesota Press).
Index of Names
Note: page numbers in italics refer to figures. References to footnotes are indicated by the suffix, ‘n’,
followed by the note number, for e xample 282n6.
Economou, E. and Gilchrist, A. 405 Field, D.J. 190, 191, 192, 197, 213
Edelman, G.M. 1039 Field, D.J., Hayes, A., and Hess, R. 215
Edelman, S. and Bülthoff, H.H. 921 Fific, M. and Townsend, J.T. 959–60
Egeth, H.E. and Yantis, S. 971 de Finetti, B. 1011
Eglash, R. 880 Fiorani, M. 975
Egly, R. 743 Fisher, R. 1011n3
Egner, T. and Hirsch, J. 980 Fishman, Y.I. 609
Eguilez, V.M. 993 Fitzgibbon, S.P. 996
Ehrenfels, C. von 5, 30, 871 Fitzpatrick, D. 938, 975
Ehrenstein, W. 302, 303 Forkman, B. and Vallortigara, G. 310
Eidels, A. 962, 963 Förster, J. and Higgins, E. 722
Eimer, M. 979 Foster, R.M. and Franz, V.H. 682
Ekroll, V. 399 Fox, K. 335
Ekroll, V. and Faul, F. 427, 455 Fox, M.D. and Raichle, M.E. 992
Elder, J.H. 218, 219, 228 Fox, R. and Check, R. 784
Elder, J.H. and Goldberg, R.M. 197, 212, 213, 214, Foxe, J.J. and Simpson, G.V. 977
214–5, 215, 216 Foxe, J.J. and Snyder, A.C. 994
Elder, J.H. and Velisavljević, L. 207–8, 209, 224 Francis, J.E. 876
Elder, J.H. and Zucker, S.W. 215, 220–1, 378 Franconeri, S. 822
Elhilali, M. 611 Franz, V.H. 685
Elliot, J. 436 Fraser, S. 854
Elliot, M.A. and Müller, H.J. 720 Freeman, E. 748–9
Ellis, R.R. and Lederman, S.J. 629 Freeman, E. and Driver, J. 516, 832
Ellis, W.D. 399 Freeman, W.J. 982
Elman, J. 1018 Freeman, W.J. and van Dijk, B.W. 982
Endler, J.A. 852 Freiwald, W.A. 768
Engel, A.K. and Singer, W. 998 Friedman, H.S. 445
Enns, J.T. 135, 136 Fries, P. 993
Enns, J.T. and Rensink, R.A. 971 Friston, K. 722
Eriksen, B.A. and Eriksen, C.W. 980 Frith, U. 716, 727
Ernst, M.O. and Banks, M.S. 516, 657 Froyen, V. 355, 356
Ernst, U.A. 197–8 Fry, G.A. and Alpern, M. 403
Escera, C. 610 Fu, K.-S. 922
Estrada, F. and Elder, J.H. 226 Fujimoto, K. and Yagi, A. 582, 583
Evans, K.K. and Treisman, A. 832 Fujisaki, W. 829
Exner, S. 4, 488, 825 Fujisaki, W. and Nishida, S. 645–6
Fulvio, J.M. and Singh, M. 249–50
Fahrenfort, J.J. 275 Fulvio,J.M., Singh, M., and Maloney, L.T. 241–2
Faivre, N. 807
Faivre, N. Berthet, V., and Kouider, S. 807 Gabo, N., Constructed Head No. 2 906
Falconbridge, M. 809 Gaillard, R. 999
Fang, F. 721, 805, 812 Gamboni, D. 915
Fang, F., Boyaci, H., and Kersten, D. 349 Ganel, T. 676, 677, 681–2
Fantoni, C. 298 Ganel, T. and Goodale, M.A. 678
Fantoni, C. and Gerbino, W. 296, 299 Gao, Z. 765
Farah, M.J. 760 Garner, W.R. 99, 766, 953–5, 962, 963, 980–1,
Farid, H. and Adelson, E.H. 822 1029, 1031
Farroni, T. 696–7 Gasper, K. and Clore, G.L. 722
Faul, F. and Ekroll, V. 453, 469 Gauss, C.F. 1060
Fechner, G.T. 42, 117 Geem, Z.W., Kim, J.H., and Loganathan, G.V. 1056
Feldman, J. 222, 937, 941–2, 1013, 1014, 1015, 1034, Geisler, W.S. 197, 213, 216, 804, 1013
1035, 1036 Geisler, W.S. and Diehl, R.L. 1022
Feldman, J. and Singh, M. 246, 249, 287–8, 939, 976, 1015–6 Geisler, W.S. and Perry, J.S. 215
Fell, J. 998 Gelb, A. 11, 394, 397, 398, 458
Felleman, D. and Essen, D.V. 377 Geman, S. 922
Felleman, D. and Van Essen, D.C. 969 Gentaz, E. 634
Felzenszwalb, P.F. 929 Gepshtein, S. and Kubovy, M. 72–3, 74, 76, 972
Felzenszwalb, P.F. and Huttenlocher, D.P. 926 Gerbino, W. 427
Fennema, C.L. and Thompson, W.B. 507 Gerbino, W. and Salmaso, D. 299, 300
Ferguson, G., Messenger, J., and Budelmann, B. 855–6 Ghim, H.R. 695
Feynman, R. 1039 Ghim, H.R. and Eimas, P.D. 694–5
Ffytche, D.H. and Zeki, S. 309 Gibson, B.S. 289
INDEX OF NAMES 1067
Gibson, J.J. 16, 167, 396, 625, 626, 872, 972, 1052–3 Häkkinen, J. and Nyman, G. 809
Giese, M.A. 579 Halko, M.A. 304
Giese, M.A. and Poggio, T. 586, 588 Hall, J.R. 856
Gilaie-Dotan, S. 975 Halliday, A. and Mingay, R. 829
Gilbert, A. 444 Hamers, J.F. and Lambert, W.E. 980
Gilbert, C.D. 1039 Han, S. 72, 79, 142, 749, 977–978
Gilbert, G.M. 646 Han, S. and Humphreys, G. 748
Gilchrist, A. 391, 394, 399, 400, 402, 407–8, 448, 455, Han, X. 383
470, 938 Hanlon, R.T. 848
Gilchrist, I. 215 Hanlon, R.T. and Messenger, J.B. 850
Gillam, B.J. 305, 306 Hansmeyer, M. 875
Gillam, B.J. and Grove, P.M. 266, 286 Happé, F.G. 727
Gillam, B.J. and Nakayama, K. 809 Harbisson, N. 666
Gintautas, V. 224 Harding, G., Harris, J.H., and Bloj, M. 456
Giralt, N. and Bloom, P. 283 Harnad, S. 935
Girshick, A.R., Landy, M.S., and Simoncelli, E.P. 156 Harrar, V. 646–7
Glass, L. 114 Harrar, V. and Harris, L.R. 642–3n4, 829
Glass, L. and Switkes, E. 215 Harris, A. and Aguirre, G.K. 768
Glover, S. and Dixon, P. 676 Harris, J.J. 809
Glynn, A.J. 294n2 Harris, L.R. 825
Godfrey, D., Lythgoe, J.N., and Rumball, D.A. 857 Harrison, S. and Feldman, J. 938
Goethe, J.W. 5 Hartline, H.K. 363
Goffaux, V. 768 Hassenstein, B. and Reichardt, W. 489
Gogel, W.C. and Mershon, D.H. 403 Hatfield, G. and Epstein, W. 1018
Goldberg, R. 864 Hayden, A. 701, 703, 705–6
Goldberger, P. 881 Haynes, J.-D. and Rees, G. 800, 808
Goldmeier, E. 11 Haynes, J.-D., Driver, J., and Rees, G. 806
Goldreich, D. and Peterson, M.A. 265 He, D., Kersten, D., and Fang, F. 805
Goldsmith, M. and Yeari, M. 749 He, Y. 991
Goldstein, K. and Gelb, A. 443 Heath, M. 682
Gombrich, E.H. 880 Heath Robinson, W. 864
Gong, P. 993, 994, 997 Hebb, D.O. 692
Gong, P. and van Leeuwen, C. 982, 991, 993 Heeger, D.J. and Bergen, J.R. 174, 176
Gonzalez, C.L.R. 676–7 Heider, F. 15
Goodale, M.A. and Milner, A.D. 672, 972 Heider, F. and Simmel, M. 872
Goodbourn, P.T. 723–4 Helmholtz, H. von 295, 392, 395, 402, 415, 632, 786,
Goodman, N.D. 935 1008, 1029
Goodwin, A.W. 623 Henshilwood, C.S. 880
Gordon, I.A. and Morrison, V. 623 Hering, E. 24, 27, 393, 395, 396, 398, 400, 786
Gordon, J. and Shapley, R. 364 Hernandez, A. 995
Goryo, K. 788 Heron, J. 832
Gottschaldt, K. 9, 10, 14, 15 Hess, C.V. 825
Graf, E.W. 516 Hess, R. 1058
Graham, D.J. and Field, D.J. 875 Hess, R.F. and Dakin, S.C. 198
Granit, R. 9 Hess, R.F. and Field, D.J. 194
Grassmann, H. 438–9 Hesselmann, G. 974, 992
Gray, C.M. 998, 1039 Hildebrand, A. 869, 872
Gray, K.L. 789 Hillebrand, F. 632
Green, D.M. and Swets, J.A. 955 Hillier, B. and Hanson, J. 879
Gregory, R. 307, 674, 811, 1018 Hillyard, S.A. 974
Grelling, K. 11 Hiris, E. 583
Griffiths, T.D. and Warren, J.D. 603–4 Hochberg, J. and Hardy, D. 214
Grinter, E.J. 724, 727 Hochberg, J. and McAlister, E. 81, 1018, 1028, 1029
Grosof, D.H. 975 Hochstein, S. and Ahissar, M. 143, 973
Gross, J. 996 Hock, H.S. and Nichols, D.F. 561, 564–5, 570
Grossberg, S. 328 Hoffman, D.D. 1056
Grosseteste, R. 436–7 Hoffman, D.D. and Richards, W.A. 243
Grossman, E.D. 583 Hoffman, D.D. and Singh, M. 262
Gutschalk, A. 610 Hohmuth, A. 627
Holcombe, A.O. and Cavanagh, P. 823
Hafed, Z.M. and Krauzlis, R.J. 515 Holcombe, A.O. and Clifford, C.W. 820–1
Haffenden, A.M. and Goodale, M.A. 685–6 Holcombe, A.O., Kanwisher, N., and Treisman, A. 821
1068 INDEX OF NAMES
van Leeuwen, C. 982, 991, 996 Watt, R.J. and Phillips, W.A. 561n1
van Leeuwen, C. and Bakker, L. 981, 996 Watts, D. and Strogatz, S. 991
van Leeuwen, C. and Smit, D.J.A. 994 Weber, E.H. 42, 117, 629, 680
van Lier, R. 298, 304–5, 989 de Weert, C.M.M and van Kruysbergen, N.A.W.H. 449
van Lier, R. and De Weert, C.M.M. 779 Weil, R.S. 806
van Lier, R. and Wagemans, J. 298 Weiss, Y. 1013
van Lier, R.J., van der Helm, P.A., and Leeuwenberg, Wenger, M.J. and Ingvalson, E.M. 961
E.L.J. 1034–5 Wenger, M.J. and Townsend, J.T. 959
Van Loon, A.M. 714 Werner, H. 872
van Noorden, L.P.A.S. 605–6 Wertheimer, M. 3–5, 6, 488, 871, 1028
van Polanen, V. 622 ‘Gestalt laws’ 9–10
Vanrie, J. 578 good continuation principle 239
VanRullen, R. and Koch, C. 1046 on perceptual grouping 57, 60, 61–2, 66, 76, 79–80,
Van Tonder, G.J. 877, 878 560, 562–3
Van Tonder, G.J. and Lyons, M.J. 867 on transparency 417
van Wassenhove, V. 997 on wholes and parts 29–30
Varela, F. 997, 998 Westland, S. and Ripamonti, C. 452–3
Vecera, S.P. 264, 268, 989 Westwood, D.A. and Goodale, M.A. 676
Vecera, S.P. and Farah, M.J. 270 Weyl, H. 880
Vecera, S.P. and O’Reilly, R.C. 275, 350, 351 Wheatstone, C. 777
Vecera, S.P. and Palmer, S.E. 264 White, A.L., Linares, D., and Holcombe, A.O. 825, 826
Vickery, T.J. 71 White, M. 405
Vickery, T.J. and Jiang, Y.V 75–76 White, S.J. and Saldaña, D. 729
Vierling-Claassen, D. 995, 996 Whittle, P. 778, 779
Vischer, R. 872 Wijntjes, M.W.A. 625, 631
Vladusich, T. 480 Wilder, J. 1016
Vogels, I.M.L.C. 625–6 Wilder, J., Feldman, J., and Singh, M. 251–2
Vogels, R. 937 Williams, C.B. and Hess, R.F. 199
von der Heydt, R. 309, 343, 356, 366 Williams, K. 337
von der Heydt, R. and Peterhans, E. 975 Williams, M.A. 807
von der Malsburg, C. 989, 993 Wilson, H.R. 507, 783
von Frisch, K. 1050–1 Wilson, J.A. and Anstis, S.M. 825
von Hildebrand, A. 902 Windmann, S. 349
von Holst, E. and Mittelstaedt, H. 1053 Winkler, I. 603, 603–4, 607, 608, 610–1, 612
von Skramlik, E. 632 Winkler, I. and Cowan, N. 603
von Stein, A. 995 Witkin, H. 716, 724
von Uexküll, J. 1050n16, 1051, 1052–4, 1060 Wittman, M. 821
Vrins, S. 301–2 Witzel, C. and Gegenfurtner, K.R. 444
Vroomen, J. and Keetels, M. 828 Wohlschlager, A. 515
Vuilleumier, P. 747, 807 Wokke, M.E. 810
Wolfe, J.M. 804, 970
Wagemans, J. 15, 16, 21, 48, 61, 88, 89, 108, 110, 111, Wolfe, J.M. and Cave, K.R. 972
114, 118, 119, 120, 121, 129, 139, 169, 195, 262, Wolfe, J.M. and Horowitz, T.S. 89, 103
294, 298, 364, 398, 488, 530, 569, 602, 607, 639, Wolff, W. 394
691, 714, 717, 723, 871, 723, 936, 937, 938 Wolfson, S.S. and Landy, M.S. 323
Walker, P. 788 Wolpert, D.M. 586, 587
Wallace, M.T., Wilkinson, L.K., and Stein, B.E. 830 Wong, Y.K. and Gauthier, I. 765
Wallach, H. 13, 392–3, 428, 504, 511, 512, 530–1, 547 Wood, G., American Gothic 909, 908
biographical notes 10 Wouterlood, D. and Boselie, F. 296, 298
Wallach, H. and O’Connell, D.N. 531 Wrobel, A. 996
Wandell, B.A., Dunmoulin, S.O., and Brewer, A.A. 809 Wu, T. and Zhu, S.-C. 921
Wang, B. 283 Wuerger, S.M., Maloney, L.T., and Krauskopf, J. 437–8
Wang, J. 928 Wulf, F. 9
Wang, L. and Jiang, J. 582
Wang, L., Weng, X., and He, S. 809 Xian, S.X. and Shevell, S.K. 449
Wang, S., Wang, Y., and Zhu, S.-C. 926
Wanning, A. 979 Yabe, H. 611
Ward, J. and Meijer, P. 658–9 Yabe, Y. 516
Ward, R. 741 Yamada, T. and Fujisaka, H. 993
Watanabe, K. and Shimojo, S. 643 Yanagi, S. 865
Watkins, S. 813 Yang, E. 726, 789
Watt, R. 197 Yang, E. and Blake, R. 804, 807
1076 INDEX OF NAMES
Yang, Z.Y. and Purves, D. 1036 Zaretskaya, N. 721, 791, 803, 805
Yang,E., Zald, D.H., and Blake, R. 807 Zemel, R.S. 76
Yao, R. 645 Zhang, N. and von der Heydt, R. 938
Yarbus, A.L. 397 Zhaoping, L. 355
Yarrow, K. 825 Zhou, H. 267, 328, 345, 977
Yazdanbakhsh, A. and Livingstone, M.S. 358 Zhou, H., Friedmann, H. and
Yen, S.-C. and Finkel, L.H. 197 von der Heydt, R. 366, 367
Yin, C. 299 Zhou, K. 283
Yin, R.K. 758 Zhou, W. 790
Yo, C. and Wilson, H.R. 508 Zhou, W. and Chen, D. 802
Yokoi, I. and Komatsu, H. 972 Zhu, S.-C. 922, 928, 929
Yong, E. 729 Zhu, S.-C. and Mumford, D. 922
Young, A.W. 742, 760 Zimba, L.D. and Blake, R. 807
Young, M.P. and Yamane, S. 970 Zipf, G.K. 875
Young, T. 436 Zipser, K. 974
Yovel, G. and Duchaine, B. 763 Zuckerman, C.B. and Rock, I. 691–2
Zuidhoek, S. 633
Zaidi, Q. and Li, A. 447 Zylinski, S. 310, 851
Zanforlin, M. 525–7 Zylinski, S., Osorio, D., and
Zangenehpour, S. and Zatorre, R.J. 663 Shohet, A.J. 850, 856
Subject Index
Note: page numbers in italics refer to figures. References to footnotes are indicated by the suffix ‘n’,
followed by the note number, for example 267n4.